The 100 year old April Fool's joke

posted by warren on Mon, 04/01/2013 - 13:15

Perhaps you think you're funny?

I'd like to see you pull a prank that still causes problems 100 years after the fact.

Take a look at this group of soldiers from the CEF. They pulled a prank on their enlistment officer about their date of birth. A good one.

For example, examine the records of Scott Mcconnell, Born Feb 31, 1884, Joseph Carriere, Born Feb 31, 1895, Harold Bennington, Born Feb 31, 1890 and Frederick Handley, Born Feb 31, 1874 among many others. Notice anything in common? They all wrote down impossible, but valid looking, birthdates on their enlistement form as a joke that keeps on creating chaos to this day.

Harry Baird was one of the soldiers of the Canadian Expeditionary Force with a sense of humour and he likely enjoyed pulling a fast one on an officer at enlistment. Some of his letters are available online at Canadian Letters where some of his more colourful descriptions of the war are recorded. Other reasons for recording impossible dates might have included hiding ones true identity while re-enlisting or just as a means of showing dissent at being drafted.

And so, a 100 years after the fact, every time someone tries to create a database about the Great War, what happens?

muninn=> select date('1989-02-31');
ERROR:  date/time field value out of range: "1989-02-31"
LINE 1: select date('1989-02-31');
muninn=>

Which likely has driven more than one database to distration. When the LAC originally indexed the CEF papers online, they indexed the birthdates as strings to get around this problem as well as the other data quality problems that exist in working with historical dates. SQL databases will not allow impossible dates to be entered and dealing with imprecise dates requires the use of time intervals or really creative and specialized schemas. Also, most date implementations in databases or operating systems are limited in their ranges to post-1901 or post-1970 dates which makes the process especially painful. (Note: Date::Calc is a good tool to get around this problem in perl.)

This is actually an interesting problem in attempting to balance archival integrity and data quality: the date is impossible but it has perpetuated itself into the bureaucracy and behaves a little bit like a primary key that can be used for additional document retrieval. As show above, a normal SQL database has no hope of working with this type of data. One of the great things about using linked open data is that we can use ontologies to record information that is wrong without creating logical or syntactic errors. Using the W3 Time Ontology for example, we can record an invalid date without necessarily committing to a date value without loosing the date string associated:

<owl:time rdf:about="Birth">
 <dc:source rdf:resource="LAC Entry"/>
 <rdfs:label>Birth of Harry Baird on 31/02/1893.</rdfs:label>
 <rdf:value>1893-02-31</rdf:value>
 <foaf:name>1893-02-31</foaf:name>
 <mil:hasPrincipal rdf:resource="Harry Baird"/>
</owl:time>

However, if we believe that part of the date information is true, and that Harry was actually born somewhere in 1893, we can use only part of the date information by adding a partial Date Time description:

<owl:time rdf:about="Birth">
 <dc:source rdf:resource="LAC Entry"/>
 <time:hasDateTimeDescription>
  <time:DateTimeDescription rdf:about="...">
   <time:year rdf:datatype="&xsd:GYear">1893</time:year>
  <time:DateTimeDescription>
 </time:hasDateTimeDescription>
 <rdfs:label>Birth of Harry Baird on 31/02/1893.</rdfs:label>
 <rdf:value>1893-02-31</rdf:value>
 <foaf:name>1893-02-31</foaf:name>
 <mil:hasPrincipal rdf:resource="Harry Baird"/>
</owl:time>

By doing this, we can leverage any partial information that is available even through its accuracy leaves to be desired.

But remember, the best of jokes are those that keep on giving for years to come.

Language English

Tags:

RDF