My Canada does not include Newfoundland (and other provocative LOD statements)

warren's picture

Relation 391196A given in working with historical data is that things will have changed since the data was created and this means that some interpretation is necessary to put the data in the right context. In Muninn's case the state of the world as it was in the 1910s is very different from the world of today in terms of things, places and people.

I'm picking on Newfoundland and its involvement in the Great War and as it relates to the Dominion of Canada as an example of issues that pop-up. At the time of the Great War, Canada did not include Newfoundland.

Newfoundland and it's mainland part Labrador is a former colony of the British Empire that became a Dominion in 1907. Newfoundland had decided not to join Canadian Confederation in 1869. It had limited autonomy above that of a colony, as had the Dominion of Canada. It fielded its own battalion sized regiment, the 1st Newfoundland Regiment (nicknamed The Blue Puttees), as its contribution to the Great War. By the end of the war, the Regiment would be renamed the Royal Newfoundland Regiment. The names and Linked Open Data URI's of these soldiers can be retrieved with this query.

Open Street Map uses relation 391196 (image right) for the boundaries of Newfoundland and Labrador to mark them as provincial entities. The current Linked Geo Data server isn't exporting this relation as a URL right now, but the geometry remains closely matched to the older Dominion boundaries.

The current dbpedia Dominion of Newfoundland term is an appropriate Linked Open Data URI for us to use with Muninn, though it is missing start and stop dates. There are also dbpedia terms for the Island of Newfoundland and the Labrador coast that reference the place and not the political entity itself. Geonames similarly provides terms for the Province of Newfoundland and Labrador (current province) and the Island of Newfoundland (the location) but not the Labrador coast. 

Trying to defined the Dominion of Canada during the Great War is equally challenging. It would eventually become modern Canada, but only after going through several national and provincial border adjustments post-1921 and absorbing the Dominion of Newfoundland in 1947, becoming the Province of Newfoundland. To make things even more confusing, Canada became a Commonwealth realm instead of a Dominion with the Statute of Westminster in 1931, though the term Dominion of Canada remained in everyday use until late in the 1950's. Canada's constitution was repatriated in 1982 from Great Britain, but the references to Realm and Dominion still remain in some federal documents. Some border disputes still exist about Canada's geography, notably with Hans Island which is in the high arctic and is likely at the bottom of the list of things that Canada and Denmark worry about.

Getting a Linked Open Data term for the dominion is a bit difficult: the dbpedia Dominion of Canada is redirected to Canada after a small edit war about the appropriate way to represent Canadian history ended with the page getting locked. The result is a Wikipedia article organization that might look pleasant but does not work well from dbpedia's standpoint.

Open Street Map relation 1428125 currently maps out the current borders of Canada (image left), which definitely don't correspond to what it would have looked like in 1914. For its part, the FAO Political Ontology's version of Canada is marked as valid since 1985 and valid until the year 9999. We can't make use of this term as it is problematic: it entails that the entity exists outside the years range of the Great War and it is defined as an independent political entity.

This FAO term is actually an example of issues that pop-up in the shift from database schema to ontology definition. The validUntil property is defined as "the area's first year of validity... ...validSince = 1985, this indicates that the area is/was valid since 1985." instead of "the term's first year of validity". This confuses whether the property is about the entity itself or about the description of the entity. It also makes use of a 'Magic Number' instead of a machine readable term. There is no way for a machine to understand this data which can easily be understood as meaning that Canada was created in 1985 and will cease to exist in the year 9999.

Geonames lists Canada as a modern day country with a bounding box encompassing Newfoundland. Thus, if we want to refer to the Dominion of Canada in the Great War context, we will need to create our own Linked Open Data URI. Defining a geometry for both dominions is a lot of work because of the large polygons that we have to create to encompass areas who's shape we aren't completely certain about. So we postpone this work by omitting the shape information and creating an empty URI for the geometry. This does not mean that the URI is useless: we can leverage the power of the semantic web to use the existing Linked Open Data URIs to make sure our terms aren't confused with another historical period and to narrow down the range of possible locations.

Defining what something is based on what it isn't

Linked open data allows us to create things, entities or concepts without needing to be completely certain about all of the facts. In this case, we want terms defining both the political and geographical aspects of the Dominion of Canada and the Dominion of Newfoundland. This involves the political entity, their features and the geometry of the features. Given the use cases of the Muninn Project, we made the decision to merge both political entity and feature within the same URI and keep the geometry separate. Therefore, we use the Dominion term provided by the Muninn Organization ontology, the Feature term from the GeoSPARQL ontology and some glue from the neoGeo vocabulary.

Given that Geonames, Dbpedia and Linked Geo Data all model their URI and concepts using different variants of Entities, Features and Geometries, it is not necessarily easy to use them all in a way that will keep a reasoner from malfunctioning. Since they all contain some type that references Geometry, we will use their terms aggressively to compare our 'make believe' geometry against. The terms defined here are abridged to be readable, but the links will load the full RDF.

Defining Canada (which does not include Newfoundland):

<rdf:Description rdf:about="Canada">
 <rdf:type rdf:resource="Dominion"/>
 <rdf:type rdf:resource="Feature"/>
 <foaf:name>Canada</foaf:name>
 <rdfs:label xml:lang="en">Dominion of Canada</rdfs:label>
 <owl:differentFrom rdf:resource="dbpedia:Canada"/>
 <owl:differentFrom rdf:resource="geonames:Canada"/>
 <owl:differentFrom rdf:resource="fao:Canada"/>
 <geom:geometry>
  <rdf:Description rdf:about="GeometryCanada">
  <rdf:type rdf:resource="Geometry"/>
  <rdfs:label xml:lang="en">Geometry of the Dominion of Canada</rdfs:label>
  <ogc:sfDisjoint rdf:resource="GeometryNewfoundland"/>
  <ogc:sfWithin rdf:resource="geonames:NorthAmerica"/>
  <ogc:sfTouches rdf:resource="dbpedia:LabradorCoast"/>
  </rdf:Description>
 </geom:geometry>
</rdf:Description>

Thus, our Canada is not the same as the one described in the FAO, DBpedia or Geonames. Canada is a Dominion, it's geometry isn't defined but we know it is in North America, separate from the geometry of the island of Newfoundland and that it partially touches the Labrador coast. This effectively prevents anyone from confusing what we think Canada is with another political entity called 'Canada' while generally placing it in North America and enforcing the fact that the geographic does not encompass Newfoundland.

In a similar fashion, we can defined the Dominion of Newfoundland as:

<rdf:Description rdf:about="Newfoundland">
<rdf:type rdf:resource="Dominion"/>
<rdf:type rdf:resource="Feature"/>
<foaf:name>Newfoundland</foaf:name>
<rdfs:label xml:lang="en">Dominion of Newfoundland</rdfs:label>
<owl:differentFrom rdf:resource="dbpedia:Newfoundland"/>
<owl:differentFrom rdf:resource="geonames:Newfoundland"/>
<owl:sameAs rdf:resource="dbpedia:DominionOfNewfoundland"/>
<geom:geometry>
  <rdf:Description rdf:about="GeometryNewfoundland">
   <rdf:type rdf:resource="Geometry"/>
   <rdfs:label xml:lang="en">Geometry of the Dominion of Newfoundland</rdfs:label>
   <ogc:sfDisjoint rdf:resource="GeometryCanada"/>
   <ogc:sfContains rdf:resource="dbpedia:IslandNewfoundland"/>
   <ogc:sfContains rdf:resource="dbpedia:LabradorCoast"/>
   <ogc:sfContains rdf:resource="geonames:IslandNewfoundland"/>
   <ogc:sfWithin rdf:resource="geonames:NorthAmerica"/>
  </rdf:Description>
 </geom:geometry>
</rdf:Description>

Here, we are the same entity as that described in dbpedia through it is not the same as the current province, nor is it the same provincial entity described by Geonames. We still don't have a full geometry described for the dominion, but it includes the Labrador coast, the island of Newfoundland (as described by Geonames and Dbpedia) and is within North-America. Of course, it is separate from the geometry of Canada.

Linked Open Data enables you to have missing data

Our problem was that we wished to define two distinct historical entities that are the ancestors of current entities and that can be easily confused with them. Their geometry has changed over time and isn't known to a sufficient detail for us to provide a GIS polygon. Using Linked Open Data we were able to describe both historical entities while avoiding ambiguity with other terms. We were even able to define URIs for their geometry without knowing what the geometry actually was by comparing against past and modern geometries.

A fear in historical modeling is that using imprecise or unknown data will lead scholars to make erroneous decisions. Tools have had limited support for these situations in the past, but the use of Linked Open Data approaches allows us to make statements about things while allowing for some measure of managed uncertainty.