What does your project do?
We extract data from digital images of WWI-era documents, put that data into various different kinds of databases, and then do research with it.
How many documents do you want to extract data from?
Millions. We want to use every document that fits our search criteria, no matter how many there are.
What kinds of documents do you want to extract data from?
We are primarily interested in documents that are a) tightly structured, preferably written on a pre-printed forms and b) available in digital image format.
Are you interested in semi-structured or unstructured documents like letters, poems and the like?
Not initially. Our first goal is to go for the documents which are tightly structured, such as documents written on pre-printed forms. In the long run, we may be interested in semi-structured documents as these have interesting parallels in other large archives. Nick, one of our researchers, wrote his thesis using semi-structured documents so if you’re interested in this kind of application you should send him an email.
How can you read and process millions of documents? Isn’t that really expensive?
We have access to very, very powerful computers which are being loaned to us so we don’t have to pay for them. We hope that our research will make future projects, of this kind, much less expensive and data-intensive because we will work out efficient ways to do the data extraction and data organisation.
Can you read manuscript (hand-written) documents, or are you limited to typescript?
We think that we will be able to handle many kinds of manuscript documents. Since a very large percentage of our documents are manuscript, one of the major research goals of our project is to work out better and more accurate ways of reading these documents. To a certain extent, it’s a matter of applying computational power to the problem, and computational power is one thing that we have a lot of. So yes, we’re interested in extracting data from manuscript.
What does ‘Muninn’ mean? What do the bird pictures mean?
Muninn is a supernatural raven from Norse mythology. According to the 13th Century Prose Edda, the god Odin sent out two ravens every morning to collect news from the whole world and bring it back to him at dusk. To Vikings, these birds were apt symbols for the god of war and poetry: dark, foreboding and intelligent.
Yet for all his inherent violence, Odin was not a thoughtlessly cruel deity. Indeed, he was said to be gravely concerned for the fate of everyone who went to war, a concern symbolically expressed through daily fear for the safety of his raven-messengers, especially his favourite, Muninn, whose name meant ‘memory’.
Those ancient warriors who believed themselves watched by Odin’s ravens were alive to the paradoxes of war. By selecting Muninn as our own emblem, we hope to be equally aware of this painful and conflicting duty: to accurately observe the ‘whole world’ of the records in our study and yet never to loose sight of our compassionate duty to the memory of each individual.