by warren | November 11th, 2010
On the occasion of Rememberance Day Muninn is launching its web interface to its catalog. The data so far is what has been processed from the Library and Archives Canada data and the Red Cross Wounded and Missing files from the Australian War Memorial.
Most of the initial work we have been doing is identifying what is in the actual image files. This sounds odd but what actually happens is that the documents are scanned to computer files without noting what the document was. So while we might know that a series of 10 images concern a perticular soldier, we don’t know what peice of paper is recorded in those files. We need to know this before we try and extract the information from those documents and that’s what have been taking our time.
Muninn gets its computer time on the SHARCNET computer cluster (a division of Compute Canada). Instead of using people to review the archives we use computer programs that extract the data from scanned copies. This is a little bit more complicated than run of the mill Optical Character Recognition since the quality of some documents is very poor, the text is sometimes hand-written and forms have check-boxes, rubber stamps and other oddities that require different approaches. Please bare with us while we iron out the bugs from the interface and make it a little bit more pleasing than its current basic look.
