Contributed by Kelly Spring (second in the series about ECU's migration to ArchivesSpace)
Wouldn’t it be nice to have turn-by-turn directions to find those vexing errors that you inherently know are in your legacy data? Sometimes charting exactly how to identify messy data can feel like working with an astrolabe to triangulate a position. But, pirate jokes aside, it is possible to define a method to move forward.
At ECU, our initial approach is to test migrating everything we have in Archivists Toolkit (AT) to ArchivesSpace (AS). Obvious, right? However, since our container lists and authorities live in databases outside of AT, we’ll also run additional tests: One series of tests to push authorities into AT and migrate to AS, and another round of tests to add container lists to AT and migrate to AS.
During test migrations, the programmer will keep a captain’s log of anything that fails, which will provide a list of data that could seriously capsize our ship. The rest of the crew will divide into three subgroups, one for each of our repositories, to pinpoint further errors. Mapping discrepancies will be identified by employing the ol’ view-in-source vs. view-in-target method. Style and content errors will also be recorded by the subgroups, but only after referencing our archival description guidelines and explicitly defining what to look for. Using a handy template provided by the Orbis Cascade Alliance, the subgroups will note elements including the problem, priority, extent, and clean-up strategy.
Sounds like smooth sailing! But, what about shifts in the wind or turbulent seas? In other words, how are we going to catch the data that falls through the cracks? Let’s say our sea-monster of a container list database simply won’t go into AT. In that case, our migration team would test importing EADs extracted from our .NET Web system into AS to find errors and/or would run the Harvard EAD checker and Yale Schematron over our files. What about an authority entanglement? For that we would evaluate by node export from AT and, if necessary, use a tool like OpenRefine to reconcile against LCNAF and LCSH.
Our team has a few more resources to define before we heave down. Soon, though, we’ll lift our eye patches and begin looking for those pesky barnacles that need cleaning.