Scurvy Dog

Contributed by Kelly Spring

[This is the 4th in a series -- see also the 1st and 2nd and 3rd installments about ECU's migration to ArchivesSpace.]

When our manuscript container list database showed signs of Roman numerals mixed with numbers, duplicate boxes, sub-sub-series, container lists that start with box #3, and, I’m not kidding, an entire series devoted to "Empty Photo Albums" (really?!?), we knew we needed to go straight to the naval surgeon. 

Enter our Lead Programmer and months of painstaking work. It was immediately apparent that updating the container lists from EAD would have required replacing the whole record and breaking accession links. Importing the container lists from our local database into AT and pushing the AT-AS migration tool was considered. However, in testing that method the migration would run for hours before producing results. Blimey!

So, the Lead Programmer tested the Harvard Excel import template’s ability to handle hierarchy, instance types, and date strings with multiple dates (of which there were thousands).  He ran reports to assess the number and scale of issues, often conversing with the migration team on the value of retaining data as-is. This, too, ended up not being the most viable option for our migration.

Ultimately, our Lead Programmer studied the AS database schema while the migration team created and updated test records directly in AS to illustrate structure, allowing him to work backward to the migration code. He developed a console application that restructured the container lists from our local database, wiped existing AS container lists, and generated a ship-load of SQL commands that were saved and then run to insert the container lists into AS. As the team worked on running the script, we checked to see that we implemented series properly, handled boxes spanning across series correctly, and checked the accuracy of merged container lists for partially processed collections. We had a surgical scare when we thought that the top container relationships flatlined and went missing. Thankfully, though, a full re-index was all it took to bring them back from the brink.

Now that the container lists are in AS, it’s back to data cleanup and quality control for this crew. We’ve identified 50 collections (out of about 2,000) that came out of SQL surgery with known errors such as duplicate box and folder instances. Currently, a sub-team is looking at data mapping for the resource descriptions, consulting our online collection guides to validate the container lists against AS, and double checking the physical material for numbering discrepancies. Our sails might be shortened, but we’ve replenished our stock of vitamin C and will soon be able to haul wind towards our digital objects.

This entry was posted in Newsletter and tagged . Bookmark the permalink.