Imagine you have several digital folders. These folders have a naming convention, say ‘Jeffrey_Birthday_2017-2018’, ‘Alan_Holiday_2018-2019’, and so on. Imagine each of these folders have files inside of them. These files, like many digital files, haven’t been thoughtfully named. Instead, they have names like ‘KarenInvite.tiff’, ‘FlightTicketsIMPORTANT.pdf’, ‘Img_0587.jpeg.’ In order to archive these very important files, you must impose a naming convention that will allow them to be identified and pass through your digital preservation system.
Now I would like you to imagine that there are 5,500 digital folders, with 65,000 digital files. You have three weeks to process these files, because you are getting close to the end of the financial year. How do you do it?
In 2018 I gained my postgraduate in information management from the University of Glasgow. This qualification gave me a crash course in archives and records management, information legislation, and digital preservation. I had more opinions on authenticity and integrity than you could shake a Schellenberg at. After this I worked as an archive assistant at the National Library of Scotland, before transferring to work as a Digital Archivist at Historic Environment Scotland.
In my current role I work as a project archivist, with the goal of digitising 375,000 records over three years. That is 1 digitised record, every 4 minutes, every day, for three years (not counting sleeping, eating, or pivoting to working-from-home during global pandemics). To break that down even further, by the end of day 1 we are already behind schedule. My training at the University of Glasgow prepared me for handling data at scale, yet this was a whole different cookie.
In order to achieve the goals of this project, I have developed mine and my archive assistant’s skills in handling data and digital objects at scale. We have been creative, taking deep dives into GitHub forums, and taken a magpie-approach to gathering tools and techniques. This development has taken the form of improving our Excel skills, Powershell command writing, and Python coding.
Consider the problem I detailed earlier. This is very similar to an issue we had with one of our digitised collections, which required mass renaming of 36,000 files according to a series of variables. If we manually renamed 1 file every 10 seconds, it would take us 100 hours of staff time to rename this collection. Instead, we spent an afternoon writing a script in Python that would execute across the collection. The script was a success, allowing us to rename all files within minutes. Additionally, we have saved the script as a reusable resource.
Our training has not been confined to a classroom. We have used webinars and YouTube tutorials, the Enki code-learning app, as well as a variety of forums and blog posts. On top of this, we have spent time speaking with our colleagues with specialist database and data management skills. This informal learning has additionally acted as networking and advocacy, strengthening the profile of the digital archive throughout our organisation. Now we are just as likely to have staff members knocking on our door asking for advice and training!
My advice to any new professional working in digital archives would be to pay attention to what your colleagues are doing. If they are invested in data mining, learn about data mining. If they are spending resources on cyber-security, learn about cyber-security. You don’t need to be an expert; at least learn the fundamentals. Through this you will be able to advocate for your archive in a way that aligns with your colleagues’, and organisation’s, priorities.
The digital landscape moves very quickly. I learned a lot as a postgraduate, but I still need to keep my skills and knowledge sharp. The ARA Professional Development Programme is a great foundation for evaluating your current development, but even a basic ‘SWOT’ assessment can go miles. For myself, I want to spend 2020 learning about SQL and develop a metadata extraction command for our quarantine machine. What are your goals?