ArkA-D - a tool for the digitization of the archival collections of research libraries

ArkA-D is a database designed for the registration and digitization of manuscript collections on four different levels: from a simple general overview to a complete digital edition with commentaries and transcriptions. ArkA-D will be built as a part of the digital platform "Alvin" which is already being used for the picture database "Bildsök" as well as the provenance and bookbinding database "ProBok". Alvin is a system originally designed to be jointly developed, used and administered by several libraries in co-operation with one another. ArkA-D would make it possible to digitize and publish the vast manuscript collections of the research libraries in different ways. This system will not only give scholars and the general public full access to the documents themselves but would also invite these groups to enrich the material with their own commentaries, transcriptions, or even full text editions. Although ArkA-D is a typical infrastructure project, the research competence of the staff, and the frequent contacts with scholars from different fields, will ensure the future value of ArkA-D. There is an increasing demand for a general system for the digitization of manuscripts. Older databases and the results of earlier research projects need to be taken care of and be administered at the same time as there are ambitions and technical possibilities to start almost an endless number of new projects. Sustainable technical solutions for long time storing, administering, and adding metadata to manuscript material have hitherto been missing. In short, several advantages are gained by building and further developing a well-working technical solution, together with the suggested models for co-operation with other institutions.
Funding for two years was granted 2011 by Riksbankens Jubileumsfond for the project ArkA-D - a tool for digitizing archival collections in research libraries.
ArkA-D aimed to develop a database for rapid digitization of archival materials without a need of extensive metadata attached to each image. The reason to call the project ArkA-D was the four distinct modules (A-D).
A - a model for the registration of archival collections
B - an image capture module to connect the images to the structure already established in A.
C - a model for metadata marking.
D - an opportunity for users to annotate and enrich the material by crow-dsourcing (CS).
ARKA-D was a joint project between Uppsala University Library (UUB), the City library in Linköping (SBL) and Lund University Library (LUB). Gothenburg university library (GUB) joined the project later and financed their own participation.
2012 - Working methods and requirements specifications
During the project first year, a steering committee was created, with participants from the participating libraries.
A working group worked in parallel at the Uppsala University Library for maintaining regular contacts between the unit responsible for the technical development, the Electronic Publishing Centre (EDP) and the Special Collections Department, both at Uppsala University Library. EDP hired two developers to build the database, which, in turn, came to be at the centre of focus in the project.
A project budget project was decided on in spring 2012

Initially, the project was divided into distinct subject areas to be developed. UUB put together a requirements specification for the A and C- modules. This suggested a model for the recording not only of archives, but also books, manuscripts, maps, pictures, objects, sound recordings, sheet music, video and software. This was necessary for the more comprehensive metadata in the C-module since these various entities all require different metadata but each category can make use of the same features to register names, organizations and places.

Plans to run the database collaboratively was a part of the project from the very beginning, therefore, the database was built to allow for an individual logon for the participating parties.

The B section - on image capture, was run by Gothenburg University Library. They developed a model for a digitization flow eventually resulting in a METS file that could be uploaded directly into the system.

CS was dealt with in a separate client group with researches from Lund and Gothenburg University (Elisabet Göransson and Anna Nordenstam) Mathias von Wachenfeldt from SBL and Maria Berggren and Per Cullhed from UUB. Since CS could not be implemented before the system had functions for recording and uploading digital objects, the latter was prioritized. The CS group, however, had meetings in 2012 and their requirements specification was delivered in January 2013. Then the CS was seen as a technique, (see more below).

2013 - development and consortium
Development work continued in the second year and it now became increasingly clear that the modules A-D, delivered a functionality, not only for the publication of archival collections, but for all kinds of collections within the ALM sector. As no other system for this type of publication offered anything similar, it was natural for the project to develop a database of heritage collections adapted for the entire ALM sector. This idea existed already in the RJ-funded ProBok project and it now came to be called Alvin.

Future funding
In 2013 work was done to secure future financing. Since a publishing platform such as Alvin would surely interest many cultural heritage institutions, the ArkA-D project developed a consortium model that could bear the costs of the publication platform after the end of the project. This was an important aspect for digital in the proposed certification standard. This is the ISO 16363: 2012 "Space data and information transfer systems - Audit and certification of trustworthy digital repositories".
For these reasons, a business model for a future consortium was developed and agreed on in 2013. It was partly based on experiences from the publishing platform DiVA, in the wish to allow smaller institutions to pay less than the larger institutions and also, that there should be no disincentives for publishing a digital content, i.e. increased costs due to increased publishing. The aim was to safeguard an incentive to freely publish as much as possible.

To further enhance the possibilities of financing, the project applied for funding from VR (The Swedish Research Council) to bridge the gap in funding between the end of the project and the time when a consortium could stand on its own. (Alvin 2014-16 Consolidating a collaborative database for digitisation). This was part of efforts to keep competences in the project. Within the steering group there were concerns about what would happen if the application was not granted but Uppsala guaranteed the project's survival. Unfortunately the application was not granted. This was in late 2013, and at this point, UUB turned to the Vice-Chancellor of Uppsala University for additional funding. In practice, Uppsala University Library financed Alvin during the part of 2014 when the RJ project funds were depleted but finally, Alvin was granted a support of 1.2 million SEK per year for five years. In 2015, the consortium has begun to take shape with the SBL as the first member after Uppsala. Lund and Gothenburg University libraries have joined in during the fall of 2015 and the Hagströmer Library will become a member in 2016. The University of Stockholm, Göteborg University, The Royal Academy of Sciences and the Polar Research Institute all have contributed collections (pictures from polar expeditions) and financing, as well as Umeå University in an RJ-funded project on digitization and transcriptions of the diaries of J.A. Nensén.

Technological development and migrations
During the autumn 2013, the collection migration was planned and also, the first technical solutions were presented and at this point it was finally possible to actually see how the system worked. A first user interface (UI) was presented towards the end of 2013 and it was decided to first migrate the archive collections of Uppsala and Lund, hitherto published in Ediffah. This began in spring 2014 when the Uppsala Ediffah entries were seen in Alvin (which still was on a test server, only available for the project participants).
During autumn 2013, when the first solutions were presented, it became clear that the technological development had become more extensive and certainly more time-consuming than originally intended. To development, a project leader was hired in 2013-14, partly after end of project funding. The first UI was not satisfactory and in 2014 it was re-built. The second UI (still in use) was presented in2014, and a final comprehensive list of functions specification was also approved in 2014

Crowdsourcing (CS)
In 2013, a thorough survey of, and testing of CS tools, was carried out and we found that a variety of tools could be linked directly to all types of contents in Alvin. However, a review of already existing CS - projects showed that these always contextualized content and turned to a public that was suited to work with the current context which could be anything from labels on bumble-bees to diaries. Therefore, CS should not be regarded as a mere technology for unspecified materials in bulk but, in order to attract the public to participate, it needs to be supported by administrative procedures and a technology for specific selection and monitoring. Bumble-bees and diaries have different audiences and must be presented accordingly.

The suggested model that now emerged was that a digital archive such as Alvin was the necessary stable and long-term basis in a digital universe where all further processing would best be carried out with external tools in a layer above the base layer (Alvin) to which you can link or upload images and metadata for analysis and further processing. This is a benefit for CS, contextualization, technological development of external tools, etc. and it turned out to be an interesting question of principle that serves the dual purpose of supporting both digital preservation and research on digital material. All of digital humanities would benefit from this fundamental model as analysis, annotation, TEI marking etc. can be made in the layer above the base layer, while the base layer material rests securely in the digital repository and does not disappear even if temporary UI:s, annotations and derivative works cannot be maintained. However, it is also important that enrichments, that need to be saved, are ingested into the digital archive for long-term storage. You can call this a digital circulation. Transcription is a good example where a handwritten text first is published in Alvin, processed by external tools, finally the resulting transcription is published in Alvin. This has led to new demands for an UI where refined data can be published alongside the original image, (to be implemented in 2016). CS proved to be more comprehensive than a simple technique and was therefore not included in the functions specification in 2014. It will instead be developed as separate tool in the layer above Alvin.

Alvin was launched at the conference "Digitalisera - men sen då?" the Nordic Museum on November 28, 2014. Then there were 2541 entries. Now, a year later the number has increased to 47459.

