Urdar. A research infrastructure for archaeological excavation data
The rich heritage legacy from archaeological excavations in Sweden is largely inaccessible for data driven research. The Urdar project will ensure that digitally born documentation from excavations will not be lost to posterity and that it will be findable for researchers through linked data and open archives. Semantic linking of field documentation and research data will enable information to be optimized for Digital Humanities and the sciences. This will contribute to interdisciplinary research as well as strengthen the position of archaeology in academic research. Urdar will bridge the divide between the heritage sector and the universities and facilitate research on the main empirical information for archaeology. Digital excavation documentation is a prime resource for exploring long-term perspectives in many different fields of research. Urdar will incorporate the FAIR principles (Findable, Accessible, Interoperable and Reusable), ensuring the results from field archaeology are primed for incorporation in a wider European framework of archaeological infrastructures through the use of common open standards and formats. By making the complex contextual information from ancient sites FAIR and possible to link with museum collections and analytical results (e.g. paleoecology, osteology, 14C-dating, genomics etc), archaeology will gain increased relevance and contribute to a greater understanding of human history and prehistory in order to inform the present and future.
Final report
Final report, Urdar. A research infrastructure for archaeological excavation data. In19-0135:1
Urdar - the purpose and development of the infrastructure
The purpose of Urdar was to preserve and make available the National Heritage Board's (NHB) archive of digitally born documentation from archaeological investigations that were created in the Intrasis software. The goal was to make data from NHB's excavation unit (UV) freely available and usable according to the FAIR data principles, as part of improving access to archaeological information and initiating a dialogue around the reuse of data. Archaeological research is to a large extent based on the empirical material produced in connection with commissioned archaeological investigations. Being able to use more of that information in digital format opens up all the possibilities for analysis and knowledge production made possible by the development of information technology in recent decades.
Result
All 3,696 Intrasis databases from UV were transferred from NHB to Uppsala University to be processed according to the FAIR principles. These were recreated in a PosgreSQL/PostGIS environment to enable processing and export to open formats. Processing mainly involved correcting coordinate systems so that all geodata are in the same system (SWEREF99 TM, EPSG 3006). The information was also supplemented with the official ID number for archaeological projects, so that it would be possible to relate each database to NHB's Historic Environment Record. Beyond that, only very limited edits to the information were made, for example, if a vector in a polygon was obviously very wrong and could be thousands of kilometres away. In a couple of such cases, the incorrect vector was removed so that the databases are easier to reuse. Beyond that, the basic principle is that all data is as it was originally delivered from the excavation.
After processing, all databases were exported in GeoPackage (GPKG) and Comma-separated values ??(CSV) formats. The GPKG format is well-suited for reuse within GIS programs and is a stable format suitable for long-term storage, but for safety reasons, the data was also exported as CSV. In order to convert formats from Intrasis to GPKG/CSV, a plugin was developed for QGIS. This plugin, Swedigarch GeoTools, enables easy export via QGIS directly from a PostgreSQL/PostGIS server without specialist skills. The plugin was developed in collaboration with Sweco, which was procured for this, and the development was also co-financed with two NHB R&D projects (RAÄ FoU) to develop possibilities to analyse the content of the databases in QGIS. An advantage of developing this solution, rather than making exports via scripts directly to the server, is that it creates conditions to easily collect more data in the future, where other archaeological organisations with data can quickly and easily export the data themselves. Beyond that, it also provides the opportunity for data to be made available in connection with reporting of archaeological investigations in the future.
The National Heritage Board's e-archive (Iipax) stores digital archive objects, from publications and documents to images and photographs. In connection with DAP (the Digital Archaeological Process development project 2015-2019), the e-archive was integrated with a web service through which archaeological contractors can deliver geodata to the Historic Environment Register (geometries of remains, assignments and excavated surfaces) as well as reports and publications to the e-archive. In addition, find lists can be delivered as tabular data (CSV) to the e-archive. Different types of objects need to have their own metadata template in the e-archive that categorises them, specifies which metadata should be used to describe them and specifies which file formats are accepted. Within the project Urdar, a category and metadata template was developed for "Documentation data" which accepts file formats such as GeoPackage and CSV, and also ensures that they are linked to project IDs in the Historic Environment Record.
Of the 3696 Intrasis databases, 3460 were ultimately deemed relevant for preservation and making available. The others are copies or completely empty databases. For each database, two exports were made:
* CSV is the most archive-stable format, pure tabular data with coordinate data that can be converted to geodata or be analysed outside GIS environments. To access the information it is necessary to first define the relations between the tables, according to the schema provided in text format with each database.
* GPKG is an open file format adapted for GIS platforms. Provided to facilitate reuse. In the GPKG file, all relations are pre-defined.
In December 2024, the GPKG and CSV files associated with metadata were imported into the e-archive and supplemented with relevant metadata from archives and the cultural environment register. All content in the e-archive is made available in the search service Archive Search, so that the files are now truly searchable. In addition, Fornsök, the public web search for archaeological sites and projects, has been developed so that there is now a direct link from each project to all available material in the e-archive.
Goal Achievement:
Findable:
* Documentation data is searchable in the public Archive search
* The content of the e-archive is published as OAI-PMH with an open API
* Documentation data can be found via Fornsök
* Documentation data can be found via Swedigarch/AGES
Accessible:
* Each file has a persistent identifier in the e-archive
*Documentation data can be ordered for download via the Archival web search . A check takes place before delivery to ensure that it does not contain classified information. The check goes quickly thanks to the preliminary work on bulk data that has already been carried out within the project.
* Information about the databases is also published via AGES within Swedigarch, via a WebGIS, as a WMS service and as an index published on Zenodo.
Interoperable:
* Coordinate systems has been corrected to SWEREF99.
* The internal Intrasis format has been converted to a completely new, open format so that all information can be made available as GPKG and CSV.
* Each file has received associated metadata in the e-archive
* In relevant cases, the files are linked to project IDs and site IDs with additional information in the Historic Environment Record/Fornsök
* Metadata from the NHB e-archive is compatible with Dublin Core and CIDOC-CRM
Reusable
* The CSV files can be opened in any program, but the relationships between the different tables need to be defined before they can be analysed.
* The GeoPackage format can be opened in the GIS programs that support the Open Geospatial Consortium's GPKG definitions. Currently, ArcGIS does not support these, but they are better suited to for example QGIS, which is Open Source.
* License: CC0 (the publications for the excavations are licensed CC BY)
Use of the infrastructure
Data from Urdar was launched in December 2024, and it is still of limited use, but an example that can be mentioned is the REICOR project, where a preliminary version of the material was used for analyses, and above all to explore future possibilities with similar material. REICOR (Rational and efficient ground investigations for industrialised construction of new railways) is a project that aims to develop methods to create a better basis for planning the route of railways, where there is a need to estimate the amount of archaeological remains in different alternative corridors. This is important for calculating the extent of cultural historical remains and the need for investigations. It is noted that the detailed information on individual facilities and findings that can be analyzed with access to GIS data from the surveys provides significant gains for this type of analysis. Another important use of the results is that they form a central basis for the development of Swedigarch.
Deviations
Some shifts in the schedule occurred due to the pandemic that broke out when the project started, which made it difficult to have physical workshops and meetings. Another factor that came to affect the project was that the Swedish Research Council decided in autumn 2021 to finance the national infrastructure Swedigarch. It became relevant to develop long-term solutions for the infrastructure within the Urdar project, for example which technical methods were developed to export Intrasis data, so that it can be done more efficiently for large volumes of other databases in the future.
Mapping of metadata against CIDOC-CRM was moved to the Swedigarch project because the infrastructure is developing a completely new version of the data model for SOCH (K-Samsök). It was deemed not relevant to map to a system that was about to be replaced. This meant less need for technical development at NHB, and these resources were used instead in contracting external consultants (Sweco) to develop export functions for the QGIS plugin Swedigarch Geotools. This meant changes in the budget as SEK 210,308 was transferred from salary to operations, with the approval of finance director Anna Mogård (2024-02-06).
Integration and long-term perspective
By publishing the material on the e-archive, it becomes available in the long term, and the updates and routines developed there also mean improved opportunities to receive and manage this type of information in the future, which is a significant gain as this previously constituted an obstacle to taking care of these types of information. The work of collecting, processing and making available geodata from archaeological investigations has completely new possibilities through the development of the export functions in Swedigarch Geotools, and this work continues within the infrastructure Swedigarch where data from several excavations are made FAIR. Within Swedigarch, technical solutions are also being developed to enable the aggregation of data from all investigations to be able to be analysed together and to link to external registers and databases, for example with findings, C14, environmental analyses and aDNA.
Accessibility and Open Science
All data can be searched for and ordered via NHB's e-archive. Due to the fact that geodata can be sensitive, especially in cases where it concerns infrastructure or facilities in the vicinity of protected objects, an extra security check is made when a dataset is ordered. A display service is available on Swedigarch, and an index of all databases (AGES_index.gpkg) is available on Zenodo. All code developed within the project is openly available on GitHub (see links below).
International collaborations
Within the project, informal contacts have been developed, which can form the basis for continued collaborations. Above all, we have been in contact with the Norwegian infrastructure ADED (Archaeological Digital Excavation Documentation) who have processed the same type of data (Intrasis) but based on partly different principles. We have also been in contact with the English infrastructure ADS (Archaeology Data Service) which also handles Intrasis data. Representatives from both ADED and ADS have been part of the reference group for the project.
Links
An overview of all databases on NHB's e-archive: https://app.raa.se/open/arkivsok/results?arkiv_samling=Avdelningen%20f%C3%B6r%20arkeologiska%20unders%C3%B6kningar%20(UV)%201994-2014&searchtype=filter&page=0&pagesize=100
The Swedigarch AGES page, with information about data from Urdar and what continues to be collected through the infrastructure: https://swedigarch.se/index.php/swedigarch/resources/ages/
AGES_index on Zenodo: https://doi.org/10.5281/zenodo.14527340
GitHub page, with information about the Swedigarch Geotools plugin, which has all the code used to create GPKG exports: https://github.com/swedigarch/QGIS-plugin/wiki The code is also available on Zenodo: https://doi.org/10.5281/zenodo.12158155
Urdar - the purpose and development of the infrastructure
The purpose of Urdar was to preserve and make available the National Heritage Board's (NHB) archive of digitally born documentation from archaeological investigations that were created in the Intrasis software. The goal was to make data from NHB's excavation unit (UV) freely available and usable according to the FAIR data principles, as part of improving access to archaeological information and initiating a dialogue around the reuse of data. Archaeological research is to a large extent based on the empirical material produced in connection with commissioned archaeological investigations. Being able to use more of that information in digital format opens up all the possibilities for analysis and knowledge production made possible by the development of information technology in recent decades.
Result
All 3,696 Intrasis databases from UV were transferred from NHB to Uppsala University to be processed according to the FAIR principles. These were recreated in a PosgreSQL/PostGIS environment to enable processing and export to open formats. Processing mainly involved correcting coordinate systems so that all geodata are in the same system (SWEREF99 TM, EPSG 3006). The information was also supplemented with the official ID number for archaeological projects, so that it would be possible to relate each database to NHB's Historic Environment Record. Beyond that, only very limited edits to the information were made, for example, if a vector in a polygon was obviously very wrong and could be thousands of kilometres away. In a couple of such cases, the incorrect vector was removed so that the databases are easier to reuse. Beyond that, the basic principle is that all data is as it was originally delivered from the excavation.
After processing, all databases were exported in GeoPackage (GPKG) and Comma-separated values ??(CSV) formats. The GPKG format is well-suited for reuse within GIS programs and is a stable format suitable for long-term storage, but for safety reasons, the data was also exported as CSV. In order to convert formats from Intrasis to GPKG/CSV, a plugin was developed for QGIS. This plugin, Swedigarch GeoTools, enables easy export via QGIS directly from a PostgreSQL/PostGIS server without specialist skills. The plugin was developed in collaboration with Sweco, which was procured for this, and the development was also co-financed with two NHB R&D projects (RAÄ FoU) to develop possibilities to analyse the content of the databases in QGIS. An advantage of developing this solution, rather than making exports via scripts directly to the server, is that it creates conditions to easily collect more data in the future, where other archaeological organisations with data can quickly and easily export the data themselves. Beyond that, it also provides the opportunity for data to be made available in connection with reporting of archaeological investigations in the future.
The National Heritage Board's e-archive (Iipax) stores digital archive objects, from publications and documents to images and photographs. In connection with DAP (the Digital Archaeological Process development project 2015-2019), the e-archive was integrated with a web service through which archaeological contractors can deliver geodata to the Historic Environment Register (geometries of remains, assignments and excavated surfaces) as well as reports and publications to the e-archive. In addition, find lists can be delivered as tabular data (CSV) to the e-archive. Different types of objects need to have their own metadata template in the e-archive that categorises them, specifies which metadata should be used to describe them and specifies which file formats are accepted. Within the project Urdar, a category and metadata template was developed for "Documentation data" which accepts file formats such as GeoPackage and CSV, and also ensures that they are linked to project IDs in the Historic Environment Record.
Of the 3696 Intrasis databases, 3460 were ultimately deemed relevant for preservation and making available. The others are copies or completely empty databases. For each database, two exports were made:
* CSV is the most archive-stable format, pure tabular data with coordinate data that can be converted to geodata or be analysed outside GIS environments. To access the information it is necessary to first define the relations between the tables, according to the schema provided in text format with each database.
* GPKG is an open file format adapted for GIS platforms. Provided to facilitate reuse. In the GPKG file, all relations are pre-defined.
In December 2024, the GPKG and CSV files associated with metadata were imported into the e-archive and supplemented with relevant metadata from archives and the cultural environment register. All content in the e-archive is made available in the search service Archive Search, so that the files are now truly searchable. In addition, Fornsök, the public web search for archaeological sites and projects, has been developed so that there is now a direct link from each project to all available material in the e-archive.
Goal Achievement:
Findable:
* Documentation data is searchable in the public Archive search
* The content of the e-archive is published as OAI-PMH with an open API
* Documentation data can be found via Fornsök
* Documentation data can be found via Swedigarch/AGES
Accessible:
* Each file has a persistent identifier in the e-archive
*Documentation data can be ordered for download via the Archival web search . A check takes place before delivery to ensure that it does not contain classified information. The check goes quickly thanks to the preliminary work on bulk data that has already been carried out within the project.
* Information about the databases is also published via AGES within Swedigarch, via a WebGIS, as a WMS service and as an index published on Zenodo.
Interoperable:
* Coordinate systems has been corrected to SWEREF99.
* The internal Intrasis format has been converted to a completely new, open format so that all information can be made available as GPKG and CSV.
* Each file has received associated metadata in the e-archive
* In relevant cases, the files are linked to project IDs and site IDs with additional information in the Historic Environment Record/Fornsök
* Metadata from the NHB e-archive is compatible with Dublin Core and CIDOC-CRM
Reusable
* The CSV files can be opened in any program, but the relationships between the different tables need to be defined before they can be analysed.
* The GeoPackage format can be opened in the GIS programs that support the Open Geospatial Consortium's GPKG definitions. Currently, ArcGIS does not support these, but they are better suited to for example QGIS, which is Open Source.
* License: CC0 (the publications for the excavations are licensed CC BY)
Use of the infrastructure
Data from Urdar was launched in December 2024, and it is still of limited use, but an example that can be mentioned is the REICOR project, where a preliminary version of the material was used for analyses, and above all to explore future possibilities with similar material. REICOR (Rational and efficient ground investigations for industrialised construction of new railways) is a project that aims to develop methods to create a better basis for planning the route of railways, where there is a need to estimate the amount of archaeological remains in different alternative corridors. This is important for calculating the extent of cultural historical remains and the need for investigations. It is noted that the detailed information on individual facilities and findings that can be analyzed with access to GIS data from the surveys provides significant gains for this type of analysis. Another important use of the results is that they form a central basis for the development of Swedigarch.
Deviations
Some shifts in the schedule occurred due to the pandemic that broke out when the project started, which made it difficult to have physical workshops and meetings. Another factor that came to affect the project was that the Swedish Research Council decided in autumn 2021 to finance the national infrastructure Swedigarch. It became relevant to develop long-term solutions for the infrastructure within the Urdar project, for example which technical methods were developed to export Intrasis data, so that it can be done more efficiently for large volumes of other databases in the future.
Mapping of metadata against CIDOC-CRM was moved to the Swedigarch project because the infrastructure is developing a completely new version of the data model for SOCH (K-Samsök). It was deemed not relevant to map to a system that was about to be replaced. This meant less need for technical development at NHB, and these resources were used instead in contracting external consultants (Sweco) to develop export functions for the QGIS plugin Swedigarch Geotools. This meant changes in the budget as SEK 210,308 was transferred from salary to operations, with the approval of finance director Anna Mogård (2024-02-06).
Integration and long-term perspective
By publishing the material on the e-archive, it becomes available in the long term, and the updates and routines developed there also mean improved opportunities to receive and manage this type of information in the future, which is a significant gain as this previously constituted an obstacle to taking care of these types of information. The work of collecting, processing and making available geodata from archaeological investigations has completely new possibilities through the development of the export functions in Swedigarch Geotools, and this work continues within the infrastructure Swedigarch where data from several excavations are made FAIR. Within Swedigarch, technical solutions are also being developed to enable the aggregation of data from all investigations to be able to be analysed together and to link to external registers and databases, for example with findings, C14, environmental analyses and aDNA.
Accessibility and Open Science
All data can be searched for and ordered via NHB's e-archive. Due to the fact that geodata can be sensitive, especially in cases where it concerns infrastructure or facilities in the vicinity of protected objects, an extra security check is made when a dataset is ordered. A display service is available on Swedigarch, and an index of all databases (AGES_index.gpkg) is available on Zenodo. All code developed within the project is openly available on GitHub (see links below).
International collaborations
Within the project, informal contacts have been developed, which can form the basis for continued collaborations. Above all, we have been in contact with the Norwegian infrastructure ADED (Archaeological Digital Excavation Documentation) who have processed the same type of data (Intrasis) but based on partly different principles. We have also been in contact with the English infrastructure ADS (Archaeology Data Service) which also handles Intrasis data. Representatives from both ADED and ADS have been part of the reference group for the project.
Links
An overview of all databases on NHB's e-archive: https://app.raa.se/open/arkivsok/results?arkiv_samling=Avdelningen%20f%C3%B6r%20arkeologiska%20unders%C3%B6kningar%20(UV)%201994-2014&searchtype=filter&page=0&pagesize=100
The Swedigarch AGES page, with information about data from Urdar and what continues to be collected through the infrastructure: https://swedigarch.se/index.php/swedigarch/resources/ages/
AGES_index on Zenodo: https://doi.org/10.5281/zenodo.14527340
GitHub page, with information about the Swedigarch Geotools plugin, which has all the code used to create GPKG exports: https://github.com/swedigarch/QGIS-plugin/wiki The code is also available on Zenodo: https://doi.org/10.5281/zenodo.12158155