60 years of global environmental change 1939-1999: digitization of 1.6 million historical aerial photographs
The global environment has changed beyond recognition over the past century, presenting unprecedented challenges in environmental resource management, including biodiversity loss, deforestation and climate change. To address these challenges, we need to understand the causes and consequences of long-run environmental change. To understand long-run environmental change, we must be able to measure it. Satellite imagery allows us to measure environmental change in recent decades, but researchers have to date been unable to analyze long-run large-scale environmental change further back in time.
In partnership with the Bodleian Library in Oxford and the National Collection of Aerial Photography in Edinburgh, we will digitize, process and make accessible a remarkable archive of 1.6 million aerial photographs from across the developing world, dating back to the late 1930s. In doing so, we will expand the horizon over which data are available to study global long-run environmental change by between 30 and 60 years, or more than 50%. Access to the archive has so far been extremely limited. We will transform the archive into a digitized, merged dataset, accessible online, which will allow researchers to look back in time to the beginning of WWII and the decolonization era. We expect the data to be applicable in fields as diverse as geology, archaeology, ecology, and climatology. In particular, we will use these data to study the relationships between development and the environment.
Final report
Purpose
Following a century of rapidly accelerating global environmental change, the world faces unprecedented environmental challenges, including biodiversity loss; rapid, unplanned urbanization; deforestation; and climate change. To address these challenges, we must understand both the causes and the consequences of long-run environmental change. To understand long-run environmental change, we must first be able to measure it. However, while the widespread availability of satellite imagery has revolutionized the study of environmental change in recent decades, large-scale analysis of environmental change from earlier decades has, to date, been essentially impossible, particularly in regions with limited administrative data. In partnership with the National Collection of Aerial Photography (NCAP) in Edinburgh, we set out to digitize, process, and make accessible a remarkable and almost completely unstudied archive of 1.7 million aerial photographs, from 65 countries across the developing world, collected between the late 1930s and the 1990s by a British government body called the Department of Overseas Surveys. Making this data available expands the existing horizon over which data is available to study long-run environmental change in the developing world by between 30 and 60 years, or more than 50%.
Results so far
The first step of this project was to convert the physical prints into digital images. Despite working under the extraordinary challenges of the COVID pandemic, NCAP successfully developed a novel robotic pipeline to automate the scanning of the archive. We estimate that the robotic pipeline increases the productivity of workers 30-fold. Developing and executing this pipeline required NCAP not only to complete extensive research and development but also to entirely refurbish an industrial workspace to house the robotic pipeline, which operates in a controlled air environment maintained through a system of airlocks. NCAP completed scanning of all 1.7 million images in the full archive in 2023.
The second step of the project is to merge and geolocate these images to create georeferenced mosaics. These mosaics can be matched to and compared with modern satellite imagery to track changes over time. Our team developed a novel method for stitching and georeferencing massive archives of aerial photographs. Our method can reconstruct a mosaic of aerial imagery with no metadata whatsoever, using only the appearance of features in overlapping images to identify matches and tesselate the images into mosaics. Aerial survey photographs overlap by design, which makes this approach feasible. However, our approach is computationally extremely costly when applied in this way. Incorporating information from metadata - such as the geolocation of a subset of images or the approximate layout of the mission, as sketched at the time on hand-drawn sortie plots - allows us to vastly reduce the processing time required. We apply this approach first to the images from Africa, which constitute more than 60% of the archive. The mosaic assembly pipeline continues as of the time of writing this report, but we have to date assembled 108 fully georeferenced mosaics for 12 countries from more than 335,000 individual images.
The final step of the project is to convert the unstructured information contained in the mosaics into structured data that researchers can analyze. We have successfully built a machine learning pipeline that converts raw imagery into predictions, and in turn stitches together these predictions into country-scale datasets using the information obtained during the second step. Supported in part by an RJ program grant, we are developing prediction maps for land use, population density, wealth, and built-up area. Among these projects, the most advanced is our pipeline to detect buildings in cities. We have successfully generated 148 extraordinary detailed maps of buildings in 95 cities in 12 countries throughout the 20th Century, with prediction performance that equals or exceeds the performance of state-of-the-art models for modern high-resolution satellite imagery.
Use of infrastructure and research initiated
The infrastructure we have developed is not yet fully public, but our own team is already developing projects that use the prediction maps we are developing to study the response to long-run climate change, the legacies of colonial rule in Africa, and the drivers of city growth and form. More broadly, we expect these data to have applications across fields as diverse as environmental history, geomorphology, climatology, archaeology, and urbanism.
Unforeseen technical and methodological issues and deviations from the original plan
This project has required the development of entirely novel technologies and approaches at every stage. Before this project, the largest application of aerial photographs to track land-use change used a sample of 1,400 images. Our pilot study in the Caribbean alone used more than 6,000 images, a 4-fold increase. The mosaics that we are in the process of constructing represent a 700-fold increase on the status quo. Advancing this far beyond the technological frontier naturally takes time, and the project was further delayed by the COVID pandemic. We have successfully overcome many challenges in the process, including the creation of a bespoke legal agreement that governs our use of the data.
Accessible infrastructure for the long run
The datasets we are creating include more than 400 TB of raw aerial photography scans and more than 200 TB of mosaics. Storing and maintaining access to these data products is a major undertaking.
The raw aerial photography scans remain, as per our data use agreement, the responsibility of their official custodians, NCAP. NCAP have developed an entirely new website (https://www.ncap.org/) to handle the massive quantities of data created by the at-scale digitization that is made possible by the advances developed under this project. The raw scans will be made available from this website over the coming year. Image footprints generated in step two above form a critical input to facilitating the search for images from a particular area of interest. Since NCAP is a public organization that operates on a cost-recovery basis, users who wish to access the raw scans will be charged a small fee that funds ongoing site maintenance and service.
The georeferenced mosaics will be distributed in several ways. First, Stockholm University has agreed to provide long-run financing for the data to be hosted in perpetuity on Sunet Drive, an Amazon S3 storage service administered by Sunet, part of the Swedish Research Council, with accessible provided through a direct connection between an Open Science Framework (OSF) metadata record and Sunet Drive. Users will be able to download mosaics for specific countries or regions. This approach ensures free and sustainable access to the mosaics in the long run, although it may not be the most user-friendly solution. Second, we are making the data available via Google Earth Engine (GEE). This format allows users to straightforwardly integrate the mosaics into geospatial analysis. A caveat to this is that we are still developing plans for long-run funding of this solution, the requirements for which have evolved over the course of this project. Lastly, we have developed a prototype of a MapBox-style interface that allows casual users and interested members of the public to browse easily through the mosaics and inspect changes across time. This product is still under development. All access to and use of the mosaics will be under a Creative Commons NC-BY license, as agreed in our data use agreement. A finding aid consisting of the footprints of all mosaics will be shared along with the data to help users locate coverage of their areas of interest.
Lastly, the (relatively) modestly-sized derived data products - for example, around 800 MB for the building maps produced so far - will all be made available via Google Earth Engine under Creative Commons BY licenses.
International collaborations
This project is led by Hannah Druckenmiller at California Institute of Technology, Solomon Hsiang at Stanford University, Andreas Madestam and Anna Tompsett at Stockholm University and the Beijer Institute for Ecological Economics, and Allan Williams at NCAP. The team also includes participating researchers from MIT, UC Berkeley, and the University of Gothenburg.
Publications
Noda, E., Huang, L. Y., Chong, T., Jain, S., Madestam, A., Tompsett, A., Druckenmiller, H., & Hsiang, S. (2024, December). A machine-learning pipeline for merging and georeferencing very large archives of historical aerial photographs. In 2024 IEEE International Conference on Big Data (BigData) (pp. 74-83). IEEE.
Masson, S., Potts, A., Williams, A., Berggreen, S., McLaren, K., Martin, S., Noda, E., Nordfors, N., Ruecroft, N., Druckenmiller, H., Hsiang, S., Madestam, A., & Tompsett, A. (2025). A robot-assisted pipeline to rapidly scan 1.7 million historical aerial photographs.
Following a century of rapidly accelerating global environmental change, the world faces unprecedented environmental challenges, including biodiversity loss; rapid, unplanned urbanization; deforestation; and climate change. To address these challenges, we must understand both the causes and the consequences of long-run environmental change. To understand long-run environmental change, we must first be able to measure it. However, while the widespread availability of satellite imagery has revolutionized the study of environmental change in recent decades, large-scale analysis of environmental change from earlier decades has, to date, been essentially impossible, particularly in regions with limited administrative data. In partnership with the National Collection of Aerial Photography (NCAP) in Edinburgh, we set out to digitize, process, and make accessible a remarkable and almost completely unstudied archive of 1.7 million aerial photographs, from 65 countries across the developing world, collected between the late 1930s and the 1990s by a British government body called the Department of Overseas Surveys. Making this data available expands the existing horizon over which data is available to study long-run environmental change in the developing world by between 30 and 60 years, or more than 50%.
Results so far
The first step of this project was to convert the physical prints into digital images. Despite working under the extraordinary challenges of the COVID pandemic, NCAP successfully developed a novel robotic pipeline to automate the scanning of the archive. We estimate that the robotic pipeline increases the productivity of workers 30-fold. Developing and executing this pipeline required NCAP not only to complete extensive research and development but also to entirely refurbish an industrial workspace to house the robotic pipeline, which operates in a controlled air environment maintained through a system of airlocks. NCAP completed scanning of all 1.7 million images in the full archive in 2023.
The second step of the project is to merge and geolocate these images to create georeferenced mosaics. These mosaics can be matched to and compared with modern satellite imagery to track changes over time. Our team developed a novel method for stitching and georeferencing massive archives of aerial photographs. Our method can reconstruct a mosaic of aerial imagery with no metadata whatsoever, using only the appearance of features in overlapping images to identify matches and tesselate the images into mosaics. Aerial survey photographs overlap by design, which makes this approach feasible. However, our approach is computationally extremely costly when applied in this way. Incorporating information from metadata - such as the geolocation of a subset of images or the approximate layout of the mission, as sketched at the time on hand-drawn sortie plots - allows us to vastly reduce the processing time required. We apply this approach first to the images from Africa, which constitute more than 60% of the archive. The mosaic assembly pipeline continues as of the time of writing this report, but we have to date assembled 108 fully georeferenced mosaics for 12 countries from more than 335,000 individual images.
The final step of the project is to convert the unstructured information contained in the mosaics into structured data that researchers can analyze. We have successfully built a machine learning pipeline that converts raw imagery into predictions, and in turn stitches together these predictions into country-scale datasets using the information obtained during the second step. Supported in part by an RJ program grant, we are developing prediction maps for land use, population density, wealth, and built-up area. Among these projects, the most advanced is our pipeline to detect buildings in cities. We have successfully generated 148 extraordinary detailed maps of buildings in 95 cities in 12 countries throughout the 20th Century, with prediction performance that equals or exceeds the performance of state-of-the-art models for modern high-resolution satellite imagery.
Use of infrastructure and research initiated
The infrastructure we have developed is not yet fully public, but our own team is already developing projects that use the prediction maps we are developing to study the response to long-run climate change, the legacies of colonial rule in Africa, and the drivers of city growth and form. More broadly, we expect these data to have applications across fields as diverse as environmental history, geomorphology, climatology, archaeology, and urbanism.
Unforeseen technical and methodological issues and deviations from the original plan
This project has required the development of entirely novel technologies and approaches at every stage. Before this project, the largest application of aerial photographs to track land-use change used a sample of 1,400 images. Our pilot study in the Caribbean alone used more than 6,000 images, a 4-fold increase. The mosaics that we are in the process of constructing represent a 700-fold increase on the status quo. Advancing this far beyond the technological frontier naturally takes time, and the project was further delayed by the COVID pandemic. We have successfully overcome many challenges in the process, including the creation of a bespoke legal agreement that governs our use of the data.
Accessible infrastructure for the long run
The datasets we are creating include more than 400 TB of raw aerial photography scans and more than 200 TB of mosaics. Storing and maintaining access to these data products is a major undertaking.
The raw aerial photography scans remain, as per our data use agreement, the responsibility of their official custodians, NCAP. NCAP have developed an entirely new website (https://www.ncap.org/) to handle the massive quantities of data created by the at-scale digitization that is made possible by the advances developed under this project. The raw scans will be made available from this website over the coming year. Image footprints generated in step two above form a critical input to facilitating the search for images from a particular area of interest. Since NCAP is a public organization that operates on a cost-recovery basis, users who wish to access the raw scans will be charged a small fee that funds ongoing site maintenance and service.
The georeferenced mosaics will be distributed in several ways. First, Stockholm University has agreed to provide long-run financing for the data to be hosted in perpetuity on Sunet Drive, an Amazon S3 storage service administered by Sunet, part of the Swedish Research Council, with accessible provided through a direct connection between an Open Science Framework (OSF) metadata record and Sunet Drive. Users will be able to download mosaics for specific countries or regions. This approach ensures free and sustainable access to the mosaics in the long run, although it may not be the most user-friendly solution. Second, we are making the data available via Google Earth Engine (GEE). This format allows users to straightforwardly integrate the mosaics into geospatial analysis. A caveat to this is that we are still developing plans for long-run funding of this solution, the requirements for which have evolved over the course of this project. Lastly, we have developed a prototype of a MapBox-style interface that allows casual users and interested members of the public to browse easily through the mosaics and inspect changes across time. This product is still under development. All access to and use of the mosaics will be under a Creative Commons NC-BY license, as agreed in our data use agreement. A finding aid consisting of the footprints of all mosaics will be shared along with the data to help users locate coverage of their areas of interest.
Lastly, the (relatively) modestly-sized derived data products - for example, around 800 MB for the building maps produced so far - will all be made available via Google Earth Engine under Creative Commons BY licenses.
International collaborations
This project is led by Hannah Druckenmiller at California Institute of Technology, Solomon Hsiang at Stanford University, Andreas Madestam and Anna Tompsett at Stockholm University and the Beijer Institute for Ecological Economics, and Allan Williams at NCAP. The team also includes participating researchers from MIT, UC Berkeley, and the University of Gothenburg.
Publications
Noda, E., Huang, L. Y., Chong, T., Jain, S., Madestam, A., Tompsett, A., Druckenmiller, H., & Hsiang, S. (2024, December). A machine-learning pipeline for merging and georeferencing very large archives of historical aerial photographs. In 2024 IEEE International Conference on Big Data (BigData) (pp. 74-83). IEEE.
Masson, S., Potts, A., Williams, A., Berggreen, S., McLaren, K., Martin, S., Noda, E., Nordfors, N., Ruecroft, N., Druckenmiller, H., Hsiang, S., Madestam, A., & Tompsett, A. (2025). A robot-assisted pipeline to rapidly scan 1.7 million historical aerial photographs.