Hans Ljungberg

Analysis and Character Recognition of Tables Digitised by Statistics Sweden

Statistics Sweden (SCB) and its predecessors have been producing statistics for a long time in an international perspective. There has been an ever more systematic and comprehensive statistical production from the 19th century onwards. The web has led to a demand to digitise the older statistics only available in print to make it more accessible. Through studies of the older statistics we can learn about our history and our place in it. Most of the older statistics produced by SCB have been digitised and published on the web. The digitised versions have navigational aids such as bookmarks, to the lowest level of heading, linked lists of contents, and searchable text. A pre-study had shown that, at the time, table recognition was not sufficiently accurate. Therefore all the tables in the pdf files are only available as images, and not easy to transfer to spreadsheets, a desired feature for scholars who want to do their own calculations on the data. A new SCB pre-study notices there now are better possibilities to recognize older tables. The aim of this project is to develop an environment for table recognition and to publish the tables of the already digitised material. Starting with the newer, more easily recognized material, the components of the recognition process becomes gradually refined as older and more difficult material become analysed and recognized. The results of the project are recognized and published tables, and a set of table recognition conventions.
Final report
Se "Vetenskaplig redov (sv).
Grant administrator
Statistics Sweden
Reference number
IN14-0337:1
Amount
SEK 2,500,000
Funding
RJ Infrastructure for research
Subject
Other Social Sciences not elsewhere specified
Year
2014