Digitization of a selection of old Swedish newspapers
Microfilmed collections, however, provide limited accessibility. Technology makes possible conversion of a digital image into machine-readable text (OCR), tapping into the potential for entirely new search capabilities and research strategies.
Kungl. biblioteket (KB) and the National Archives began working together in 2010 to develop a model for mass digitization of newspapers, the results upon which KB has designed an interface for accessing the collections.
Researchers require direct access via networks in order to exploit the full potential of the library's collection. However, KB's budgetary constraints limit the scope of current digitization activities to contemporary legal deposit materials. Access via the web is in this case unfortunately restricted due to the copyright legislation that is not congruent with the research community s requirements.
This project aims at digitizing and making available a strategic selection of newspapers from a period where the material is out of copyright, thereby improving the research infrastructure substantially despite current restrictions.
1.1. Digitization of a selection of Swedish historical newspapers IN15-0452: Final report
Background
Newspapers have been seen as a warrant of democracy and free exchange of opinions for two centuries and as such they are an indispensable source of information for researchers.
The paper they were printed on, however, has poor resistance. In order to preserve the information for the future, newspapers have been microfilmed since the 1950s, a solution with a strong potential for preservation but not enhancing the searchability and use of the collections.
Digital technology has radically changed the situation. The conversion of digital image to machine-readable text (i.e. optical character recognition – OCR) enables free text searches which facilitates the search for specific information as well as the development of new types of research strategies.
The National Library (KB) and the National Archives (RA) have since 2010 developed a production line for mass digitization of newspapers and KB has designed and implemented a interface for the digitized collections.
KB:s budget resources only cover the digitization of legal deposit newspapers (from 2014 and onwards). The retrospective digitization of
newspapers has been financed by external funding.
Project summary
The focus of this project has been the digitization of a strategic selection of newspapers for a period when the material is out of copyright, thus supporting the research infrastructure by enabling access to material without legal restrictions.
KB received a grant of 10 414 268 SEK, which will cover the digitization of approximately 44 newspaper titles for the period 1645-1895.
The original estimation arrived at approximately 1 200 000 pages. However, this proved to be an underestimation as the exact figure was 1 395 102 pages.
The digitization was carried out by the RA/Media Conversion Center (MKC) in Fränsta and the work continued until December 2017. A review at the end of December 2017 showed that a total of five years had not been digitized (Post- och inrikes tidningar and Norrköpings tidningar, in total 7 848 pages). This material was digitized in February 2018 and the pages are included in the total sum of pages above.
The newspapers we have worked with have been of varied quality: from very good to severely degraded. The final average page price was 7,14 SEK.
1.2. Preparation, registration and delivery of newspapers at KB
Since the start of the project in January 2016, 1 395 102 pages have been prepared, registered and delivered to the RA/MKC for digitisation.
KB used duplicates for digitization when available. Legal deposit objects where used when no duplicates existed. Approximately 85% of the digitized newspapers were duplicates.
During the initial phase of the project, the process of identification and retrieval of duplicates in the KB archive was time consuming. However, in May 2016, a project was initiated to register the entire historical newspaper collection in the library depot, located outside Bålsta north of Stockholm. This initiative has greatly facilitated the latter part of the newspaper digitization project and will also contribute to future initiatives.
1.3. Access to the digitized newspapers in KB:s interface "tidningar.kb.se"
The newspapers were made available on tidningar.kb.se as they were digitized, archived and passed through the quality assurance procedure. All newspapers are now accessible via the interface (see [1]).
1.4. Presentation of the project
The project and its progress have on several occasions been presented on the KB website, Twitter, Facebook and Mynewsdesk. [2][3][4] Articles covering the project have been published in Biblioteksbladet 2017 [5] and in Släkthistoriskt forum 2017 [6][7][8]. In addition, the Swedish Geneology Association has continually updated its website with news about the project. [9][10][11] Press releases to selected newspapers and other stakeholders have also taken place on two occasions. The progress of the project is also continuously updated in the KB newspaper user forum. [12]
KB:s ambition to update its stakeholders on how the project develops and the fact that the amount of out of copyright material is constantly increasing can be tracked in the visitor statistics from the search interface, as it has increased from an average of 150 users per day at the start of the project to over 500 unique users per day In the current situation.
The employees at the KB newspaper section allocate a substantial amount of time for information about the project in different contexts. During 2017 major presentations of the project were carried out at the National Genealogy Days in Halmstad [13] and at the Book Fair in Gothenburg.
1.5. Has the project resulted in new research collaboration or new research tasks?
There are several instances where the results from the newspaper digitisation are used for research. Among the examples can be noted HumLab at Umeå University, where the out of copyright Swedish newspapers from "data.kb.se" in its project " Digital lägg – om pressens gränssnitt 1800", [14]
Economics researchers Hanna Stenbacka Köhler and Isaiah Hull from the Swedish Riksbank and Hanna Armelius at the Ministry of Finance have used "tidningar.kb.se" in their work to develop an uncertainty index. [15][16]
We can also note that the number of visitors to the computer terminals in the newspaper reading room in KB, where the digitized newspapers can be accessed, has risen sharply. The number of questions via mail, telephone and social media has also increased significantly as the project has progressed.
Since January 2018, all Swedish university libraries have the option of signing a license agreement to give researchers and students access the digitized newspaper collections, including the material covered by copyright legislation. So far, 16 libraries have joined. [17]
This solution is based on the agreement between KB and Bonus Copyright Access, entered in June 2017 and running until 2019-06-30.
KB also cooperates with Språkbanken (the Swedish language bank) in a research application aiming at developing methods for improving the results of the OCR-process.
Språkbanken has incorporated all the files produced in the project in their text databases.
1.6. Financial accounting 2016-01-01-2018-03-31
NLS has received a funding of 10 414 268 SEK which will cover the digitization of approximately 45 newspaper titles for the period 1645-1895.
We originally estimated the collection to approximately 1,200,000 pages. However, this proved to be an understatement, and the exact figure was 1 395 102 pages.
The digitization was carried out by the RA/MKC in Fränsta and continued until December 2017. A review at the end of December 2017 showed that a total of five years had not been digitized (Post- och inrikes tidningar and Norrköpings tidningar, in total 7 848 pages). Riksbankens Jubileumnsfond was informed on this by email on February 5, 2018. The titles were digitized during February and March 2018 and the pages are included in the final page number in the paragraph above.
The newspaper material we have worked with has been of varied quality: from very good to severely degraded. The final average page price was 7,14 SEK.
The cost of the work carried out by MKC amounts to the following:
2016 2 594 687 kr
2017 7 314 077 kr
2018 (outstanding 5 years) 48 951 kr
Total 9 957 715 kr
Out of the funds granted, KB have claimed 10 400 000 SEK and spent 9 957 715 SEK. The remaining 442 285 SEK will be returned to Riksbankens jubileumsfond. KB therefore needs information about which bank and which account number these funds are to be transferred.
1.7. How will the work be integrated and transmitted in the organization?
The contribution from Riksbankens jubileumsfond to this project has considerably improved the access to Swedish historical newspapers. A significant part of Sweden's oldest newspapers are now be available for researchers and to the public.
The digitization of newspapers is today a central part of KB:s operations. Unfortunately, the KB budget does not include regular funding for digitization of historical newspapers. In order to do this, we are constantly seeking contributors and collaborators who can make this happen.
Digitisation enables access for researchers and the general public to historical sources that reflect changes and developments in society at large. The question of digital infrastructure and access consequently have democratic implications.
There is a great interest for the digitized newspaper collections on a regional basis, to get access to the local newspapers – a genre that is more difficult to secure funding to digitize and therefore is frequently overlooked. The question of democracy is relevant also in this case,
KB continuously seeks funding and research cooperation in order to widen the digital collections. Newspapers represent a valuable resource in this respect.
1.8. Report from NLS to Riksbankens Jubileumsfond?
• Six months report: 2016-06-13
• One year report: 2017-01-23
• Eighteen months report: 2017-07-10
• Follow-up to the partial report: 2017-07-12
• Audit reporet: 2017-07-21 (In portal)
• Results of a post-check in IN15-0452:1: 2018-02-05
• Digitization of a selection of Swedish historical newspapers IN15-0452: Final report: 2018-04-16
Torsten Johansson
Newspaper Division
National Library of Sweden
+46 10 7093402
torsten.johansson@kb.se
Referencer
1. Look at Appendix ”RJ, titles”
2. http://www.kb.se/aktuellt/nyheter/2016/Slaktforska-och-folj-1800-talets-nyhetsrapportering--KBs-soktjanst-for-dagstidningar-vaxer/
3. http://www.kb.se/aktuellt/nyheter/2017/Annu-mer-historiska-nyheter-i-KBs-onlinetjanst2/
4. http://www.kb.se/aktuellt/nyheter/2017/En-miljon-fria-tidningssidor-i-KBs-soktjanst/
5. http://biblioteksbladet.se/skatten-i-kallarhalan/
6. Svensson, Hanna; ”En skattkista för tidningsälskare”; Släkthistoriskt forum; nr 1, 2017, s 18-21. https://www.genealogi.se/images/shf/SHF-1-17-digitaliserade%20dagstidningar.pdf
7. Lindström, Christer; ”Tidningarna gav svar på sekelgammal gåta”; Släkthistoriskt forum; nr 1, 2017, s 22. https://www.genealogi.se/images/shf/SHF-1-17-digitaliserade%20dagstidningar.pdf
8. Söderström; Olle och Svensson, Hanna; ”Upphovsrätt bakom tidningstrasslet”; Släkthistoriskt forum; nr 1, 2017, s 23. https://www.genealogi.se/images/shf/SHF-1-17-digitaliserade%20dagstidningar.pdf
9. https://www.genealogi.se/om-roetter/nyhetsarkivet/nyheter-2017/123-nyheter/2013/1792-soek-och-finn-bland-hundratusentals-tidningssidor
10. https://www.genealogi.se/123-nyheter/2013/1874-en-miljon-fria-tidningssidor-digitaliserade-hos-kb
11. https://www.genealogi.se/123-nyheter/2013/1900-nya-tidningssidor-i-kb-s-digitala-tjaenst
12. http://feedback.tidningar.kb.se/viewtopic.php?id=84
13. http://www.sfd2017.se/program/massprogram-och-tider/digitalisering-av-historiska-dagstidningar-pa-kungliga-biblioteket
14. http://www.humlab.umu.se/sv/forskning-utveckling/paagaaende-projekt/digitala-laegg/
15. http://www.policyuncertainty.com/sweden_monthly.html
16. http://www.sciencedirect.com/science/article/pii/S016517651730109X?via%3Dihub
17. http://feedback.tidningar.kb.se/viewtopic.php?id=113
Appendix
RJ, titles
Titel: Antal nr. antal sidor År fr.o.m. År t.o.m.
ALFWAR OCH SKÄMT 158 670 1842 1843
BAROMETERN 7 430 29 650 1841 1895
BORÅS TIDNING 6 048 24 559 1839 1895
CARLSCRONAS TIDNINGAR 108 436 1761 1764
CARLSCRONAS WEKOBLAD 9 860 40 637 1754 1878
DAGLIGT ALLEHANDA 24 445 161 651 1767 1849
FALKÖPINGS TIDNING 3 670 14 638 1857 1896
GÖTEBORGS HANDELS- OCH SJÖFARTSTIDNING 18 042 83 845 1832 1895
GÖTEBORGSPOSTEN 11 151 46 833 1859 1895
GÖTHEBORGS ALLEHANDA 9 192 38 305 1774 1843
GÖTHEBORGSKA NYHETER 4 349 35 790 1765 1848
HÄRNÖSANDSPOSTEN 6 682 25 788 1842 1895
INRIKES TIDNINGAR 7 399 36 154 1760 1820
JÖNKÖPINGSPOSTEN 3 579 16 075 1865 1895
KARLSHAMNS ALLEHANDA 5 935 22 481 1848 1896
KARLSKRONA WECKOBLAD 2 588 11 109 1879 1895
KRISTIANSTADSBLADET 6 408 25 596 1856 1895
LUNDS WECKOBLAD 8 222 37 257 1775 1897
MALMÖ ALLEHANDA 6 724 27 778 1827 1893
NERIKES ALLEHANDA 6 800 27 520 1844 1895
NORDEN 279 1 326 1856 1861
NORRBOTTENSKURIREN 2 735 11 128 1861 1896
NORRBOTTENSPOSTEN 2 926 11 778 1847 1895
NORRKÖPINGS TIDNINGAR 16 194 69 183 1787 1895
NORRKÖPINGS WECKOTIDNINGAR 1 206 5 522 1758 1786
NORRLÄNDSKA KORRESPONDENTEN 2 432 9 616 1851 1873
NYA DAGLIGT ALLEHANDA 10 971 46 173 1859 1895
NYA KARLSKRONA WECKOBLAD 51 218 1878 1878
NYA WERMLANDSTIDNINGEN 5 287 22 011 1851 1895
NYA WEXJÖBLADET 4 847 20 051 1847 1895
NYTT ALLVAR OCH SKÄMT 445 1 786 1844 1851
NYTT OCH GAMMALT 1 533 13 724 1783 1812
POST- OCH INRIKES TIDNINGAR 21 057 104 446 1821 1895
POSTTIDNINGAR 12 243 62 886 1645 1820
STOCKHOLMS DAGBLAD 22 405 118 683 1824 1895
STOCKHOLMSPOSTEN 16 425 66 005 1778 1833
SUNDSVALLS TIDNING 2 658 10 889 1880 1895
SUNDSVALLS TIDNING NORRLÄNDSKA KORRESPONDENTEN 917 3 740 1873 1879
UMEBLADET 3 431 14 114 1847 1896
UPSALA 6 284 27 061 1845 1895
WERMLANDSTIDNINGEN 395 1 574 1844 1850
VESTMANLANDS LÄNS TIDNING 5 951 24 066 1831 1896
WEXJÖBLADET 2 348 9 580 1810 1855
ÖSTGÖTA CORRESPONDENTEN 7 933 32 770 1838 1895
Totalt 299 743 1 395 102