Torsten Johansson

Digitization of a selection of old Swedish newspapers

Newspapers have long been regarded as guarantors of democratic, free exchange of ideas and serve as an indispensable source of information for the research community. Unfortunately, the paper they are printed on is far from resilient. Newspapers have been regularly microfilmed since the 1950 s in order to retain their content for future readers.
Microfilmed collections, however, provide limited accessibility. Technology makes possible conversion of a digital image into machine-readable text (OCR), tapping into the potential for entirely new search capabilities and research strategies.
Kungl. biblioteket (KB) and the National Archives began working together in 2010 to develop a model for mass digitization of newspapers, the results upon which KB has designed an interface for accessing the collections.
Researchers require direct access via networks in order to exploit the full potential of the library's collection. However, KB's budgetary constraints limit the scope of current digitization activities to contemporary legal deposit materials. Access via the web is in this case unfortunately restricted due to the copyright legislation that is not congruent with the research community s requirements.
This project aims at digitizing and making available a strategic selection of newspapers from a period where the material is out of copyright, thereby improving the research infrastructure substantially despite current restrictions.
Final report

 1.1.    Digitization of a selection of Swedish historical newspapers IN15-0452: Final report
    Background

    Newspapers have been seen as a warrant of democracy and free exchange of opinions for two centuries and as such they are an indispensable source of information for researchers.

    The paper they were printed on, however, has poor resistance. In order to preserve the information for the future, newspapers have been microfilmed since the 1950s, a solution with a strong potential for preservation but not enhancing the searchability and use of the collections.

    Digital technology has radically changed the situation. The conversion of digital image to machine-readable text (i.e. optical character recognition – OCR) enables free text searches which facilitates the search for specific information as well as the development of new types of research strategies.

    The National Library (KB) and the National Archives (RA) have since 2010 developed a production line for mass digitization of newspapers and KB has designed and implemented a interface for the digitized collections.

    KB:s budget resources only cover the digitization of legal deposit newspapers (from 2014 and onwards). The retrospective digitization of
    newspapers has been financed by external funding.


    Project summary

    The focus of this project has been the digitization of a strategic selection of newspapers for a period when the material is out of copyright, thus supporting the research infrastructure by enabling access to material without legal restrictions.

    KB received a grant of 10 414 268 SEK, which will cover the digitization of approximately 44 newspaper titles for the period 1645-1895.

    The original estimation arrived at approximately 1 200 000 pages. However, this proved to be an underestimation as the exact figure was 1 395 102 pages.

    The digitization was carried out by the RA/Media Conversion Center (MKC) in Fränsta and the work continued until December 2017. A review at the end of December 2017 showed that a total of five years had not been digitized (Post- och inrikes tidningar and Norrköpings tidningar, in total 7 848 pages). This material was digitized in February 2018 and the pages are included in the total sum of pages above.

    The newspapers we have worked with have been of varied quality: from very good to severely degraded. The final average page price was 7,14 SEK.
1.2.    Preparation, registration and delivery of newspapers at KB

    Since the start of the project in January 2016, 1 395 102 pages have been prepared, registered and delivered to the RA/MKC for digitisation.

    KB used duplicates for digitization when available. Legal deposit objects where used when no duplicates existed. Approximately 85% of the digitized newspapers were duplicates.
    During the initial phase of the project, the process of identification and retrieval of duplicates in the KB archive was time consuming. However, in May 2016, a project was initiated to register the entire historical newspaper collection in the library depot, located outside Bålsta north of Stockholm. This initiative has greatly facilitated the latter part of the newspaper digitization project and will also contribute to future initiatives.
1.3.    Access to the digitized newspapers in KB:s interface "tidningar.kb.se"
    The newspapers were made available on tidningar.kb.se as they were digitized, archived and passed through the quality assurance procedure. All newspapers are now accessible via the interface (see [1]).
1.4.    Presentation of the project
    The project and its progress have on several occasions been presented on the KB website, Twitter, Facebook and Mynewsdesk. [2][3][4] Articles covering the project have been published in Biblioteksbladet 2017 [5] and in Släkthistoriskt forum 2017 [6][7][8]. In addition, the Swedish Geneology Association has continually updated its website with news about the project. [9][10][11] Press releases to selected newspapers and other stakeholders have also taken place on two occasions. The progress of the project is also continuously updated in the KB newspaper user forum. [12]

    KB:s ambition to update its stakeholders on how the project develops and the fact that the amount of out of copyright material is constantly increasing can be tracked in the visitor statistics from the search interface, as it has increased from an average of 150 users per day at the start of the project to over 500 unique users per day In the current situation.

    The employees at the KB newspaper section allocate a substantial amount of time for information about the project in different contexts. During 2017 major presentations of the project were carried out at the National Genealogy Days in Halmstad [13] and at the Book Fair in Gothenburg.

1.5.    Has the project resulted in new research collaboration or new research tasks?
    There are several instances where the results from the newspaper digitisation are used for research. Among the examples can be noted  HumLab at Umeå University, where the out of copyright Swedish newspapers from "data.kb.se" in its project " Digital lägg – om pressens gränssnitt 1800", [14]

    Economics researchers Hanna Stenbacka Köhler and Isaiah Hull from the Swedish Riksbank and Hanna Armelius at the Ministry of Finance have used "tidningar.kb.se" in their work to develop an uncertainty index. [15][16]

    We can also note that the number of visitors to the computer terminals in the newspaper reading room in KB, where the digitized newspapers can be accessed, has risen sharply. The number of questions via mail, telephone and social media has also increased significantly as the project has progressed.

    Since January 2018, all Swedish university libraries have the option of signing a license agreement to give researchers and students access the digitized newspaper collections, including the material covered by copyright legislation. So far, 16 libraries have joined. [17]
    This solution is based on the agreement between KB and Bonus Copyright Access, entered in June 2017 and running until 2019-06-30.

    KB also cooperates with Språkbanken (the Swedish language bank) in a research application aiming at developing methods for improving the results of the OCR-process.

    Språkbanken has incorporated all the files produced in the project in their text databases.

1.6.    Financial accounting 2016-01-01-2018-03-31

    NLS has received a funding of 10 414 268 SEK which will cover the digitization of approximately 45 newspaper titles for the period 1645-1895.

    We originally estimated the collection to approximately 1,200,000 pages. However, this proved to be an understatement, and the exact figure was 1 395 102 pages.

    The digitization was carried out by the RA/MKC in Fränsta and continued until December 2017. A review at the end of December 2017 showed that a total of five years had not been digitized (Post- och inrikes tidningar and Norrköpings tidningar, in total 7 848 pages). Riksbankens Jubileumnsfond was informed on this by email on February 5, 2018. The titles were digitized during February and March 2018 and the pages are included in the final page number in the paragraph above.

    The newspaper material we have worked with has been of varied quality: from very good to severely degraded. The final average page price was 7,14 SEK.

    The cost of the work carried out by MKC amounts to the following:

2016    2 594 687 kr
2017    7 314 077 kr
2018 (outstanding 5 years)    48 951 kr
Total    9 957 715 kr

    Out of the funds granted, KB have claimed 10 400 000 SEK and spent 9 957 715 SEK. The remaining 442 285 SEK will be returned to Riksbankens jubileumsfond. KB therefore needs information about which bank and which account number these funds are to be transferred.
1.7.    How will the work be integrated and transmitted in the organization?
    The contribution from Riksbankens jubileumsfond to this project has considerably improved the access to Swedish historical newspapers. A significant part of Sweden's oldest newspapers are now be available for researchers and to the public.

    The digitization of newspapers is today a central part of KB:s operations. Unfortunately, the KB budget does not include regular funding for digitization of historical newspapers. In order to do this, we are constantly seeking contributors and collaborators who can make this happen.

    Digitisation enables access for researchers and the general public to historical sources that reflect changes and developments in society at large. The question of digital infrastructure and access consequently have democratic implications.

    There is a great interest for the digitized newspaper collections on a regional basis, to get access to the local newspapers – a genre that is more difficult to secure funding to digitize and therefore is frequently overlooked. The question of democracy is relevant also in this case,

    KB continuously seeks funding and research cooperation in order to widen the digital collections. Newspapers represent a valuable resource in this respect.


1.8.    Report from NLS to Riksbankens Jubileumsfond?

•    Six months report: 2016-06-13
•    One year report: 2017-01-23
•    Eighteen months report: 2017-07-10
•    Follow-up to the partial report: 2017-07-12
•    Audit reporet: 2017-07-21 (In portal)
•    Results of a post-check in IN15-0452:1: 2018-02-05
•    Digitization of a selection of Swedish historical newspapers IN15-0452: Final report: 2018-04-16





    Torsten Johansson
    Newspaper Division
    National Library of Sweden
    +46 10 7093402
    torsten.johansson@kb.se

    Referencer

1.    Look at Appendix ”RJ, titles”
2.    http://www.kb.se/aktuellt/nyheter/2016/Slaktforska-och-folj-1800-talets-nyhetsrapportering--KBs-soktjanst-for-dagstidningar-vaxer/
3.    http://www.kb.se/aktuellt/nyheter/2017/Annu-mer-historiska-nyheter-i-KBs-onlinetjanst2/
4.    http://www.kb.se/aktuellt/nyheter/2017/En-miljon-fria-tidningssidor-i-KBs-soktjanst/
5.    http://biblioteksbladet.se/skatten-i-kallarhalan/
6.    Svensson, Hanna; ”En skattkista för tidningsälskare”; Släkthistoriskt forum; nr 1, 2017, s 18-21. https://www.genealogi.se/images/shf/SHF-1-17-digitaliserade%20dagstidningar.pdf
7.    Lindström, Christer; ”Tidningarna gav svar på sekelgammal gåta”; Släkthistoriskt forum; nr 1, 2017, s 22. https://www.genealogi.se/images/shf/SHF-1-17-digitaliserade%20dagstidningar.pdf
8.    Söderström; Olle och Svensson, Hanna; ”Upphovsrätt bakom tidningstrasslet”; Släkthistoriskt forum; nr 1, 2017, s 23. https://www.genealogi.se/images/shf/SHF-1-17-digitaliserade%20dagstidningar.pdf
9.    https://www.genealogi.se/om-roetter/nyhetsarkivet/nyheter-2017/123-nyheter/2013/1792-soek-och-finn-bland-hundratusentals-tidningssidor
10.    https://www.genealogi.se/123-nyheter/2013/1874-en-miljon-fria-tidningssidor-digitaliserade-hos-kb
11.    https://www.genealogi.se/123-nyheter/2013/1900-nya-tidningssidor-i-kb-s-digitala-tjaenst
12.    http://feedback.tidningar.kb.se/viewtopic.php?id=84
13.    http://www.sfd2017.se/program/massprogram-och-tider/digitalisering-av-historiska-dagstidningar-pa-kungliga-biblioteket
14.    http://www.humlab.umu.se/sv/forskning-utveckling/paagaaende-projekt/digitala-laegg/
15.    http://www.policyuncertainty.com/sweden_monthly.html
16.    http://www.sciencedirect.com/science/article/pii/S016517651730109X?via%3Dihub
17.    http://feedback.tidningar.kb.se/viewtopic.php?id=113











    Appendix


    RJ, titles

Titel:    Antal nr.    antal sidor    År fr.o.m.    År t.o.m.
ALFWAR OCH SKÄMT    158    670    1842    1843
BAROMETERN    7 430    29 650    1841    1895
BORÅS TIDNING    6 048    24 559    1839    1895
CARLSCRONAS TIDNINGAR    108    436    1761    1764
CARLSCRONAS WEKOBLAD    9 860    40 637    1754    1878
DAGLIGT ALLEHANDA    24 445    161 651    1767    1849
FALKÖPINGS TIDNING    3 670    14 638    1857    1896
GÖTEBORGS HANDELS- OCH SJÖFARTSTIDNING    18 042    83 845    1832    1895
GÖTEBORGSPOSTEN    11 151    46 833    1859    1895
GÖTHEBORGS ALLEHANDA    9 192    38 305    1774    1843
GÖTHEBORGSKA NYHETER    4 349    35 790    1765    1848
HÄRNÖSANDSPOSTEN    6 682    25 788    1842    1895
INRIKES TIDNINGAR    7 399    36 154    1760    1820
JÖNKÖPINGSPOSTEN    3 579    16 075    1865    1895
KARLSHAMNS ALLEHANDA    5 935    22 481    1848    1896
KARLSKRONA WECKOBLAD    2 588    11 109    1879    1895
KRISTIANSTADSBLADET    6 408    25 596    1856    1895
LUNDS WECKOBLAD    8 222    37 257    1775    1897
MALMÖ ALLEHANDA    6 724    27 778    1827    1893
NERIKES ALLEHANDA    6 800    27 520    1844    1895
NORDEN    279    1 326    1856    1861
NORRBOTTENSKURIREN    2 735    11 128    1861    1896
NORRBOTTENSPOSTEN    2 926    11 778    1847    1895
NORRKÖPINGS TIDNINGAR    16 194    69 183    1787    1895
NORRKÖPINGS WECKOTIDNINGAR    1 206    5 522    1758    1786
NORRLÄNDSKA KORRESPONDENTEN    2 432    9 616    1851    1873
NYA DAGLIGT ALLEHANDA    10 971    46 173    1859    1895
NYA KARLSKRONA WECKOBLAD    51    218    1878    1878
NYA WERMLANDSTIDNINGEN    5 287    22 011    1851    1895
NYA WEXJÖBLADET    4 847    20 051    1847    1895
NYTT ALLVAR OCH SKÄMT    445    1 786    1844    1851
NYTT OCH GAMMALT    1 533    13 724    1783    1812
POST- OCH INRIKES TIDNINGAR    21 057    104 446    1821    1895
POSTTIDNINGAR    12 243    62 886    1645    1820
STOCKHOLMS DAGBLAD    22 405    118 683    1824    1895
STOCKHOLMSPOSTEN    16 425    66 005    1778    1833
SUNDSVALLS TIDNING    2 658    10 889    1880    1895
SUNDSVALLS TIDNING NORRLÄNDSKA KORRESPONDENTEN    917    3 740    1873    1879
UMEBLADET    3 431    14 114    1847    1896
UPSALA    6 284    27 061    1845    1895
WERMLANDSTIDNINGEN    395    1 574    1844    1850
VESTMANLANDS LÄNS TIDNING    5 951    24 066    1831    1896
WEXJÖBLADET    2 348    9 580    1810    1855
ÖSTGÖTA CORRESPONDENTEN    7 933    32 770    1838    1895
Totalt    299 743    1 395 102         





Grant administrator
The National Library of Sweden
Reference number
IN15-0452:1
Amount
SEK 10,414,000.00
Funding
RJ Infrastructure for research
Subject
History
Year
2015