Rickard Domeij

SpeakingUp -- Making spoken cultural heritage accessible for research

Speech recordings comprise a seriously underutilized resource of the Swedish memory institutions. Inaccessible speech material conceals a wealth of information of great interest for the humanities and social sciences (HS). The amount of data is huge. Paradoxically, this contributes to the materials not being used: speech is extremely challenging to work with and is unmanageable without appropriate tools.

The overall aim of the project is to make Sweden's archival treasure of recorded speech accessible for HS research. SpeakingUp is conducted by the Institute for Language and Folklore (ISOF), KTH and Digisam.

Speech technology can be used to analyze large volumes of data automatically. Such methods have been successfully applied in HS research using speech recorded expressly for this purpose. Archival materials are recorded in different conditions for other purposes. The project will adapt and develop speech technological methods for analyzing archival material. Through close cooperation between HS researchers and language technologists, we ensure that these methods can be practically useful for research in speech material in memory institutions.

SpeakingUP will contribute to the use of new methods to access and process large amounts of recorded speech. This will open new paths of accessibility to ISOF's collections for researchers and other users. Other resource holders, not least of all Swedish memory institutions with speech archives, will benefit from the results.
Final report
Background, aims and scope

Tilltal (Tillgängligt kulturarv för forskning i tal, ‘Accessible cultural heritage for speech research’) is a multidisciplinary and methodological project that is being undertaken by the Institute for Language and Folklore, KTH Royal Institute of Technology and the Swedish National Archives. The overall goal of the project is to make Sweden’s archival treasure of recorded speech more accessible for humanities and social science research. To this end, the Tilltal project has applied existing speech and language technology methods and adapted them to archive materials. In this report we briefly present the project and its results.

Speech recordings represent a seriously underutilized resource of the Swedish memory institutions. These materials conceal a wealth of information of great interest for the humanities and social sciences (HS). The amount of data is huge; the archives of the Institute for Language and Folklore alone contain around 25,000 hours of recorded speech. Paradoxically, this contributes to material of this kind not being used: speech is extremely challenging to work with and is unmanageable without appropriate tools.

Speech technology can be used to analyze large volumes of data automatically. Such methods have been successfully applied in HS research using speech recorded expressly for this purpose. Archival materials, on the other hand, are recorded with completely different prerequisites and goals, and with considerable variation in terms of sound quality, content and other parameters that are important from a speech technology point of view.

Research design and methodological issues

The project comprises three case studies and one user study. In the case studies, three research agendas from different fields and aimed at different types of speech analysis are being pursued: Case 1: From personal experience narratives to cultural heritage builds on ethnology; Case 2: Linguistic variation in time and space is predominantly sociolinguistic; and Case 3: Interaction patterns over time and type of conversation extends previous work within interaction analysis.

Lastly, the user study of the Tilltal project is an overarching investigation and design process involving all three case studies. It has two parts. The first seeks to understand and describe the bigger picture as regards (using tools for) collecting, processing and making spoken narratives available for research at the Institute for Language and Folklore. The second part of the user study applies use case analysis to collected data. Considering the researchers’ needs, digital solutions have been suggested and tried in practice. Through close cooperation between HS researchers and language technologists, we hope to ensure that the methods can be of practical use for research.

Technical and methodological issues

Automatic speech recognition (ASR) has developed rapidly in recent years, but it is still a long way from managing heavily non-standard speech from a great number of different speakers, often recorded in less than optimal conditions. In other words, the vast majority of the audio materials cannot be automatically transcribed as we found out. On the other hand, there i other acoustic information that can be of interest, and archives contain a wide range of information in written form, including descriptions of recording situations, annotations and manual transcripts. These written data sources provide further possible pathways into the speech data.

By focusing in depth on archive materials for the purpose of making them more accessible, we have discovered a number of difficulties that we did not foresee, and that would otherwise not have been known, for example that recordings had been deleted after transcriptions had been made. At the same time, we have discovered great potential gains for research in the humanities and social sciences by continuously evaluating and discussing proposals for digital solutions. This has resulted in important conclusions and a set of promising prototypical tools that we hope will be of use for other projects continuing on this path. Not least, we hope that the work we have done will make it easier for researchers in the future to work with the actual recordings, and not only with transcriptions of them.

How the project has contributed to and been integrated at Isof

Using recordings as starting points, we have looked for written materials associated with them. By means of different methods, we have linked the texts within these materials to the timeline of the recording. In this way, the written materials belonging to a certain recording have been collected in a digitally accessible bunch. Such bunches are valuable tools for research where archive materials are used. For instance, it makes it possible to go straight from a specific annotation in text or a subject topic card to listening to the relevant portion of a recording.

Digital support of this type corresponds well with the wishes expressed by archive researchers for tools that can handle different types of related data resources – recorded interviews, letters, notes and summaries from data collectors and researchers, questionnaires, etc. – as parts of one connected collection, rather than as isolated resources. By bringing together the different material categories of the archive and making them available as digital bunches, the archive’s collections becomes, on the whole, also more transparent and accessible.

We have developed prototypes for, on the one hand, search functions which directs the researcher straight into relevant bits of a recorded interview, and, on the other hand, tools which enables the researcher to explore the other materials while listening. We have begun outlining what this might look like, using annotation tools such as ELAN and by developing prototypical tools. Some of the digital tools, pre-existing or designed within the project, are described below. There is ongoing work at Isof developing and integrating these tools in graphical user interfaces.

Important results in the form of digital solutions

In the project, larger-scale trials with automatic alignment have been made on archive materials, linking brief text descriptions of the contents or subject topic into sections of audio recordings so that the right portion of audio can be accessed directly via the text records, which in turn can be found by searching. This has been done for parts of the subject topic catalogue for the archived recordings which have time stamps allowing automatic linking. Another prototype tool has been designed for researchers to link annotations of time in their document to the correct parts of a recording under study, allowing them to simply click on the time stamp to re-listen to the audio that an annotation concerns.

The audio browser Edyson, developed within the project, also produces timestamped annotations that may serve in the same way. It makes it possible to add other information about a recording, such as laughter, or marking sections with fast or otherwise intensive exchange. Edyson is a web-based framework for browsing and annotating large amounts of speech and audio data, developed within the scope of TillTal. It is based on the idea of deconstructing an audio file into equally sized snippets of short duration. Given a set of these short sounds one could rearrange them, and as such listen to them, in any order or manner one wants.

The reason for using Edyson is at least twofold. First, it is an appropriate method for browsing some audio quickly and as such a way for researchers to gain insight into the nature of their data. This is a task that might seem trivial at first, but it is often challenging given the large size of modern audio collections such as those at the Institute of Language and Folklore in Uppsala. It is entirely conceivable that a lot, if not most, of these data, are not properly labeled. As is the case for other speech archives and audio collections that are potentially even bigger. Edyson allows for fast and efficient browsing of audio which greatly facilitates many tasks within research and audio analysis. Secondly, Edyson can also be used for annotation. This functionality serves to provide the user with a basic set of labels of their findings, that for instance could be refined in further analysis.

Interdisciplinary cooperation and dissemination of results

From the earliest planning stages of the Tilltal project, it was clear that the interdisciplinary aspects of the project needed to be taken seriously. We expected that collaboration between participants from such different academic traditions would entail some difficulties in communication, or at least require mutual readjustments. Several steps have been taken to ensure that all participants are on the same page. There have been frequent meetings to make sure everyone is up to speed with what is going on, often with a subset of participants but several times a year all project members get together. We have also held several offsite workshops of a couple of days, in camp school spirit, which have moved the project forward significantly. The workshops have allowed project members to meet for longer continuous time blocks to present work in progress and to jointly explore archive data, with ample time for discussions and reflections.

The project and its results have been disseminated in many ways on the web, in presentations, papers and reports as can be seen in the publication list. The results will be managed and further developed by the National Language Bank at Isof, KTH and the University of Göteborg.
Grant administrator
The Institute for Language and Folklore (SOFI)
Reference number
SAF16-0917:1
Amount
SEK 9,771,000.00
Funding
Collections and Research
Subject
Language Technology (Computational Linguistics)
Year
2016