The Swedish Sign Language Corpus
Wide-ranging corpus work is indispensable both to dictionary work in Swedish Sign Language and to extended research into the structure and use of sign language. The aim of the project is to begin to create the necessary conditions for long-term work on a Swedish Sign Language corpus, thus providing the prerequisites for corpus based studies on the vocabulary and grammar of the language.
The corpus project entails filming and documentation of various kinds of sign language discourse, produced by deaf signers. The recorded material will be annotated with the annotation tool ELAN, which makes it possible to link texts to video sequences. The whole corpus will be accessible to researchers and teaching staff of the Sign Language Section at the Department of Linguistics, Stockholm University, whereas parts of it will be freely accessible for use in e.g. Sign Language courses and the training of sign language interpreters.
Johanna Mesch, Institutionen för lingvistik, Stockholms universitet
2009-2011
The aim of the project The Swedish Sign Language Corpus was to construct a corpus database for the Swedish Sign Language. The project included recordings and documentation of sign language materials from deaf native or near-native users of Swedish Sign Language.
The recorded material was annotated and transcribed in ELAN (EUDICO Linguistic Annotator), an annotation tool that is freely available for downloading from the Max Planck Institute for Psycholinguistics, Nijmegen (http://www-lat-mpi.eu/tools/elan/). This annotation tool is frequently used by Sign Language researcher all over the world. The tool is used for annotations of recorded material and linking transcriptions to digitalized video (and audio) materials.
Databases of corpora in many different signed languages will be more easily available through the MPI Language Archive (http://corpus1.mpi.nl/ds/imdi_browser/) or by the universities' own web portals.
The Swedish Sign Language corpus material will be freely available on a web portal for use in research, sign language teaching and in sign language lexicography.
There were no greater changes of the project aims; the project work mainly followed the plan except for metadata description and for publication of the corpus material. The annotation work took more time than expected as it involved the development of transcription conventions, solving lemma-based problems for words, time consuming annotation work (manual transcription of every expression and phrase). In accordance with the project plan, only rough annotations with glosses and Swedish translations were done. The annotation work was very time consuming, it takes about 1 ½ hours to annotate a video sequence of 1 minute. It takes about 1 hour to annotate a Swedish translation of the same video sequence.
Only about 15% of the recorded material has been annotated with glosses and a Swedish translation, controlled and approved by the project leader.
For the following, the project went according to plan:
- technical solutions concerning annotations,
- recordings,
- editing of recorded materials,
- creating a manual for transcription conventions,
- transcriptions of sign language recordings
- categorizing the material
The collected data consists of recordings of 42 native and near native signers of Swedish Sign Language (40 deaf and 2 hearing signers), women and men aged between 20 to 82 years from the three Swedish regions Götaland, Svealand and Norrland.
The Sign Language material in the corpus database consists of media files with some annotations (glosses and Swedish translations, synchronized with moving pictures). To be able to use the Sign Language Corpus, the annotation tool ELAN is used (current version 4.3.3, June 2012). The ELAN software is constructed especially for analysis of spoken language, gestures and sign language. The tool supports e.g. video with annotations, time codes of annotations to synchronize with video, links between annotations, unlimited number of annotation tiers, export/import of annotation files as text files.
As for the transcription convention, mostly glosses and tiers have been discussed and developed. New knowledge has developed during the annotation work, e.g. concerning lexical and stylistic variation. The films recorded with cameras placed in the ceiling above each signer are valuable aids in giving visual information on how the hands move in the space in front of the signer. The aim is to be able to use the transcription conventions in further annotation work; looking for a specific sign, finding frequency information for different signs and sign combinations, something that requires a large resource of sign language materials. We gather new knowledge of the Sign Language construction, the Swedish Sign Language lexicon, and how signs are use in different contexts, e.g. dialogues and elicited stories.
There is a worldwide interest in the Swedish Sign language Corpus work, as was evident at the latest conferences and workshops during 2009 - 2010. Valuable contacts have been mad with Radboud University Nijmegen, DCAL Research Centre University of London, Macquairie University and Hamburg University. Researchers in Norway and Finland have been in touch with us for contact and support for project applications. New research ideas have arisen thanks to the annotation work and the search possibilities with the annotation tool ELAN. During the project, Johanna Mesch was a member of the steering committee of the European network Sign Linguistics Corpora Network, SLCN, (2008-2010), financed by the Netherlands Organisation for Scientific Research, NWO. She was also a member of the organising and programme committee of the 5th Workshop on the Representation and Processing of Sign Languages as a satellite to the Language Resources and Evaluation Conference, LREC in Istanbul, May 2012.
New research questions
New research questions have arisen within the project. These will be implemented in the ongoing research of the Swedish Sign Language. There are potential ideas about how to use the Swedish Sign Language Corpus as a language resource in language teaching and in the development of the Swedish Sign Language Lexicon (2008 - online. Available at http://www.ling.su.se/teckensprakslexikon)
Part of the corpus material was used by Carl Börstell, in his master thesis "Revisiting reduplication. Toward a description of reduplication in predicative signs in Swedish Sign Language" (2011), and also by Unn Thofelt in her Bachelor thesis "Något om den konstruerade dialogen i svenskt teckenspråk" (2011). After their talk at the Workshop in Göttingen, February 24th, 2011, Johanna Mesch, Anna-Lena Nilsson and Lars Wallin have submitted an article about manual feedback in eight conversations in Sign Language. More corpus based studies are planned, e.g. studies on mouth and hand in collaboration, that will be presented by Lars Wallin at a plenary session at the TILSR conference in London, July 2013.
Results from the project in the form of video files and annotated files will gradually be available for other researchers and teachers. Further aim of the project is also to publish the Swedish Sign Language Corpus via a web portal with user friendly interface.
The project has also been presented at a number of places:
- Web pages of the Department of Linguistics, Stockholm University: http://www.ling.su.se/forskning/forskningsprojekt/teckensprak/teckensprakskorpus/korpus-for-det-svenska-teckenspraket-1.67853
- Stockholm University Study Visit Day, March 26, 2009
at the Congress of Sveriges Dövas Riksförbund in Leksand June 12-14, 2009
- Dövas Dag in Örebro, September 18-19 2009, at Jönköping in September 2010 and in Malmö September 17, 2011
- Theme Day on Swedish Sign Language Research 40 Years, March 31, 2012, at Stockholm University, http://www.ling.su.se/om-oss/evenemang/webbfilmer/teckenspraksforskning-40-ar/svensk-teckenspraksforskning-40-ar-1.83493
- Courses in Swedish Sign Language, Corpus Linguistics, and General Linguistics at the Department of Linguistics
- International seminars and conferences, see List of Publications and Conference Papers