Kirk Sullivan

Identification of "imitated" voices: a research project with legal and security applications



Internationally the importance of cooperation between lawyers and linguists including phoneticians is increasingly being realised. This cooperation has resulted in international associations that arrange conferences and publish journals. In Sweden the development has not developed at the same speed as elsewhere in the world, where the expert skills that linguistics and phoneticians have been used in law courts to make judgements about for example, tape-recordings of voices. These expert witness statements can lead to an individual being convicted or not. Our voices are an integral part of our personality and there are good reasons to believe that is impossible for an individual to alter their voices to such an extent that they cannot be identified. The aim of this project is to define and classify the acoustic correlates of a perceptually successful voice imitation, and to uncover the individual features in a voice that would make it possible to identify even those voices that have been disguised/imitated. The results of this study can be of importance not only in the legal context, but also in areas of security where automatic voice identification systems are used. A range of methods are to be used in this project including auditory and acoustics voice analysis and perception tests. An international network of experts with a specialisation in forensic research in England, Germany, Australia and the USA are linked to the project.
Final report

Kirk Sullivan, Umeå University

During the period of the project the Report RPS 2005:2 from the National Police Board was published. This report prescribed when a voice line-up for identification based on voice could be used. This report aims to improve this form of evidence and suggests that phoneticians should be consulted and that the test should be undertaken following the direction of the national police board. The findings of this project should inform police in their use of voice line-ups and earwitness evidence.

The goal of the project was to define and classify acoustic correlates that lead to a perceptually successful voice imitation and to find individual features in the voice that would make it possible to identify disguised and imitated voices. As the project progressed we focused more on features of the listener that lead to poorer detection of imitation and disguise and features of the speaker that also result in poorer identification of the speaker. This approach was taken as a way to delimit the set of features that could be used to identify disguised and imitated voices.

The projects three main outcomes are: (1) dialect is a core feature of speaker identification and one that can easily be used to mislead a human listener; (2) that expectation increases a listener's acceptance of imitation and (3) that isochunks and formant dynamics can successfully be used in machine speaker identification using only small speech segments.

(1) Using both a native bi-dialectal speaker and high quality dialect imitations we found that listeners were unable to recognize the speakers when they used their other native dialect / reverted to their natural dialect. This suggests that it is possible to disguise oneself by using a dialect that one does not usually speak.

The new research questions that arise from this finding are (a) how much of the target dialect needs to be imitated correctly for the speaker to be recognized speaking with the target dialect rather than their own dialect or some dialect that is difficult to place, (b) are there specific features that create a good dialect imitation that permit other aspects to be ignored, (c) can people be recognized if they change the language they are speaking and (d) how important is it that the listeners know the language / are familiar with the dialect being imitated?

(2) An imitation is more successful if the listener's expectations are met. This includes the topic of the imitation as well as the idiosyncratic features of pronunciation. A voice imitation is less frequently accepted by listeners if the topic of the imitation is not something the listener associates with the person whose voice is being imitated. A listener has expectations and expects these to be fulfilled; deviation from the expected makes the listener begin to question the authenticity of what is being heard. This finding was replicated for imitated emotions in our examination of the impact of imitated acoustic emotion on the Decoding of Semantic Emotion.

The new research questions arising from this finding are: (a) can expectation be manipulated experimentally; (b) can a perfect voice imitation overcome an unexpected topic; and (c) how do listeners categorize an unexpected topic spoken by the target speaker?

(3) the identification and separation of speakers and imitation using minimal amounts of data and specific features of speech were investigated using isochunks and spectral moments. An isochunk is a short speech segment that occurs several times within each recording, which is sufficiently long to use spectral moment analysis. Our study produced some promising results of discrimination even in conditions of voice imitation and showed that the method was insensitive to imitation. The use of formant dynamics to discriminate between speakers was also shown to be a useful technique.

The new research questions arising from this finding are: (a) can the method be refined to lead to improved identification in legal cases where voice imitation is suspected and (b) can this approach be combined with approaches using formant dynamics.

The project has two journal articles under peer-review. These two papers summarize and synthesize many of the projects results and will therefore be the major publications arising from the project. The first Eriksson, E.J., F. Schaeffler, M. Sjöström, K.P.H. Sullivan & E. Zetterholm. On the perceptual dominance of dialect. has been submitted to the journal Perception & psychophysics. This paper attempts to fine whether it is possible to ignore perceived dialect when trying to identify a previously heard voice. The paper shows that domain-specific expertise (i.e. being a speaker of the dialect) did not help in identifying a speaker when they altered their voice to another dialect. This finding is unexpected and not that one that is predicted by the change detection literature. It was also found that listeners could not be directed to ignore dialect. This also points to its importance and potential use as form of imitated disguise.

The second paper Eriksson, E.J., K.P.H. Sullivan, E. Zetterholm, P.E. Czigler, Å. Skagerstrand, J. Green & J. van Doorn. Detection of imitated voices, or who are reliable earwitnesses has been submitted to the International Journal of Speech, Language and the Law. This paper summarizes many of the factors that the project has investigated: that effects of expectation, dialect area, gender, age and age-related hearing loss on the detection of imitated voices. The paper shows that expectation is the most important factor, and that age is a secondary factor. Hearing loss when corrected does not make someone a less competent witness. A court should not dismiss an earwitness simply because they are wearing a hearing aid. If the hearing aid results in near-normal hearing the evidence is as valid as the person with no hearing aid. We have plans to investigate the interaction of hearing loss and quality of earwitness evidence in more detail. The paper again unexpectedly shows that dialect background does not impact upon performance.

The two already published papers of importance are: Zetterholm, E. (2007). Detection of speaker characteristics using voice imitation, and Farrús, M., & Eriksson, E. J., Sullivan, Kirk. P. H., & Herndando, J. (2008). Dialect imitations in speaker recognition. These papers illustrate core aspects of the project's goals.

As well as a number of popular science presentations of the project in the national and local media, and at university open days, the project's progress and outcomes have been presented at seminars and in lectures given by member of the research team at North Carolina State University, USA; Chulalongkorn University, Bangkok, Thailand; Institute of Linguistics, National Center for Social Sciences and Humanities, Hanoi, Vietnam; the School of Languages, International Studies, and Tourism, University of Canberra, Australia; the Phonetics lab, Department of linguistics, The University of Melbourne, Australia; Haskins Laboratories, Yale University, New Haven, USA; the Department of Linguistics, Cambridge University, England; the department of Philosophy and Linguistics, Umeå University; the Department of Linguistics and Phonetics/ Centre for Language and Literature, Lund University; the department of Linguistics, Göteborg University, and Malmö Högskola.

Due to the project's team being based in two universities an internal website was used for communication of ideas, documents and planning. This proved a good approach to cooperation as no one was excluded from the development of and decisions made within the project. The internal website was also a good way of keeping track of multi-authored documents and papers. The project team has functioned well both internally and with the international support team. For example, Elisabeth Zetterholm spent a total of six months at Haskins Laboratories, Yale University, New Haven, USA and Erik Eriksson six months at North Carolina State University, Raleigh, USA working with Robert Rodman and his research team. These visits added competence to the project. The competence that Elisabeth gained and brought to the project as a result of her visits came from the project "Imitation: A Tool for Studying Speech Perception" that has been running at Haskins since 1999. The competencies that Erik brought to the project related to the use of isochunks in speaker identification, and emotion and speaker identification. The success of the cooperation with the (inter)national support team is also reflected in frequency they are co-authors of the book chapters, journal articles and conference papers published by the project team.

Submitted and under peer-review (2008-03-27)

Eriksson, E.J., F. Schaeffler, M. Sjöström, K.P.H. Sullivan & E. Zetterholm (submitted) On the perceptual dominance of dialect. Perception & psychophysics

Eriksson, E.J., K.P.H. Sullivan, E. Zetterholm, P.E. Czigler, Å. Skagerstrand, J. Green & J. van Doorn (submitted) Detection of imitated voices, or who are reliable earwitnesses. International Journal of Speech, Language and the Law.

In Press (2008-03-27)

Erikson, E., & Sullivan, K. P. H. (In Press, July 2008). An investigation of the effectiveness of a Swedish glide + vowel segment for speaker discrimination. International Journal of Speech, Language and the Law.

Sjöström, M., E. Eriksson, E. Zetterholm & K. Sullivan (In Press, 2008). A Bidialectal Experiment on Voice Identification. Working Papers 53, Centre for Languages and Literature, Lund University.

2008

Farrús, M., & Eriksson, E. J., Sullivan, Kirk. P. H., & Herndando, J. (2008). Dialect imitations in speaker recognition. In M. T. Turell, J. Circes, & Spassova, M. (Eds.), Proceedings of the 2nd European IAFL Conference on Forensic Linguistics / Language and the Law 2006. (pp. 347- 353). Barcelona, Spain: IULA, Documenta Universitaria.

2007

Eriksson, Erik J. (2007) That voice sounds familiar: factors in speaker recognition, PhD Thesis, Umeå Studies in Cognitive Science 1, Umeå: Department of Philosophy and Linguistics, Umeå University, Sweden.

Erik J. Eriksson, Robert D. Rodman and Robert C. Hubal (2007) Emotions in Speech: Juristic Implications In J. G. Carbonell & J.  Siekmann (Series Eds.) & C. Müller (Vol. Ed.), Lecture Notes in Computer Science / Artificial Intelligence: Vol. 4343. Speaker Classification Volume I: Fundamentals, Features, and Methods, (pp. 152-173). Berlin, Germany: Springer.

Eriksson, E. J., Schaeffler, F., & Sullivan, K. P. H. (2007). Acoustic Impact on Decoding of Semantic Emotion. In J. G. Carbonell & J.  Siekmann (Series Eds.) & C. Müller (Vol. Ed.), Lecture Notes in Computer Science / Artificial Intelligence: Vol. 4441. Speaker Classification Volume II: Selected Projects, (pp. 57–69). Berlin, Germany: Springer.

Eriksson, E.J., & Sullivan, K. P. H. (2007). Dialect recognition in a noisy environment: preliminary data. Fonetik 2007, TMH-QPSR, Dept. of Speech, Music and Hearing, KTH, Stockholm, 50,  101-104. (Proceedings from Fonetik 2007, Stockholm, Sweden.)

Eriksson, E.J., Sullivan, K P H, van Doorn, J,  & Zetterholm, E. (2007, December). Voice imitation and forensic speaker recognition. Paper presented at FSI not CSI:  Perspectives in State-of-the Art Forensic Speaker Recognition, 6-7 December 2007, Sydney, Australia.

Farrús, M., & Eriksson, E. J., Sullivan, Kirk. P. H., & Herndando, J.  (2007, April). Speaker recognition and accents. Paper presented at the Femte Svenska Lingvistikkonferensen [The fifth Swedish Linguistics Conference], 26-27 April 2007, Umeå, Sweden.

Zetterholm, E. (2007). Detection of speaker characteristics using voice imitation. In J. G. Carbonell & J.  Siekmann (Series Eds.) & C. Müller (Vol. Ed.), Lecture Notes in Computer Science / Artificial Intelligence: Vol. 4441. Speaker Classification Volume II: Selected Projects, (pp. 192–205). Berlin, Germany: Springer.

2006

Clermont, F. & E. Zetterholm (2006). F-pattern Analysis of Professional Imitations of “hallå” in three Swedish Dialects. Working Papers/ Lund University Centre for Language and Literature /, General Linguistics / Phonetics 52, 25-28 (Proceedings from Fonetik 2006, Lund, Sweden, June 7-9, 2006)

Sjöström, M., Eriksson. E. J., Zetterholm, E., & Sullivan, K. P. H. (2006). A Switch of Dialect as Disguise. Working Papers/ Lund University Centre for Language and Literature /, General Linguistics / Phonetics 52, 113-116 (Proceedings from Fonetik 2006, Lund, Sweden, June 7-9, 2006)

Zetterholm, E. (2006). Same speaker - different voices. A study of one impersonator and some of his different imitations. Proceedings SST2006, Auckland, New Zealand, Dec 6-8 2006. (pp. 70-76).

Zetterholm, E., Eriksson, E.J., & Sullivan, K. P. H. (2006, July). On sentence content, speaker familiarity and dialect. Paper presented at the Annual Conference of the International Association for Forensic Phonetics and Acoustics, July 23 -25, 2006. Gothenburg. Sweden.

2005

Rodman, Robert D, Eriksson, Erik J &Hubal, Robert (2005, July) Deducing emotions from speech: forensic implications. Paper presented at the  International Associationof Forensic Linguists 7th Biennial Conference on Forensic Linguistics/Language and Law 1st ~ 4th July 2005, Cardiff University, UK

Zetterholm, E. (2005) PhD Abstract: Voice Imitation: A Phonetic Study of Perceptual Illusions and Acoustic Success. The International Journal of Speech Language and the Law, 12, (1): 131-135.

Zetterholm E., D. Elenius, M. Blomberg (2005). A case study of impersonation from a security systems point of view. Working Papers 51: 239-255. Lund: Department of Linguistics, Lund University.

2004

Blomberg, M., Elenius, D., & Zetterholm, E. (2004). Relating acoustic features of a professional impersonator with the score of a speaker verification system. In P. Branderud, & H. Traunmüller (Eds.), Proceedings of FONETIK 2004: The XVIIth Swedish Phonetics Conference, May 26-28, 2004, Stockholm, Sweden (pp. 84-87). Stockholm, Sweden: Department of Linguistics, Stockholm University, Stockholm, Sweden.

Czigler, P. E., Schaeffler, F., Sullivan, K. P. H., & Zetterholm, E. (2004). Imitation und Reduktion. Eine Fallstudie zu den Fähigkeiten eines professionellen Imitators. In M. Jenis, A. Malmqvist, & I. Valfridsson, I. (Eds.), Norden und Süden: Festschrift für Kjell-Åke Forsgren zum 65. Geburtstag, (pp. 51–58). Umeå, Sweden: Umeå universitet.

Eriksson, Erik J.,  Cepeda, Luis F., Rodman, Robert D., McAllister, David F., Bitzer Donald,  & Arroway, Pam (2004) Cross-language speaker identification using spectral moments. In P. Branderud, & H. Traunmüller (Eds.), Proceedings of FONETIK 2004: The XVIIth Swedish Phonetics Conference, May 26-28, 2004, Stockholm, Sweden (pp. 76-79). Stockholm, Sweden: Department of Linguistics, Stockholm University, Stockholm, Sweden.

Eriksson, E., Cepeda, L., Rodman, R. D., McAllister, D., Bitzer, D., Arroway, P., Sullivan, K. P. H., Sjöström, M., Landgren, T., & Zetterholm, E. (2004, July). Can spectral moments have perceptual significance? Paper presented at the International Association for Forensic Phonetics and Acoustics Annual Conference, 28 -31 July 2004, Helsinki, Finland.

Eriksson, E. J., Cepeda, L. F., Rodman, R. D., Sullivan, K. P. H., McAllister, D., Bitzer, D., & Arroway, P. (2004). Robustness of Spectral Moments: a Study using Voice Imitations. In S. Cassidy, F. Cox, R. Mannell, & S. Palethorpe (Eds.), Proceedings of the 10th Australian International Conference on Speech Science and Technology, Macquarie University, Sydney, Australia, December 8-10 (pp. 259 – 264). Canberra, Australia: Australian Speech Science and Technology Association Inc.

Eriksson, E., Green, J., Sjöstrom, M., Sullivan, K. P. H., & Zetterholm, E. (2004) Perceived age:  a distracter for voice disguise and speaker identification? In P. Branderud, & H. Traunmüller (Eds.), Proceedings of FONETIK 2004: The XVIIth Swedish Phonetics Conference, May 26-28, 2004, Stockholm, Sweden (pp. 80-84). Stockholm, Sweden: Department of Linguistics, Stockholm University, Stockholm, Sweden.

Karlsson, F., Zetterholm, E., & Sullivan, K. P. H. (2004). Development of a gender difference in voice onset time. In S. Cassidy, F. Cox, R. Mannell, & S. Palethorpe (Eds.), Proceedings of the 10th Australian International Conference on Speech Science and Technology, Macquarie University, Sydney, Australia, December 8-10 (pp. 316-321) Canberra, Australia: Australian Speech Science and Technology Association Inc.

Torstensson, N., Eriksson, E. J., & Sullivan, K. P. H. (2004). Mimicked accents – Do speakers have similar cognitive prototypes? In S. Cassidy, F. Cox, R. Mannell, & S. Palethorpe (Eds.), Proceedings of the 10th Australian International Conference on Speech Science and Technology, Macquarie University, Sydney, Australia, December 8-10 (pp. 271 – 276). Canberra, Australia: Australian Speech Science and Technology Association Inc

Zetterholm, E., D. Elenius and M. Blomberg (2004). A comparison between human perception and a speaker verification system score of a voice imitation. In S. Cassidy, F. Cox, R. Mannell, & S. Palethorpe (Eds.), Proceedings of the 10th Australian International Conference on Speech Science and Technology, Macquarie University, Sydney, Australia, December 8-10 (pp. 393-397). Canberra, Australia: Australian Speech Science and Technology Association Inc

Zetterholm, E., & Sullivan, K. P. H. (2004, July). One speaker: Two voices – One imitator: Two voices. Paper presented at the Annual Conference of the International Association for Forensic Phonetics and Acoustics, July 28 -31, 2004, Helsinki, Finland.

2003

Eriksson, E., Kügler, F., Sullivan, K. P. H., van Doorn, J., & Zetterholm, E. (2003, June-July). Imitation, line-up selection and semantics. Paper presented at the International Association for Forensic Phonetics Annual Conference, 29 June-2 July 2003, Vienna, Austria.

Eriksson, E., Kügler, F., Sullivan, K. P. H., van Doorn, J., & Zetterholm, E. (2003). Why foil 4? A first look. Reports in Phonetics, Umeå University, PHONUM, 9, 161-164. (Proceedings of FONETIK 2003: The Swedish Phonetics Conference, June 2 – 4 2003, Lövånger, Sweden).

Zetterholm, E., Sullivan, K. P. H., Green, J,  van Doorn, J., & Czigler, P. E. (2003) Who knows Carl Bildt? — and what if you don’t? In Bourlard, H. (Ed.), Proceedings of the 8th European Conference on Speech Communication and Technology (EUROSPEECH 2003 - INTERSPEECH 2003), Geneva, Switzerland, September 1-4, 2003, Volume 4 (pp. 2633-2636). Bonn: International Speech Communication Association (ISCA)

Zetterholm, E., Sullivan, K. P. H., Green, J, Eriksson, E., & Czigler, P. E.  (2003). Imitation, expectation and acceptance: The role of age and first language in a Nordic setting. In M.-J. Solé, D. Recasens, & J. Romero (Eds.), Proceedings of the 15th International Congress of Phonetic Sciences. Barcelona, Spain, August 3-9 2003 (pp. 683-686). Adelaide, Australia: Casual Publications.

Zetterholm, E. (2003). Voice Imitation. A phonetic study of perceptual illusions and acoustic success. Dissertation. Travaux de l’institut de linguistic de Lund 44, Department of Linguistics and Phonetics, Lund University.

Zetterholm, E. (2003) The same but different – three impersonators imitate the same target voices. Proceedings of the 15th International Congress of Phonetic Sciences: 2205-2208, Barcelona, August 3-9 2003.









 

Grant administrator
Umeå University
Reference number
K2002-1121:1
Amount
SEK 2,000,000
Funding
Humanities and Social Sciences Donation
Subject
Other Social Sciences
Year
2002