The mirror to our soul: Comparisons of spontaneous and posed vocal expressions of emotion
The mirror to our soul: Comparisons of spontaneous and posed vocal expressions of emotion
Practically every day of our lives, we make inferences about other people’s emotions, based on how their voices sound. But can spontaneous vocal expressions convey specific emotions in real life or is it only when actors pose emotions that they are given a unique voice profile?
The aim of the project was to explore similarities and differences between spontaneous and posed expressions in the voice, by means of collection of voice samples, listening tests, and acoustic analyses. We have addressed questions 1-6 in the project plan in seven studies that are currently being published.
Study 1: Evaluation of a new database
Only a few studies have compared spontaneous and posed expression, and the findings have been inconsistent. This can at least partly be explained by the small voice samples used. The initial work in the project was thus devoted to collecting a more representative voice sample. By collaborating with European researchers, we were able to gain access to voice recordings from 22 separate datasets. This resulted in a large database featuring 1746 recordings in five different languages that could be used in our studies.
To evaluate the database, we conducted a study in which all voice clips were rated by speech experts and lay people regarding various aspects. Comparisons showed that spontaneous clips overall had (a) lower emotion intensity, (b) more positive affect, (c) more verbal cues (which may reveal an emotion), and (d) worse sound quality than posed clips. Results from the study were used to select suitable stimuli for the following studies.
Study 2: Can listeners discriminate between spontaneous and posed expressions?
Can a listener really hear the difference between genuine and "fake" emotions in the voice? Previous studies have not offered an unequivocal answer to this question (question 4 in the project plan) because they have not controlled for differences in emotion intensity. (Studies of posed expression have mostly focused on high intensity, whereas studies of spontaneous expression have focused on low intensity.)
What are the consequences of this for an attempt to find a unique voice profile for emotions? Plutchik (1994) has proposed a model of the structure of emotions, in terms of a cone turned up-side-down. This model implies that different emotions of a low intensity are more similar to each other than are different emotions of a high intensity. Thus, to make a fair comparison of spontaneous and posed expressions, we need to control for emotion intensity.
This was done in an experimental study. First, we randomly sampled spontaneous and posed expressions with three levels of intensity (low-medium-high), with equal numbers of clips in each category. (All clips were taken from the previously mentioned database.) Then listeners were asked to rate the extent to which each expression was spontaneous ("genuine"). Results were rather clear: Regardless of the intensity, spontaneous clips were generally rated as more "genuine" than were posed clips.
Study 3: Are there differences in acoustic patterns?
Because listeners were able to distinguish between spontaneous and posed expressions, we assumed that these had different acoustic patterns (question 3). Thus, we analyzed the clips with regard to 88 parameters. To avoid a statistical fishing expedition, the parameters were reduced to 13 distinct voice measures via factor analysis. The effect of greatest interest was the interaction between type of expression (spont/posed) and type of emotion (e.g. sadness): This shows if spontaneous and posed clips express emotions by different patterns.
Results showed that the interaction was significant for ca. 20-40% of the acoustic measures (depending on intensity). Differences between spontaneous and posed clips could mainly be found in measures of pitch, voice intensity and speech rate. Analyses also revealed a greater variability in measures between emotions for high-intensity clips than for low-intensity clips (question 2).
To explore which voice features are perceived as spontaneous by listeners (question 6), we computed correlations among acoustic parameters and listener ratings in Study 2. Listeners’ judgments of expressions as "genuine" were correlated with (high) voice intensity and (fast) speech rate. We also found that a tense voice quality with much high-frequency energy was perceived as more "genuinely" emotional.
Study 4: Can the discrimination occur implicitly?
The above studies showed that spontaneous and posed expressions could be distinguished by perceptual and acoustic measures. We wanted to explore whether such discrimination can be manifested at the physiological level also (a follow-up question, not in the project plan). We selected voice clips featuring negative emotions from the database and created two blocks of clips that were matched with respect to intensity, emotions, etc. The only difference was that one block consisted of spontaneous clips, the other of posed clips.
Listeners were required to rate the gender of the speaker in each voice clip. Meanwhile, we measured the listeners’ skin conductance, continuous blood pressure, and puls rate. Results revealed significantly higher arousal levels in the participants when exposed to spontaneous clips than when exposed to posed clips, even though the listeners had not been instructed to focus on the expression.
Study 5: Can spontaneous expressions convey specific emotions?
Previous studies of spontaneous expression have failed to find distinct categories of emotion in perceptual and acoustic measures, perhaps because they have used clips with low intensity. Can spontaneous clips with high intensity convey specific emotions better?
We tested this (question 1) in a study where listeners rated spontaneous expressions with three levels of emotion intensity. Their task was to indicate which emotion (from a list of eight) was expressed by each voice clip. Based on Plutchik’s model, we expected to find a dose-response relationship between the emotion intensity and the discreteness of the perceived emotion. This tendency should be evident as better agreement in ratings for high-intensity clips than for low-intensity clips.
As predicted, our results indicated the highest agreement about the expressed emotion for the high-intensity clips. On average, 72% of the listeners chose to describe the conveyed emotion with the most common response alternative. (This is about six times higher than the level that could be expected by chance with nine response alternatives, 11%). Agreement was lower for expressions with medium (58%) or low intensity (46%).
Study 6: Do mixed emotions occur in the voice?
One further factor that may be important for our comparison is that spontaneous expressions in everyday life might contain "mixed emotions" (e.g., both joy and sadness), whereas posed expressions recorded in laboratories usually contain "pure" emotions (e.g., only joy).
To explore occurrences of "mixed emotions" in vocal expressions (question 5), we conducted a unique field study. The goal was to capture expressions as they occur naturally in everyday life. Participants were given a small, digitial recorder with a discretly placed microphone that they were asked to carry with them for two weeks. By means of "repeated measurement", we were able to collect a representative sample of expressions.
After the recording period, researchers reviewed this material to select episodes that appeared to include expressions of emotions. Participants were invited back to the lab, in order to listen to the selected recordings and report which emotion they experienced in each clip. They could also report "mixed emotions" or that they felt no emotion at all. (To protect the integrity of the participants, they were asked to approve all clips for listening tests.)
Results from the speakers’ own ratings suggested that the most frequently occurring emotions in the recorded material were joy, irritation, sorrow, surprise, contentment and anxiety. About 37% of the clips included "mixed emotions", which indicates that "mixed emotions" are fairly common. Listening tests with naïve listeners revealed that they perceived "mixed emotions" to a roughly similar extent (34%) as the speakers. Moreover, the listeners could for the most part recognize the emotions in accordance with what the speakers had reported. It would seem that spontaneous vocal expressions can convey specific emotions to listeners in a veridical manner - contrary to what many authors have claimed (question 1).
Conclusions from the project
Results from our studies show unequivocally that spontaneous expressions differ from posed expressions, though not in the ways often assumed by researchers. Posed expressions are not more stereotyped than spontaneous ones; neither do they convey specific emotions to a lesser extent. The two types of expression differ with regard to far more subtle nuances in the voice. Our results have implications for the training of psychotherapists, interrogators and actors, for whom the distinction between genuine and feigned emotion is crucial. Our results also raise a critical follow-up question about methods: Are posed vocal expressions sufficiently similar to spontaneous expressions, so as to justify the generalization of findings from studies using the former to real-world situations that involve the latter? This project has also stimulated further follow-up questions: Is it possible to teach a speaker to simulate acoustic features that give an impression of spontaneous emotion? Do our physiological reactions to expressions function as a "somatic marker", which helps us to determine whether other persons are trustworthy? How can the project’s findings be used to enhance future databases? We look forward to discussing such questions and our data with researchers and broader society, once our articles have been published.