Textless Speech Emotion Conversion using Discrete & Decomposed Representations - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Communication Dans Un Congrès Année : 2022

Textless Speech Emotion Conversion using Discrete & Decomposed Representations

Résumé

Speech emotion conversion is the task of modifying the perceived emotion of a speech utterance while preserving the lexical content and speaker identity. In this study, we cast the problem of emotion conversion as a spoken language translation task. We use a decomposition of the speech signal into discrete learned representations, consisting of phonetic-content units, prosodic features, speaker, and emotion. First, we modify the speech content by translating the phoneticcontent units to a target emotion, and then predict the prosodic features based on these units. Finally, the speech waveform is generated by feeding the predicted representations into a neural vocoder. Such a paradigm allows us to go beyond spectral and parametric changes of the signal, and model non-verbal vocalizations, such as laughter insertion, yawning removal, etc. We demonstrate objectively and subjectively that the proposed method is vastly superior to current approaches and even beats text-based systems in terms of perceived emotion and audio quality. We rigorously evaluate all components of such a complex system and conclude with an extensive model analysis and ablation study to better emphasize the architectural choices, strengths and weaknesses of the proposed method. Samples are available under the following link: https: //speechbot.github.io/emotion.
Fichier principal
Vignette du fichier
2111.07402.pdf (596.59 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03831801 , version 1 (27-10-2022)

Identifiants

Citer

Felix Kreuk, Adam Polyak, Jade Copet, Eugene Kharitonov, Tu-Anh Nguyen, et al.. Textless Speech Emotion Conversion using Discrete & Decomposed Representations. EMNLP 2022, Dec 2022, Abu Dhabi (online), United Arab Emirates. ⟨hal-03831801⟩
46 Consultations
25 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More