Speech Emotion Recognition

Kondaparthy Vaishnavi; Myana Vaishnavi; Dr K. Vaidehi

Speech Emotion Recognition

Author(s): Kondaparthy Vaishnavi, Myana Vaishnavi, Dr K. Vaidehi

Published: 2025

Volume: 6 | Issue: 2 | Pages: 1173-1192

Subject: Engineering

Country: India

License: CC BY 4.0

Full Text (PDF)

Open Access - Free to Download

Download Full Article (PDF)

Alternative download link

Abstract

The goal of speech emotion recognition (SER) is to recognise and categorise emotional states expressed by speech signals, improving applications in healthcare, education, customer service, and human-computer interaction. Recent developments in SER are the main topic of this review, with a special emphasis on deep learning techniques that combine verbal and auditory data. Despite the moderate success of traditional methods based on prosody and hand-crafted features like Mel Frequency Cepstral Coefficients (MFCC), deep learning models like Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks greatly enhance the capacity to capture intricate, emotion-specific speech patterns. Well-known datasets like as IEMOCAP and RAVDESS are frequently used to assess the performance of SER systems, and accuracy metrics are employed to compare the efficacy of different strategies.
Accurately identifying spontaneous emotional speech, which lacks the structured cues seen in performed emotions, and maintaining robustness in a variety of acoustic settings are two of the main hurdles in SER. To solve these problems, recent research has incorporated sophisticated neural architectures, including hybrid CNN-LSTM models and attention mechanisms. Additionally, self-supervised learning techniques have become popular choices for scenarios involving little labelled data and low-resource languages. These models increase the accuracy of emotion categorisation and improve generalisation across languages and speakers by utilising both labelled and unlabelled data.
Another possible direction for future SER research is the integration of multimodal data, such as merging audio with textual or visual information. More complex emotion recognition may be possible with models that use context-aware techniques like sentiment-weighted attention. It is anticipated that SER technology will make a substantial contribution to emotionally aware AI systems as it develops, offering responsive, adaptive interactions that have a deeper comprehension of human emotions.

How to Cite This Article

Kondaparthy Vaishnavi, Myana Vaishnavi, Dr K. Vaidehi (2025). Speech Emotion Recognition . International Journal of Multidisciplinary Research and Growth Evaluation (IJMRGE), 6(2), 1173-1192.

Export Citation:

BibTeX RIS EndNote

Publication Information

Journal: International Journal of Multidisciplinary Research and Growth Evaluation (IJMRGE)

Publisher: Anfo Publication House

ISSN: 2582-7138 (Online)

Frequency: Bimonthly

Language: English

Open Access: Yes - This article is distributed under the terms of the Creative Commons Attribution 4.0 International License

International Journal of Multidisciplinary Research and Growth Evaluation

Speech Emotion Recognition

Full Text (PDF)

Abstract

How to Cite This Article

Publication Information

Share This Article:

Company

Useful Links

Follow Us