Kh. Aghajani
Abstract
Emotion recognition has several applications in various fields, including human-computer interactions. In recent years, various methods have been proposed to recognize emotion using facial or speech information. While the fusion of these two has been paid less attention in emotion recognition. In this ...
Read More
Emotion recognition has several applications in various fields, including human-computer interactions. In recent years, various methods have been proposed to recognize emotion using facial or speech information. While the fusion of these two has been paid less attention in emotion recognition. In this paper, first of all, the use of only face or speech information in emotion recognition is examined. For emotion recognition through speech, a pre-trained network called YAMNet is used to extract features. After passing through a convolutional neural network (CNN), the extracted features are then fed into a bi-LSTM with an attention mechanism to perform the recognition. For emotion recognition through facial information, a deep CNN-based model has been proposed. Finally, after reviewing these two approaches, an emotion detection framework based on the fusion of these two models is proposed. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), containing videos taken from 24 actors (12 men and 12 women) with 8 categories has been used to evaluate the proposed model. The results of the implementation show that the combination of the face and speech information improves the performance of the emotion recognizer.