H.6.5.13. Signal processing
Zeynab Mohammadpoory; Mahda Nasrollahzadeh; Sakineh Asadi
Abstract
Nowadays, the recognition of emotions using speech signals has gained popularity because of its vast number of applications in different fields such as medicine, online marketing, online search engines, education systems, criminal investigations, traffic collisions, and more. Many researchers have adopted ...
Read More
Nowadays, the recognition of emotions using speech signals has gained popularity because of its vast number of applications in different fields such as medicine, online marketing, online search engines, education systems, criminal investigations, traffic collisions, and more. Many researchers have adopted different methodologies to improve emotion classification accuracy using speech signals. This study presents a novel time-series-to-graph transformation framework for speech emotion recognition. Speech signals were segmented into overlapping windows, each converted into graphs, from which 16 structural features were extracted. Significant features were then selected via Minimum Redundancy Maximum Relevance (mRMR) and used to train four classifiers: random forest (RF), linear discriminant analysis (LDA), support vector machine (SVM), and k-nearest neighbors (KNN). Finally, a soft-voting ensemble strategy was employed to integrate their predictions, yielding improved classification performance. The proposed method achieved the highest sensitivity, specificity, and accuracy for the SAVEE database: 83.57%, 98.93%, and 98.16%, respectively. Similarly, for the EmoDB database, the highest values were 94.47%, 99.09%, and 98.40%, respectively. We also compared our results with other methods and found that our method outperformed state-of-the-art techniques in emotion classification.