Document Type : Original/Review Paper

Authors

1 Faculty of Electrical Engineering, Shahrood University of Technology, Shahrood, Iran.

2 Department of Electrical Engineering, University of Torbat Heydarieh, Torbat Heydarieh, Iran.

3 Department of Computer Engineering, University of Mazandaran, Babolsar, Iran.

Abstract

Nowadays, the recognition of emotions using speech signals has gained popularity because of its vast number of applications in different fields such as medicine, online marketing, online search engines, education systems, criminal investigations, traffic collisions, and more. Many researchers have adopted different methodologies to improve emotion classification accuracy using speech signals. This study presents a novel time-series-to-graph transformation framework for speech emotion recognition. Speech signals were segmented into overlapping windows, each converted into graphs, from which 16 structural features were extracted. Significant features were then selected via Minimum Redundancy Maximum Relevance (mRMR) and used to train four classifiers: random forest (RF), linear discriminant analysis (LDA), support vector machine (SVM), and k-nearest neighbors (KNN). Finally, a soft-voting ensemble strategy was employed to integrate their predictions, yielding improved classification performance. The proposed method achieved the highest sensitivity, specificity, and accuracy for the SAVEE database: 83.57%, 98.93%, and 98.16%, respectively. Similarly, for the EmoDB database, the highest values were 94.47%, 99.09%, and 98.40%, respectively. We also compared our results with other methods and found that our method outperformed state-of-the-art techniques in emotion classification.

Keywords

Main Subjects

[1] M. Wang, H. Ma, Y. Wang, and X. Sun, "Design of smart home system speech emotion recognition model based on ensemble deep learning and feature fusion," Appl. Acoust., vol. 218, p. 109886, 2024.
 
[2] S. P. Mishra, P. Warule, and S. Deb, "Speech emotion recognition using multi resolution Hilbert transform based spectral and entropy features," Appl. Acoust., vol. 229, p. 110403, 2025.
 
[3] H. Wang, Y. Liu, X. Zhen, and X. Tu, "Depression speech recognition with a three-dimensional convolutional network," Front. Hum. Neurosci., vol. 15, p. 713823, 2021.
 
[4] M. Bojanić, V. Delić, and A. Karpov, "Call redistribution for a call center based on speech emotion recognition," Appl. Sci., vol. 10, no. 13, p. 4653, 2020.
 
[5] T. Deschamps-Berger, L. Lamel, and L. Devillers, "End-to-end speech emotion recognition: challenges of real-life emergency call centers data recordings," in 2021 9th Int. Conf. Affective Comput. Intell. Interact. (ACII), pp. 1–8, IEEE, 2021.
 
[6] X. Cai, D. Dai, Z. Wu, X. Li, J. Li, and H. Meng, "Emotion controllable speech synthesis using emotion-unlabeled dataset with the assistance of cross-domain speech emotion recognition," in ICASSP 2021, pp. 5734–5738, IEEE, 2021.
 
[7] M. El Ayadi, M. S. Kamel, and F. Karray, "Survey on speech emotion recognition: Features, classification schemes, and databases," Pattern Recognition., vol. 44, no. 3, pp. 572–587, 2011.
 
[8] J. Kacur, B. Puterka, J. Pavlovicova, and M. Oravec, "On the speech properties and feature extraction methods in speech emotion recognition," Sensors, vol. 21, no. 5, p. 1888, 2021.
 
[9] Z. Mohammadpoory, M. Nasrolahzadeh, S. A. Amiri, and J. Haddadnia, "A Non-invasive Approach for Early Alzheimer’s Detection Through Spontaneous Speech Analysis Using Deep Visibility Graphs," Cogn. Comput., vol. 17, no. 1, pp. 42, 2025.
 
[10] T. M. Wani, T. S. Gunawan, S. A. A. Qadri, M. Kartiwi, and E. Ambikairajah, "A comprehensive review of speech emotion recognition systems," IEEE Access, vol. 9, pp. 47795–47814, 2021.
 
[11] M. Nasrolahzadeh, Z. Mohammadpoory, and J. Haddadnia, "Weighted Visibility Graph-based Deep Complex Network Features: New Diagnostic Spontaneous Speech Markers of Alzheimer's Disease," Physica D, vol. 476, p. 134681, 2025.
 
[12] F. Mohammady, S. Asadi Amiri, and Z. Mohammadpoory, "Leveraging segmentation and visibility graph analysis to enhance motor imagery classification in EEG signals," Cogn. Comput., vol. 18, no. 1, p. 8, Dec. 2026.
 
[13] M. Nasrolahzadeh, Z. Mohammadpoory, and J. Haddadnia, "Indices from visibility graph complexity of spontaneous speech signal: An efficient nonlinear tool for Alzheimer's disease diagnosis," Chaos Solitons Fractals, vol. 174, p. 113829, 2023.
 
[14] L. Lacasa, B. Luque, J. Luque, and J. C. Nuno, "The visibility graph: A new method for estimating the Hurst exponent of fractional Brownian motion," Europhys. Lett., vol. 86, no. 3, p. 30001, 2009.
 
[15] Z. Mohammadpoory, M. Nasrolahzadeh, S. A. Amiri, "Classification of healthy and epileptic seizure EEG signals based on different visibility graph algorithms and EEG time series," Multimed. Tools Appl., vol. 83, pp. 2703–2724, 2024.
 
[16] M. Nasrolahzadeh, Z. Mohammadpoory, and J. Haddadnia, "The visibility graph analysis of heart rate variability during chi meditation and Kundalini Yoga techniques," Healthcare Analytics, vol. 4, p. 100253, 2023.
 
[17] Y. You, C. Cai, and Y. Wu, "3D visibility graph based motion planning and control," in Proc. 5th Int. Conf. Robotics Artif. Intell., pp. 48–53, 2019.
 
[18] T. Varoudis and S. Psarra, "Beyond two dimensions: architecture through three dimensional visibility graph analysis," J. Space Syntax, vol. 5, no. 1, pp. 91–108, 2014.
 
[19] Z. Mohammadpoory, M. Nasrolahzadeh, S. A. Amiri, "Patient-independent epileptic seizure detection using weighted visibility graph features and wavelet decomposition," Multimed. Tools Appl., pp. 1–25, 2025.
 
[20] M. Nasrolahzadeh, Z. Mohammadpoory, and J. Haddadnia, "A novel method for early diagnosis of Alzheimer’s disease based on higher-order spectral estimation of spontaneous speech signals," Cogn. Neurodyn., vol. 10, no. 6, pp. 495–503, 2016.
 
[21] E. Lieskovská, M. Jakubec, R. Jarina, and M. Chmulík, "A review on speech emotion recognition using deep learning and attention mechanism," Electronics, vol. 10, no. 10, p. 1163, 2021.
 
[22] B. Schuller and A. Batliner, Computational Paralinguistics: Emotion, Affect and Personality in Speech and Language Processing. John Wiley & Sons, 2013.
 
[23] M. Papakostas, G. Siantikos, T. Giannakopoulos, E. Spyrou, and D. Sgouropoulos, "Recognizing emotional states using speech information," in GeNeDis 2016: Geriatrics, pp. 155–164, Springer, 2017.
 
[24] T. M. Wani, T. S. Gunawan, S. A. A. Qadri, M. Kartiwi, and E. Ambikairajah, "A comprehensive review of speech emotion recognition systems," IEEE Access, vol. 9, pp. 47795–47814, 2021.
 
[25] K. Tomba, J. Dumoulin, E. Mugellini, O. Abou Khaled, and S. Hawila, "Stress detection through speech analysis," in ICETE 2018, pp. 560–564, 2018.
 
[26] L. Vignolo, H. Rufiner, and D. Milone, "Multi-objective optimisation of wavelet features for phoneme recognition," IET Signal Process., Available: http://digital- library.theiet.org/content/journals/10.1049/iet-spr.
 
[27] K. Aghajani and I. E. Paeen Afrakoti, "Speech emotion recognition using scalogram based deep structure," Int. J. Eng., vol. 33, no. 2, pp. 285–292, 2020.
 
[28] M. El Ayadi, M. S. Kamel, and F. Karray, "Survey on speech emotion recognition: Features, classification schemes, and databases," Pattern Recognit., vol. 44, no. 3, pp. 572–587, 2011.
 
[29] S. Taran, "A nonlinear feature extraction approach for speech emotion recognition using VMD and TKEO," Appl. Acoust., vol. 214, p. 109667, 2023.
 
[30] R. K. Srivastava and D. Pandey, "Speech recognition using HMM and Soft Computing," Mater. Today: Proc., vol. 51, pp. 1878–1883, 2022.
 
[31] J. M. López-Gil and N. Garay-Vitoria, "Assessing the effectiveness of ensembles in Speech Emotion Recognition: Performance analysis under challenging scenarios," Expert Syst. Appl., vol. 243, p. 122905, 2024.
 
[32] Y. Pan, P. Shen, and L. Shen, "Speech emotion recognition using support vector machine," Int. J. Smart Home, vol. 6, no. 2, pp. 101–108, 2012.
 
[33] P. Shegokar and P. Sircar, "Continuous wavelet transform based speech emotion recognition," in 2016 10th Int. Conf. Signal Process. Commun. Syst. (ICSPCS), pp. 1–8, IEEE, 2016.
 
[34] S. S. Poorna, V. Menon, and S. Gopalan, "Hybrid CNN-BiLSTM architecture with multiple attention mechanisms to enhance speech emotion recognition," Biomed. Signal Process. Control, vol. 100, p. 106967, 2025.
 
[35] M. Nasrolahzadeh, Z. Mohammadpoory, "A novel method for distinction heart rate variability during meditation using LSTM recurrent neural networks based on visibility graph," Biomed. Signal Process. Control, vol. 90, p. 105822, 2024.
 
[36] M. Ahmadlou, H. Adeli, and A. Adeli, "New diagnostic EEG markers of the Alzheimer’s disease using visibility graph," J. Neural Transm., vol. 117, pp. 1099–1109, 2010.
 
[37] Z. Mohammadpoory, M. Nasrolahzadeh, and J. Haddadnia, "Epileptic seizure detection in EEGs signals based on the weighted visibility graph entropy," Seizure, vol. 50, pp. 202–208, 2017.
 
[38] Z. Mohammadpoory, M. Nasrolahzadeh, N. Mahmoodian, M. Sayyah, and J. Haddadnia, "Complex network based models of ECoG signals for detection of induced epileptic seizures in rats," Cogn. Neurodyn., vol. 13, pp. 325–339, 2019.
 
[39] M. Nasrolahzadeh, Z. Mohammadpoory, and J. Haddadnia, "Analysis of heart rate signals during meditation using visibility graph complexity," Cogn. Neurodyn., vol. 13, pp. 45–52, 2019.
 
 
[40] Z. Mohammadpoory, M. Nasrolahzadeh, N. Mahmoodian, and J. Haddadnia, "Automatic identification of diabetic retinopathy stages by using fundus images and visibility graph method," Measurement, vol. 140, pp. 133–141, 2019.
 
[41] S. Haq and P. J. B. Jackson, "Speaker-dependent audio-visual emotion recognition," in Proc. Int. Conf. Auditory-Visual Speech Process. (AVSP), pp. 53–58, 2009.
 
[42] F. Burkhardt, A. Paeschke, M. Rolfes, W. F. Sendlmeier, and B. Weiss, "A Database of German Emotional Speech," Proc. Interspeech, Lisbon, Portugal, pp. 1517–1520, 2005.
 
[43] S. Asadi Amiri, M. Nasrolahzadeh, Z. Mohammadpoory, A. Movahedinia, and A. Zare, "A Novel Method for Fish Spoilage Detection based on Fish Eye Images using Deep Convolutional Inception-ResNet-v2," J. AI Data Min., vol. 12, no. 1, pp. 105–113, 2024.
 
[44] R. A. Fisher, "The use of multiple measurements in taxonomic problems," Ann. Eugenics, vol. 7, no. 2, pp. 179–188, 1936.
 
[45] L. Breiman, "Bagging predictors," Mach. Learn., vol. 24, no. 2, pp. 123–140, 1996.
 
[46] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "SMOTE: Synthetic Minority Over-sampling Technique," J. Artif. Intell. Res., vol. 16, pp. 321–357, 2002.
 
[47] S. Moghani, H. Marvi, and Z. Mohammadpoory, “Valvular Heart Disease Classification through Hierarchical Decomposition via Matrix Factorization of Scalogram-Based Phonocardiogram Representations,” Journal of AI and Data Mining, vol. 13, no. 3, pp. 369–378, Jul. 2025.
 
[48] S. P. Mishra, P. Warule, and S. Deb, "Chirplet transform based time frequency analysis of speech signal for automated speech emotion recognition," Speech Commun., vol. 155, p. 102986, 2023.
 
[49] J. Xie, M. Zhu, and K. Hu, "Fusion-based speech emotion classification using two-stage feature selection," Speech Commun., vol. 152, p. 102955, 2023.
 
[50] S. P. Mishra, P. Warule, and S. Deb, "Speech emotion classification using feature-level and classifier-level fusion," Evolv. Syst., vol. 15, no. 2, pp. 541–554, 2024.