[1] M. El Ayadi, M. S. Kamel, and F. Karray, "Survey on speech emotion recognition: Features, classification schemes, and databases," Pattern recognition, vol. 44, no. 3, pp. 572-587, 2011.
[2] E. H. Kim, K. H. Hyun, S. H. Kim, and Y. K. Kwak, "Improved emotion recognition with a novel speaker-independent feature," IEEE/ASME transactions on mechatronics, vol. 14, no. 3, pp. 317-325, 2009.
[3] E. Bozkurt, E. Erzin, C. E. Erdem, and A. T. Erdem, "Formant position based weighted spectral features for emotion recognition," Speech Communication, vol. 53, no. 9-10, pp. 1186-1197, 2011.
[4] C.-N. Anagnostopoulos, T. Iliou, and I. Giannoukos, "Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011," Artificial Intelligence Review, vol. 43, no. 2, pp. 155-177, 2015.
[5] A. Harimi, A. AhmadyFard, A. Shahzadi, and K. Yaghmaie, "Anger or joy? Emotion recognition using nonlinear dynamics of speech," Applied Artificial Intelligence, vol. 29, no. 7, pp. 675-696, 2015.
[6] A. Shahzadi, A. Ahmadyfard, A. Harimi, and K. Yaghmaie, "Speech emotion recognition using nonlinear dynamics features," Turkish Journal of Electrical Engineering & Computer Sciences, vol. 23, 2015.
[7] A. Harimi, H. S. Fakhr, and A. Bakhshi, "Recognition of emotion using reconstructed phase space of speech," Malaysian Journal of Computer Science, vol. 29, no. 4, pp. 262-271, 2016.
[8] A. Bakhshi, A. Harimi, and S. Chalup, "CyTex: Transforming speech to textured images for speech emotion recognition,"
Speech Communication, vol. 139, pp. 62-75, 2022/04/01/ 2022, doi:
https://doi.org/10.1016/j.specom.2022.02.007.
[9] H. Marvi, Z. Esmaileyan, and A. Harimi, "Estimation of LPC coefficients using Evolutionary Algorithms," Journal of AI and Data Mining, vol. 1, no. 2, pp. 111-118, 2013, doi: 10.22044/jadm.2013.115.
[10] A. Harimi, A. Shahzadi, A. Ahmadyfard, and K. Yaghmaie, "Classification of emotional speech using spectral pattern features," Journal of AI and Data Mining, vol. 2, no. 1, pp. 53-61, 2014, doi: 10.22044/jadm.2014.150.
[11] E. Kalhor and B. Bakhtiari, "Multi-Task Feature Selection for Speech Emotion Recognition: Common Speaker-Independent Features Among Emotions," Journal of AI and Data Mining, vol. 9, no. 3, pp. 269-282, 2021, doi: 10.22044/jadm.2021.9800.2118.
[12] B. Schuller, S. Steidl, and A. Batliner, The Interspeech 2009 Emotion Challenge. 2009, pp. 312-315.
[13] B. Schuller, A. Batliner, S. Steidl, F. Schiel, and J. Krajewski, The interspeech 2011 speaker state challenge. 2011, pp. 3201-3204.
[14] J.-C. Lin, C.-H. Wu, and W.-L. Wei, "Error weighted semi-coupled hidden Markov model for audio-visual emotion recognition," IEEE Transactions on Multimedia, vol. 14, no. 1, pp. 142-156, 2011.
[15] B. Schuller, G. Rigoll, and M. Lang, "Hidden Markov model-based speech emotion recognition," in 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings.(ICASSP'03). 2003, vol. 2: Ieee, pp. II-1.
[16] M. Bejani, D. Gharavian, and N. M. Charkari, "Audiovisual emotion recognition using ANOVA feature selection method and multi-classifier neural networks," Neural Computing and Applications, vol. 24, no. 2, pp. 399-412, 2014.
[17] J. Nicholson, K. Takahashi, and R. Nakatsu, "Emotion recognition in speech using neural networks," Neural computing & applications, vol. 9, no. 4, pp. 290-296, 2000.
[18] A. Bhavan, P. Chauhan, and R. R. Shah, "Bagged support vector machines for emotion recognition from speech," Knowledge-Based Systems, vol. 184, p. 104886, 2019.
[19] B. Schuller, G. Rigoll, and M. Lang, "Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture," in 2004 IEEE international conference on acoustics, speech, and signal processing, 2004, vol. 1: IEEE, pp. I-577.
[20] Y. Chavhan, M. Dhore, and P. Yesaware, "Speech emotion recognition using support vector machine," International Journal of Computer Applications, vol. 1, no. 20, pp. 6-9, 2010.
[21] T. Zhang, W. Zheng, Z. Cui, Y. Zong, J. Yan, and K. Yan, "A deep neural network-driven feature learning method for multi-view facial expression recognition," IEEE Transactions on Multimedia, vol. 18, no. 12, pp. 2528-2536, 2016.
[22] Z. Huang, M. Dong, Q. Mao, and Y. Zhan, "Speech Emotion Recognition Using CNN," presented at the Proceedings of the 22nd ACM international conference on Multimedia, Orlando, Florida, USA, 2014. [Online]. Available:
https://doi.org/10.1145/2647868.2654984.
[23] Q. Mao, M. Dong, Z. Huang, and Y. Zhan, "Learning Salient Features for Speech Emotion Recognition Using Convolutional Neural Networks," IEEE Transactions on Multimedia, vol. 16, no. 8, pp. 2203-2213, 2014, doi: 10.1109/TMM.2014.2360798.
[24] G. Trigeorgis et al., "Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network," in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 20-25 March 2016 2016, pp. 5200-5204, doi: 10.1109/ICASSP.2016.7472669.
[25] S. Hochreiter and J. Schmidhuber, "Long Short-Term Memory," Neural Computation, vol. 9, no. 8, pp. 1735-1780, 1997, doi: 10.1162/neco.1997.9.8.1735.
[26] D. Guiming, W. Xia, W. Guangyan, Z. Yan, and L. Dan, "Speech recognition based on convolutional neural networks," in 2016 IEEE International Conference on Signal and Image Processing (ICSIP), 13-15 Aug. 2016 2016, pp. 708-711, doi: 10.1109/SIPROCESS.2016.7888355.
[27] Z. Huang, M. Dong, Q. Mao, and Y. Zhan, "Speech emotion recognition using CNN," in Proceedings of the 22nd ACM international conference on Multimedia, 2014, pp. 801-804.
[28] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet classification with deep convolutional neural networks," Commun. ACM, vol. 60, no. 6, pp. 84–90, 2017, doi: 10.1145/3065386.
[29] K. He, X. Zhang, S. Ren, and J. Sun, "Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition," in Computer Vision – ECCV 2014, Cham, D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, Eds., 2014// 2014: Springer International Publishing, pp. 346-361.
[30] F. Chollet, Deep learning with Python. Manning New York, 2018.
[31] S. Zhang, S. Zhang, T. Huang, and W. Gao, "Speech emotion recognition using deep convolutional neural network and discriminant temporal pyramid matching," IEEE Transactions on Multimedia, vol. 20, no. 6, pp. 1576-1590, 2017.
[32] M. Falahzadeh, F. Farokhi, A. Harimi, and R. Sabbaghi, "Deep Convolutional Neural Network and Gray Wolf Optimization Algorithm for Speech Emotion Recognition," Circuits, Systems, and Signal Processing, pp. 1-44, 08/25 2022, doi: 10.1007/s00034-022-02130-3.
[33] S. Jothimani and K. Premalatha, "MFF-SAug: Multi feature fusion with spectrogram augmentation of speech emotion recognition using convolution neural network,"
Chaos, Solitons & Fractals, vol. 162, p. 112512, 2022/09/01/ 2022, doi:
https://doi.org/10.1016/j.chaos.2022.112512.
[34] X. Xu, D. Li, Y. Zhou, and Z. Wang, "Multi-type features separating fusion learning for Speech Emotion Recognition,"
Applied Soft Computing, vol. 130, p. 109648, 2022/11/01/ 2022, doi:
https://doi.org/10.1016/j.asoc.2022.109648.
[35] K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 27-30 June 2016 2016, pp. 770-778, doi: 10.1109/CVPR.2016.90.
[36] F. Burkhardt, A. Paeschke, M. Rolfes, W. F. Sendlmeier, and B. Weiss, "A database of German emotional speech," in Ninth european conference on speech communication and technology, 2005.
[37] S. M S, A. Elampulakkadu, T. Deepa, C. Shameema, and S. Rajan, Emotion recognition from audio signals using Support Vector Machine. 2015, pp. 139-144.
[38] S. Kanwal and S. Asghar, "Speech Emotion Recognition Using Clustering Based GA-Optimized Feature Set," IEEE Access, vol. 9, pp. 125830-125842, 2021, doi: 10.1109/ACCESS.2021.3111659.
[39] L. Zão, D. Cavalcante, and R. Coelho, "Time-Frequency Feature and AMS-GMM Mask for Acoustic Emotion Classification," IEEE Signal Processing Letters, vol. 21, no. 5, pp. 620-624, 2014, doi: 10.1109/LSP.2014.2311435.
[40] H. Tao, R. Liang, C. Zha, X. Zhang, and L. Zhao, "Spectral Features Based on Local Hu Moments of Gabor Spectrograms for Speech Emotion Recognition," IEICE Transactions on Information and Systems, vol. E99.D, no. 8, pp. 2186-2189, 2016, doi: 10.1587/transinf.2015EDL8258.
[41] M. Lech, M. N. Stolar, C. Best, and R. S. Bolia, "Real-Time Speech Emotion Recognition Using a Pre-trained Image Classification Network: Effects of Bandwidth Reduction and Companding," in Frontiers in Computer Science, 2020.
[42] S. Sekkate, M. Khalil, A. Abdellah, and S. Jebara, "An Investigation of a Feature-Level Fusion for Noisy Speech Emotion Recognition," Computers, vol. 8, p. 91, 12/13 2019, doi: 10.3390/computers8040091.
[43] L. Kerkeni, Y. Serrestou, M. Mbarki, K. Raoof, and M. A. Mahjoub, "Speech Emotion Recognition: Methods and Cases Study," in ICAART, 2018.
[44] F. Daneshfar and S. J. Kabudian, "Speech emotion recognition using discriminative dimension reduction by employing a modified quantum-behaved particle swarm optimization algorithm," Multimedia Tools Appl., vol. 79, no. 1–2, pp. 1261–1289, 2020, doi: 10.1007/s11042-019-08222-8.
[45] D. Issa, M. F. Demirci, and A. Yazıcı, "Speech emotion recognition with deep convolutional neural networks," Biomed. Signal Process. Control., vol. 59, p. 101894, 2020.
[46] A. Shirani and A. R. N. Nilchi, "Speech Emotion Recognition based on SVM as Both Feature Selector and Classifier," International Journal of Image, Graphics and Signal Processing, vol. 8, pp. 39-45, 2016.
[47] Y. Ü. Sönmez and A. Varol, "A Speech Emotion Recognition Model Based on Multi-Level Local Binary and Local Ternary Patterns," IEEE Access, vol. 8, pp. 190784-190796, 2020, doi: 10.1109/ACCESS.2020.3031763.
[48] M. B. Er, "A Novel Approach for Classification of Speech Emotions Based on Deep and Acoustic Features," IEEE Access, vol. 8, pp. 221640-221653, 2020, doi: 10.1109/ACCESS.2020.3043201.
[49] Z. Zhao et al., "Exploring deep spectrum representations via attention-based recurrent and convolutional neural networks for speech emotion recognition," IEEE Access, vol. 7, pp. 97515-97525, 2019.