D. A. Reynolds, “Speaker identification and verification using Gaussian mixture speaker models,” Speech Commun., vol. 17, no. 1, pp. 91–108, Aug. 1995.
 Z. Wu, N. Evans, T. Kinnunen, J. Yamagishi, F. Alegre, and H. Li, “Spoofing and countermeasures for speaker verification: A survey,” Speech Commun., vol. 66, pp. 130–153, Feb. 2015.
 Yee Wah Lau, M. Wagner, and D. Tran, “Vulnerability of speaker verification to voice mimicking,” in Proceedings of 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, 2004., Oct. 2004, pp. 145–148.
 Z. Wu and H. Li, “Voice conversion and spoofing attack on speaker verification systems,” in 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, Nov. 2013, pp. 1–9.
 Z. Wu, S. Gao, E. S. Cling, and H. Li, “A study on replay attack and anti-spoofing for text-dependent speaker verification,” in Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific, Dec. 2014, pp. 1–5.
 M. Todisco et al., “ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection,” ArXiv190405441 Cs Eess, Apr. 2019.
 Z. Wu et al., “ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge,” presented at the Sixteenth annual conference of the international speech communication association, 2015.
 J. Yamagishi et al., “Asvspoof 2019: The 3rd automatic speaker verification spoofing and countermeasures challenge database,” 2019.
 J. Yamagishi et al., “ASVspoof 2021: accelerating progress in spoofed and deepfake speech detection,” ArXiv210900537 Cs Eess, Sep. 2021
 P. A. Ziabary and H. Veisi, “A Countermeasure Based on CQT Spectrogram for Deepfake Speech Detection,” in 2021 7th International Conference on Signal Processing and Intelligent Systems (ICSPIS), Dec. 2021, pp. 1–5.
 M. Alzantot, Z. Wang, and M. B. Srivastava, “Deep Residual Neural Networks for Audio Spoofing Detection,” in Interspeech 2019, Sep. 2019, pp. 1078–1082. doi: 10.21437/Interspeech.2019-3174.
 A. Gomez-Alanis, A. M. Peinado, J. A. Gonzalez, and A. M. Gomez, “A Light Convolutional GRU-RNN Deep Feature Extractor for ASV Spoofing Detection,” in Interspeech 2019, Sep. 2019, pp. 1068–1072.
 C.-I. Lai, N. Chen, J. Villalba, and N. Dehak, “ASSERT: Anti-Spoofing with Squeeze-Excitation and Residual Networks,” in Interspeech 2019, Sep. 2019, pp. 1013–1017.
 Z. Wu, R. K. Das, J. Yang, and H. Li, “Light Convolutional Neural Network with Feature Genuinization for Detection of Synthetic Speech Attacks,” in Interspeech 2020, Oct. 2020, pp. 1101–1105.
 K. Aghajani, “Audio-visual emotion recognition based on a deep convolutional neural network,” Journal of AI & Data Mining, vol. 10, no. 4, pp. 529–537, Nov. 2022.
 B. Z. Mansouri, H. R. Ghaffary, and A. Harimi, “Speech Emotion Recognition using Enriched Spectrogram and Deep Convolutional Neural Network Transfer Learning,” J. AI Data Min., vol. 10, no. 4, pp. 539–547, Nov. 2022.
 J. C. Brown, “Calculation of a constant Q spectral transform,” J. Acoust. Soc. Am., vol. 89, no. 1, pp. 425–434, Jan. 1991.
 M. Todisco, H. Delgado, and N. Evans, “A New Feature for Automatic Speaker Verification Anti-Spoofing: Constant Q Cepstral Coefficients,” in The Speaker and Language Recognition Workshop (Odyssey 2016), Jun. 2016, pp. 283–290.
 X. Li, X. Wu, H. Lu, X. Liu, and H. Meng, “Channel-wise Gated Res2Net: Towards Robust Detection of Synthetic Speech Attacks,” ArXiv210708803 Cs Eess, Jul. 2021, Accessed: May 02, 2022. [Online]. Available: http://arxiv.org/abs/2107.08803
 Y. Zhang, F. Jiang, and Z. Duan, “One-Class Learning Towards Synthetic Voice Spoofing Detection,” IEEE Signal Process. Lett., vol. 28, pp. 937–941, 2021.
 J. Monteiro, J. Alam, and T. H. Falk, “Generalized end-to-end detection of spoofing attacks to automatic speaker recognizers,” Comput. Speech Lang., vol. 63, p. 101096, Sep. 2020.
 H. Tak, J. Jung, J. Patino, M. Todisco, and N. Evans, “Graph attention networks for anti-spoofing,” ArXiv Prepr. ArXiv210403654, 2021.
 Z. Huang, S. Wang, and K. Yu, “Angular Softmax for Short-Duration Text-independent Speaker Verification.,” presented at the Interspeech, 2018, pp. 3623–3627.
 M. Sahidullah et al., “UIAI System for Short-Duration Speaker Verification Challenge 2020,” in 2021 IEEE Spoken Language Technology Workshop (SLT), Jan. 2021, pp. 323–329.
 S. Wang, Z. Huang, Y. Qian, and K. Yu, “Discriminative Neural Embedding Learning for Short-Duration Text-Independent Speaker Verification,” IEEEACM Trans. Audio Speech Lang. Process., vol. 27, no. 11, pp. 1686–1696, Nov. 2019.
 Y. Jung, Y. Choi, H. Lim, and H. Kim, “A Unified Deep Learning Framework for Short-Duration Speaker Verification in Adverse Environments,” IEEE Access, vol. 8, pp. 175448–175466, 2020.
 M. R. Kamble, H. B. Sailor, H. A. Patil, and H. Li, “Advances in anti-spoofing: from the perspective of ASVspoof challenges,” APSIPA Trans. Signal Inf. Process., vol. 9, p. e2, 2020.
 Z. Wu, X. Xiao, E. S. Chng, and H. Li, “Synthetic speech detection using temporal modulation feature,” in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, May 2013, pp. 7234–7238.
 M. Sahidullah, T. Kinnunen, and C. Hanilci, “A Comparison of Features for Synthetic Speech Detection,” 2015, Accessed: May 02, 2022. [Online]. Available: https://erepo.uef.fi/handle/123456789/4371
 I. Saratxaga, J. Sanchez, Z. Wu, I. Hernaez, and E. Navas, “Synthetic speech detection using phase information,” Phase-Aware Signal Process. Speech Commun., vol. 81, pp. 30–41, Jul. 2016.
 M. Todisco, H. Delgado, and N. W. Evans, “A New Feature for Automatic Speaker Verification Anti-Spoofing: Constant Q Cepstral Coefficients.,” presented at the Odyssey, 2016, vol. 2016, pp. 283–290.
 B. Chettri, D. Stoller, V. Morfi, M. A. M. Ramírez, E. Benetos, and B. L. Sturm, “Ensemble Models for Spoofing Detection in Automatic Speaker Verification,” in Interspeech 2019, Sep. 2019, pp. 1018–1022.
 X. Fang, H. Du, T. Gao, L. Zou, and Z. Ling, “Voice Spoofing Detection with Raw Waveform Based on Dual Path Res2net,” in 5th International Conference on Crowd Science and Engineering, New York, NY, USA, 2021, pp. 160–165.
 M. Pal, A. Raikar, A. Panda, and S. K. Kopparapu, “Synthetic speech detection using meta-learning with prototypical loss,” ArXiv220109470 Cs Eess, Jan. 2022.
 H. Ma, J. Yi, J. Tao, Y. Bai, Z. Tian, and C. Wang, “Continual Learning for Fake Audio Detection,” ArXiv210407286 Cs Eess, Apr. 2021.
 R. Jaiswal, D. Fitzgerald, E. Coyle, and S. Rickard, “Towards Shifted NMF for Improved Monaural Separation,” IET Conf. Proc., pp. 19-19(1), Jan. 2013.
 Z. Weiping, Y. Jiantao, X. Xiaotao, L. Xiangtao, and P. Shaohu, “Acoustic scene classification using deep convolutional neural network and multiple spectrograms fusion,” Detect. Classif. Acoust. Scenes Events DCASE, 2017.