P. Abdzadeh; H. Veisi
Abstract
Automatic Speaker Verification (ASV) systems have proven to bevulnerable to various types of presentation attacks, among whichLogical Access attacks are manufactured using voiceconversion and text-to-speech methods. In recent years, there has beenloads of work concentrating on synthetic speech detection, ...
Read More
Automatic Speaker Verification (ASV) systems have proven to bevulnerable to various types of presentation attacks, among whichLogical Access attacks are manufactured using voiceconversion and text-to-speech methods. In recent years, there has beenloads of work concentrating on synthetic speech detection, and with the arrival of deep learning-based methods and their success in various computer science fields, they have been a prevailing tool for this very task too. Most of the deep neural network-based techniques forsynthetic speech detection have employed the acoustic features basedon Short-Term Fourier Transform (STFT), which are extracted from theraw audio signal. However, lately, it has been discovered that the usageof Constant Q Transform's (CQT) spectrogram can be a beneficialasset both for performance improvement and processing power andtime reduction of a deep learning-based synthetic speech detection. In this work, we compare the usage of the CQT spectrogram and some most utilized STFT-based acoustic features. As lateral objectives, we consider improving the model's performance as much as we can using methods such as self-attention and one-class learning. Also, short-duration synthetic speech detection has been one of the lateral goals too. Finally, we see that the CQT spectrogram-based model not only outperforms the STFT-based acoustic feature extraction methods but also reduces the processing time and resources for detecting genuine speech from fake. Also, the CQT spectrogram-based model places wellamong the best works done on the LA subset of the ASVspoof 2019 dataset, especially in terms of Equal Error Rate.
N. Taghvaei; B. Masoumi; M. R. Keyvanpour
Abstract
In general, humans are very complex organisms, and therefore, research into their various dimensions and aspects, including personality, has become an attractive subject of research. With the advent of technology, the emergence of a new kind of communication in the context of social networks has also ...
Read More
In general, humans are very complex organisms, and therefore, research into their various dimensions and aspects, including personality, has become an attractive subject of research. With the advent of technology, the emergence of a new kind of communication in the context of social networks has also given a new form of social communication to humans, and the recognition and categorization of people in this new space have become a hot topic of research that has been challenged by many researchers. In this paper, considering the Big Five personality characteristics of individuals, first, categorization of related work is proposed, and then a hybrid framework based on Fuzzy Neural Networks (FNN), along with, Deep Neural Networks (DNN) has been proposed that improves the accuracy of personality recognition by combining different FNN-classifiers with DNN-classifier in a proposed two-stage decision fusion scheme. Finally, a simulation of the proposed approach is carried out. The proposed approach is using the structural features of Social Networks Analysis (SNA), along with a linguistic analysis (LA) feature extracted from the description of the activities of individuals and comparison with the previous similar researches. The results, well-illustrated the performance improvement of the proposed framework up to 83.2 % of average accuracy on myPersonality dataset.