H.3. Artificial Intelligence
Ali Nasr-Esfahani; Mehdi Bekrani; Roozbeh Rajabi
Abstract
Artificial intelligence (AI) has significantly advanced speech recognition applications. However, many existing neural network-based methods struggle with noise, reducing accuracy in real-world environments. This study addresses isolated spoken Persian digit recognition (zero to nine) under noisy conditions, ...
Read More
Artificial intelligence (AI) has significantly advanced speech recognition applications. However, many existing neural network-based methods struggle with noise, reducing accuracy in real-world environments. This study addresses isolated spoken Persian digit recognition (zero to nine) under noisy conditions, particularly for phonetically similar numbers. A hybrid model combining residual convolutional neural networks and bidirectional gated recurrent units (BiGRU) is proposed, utilizing word units instead of phoneme units for speaker-independent recognition. The FARSDIGIT1 dataset, augmented with various approaches, is processed using Mel-Frequency Cepstral Coefficients (MFCC) for feature extraction. Experimental results demonstrate the model’s effectiveness, achieving 98.53%, 96.10%, and 95.92% accuracy on training, validation, and test sets, respectively. In noisy conditions, the proposed approach improves recognition by 26.88% over phoneme unit-based LSTM models and surpasses the Mel-scale Two Dimension Root Cepstrum Coefficients (MTDRCC) feature extraction technique along with MLP model (MTDRCC+MLP) by 7.61%.
V. Fazel Asl; B. Karasfi; B. Masoumi
Abstract
In this article, we consider the problems of abnormal behavior detection in a high-crowded environment. One of the main issues in abnormal behavior detection is the complexity of the structure patterns between the frames. In this paper, social force and optical flow patterns are used to prepare the system ...
Read More
In this article, we consider the problems of abnormal behavior detection in a high-crowded environment. One of the main issues in abnormal behavior detection is the complexity of the structure patterns between the frames. In this paper, social force and optical flow patterns are used to prepare the system for training the complexity of the structural patterns. The cycle GAN system has been used to train behavioral patterns. Two models of normal and abnormal behavioral patterns are used to evaluate the accuracy of the system detection. In the case of abnormal patterns used for training, due to the lack of this type of behavioral pattern, which is another challenge in detecting the abnormal behaviors, the geometric techniques are used to augment the patterns. If the normal behavioral patterns are used for training, there is no need to augment the patterns because the normal patterns are sufficient. Then, by using the cycle generative adversarial nets (cycle GAN), the normal and abnormal behaviors training will be considered separately. This system produces the social force and optical flow pattern for normal and abnormal behaviors on the first and second sides. We use the cycle GAN system both to train behavioral patterns and to assess the accuracy of abnormal behaviors detection. In the testing phase, if normal behavioral patterns are used for training, the cycle GAN system should not be able to reconstruct the abnormal behavioral patterns with high accuracy.