Technical Paper
H.3. Artificial Intelligence
Ali Nasr-Esfahani; Mehdi Bekrani; Roozbeh Rajabi
Abstract
Artificial intelligence (AI) has significantly advanced speech recognition applications. However, many existing neural network-based methods struggle with noise, reducing accuracy in real-world environments. This study addresses isolated spoken Persian digit recognition (zero to nine) under noisy conditions, ...
Read More
Artificial intelligence (AI) has significantly advanced speech recognition applications. However, many existing neural network-based methods struggle with noise, reducing accuracy in real-world environments. This study addresses isolated spoken Persian digit recognition (zero to nine) under noisy conditions, particularly for phonetically similar numbers. A hybrid model combining residual convolutional neural networks and bidirectional gated recurrent units (BiGRU) is proposed, utilizing word units instead of phoneme units for speaker-independent recognition. The FARSDIGIT1 dataset, augmented with various approaches, is processed using Mel-Frequency Cepstral Coefficients (MFCC) for feature extraction. Experimental results demonstrate the model’s effectiveness, achieving 98.53%, 96.10%, and 95.92% accuracy on training, validation, and test sets, respectively. In noisy conditions, the proposed approach improves recognition by 26.88% over phoneme unit-based LSTM models and surpasses the Mel-scale Two Dimension Root Cepstrum Coefficients (MTDRCC) feature extraction technique along with MLP model (MTDRCC+MLP) by 7.61%.
Original/Review Paper
H.3. Artificial Intelligence
Vahideh Monemizadeh; Kourosh Kiani
Abstract
Anomaly detection is becoming increasingly crucial across various fields, including cybersecurity, financial risk management, and health monitoring. However, it faces significant challenges when dealing with large-scale, high-dimensional, and unlabeled datasets. This study focuses on decision tree-based ...
Read More
Anomaly detection is becoming increasingly crucial across various fields, including cybersecurity, financial risk management, and health monitoring. However, it faces significant challenges when dealing with large-scale, high-dimensional, and unlabeled datasets. This study focuses on decision tree-based methods for anomaly detection due to their scalability, interpretability, and effectiveness in managing high-dimensional data. Although Isolation Forest (iForest) and its extended variant, Extended Isolation Forest (EIF), are widely used, they exhibit limitations in identifying anomalies, particularly in handling normal data distributions and preventing the formation of ghost clusters. The Rotated Isolation Forest (RIF) was developed to address these challenges, enhancing the model's ability to discern true anomalies from normal variations by employing randomized rotations in feature space. Building on this approach, we proposed the Discrete Rotated Isolation Forest (DRIF) model, which integrates an Autoencoder for dimensionality reduction. Using a discrete probability distribution and an Autoencoder enhance computational efficiency. Experimental evaluations on synthetic and real-world datasets demonstrate that proposed model outperforms iForest, EIF, and RIF. And also achieving higher Receiver Operating Characteristic-Area Under the Curve (ROC-AUC) scores and significantly faster execution times. These findings establish the proposed model as a robust, scalable, and efficient approach for unsupervised anomaly detection in high-dimensional datasets.
Review Article
H.3. Artificial Intelligence
Rasoul Hosseinzadeh; Mahdi Sadeghzadeh
Abstract
The attention mechanisms have significantly advanced the field of machine learning and deep learning across various domains, including natural language processing, computer vision, and multimodal systems. This paper presents a comprehensive survey of attention mechanisms in Transformer architectures, ...
Read More
The attention mechanisms have significantly advanced the field of machine learning and deep learning across various domains, including natural language processing, computer vision, and multimodal systems. This paper presents a comprehensive survey of attention mechanisms in Transformer architectures, emphasizing their evolution, design variants, and domain-specific applications in NLP, computer vision, and multimodal learning. We categorize attention types by their goals like efficiency, scalability, and interpretability, and provide a comparative analysis of their strengths, limitations, and suitable use cases. This survey also addresses the lack of visual intuitions, offering a clearer taxonomy and discussion of hybrid approaches, such as sparse-hierarchical combinations. In addition to foundational mechanisms, we highlight hybrid approaches, theoretical underpinnings, and practical trade-offs. The paper identifies current challenges in computation, robustness, and transparency, offering a structured classification and proposing future directions. By comparing state-of-the-art techniques, this survey aims to guide researchers in selecting and designing attention mechanisms best suited for specific AI applications, ultimately fostering the development of more efficient, interpretable, and adaptable Transformer-based models.
Original/Review Paper
H.6.5.13. Signal processing
Samira Moghani; Hossein Marvi; Zeynab Mohammadpoory
Abstract
This study introduces a novel classification framework based on Deep Orthogonal Non-Negative Matrix Factorization (Deep ONMF), which leverages scalogram representations of phonocardiogram (PCG) signals to hierarchically extract structural features crucial for detecting valvular heart diseases (VHDs). ...
Read More
This study introduces a novel classification framework based on Deep Orthogonal Non-Negative Matrix Factorization (Deep ONMF), which leverages scalogram representations of phonocardiogram (PCG) signals to hierarchically extract structural features crucial for detecting valvular heart diseases (VHDs). Scalograms, generated via the Continuous Wavelet Transform (CWT), serve as the foundational input to the proposed feature extraction pipeline, which integrates them with Deep ONMF in a unified and segmentation-free architecture. The resulting scalogram–Deep ONMF framework is designed to hierarchically extract features through two complementary perspectives: Scale-Domain Analysis (SDA) and Temporal-Domain Analysis (TDA). These extracted features are then classified using shallow classifiers, with Random Forest (RF) achieving the best results, particularly when paired with SDA features based on the Bump wavelet. Experimental evaluations on two public PCG datasets—one with five heart sound classes and another with binary classification—demonstrate the effectiveness of the proposed method, achieving high classification accuracies of up to 98.40% and 97.23%, respectively, thereby confirming its competitiveness with state-of-the-art techniques. The results suggest that the proposed approach offers a practical and powerful solution for automated heart sound analysis, with potential applications beyond VHD detection.