Technical Paper
H.3. Artificial Intelligence
Naeimeh Mohammad Karimi; Mehdi Rezaeian
Abstract
In the era of massive data, analyzing bioinformatics fields and discovering its functions are very important. The rate of sequence generation using sequence generation techniques is increasing rapidly, and researchers are faced with many unknown functions. One of the essential operations in bioinformatics ...
Read More
In the era of massive data, analyzing bioinformatics fields and discovering its functions are very important. The rate of sequence generation using sequence generation techniques is increasing rapidly, and researchers are faced with many unknown functions. One of the essential operations in bioinformatics is the classification of sequences to discover unknown proteins. There are two methods to classify sequences: the traditional method and the modern method. The conventional methods use sequence alignment, which has a high computational cost. In the contemporary method, feature extraction is used to classify proteins. In this regard, methods such as DeepFam have been presented. This research is an improvement of the DeepFam model, and the special focus is on extracting the appropriate features to differentiate the sequences of different categories. As the model improved, the features tended to be more generic. The grad-CAM method has been used to analyze the extracted features and interpret improved network layers. Then, we used the fitting vector from the transformer model to check the performance of Grad-CAM. The COG database, a massive database of protein sequences, was used to check the accuracy of the presented method. We have shown that by extracting more efficient features, the conserved regions in the sequences can be discovered more accurately, which helps to classify the proteins better. One of the critical advantages of the presented method is that by increasing the number of categories, the necessary flexibility is maintained, and the classification accuracy in three tests is higher than that of other methods.
Original/Review Paper
H.5. Image Processing and Computer Vision
Fateme Namazi; Mehdi Ezoji; Ebadat Ghanbari Parmehr
Abstract
Paddy fields in the north of Iran are highly fragmented, leading to challenges in accurately mapping them using remote sensing techniques. Cloudy weather often degrades image quality or renders images unusable, further complicating monitoring efforts. This paper presents a novel paddy rice mapping method ...
Read More
Paddy fields in the north of Iran are highly fragmented, leading to challenges in accurately mapping them using remote sensing techniques. Cloudy weather often degrades image quality or renders images unusable, further complicating monitoring efforts. This paper presents a novel paddy rice mapping method based on phenology, addressing these challenges. The method utilizes time series data from Sentinel-1 and 2 satellites to derive a rice phenology curve. This curve is constructed using the cross ratio (CR) index from Sentinel-1, and the normalized difference vegetation index (NDVI) and land surface water index (LSWI) from Sentinel-2. Unlike existing methods, which often rely on analyzing single-point indices at specific times, this approach examines the entire time series behavior of each pixel. This robust strategy significantly mitigates the impact of cloud cover on classification accuracy. The time series behavior of each pixel is then correlated with this rice phenology curve. The maximum correlation, typically achieved around the 50-day period in the middle of the cultivation season, helps identify potential rice fields. A Support Vector Machine (SVM) classifier with a Radial Basis Function (RBF) kernel is then employed, utilizing the maximum correlation values from all three indices to classify pixels as rice paddy or other land cover types. The implementation results validate the accuracy of this method, achieving an overall accuracy of 99%. All processes were carried out on the Google Earth Engine (GEE) platform.
Applied Article
H.5.9. Scene Analysis
Navid Raisi; Mahdi Rezaei; Behrooz Masoumi
Abstract
Human Activity Recognition (HAR) using computer vision is an expanding field with diverse applications, including healthcare, transportation, and human-computer interaction. While classical approaches such as Support Vector Machines (SVM), Histogram of Oriented Gradients (HOG), and ...
Read More
Human Activity Recognition (HAR) using computer vision is an expanding field with diverse applications, including healthcare, transportation, and human-computer interaction. While classical approaches such as Support Vector Machines (SVM), Histogram of Oriented Gradients (HOG), and Hidden Markov Models (HMM) rely on manually extracted features and struggle with complex motion patterns, deep learning-based models (e.g., Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM), Transformer-based models) have improved performance but still face challenges in handling occlusions, noisy environments, and computational efficiency. This paper introduces Attention-HAR, a novel deep neural network model designed to enhance HAR performance through three key innovations: Conv3DTranspose for spatial upsampling, ConvLSTM2D for capturing spatiotemporal patterns, and a custom attention mechanism that prioritizes critical frames within sequences. Unlike conventional attention mechanisms, our approach dynamically assigns weights to key frames, reducing the impact of redundant frames and enhancing interpretability and computational efficiency. Experimental results on the UCF-101 dataset demonstrate that Attention-HAR outperforms state-of-the-art models, achieving an accuracy of 97.61%, a precision of 97.95%, a recall of 97.49%, an F1-score of 97.64, and an AUC of 99.9%. With only 1.26 million parameters, the model is computationally efficient and well-suited for deployment on lightweight platforms. These findings suggest that integrating temporal-spatial feature learning with attention mechanisms can significantly improve HAR in dynamic and complex environments.
Technical Paper
B.3. Communication/Networking and Information Technology
Roya Morshedi; S. Mojtaba Matinkhah
Abstract
The Internet of Things (IoT) is a rapidly growing domain essential for modern smart services. However, resource limitations in IoT nodes create significant security vulnerabilities, making them prone to cyberattacks. Deep learning models have emerged as effective tools for detecting anomalies in IoT ...
Read More
The Internet of Things (IoT) is a rapidly growing domain essential for modern smart services. However, resource limitations in IoT nodes create significant security vulnerabilities, making them prone to cyberattacks. Deep learning models have emerged as effective tools for detecting anomalies in IoT traffic, yet Gaussian noise remains a major challenge, impacting detection accuracy. This study proposes an intrusion detection system based on a simple LSTM architecture with 128 memory units, optimized for deployment on edge servers and trained on the CIC-IDS2017 dataset. The model achieves outstanding performance, with a detection rate of 99.90%, accuracy of 99.90%, and an F1 score of 98.93%. A key innovation is integrating the Hurst parameter with the model, improving resilience against Gaussian noise and enhancing detection of attacks like DoS and DDoS. This research highlights the value of advanced statistical features and robust noise-resistant models in securing IoT networks. The system’s precision, rapid response, and innovative approach mark a significant advance in IoT cybersecurity.
Original/Review Paper
H.3.8. Natural Language Processing
Mozhgan Akaberi; Maryam Khodabakhsh; Seyedehfatemeh Karimi; Hoda Mashayekhi
Abstract
The exponential growth of digital information has increased the demand for robust and efficient Information Retrieval (IR) systems. Query Performance Prediction (QPP) is a critical task for identifying difficult queries and enhancing retrieval strategies. However, existing QPP methods suffer from several ...
Read More
The exponential growth of digital information has increased the demand for robust and efficient Information Retrieval (IR) systems. Query Performance Prediction (QPP) is a critical task for identifying difficult queries and enhancing retrieval strategies. However, existing QPP methods suffer from several limitations: (1) score-based approaches fail to capture the structural relationships among retrieved documents, (2) supervised methods require labeled training data, making them costly and impractical for new domains, and (3) unsupervised post-retrieval predictors often rely solely on retrieval score dispersion, neglecting document clustering effects. To address these challenges, we propose a novel clustering-based post-retrieval QPP method. Specifically, we introduce three unsupervised predictors: Clustered Distinction, which measures query-specific separability of retrieved clusters; Clustered Query Drift, which estimates the deviation of top-ranked documents from query intent; and a hybrid approach combining both. By analyzing the clustering structure of retrieved documents, our method improves interpretability while eliminating the need for labeled data. We evaluate our approach on three standard datasets: the large-scale MS MARCO Passage Ranking dataset, TREC DL 2019, and TREC DL 2020. Experimental results demonstrate that our method significantly outperforms state-of-the-art score-based QPP models. These findings highlight the potential of cluster-aware QPP for enhancing IR systems and reducing the impact of difficult queries.
Technical Paper
H.3. Artificial Intelligence
Ali Nasr-Esfahani; Mehdi Bekrani; Roozbeh Rajabi
Abstract
Artificial intelligence (AI) has significantly advanced speech recognition applications. However, many existing neural network-based methods struggle with noise, reducing accuracy in real-world environments. This study addresses isolated spoken Persian digit recognition (zero to nine) under noisy conditions, ...
Read More
Artificial intelligence (AI) has significantly advanced speech recognition applications. However, many existing neural network-based methods struggle with noise, reducing accuracy in real-world environments. This study addresses isolated spoken Persian digit recognition (zero to nine) under noisy conditions, particularly for phonetically similar numbers. A hybrid model combining residual convolutional neural networks and bidirectional gated recurrent units (BiGRU) is proposed, utilizing word units instead of phoneme units for speaker-independent recognition. The FARSDIGIT1 dataset, augmented with various approaches, is processed using Mel-Frequency Cepstral Coefficients (MFCC) for feature extraction. Experimental results demonstrate the model’s effectiveness, achieving 98.53%, 96.10%, and 95.92% accuracy on training, validation, and test sets, respectively. In noisy conditions, the proposed approach improves recognition by 26.88% over phoneme unit-based LSTM models and surpasses the Mel-scale Two Dimension Root Cepstrum Coefficients (MTDRCC) feature extraction technique along with MLP model (MTDRCC+MLP) by 7.61%.
Original/Review Paper
H.3. Artificial Intelligence
Vahideh Monemizadeh; Kourosh Kiani
Abstract
Anomaly detection is becoming increasingly crucial across various fields, including cybersecurity, financial risk management, and health monitoring. However, it faces significant challenges when dealing with large-scale, high-dimensional, and unlabeled datasets. This study focuses on decision tree-based ...
Read More
Anomaly detection is becoming increasingly crucial across various fields, including cybersecurity, financial risk management, and health monitoring. However, it faces significant challenges when dealing with large-scale, high-dimensional, and unlabeled datasets. This study focuses on decision tree-based methods for anomaly detection due to their scalability, interpretability, and effectiveness in managing high-dimensional data. Although Isolation Forest (iForest) and its extended variant, Extended Isolation Forest (EIF), are widely used, they exhibit limitations in identifying anomalies, particularly in handling normal data distributions and preventing the formation of ghost clusters. The Rotated Isolation Forest (RIF) was developed to address these challenges, enhancing the model's ability to discern true anomalies from normal variations by employing randomized rotations in feature space. Building on this approach, we proposed the Discrete Rotated Isolation Forest (DRIF) model, which integrates an Autoencoder for dimensionality reduction. Using a discrete probability distribution and an Autoencoder enhance computational efficiency. Experimental evaluations on synthetic and real-world datasets demonstrate that proposed model outperforms iForest, EIF, and RIF. And also achieving higher Receiver Operating Characteristic-Area Under the Curve (ROC-AUC) scores and significantly faster execution times. These findings establish the proposed model as a robust, scalable, and efficient approach for unsupervised anomaly detection in high-dimensional datasets.
Review Article
H.3. Artificial Intelligence
Rasoul Hosseinzadeh; Mahdi Sadeghzadeh
Abstract
The attention mechanisms have significantly advanced the field of machine learning and deep learning across various domains, including natural language processing, computer vision, and multimodal systems. This paper presents a comprehensive survey of attention mechanisms in Transformer architectures, ...
Read More
The attention mechanisms have significantly advanced the field of machine learning and deep learning across various domains, including natural language processing, computer vision, and multimodal systems. This paper presents a comprehensive survey of attention mechanisms in Transformer architectures, emphasizing their evolution, design variants, and domain-specific applications in NLP, computer vision, and multimodal learning. We categorize attention types by their goals like efficiency, scalability, and interpretability, and provide a comparative analysis of their strengths, limitations, and suitable use cases. This survey also addresses the lack of visual intuitions, offering a clearer taxonomy and discussion of hybrid approaches, such as sparse-hierarchical combinations. In addition to foundational mechanisms, we highlight hybrid approaches, theoretical underpinnings, and practical trade-offs. The paper identifies current challenges in computation, robustness, and transparency, offering a structured classification and proposing future directions. By comparing state-of-the-art techniques, this survey aims to guide researchers in selecting and designing attention mechanisms best suited for specific AI applications, ultimately fostering the development of more efficient, interpretable, and adaptable Transformer-based models.
Original/Review Paper
H.6.5.13. Signal processing
Samira Moghani; Hossein Marvi; Zeynab Mohammadpoory
Abstract
This study introduces a novel classification framework based on Deep Orthogonal Non-Negative Matrix Factorization (Deep ONMF), which leverages scalogram representations of phonocardiogram (PCG) signals to hierarchically extract structural features crucial for detecting valvular heart diseases (VHDs). ...
Read More
This study introduces a novel classification framework based on Deep Orthogonal Non-Negative Matrix Factorization (Deep ONMF), which leverages scalogram representations of phonocardiogram (PCG) signals to hierarchically extract structural features crucial for detecting valvular heart diseases (VHDs). Scalograms, generated via the Continuous Wavelet Transform (CWT), serve as the foundational input to the proposed feature extraction pipeline, which integrates them with Deep ONMF in a unified and segmentation-free architecture. The resulting scalogram–Deep ONMF framework is designed to hierarchically extract features through two complementary perspectives: Scale-Domain Analysis (SDA) and Temporal-Domain Analysis (TDA). These extracted features are then classified using shallow classifiers, with Random Forest (RF) achieving the best results, particularly when paired with SDA features based on the Bump wavelet. Experimental evaluations on two public PCG datasets—one with five heart sound classes and another with binary classification—demonstrate the effectiveness of the proposed method, achieving high classification accuracies of up to 98.40% and 97.23%, respectively, thereby confirming its competitiveness with state-of-the-art techniques. The results suggest that the proposed approach offers a practical and powerful solution for automated heart sound analysis, with potential applications beyond VHD detection.
Original/Review Paper
H.6.5.2. Computer vision
Kourosh Kiani; Razieh Rastgoo; Alireza Chaji; Sergio Escalera
Abstract
Image inpainting, the process of restoring missing or corrupted regions of an image by reconstructing pixel information, has recently seen considerable advancements through deep learning-based approaches. Aiming to tackle the complex spatial relationships within an image, in this paper, we introduce ...
Read More
Image inpainting, the process of restoring missing or corrupted regions of an image by reconstructing pixel information, has recently seen considerable advancements through deep learning-based approaches. Aiming to tackle the complex spatial relationships within an image, in this paper, we introduce a novel deep learning-based pre-processing methodology for image inpainting utilizing the Vision Transformer (ViT). Unlike CNN-based methods, our approach leverages the self-attention mechanism of ViT to model global contextual dependencies, improving the quality of inpainted regions. Specifically, we replace masked pixel values with those generated by the ViT, utilizing the attention mechanism to extract diverse visual patches and capture discriminative spatial features. To the best of our knowledge, this is the first instance of such a pre-processing model being proposed for image inpainting tasks. Furthermore, we demonstrate that our methodology can be effectively applied using a pre-trained ViT model with a pre-defined patch size, reducing computational overhead while maintaining high reconstruction fidelity. To assess the generalization capability of the proposed methodology, we conduct extensive experiments comparing our approach with four standard inpainting models across four public datasets. The results validate the efficacy of our pre-processing technique in enhancing inpainting performance, particularly in scenarios involving complex textures and large missing regions.