Original/Review Paper
H.5.7. Segmentation
Ali Fahmi Jafargholkhanloo; Mousa Shamsi; Mahdi Bashiri Bawil
Abstract
Magnetic Resonance Imaging (MRI) often suffers from noise and Intensity Non-Uniformity (INU), making segmentation a challenging task. The Fuzzy C-Means (FCM) algorithm, a widely used clustering method for image segmentation, is highly sensitive to noise and its convergence rate depends on data distribution. ...
Read More
Magnetic Resonance Imaging (MRI) often suffers from noise and Intensity Non-Uniformity (INU), making segmentation a challenging task. The Fuzzy C-Means (FCM) algorithm, a widely used clustering method for image segmentation, is highly sensitive to noise and its convergence rate depends on data distribution. FCM employs the Euclidean distance metric, which fails to adapt to variations in data point distributions within compact and similarly shaped clusters. Additionally, this metric is not locally adaptive to different cluster shapes. This paper introduces a Conditional Spatial Gustafson-Kessel Clustering Algorithm based on Information Theory (CSGKIT) to address these challenges. First, information theory is incorporated to enhance the algorithm's robustness against noise and improve segmentation accuracy. Second, the Mahalanobis distance replaces the Euclidean distance to better accommodate cluster shapes during the clustering process. Finally, a conditional spatial approach uses a fuzzy-weighted membership matrix to incorporate local spatial interactions between neighboring pixels. The proposed CSGKIT algorithm is evaluated on two datasets: the BrainWeb simulated dataset and the Open Access Series of Imaging Studies (OASIS) dataset. Experimental results indicate that CSGKIT outperforms other FCM-based algorithms in segmentation accuracy across various tissue types.
Original/Review Paper
H.6.5.14. Text processing
Abolfazl Adressi; َAmirhossein Amiri
Abstract
Identifying and classifying anomalies in textual data from social networks is challenging due to the linguistic complexity and diverse user expressions. While deep learning and machine learning techniques offer promise in tackling this problem, their effectiveness is limited by insufficient data. The ...
Read More
Identifying and classifying anomalies in textual data from social networks is challenging due to the linguistic complexity and diverse user expressions. While deep learning and machine learning techniques offer promise in tackling this problem, their effectiveness is limited by insufficient data. The effect of Generative Adversarial Networks (GANs) on anomaly detection and Classification is assessed in this paper, along with their relevance for generating synthetic text data. Combining synthetic and real data enhances classification accuracy, especially in settings of limited data. In this paper, Lasso and Ridge regression techniques are used for anomaly detection and classification. Experimental results reveal the superior performance of the proposed model in identifying and classifying anomalies under new datasets generated by GAN. By combining statistical methods with generative techniques, the solution becomes not only more interpretable and scalable but also better suited for advanced text analysis in fast-changing environments like social media platforms.
Original/Review Paper
H.6.5.2. Computer vision
Rozhin Mohammadizand; Razieh Rastgoo
Abstract
Sign language is a structured, non-vocal form of communication primarily used by individuals who are deaf or hard of hearing, who often face challenges interacting with non-signers. To address this, translation systems between sign and spoken language are essential, encompassing sign language recognition ...
Read More
Sign language is a structured, non-vocal form of communication primarily used by individuals who are deaf or hard of hearing, who often face challenges interacting with non-signers. To address this, translation systems between sign and spoken language are essential, encompassing sign language recognition and production. In this work, we focus on sign language production and propose a deep learning framework for generating skeleton-based video representations of sign language at the word level. Our approach employs a conditional Generative Adversarial Network (cGAN) with transformer embeddings in both generator and discriminator, augmented with bone-length and joint-angle constraints and a classifier-guided loss to ensure anatomically plausible and semantically consistent gestures. We further introduce a novel loss function to improve human keypoint generation for sign representation. Extensive experiments on three benchmark datasets demonstrate that our method outperforms state-of-the-art approaches according to statistical (MMD) and perceptual (FID) metrics, while qualitative analyses confirm that the generated gestures are temporally smooth, anatomically accurate, and semantically meaningful. These results highlight the effectiveness of our model in advancing word-level sign language synthesis.
Original/Review Paper
H.6.5.2. Computer vision
Mahdi Davari; Razieh Rastgoo
Abstract
Detecting driver distraction is critically important, as it remains a major contributor to road accidents and traffic-related injuries worldwide. This study introduces a novel hybrid deep learning model that integrates Spatio-Temporal Graph Convolutional Networks (ST-GCN) with a Transformer Encoder and ...
Read More
Detecting driver distraction is critically important, as it remains a major contributor to road accidents and traffic-related injuries worldwide. This study introduces a novel hybrid deep learning model that integrates Spatio-Temporal Graph Convolutional Networks (ST-GCN) with a Transformer Encoder and Attention mechanisms to effectively detect distracted driving behaviors. The ST-GCN component captures spatial and temporal dependencies in 3D skeletal motion data, modeling the dynamic body movements of the driver. Following this, a Transformer Encoder is employed to further refine temporal representations by leveraging global attention, allowing the model to understand long-range dependencies and subtle behavioral patterns over time. In addition, an Attention mechanism is applied to emphasize the most informative joints and time frames. To address class imbalance in the dataset, the model uses a focal loss function, which helps focus training on more difficult-to-classify examples. The proposed approach is validated on the 3D skeletal Drive&Act dataset, where it achieves a high accuracy of 97.47%, outperforming existing models, particularly under challenging conditions such as poor lighting and complex driving environments. The system demonstrates strong potential for real-time driver monitoring, offering an intelligent solution to enhance road safety and reduce accident risks through early detection of driver distraction.
Original/Review Paper
H.3.2.2. Computer vision
Mohammad Jadidi; Kourosh Kiani; Razieh Rastgoo
Abstract
In recent years, the application of deep learning techniques has revolutionized various domains, including the realm of sports analytics. The analysis of ball tracking and trajectory in sports has become an increasingly vital area of research, driven by advancements in technology and the growing demand ...
Read More
In recent years, the application of deep learning techniques has revolutionized various domains, including the realm of sports analytics. The analysis of ball tracking and trajectory in sports has become an increasingly vital area of research, driven by advancements in technology and the growing demand for data-driven insights in athletic performance. In volleyball, a sport characterized by rapid movements and strategic play, the ability to accurately track the trajectory of the ball is crucial for both training and competitive analysis. This paper proposes novel deep learning models for accurate volleyball ball detection and tracking. By incorporating attention mechanisms into the YOLOv8 and YOLOv10 architecture, our models significantly improve performance, particularly in challenging situations involving occlusions and fast movements. The proposed models across several metrics compared to baseline and other models. Specifically, achieved precision (94.2% and 94.7%, respectively) and recall (88.1% and 87.6%, respectively) and real-time processing speeds, making them suitable for various sports analytics applications.
Review Article
H.6.5.7. Industry
Hossein Ghayoumi Zadeh; Ali Fayazi; khosro rezaee; Afsaneh Aminaee; Hadi Halavati; Mehdi Tahernejad; Hadi Memarzadeh; Ali Masoumi; Mohammad Sadegh Jafari
Abstract
In this study, an intelligent deep learning–based system is proposed for automated detection of surface defects in copper cathode blanks used in the electrorefining process. The proposed pipeline combines a YOLOv8-based segmentation model with an EfficientNetV2-S classifier to localize and analyze ...
Read More
In this study, an intelligent deep learning–based system is proposed for automated detection of surface defects in copper cathode blanks used in the electrorefining process. The proposed pipeline combines a YOLOv8-based segmentation model with an EfficientNetV2-S classifier to localize and analyze defect-relevant regions of each blank. The segmentation module identifies the main copper regions, edge strips, and defect-prone areas associated with surface anomalies such as scratches, dents, misalignment, and discoloration, effectively reducing background interference and improving classification reliability. The dataset includes 5,266 labeled images with a significant class imbalance, addressed using focal loss and class weighting during training. Experimental results on the test set demonstrate strong performance, achieving 98.32% accuracy, 96.71% precision, 95.67% recall, an F1-score of 96.19%, and an AUC of 0.9953. Grad-CAM visualizations and error analysis further confirm that the model consistently focuses on meaningful defect regions while remaining robust to background and illumination variations. These results highlight the effectiveness of the proposed approach for reliable quality control in industrial copper electrorefining lines.
Original/Review Paper
H.3.8. Natural Language Processing
Saedeh Tahery; Saeed Farzi
Abstract
Dialogue understanding for low-resource languages like Persian remains challenging due to limited annotated data, which constrains supervised training at scale. We propose a simple yet effective training-free method that combines machine translation, retrieval-based example selection, and prompting with ...
Read More
Dialogue understanding for low-resource languages like Persian remains challenging due to limited annotated data, which constrains supervised training at scale. We propose a simple yet effective training-free method that combines machine translation, retrieval-based example selection, and prompting with a large language model (GPT-4o) to improve zero-shot cross-lingual performance. Given a Persian utterance translated into English, our method retrieves semantically and lexically similar English examples using a hybrid similarity function, translates them back into Persian, and constructs a few-shot prompt tailored to the input. This input-sensitive strategy enhances the quality of the examples, helping the model align more effectively with each instance. Experimental results on the Persian-ATIS dataset show that our approach improves intent detection and achieves competitive slot filling performance, outperforming state-of-the-art baselines without requiring any supervision in the target language. The modular pipeline is easy to reproduce and, in future work, can be extended to other low-resource languages, tasks, or retrieval configurations. The repository of our work is available at https://anonymous.4open.science/r/Persian_Language_Understanding-FDF4.