Original/Review Paper
H.5.7. Segmentation
Ali Fahmi Jafargholkhanloo; Mousa Shamsi; Mahdi Bashiri Bawil
Abstract
Magnetic Resonance Imaging (MRI) often suffers from noise and Intensity Non-Uniformity (INU), making segmentation a challenging task. The Fuzzy C-Means (FCM) algorithm, a widely used clustering method for image segmentation, is highly sensitive to noise and its convergence rate depends on data distribution. ...
Read More
Magnetic Resonance Imaging (MRI) often suffers from noise and Intensity Non-Uniformity (INU), making segmentation a challenging task. The Fuzzy C-Means (FCM) algorithm, a widely used clustering method for image segmentation, is highly sensitive to noise and its convergence rate depends on data distribution. FCM employs the Euclidean distance metric, which fails to adapt to variations in data point distributions within compact and similarly shaped clusters. Additionally, this metric is not locally adaptive to different cluster shapes. This paper introduces a Conditional Spatial Gustafson-Kessel Clustering Algorithm based on Information Theory (CSGKIT) to address these challenges. First, information theory is incorporated to enhance the algorithm's robustness against noise and improve segmentation accuracy. Second, the Mahalanobis distance replaces the Euclidean distance to better accommodate cluster shapes during the clustering process. Finally, a conditional spatial approach uses a fuzzy-weighted membership matrix to incorporate local spatial interactions between neighboring pixels. The proposed CSGKIT algorithm is evaluated on two datasets: the BrainWeb simulated dataset and the Open Access Series of Imaging Studies (OASIS) dataset. Experimental results indicate that CSGKIT outperforms other FCM-based algorithms in segmentation accuracy across various tissue types.
Original/Review Paper
H.3.2.2. Computer vision
Mohammad Jadidi; Kourosh Kiani; Razieh Rastgoo
Abstract
In recent years, the application of deep learning techniques has revolutionized various domains, including the realm of sports analytics. The analysis of ball tracking and trajectory in sports has become an increasingly vital area of research, driven by advancements in technology and the growing demand ...
Read More
In recent years, the application of deep learning techniques has revolutionized various domains, including the realm of sports analytics. The analysis of ball tracking and trajectory in sports has become an increasingly vital area of research, driven by advancements in technology and the growing demand for data-driven insights in athletic performance. In volleyball, a sport characterized by rapid movements and strategic play, the ability to accurately track the trajectory of the ball is crucial for both training and competitive analysis. This paper proposes novel deep learning models for accurate volleyball ball detection and tracking. By incorporating attention mechanisms into the YOLOv8 and YOLOv10 architecture, our models significantly improve performance, particularly in challenging situations involving occlusions and fast movements. The proposed models across several metrics compared to baseline and other models. Specifically, achieved precision (94.2% and 94.7%, respectively) and recall (88.1% and 87.6%, respectively) and real-time processing speeds, making them suitable for various sports analytics applications.
Original/Review Paper
H.6.5.14. Text processing
Abolfazl Adressi; َAmirhossein Amiri
Abstract
Identifying and classifying anomalies in textual data from social networks is challenging due to the linguistic complexity and diverse user expressions. While deep learning and machine learning techniques offer promise in tackling this problem, their effectiveness is limited by insufficient data. The ...
Read More
Identifying and classifying anomalies in textual data from social networks is challenging due to the linguistic complexity and diverse user expressions. While deep learning and machine learning techniques offer promise in tackling this problem, their effectiveness is limited by insufficient data. The effect of Generative Adversarial Networks (GANs) on anomaly detection and Classification is assessed in this paper, along with their relevance for generating synthetic text data. Combining synthetic and real data enhances classification accuracy, especially in settings of limited data. In this paper, Lasso and Ridge regression techniques are used for anomaly detection and classification. Experimental results reveal the superior performance of the proposed model in identifying and classifying anomalies under new datasets generated by GAN. By combining statistical methods with generative techniques, the solution becomes not only more interpretable and scalable but also better suited for advanced text analysis in fast-changing environments like social media platforms.
Review Article
H.6.5.7. Industry
Hossein Ghayoumi Zadeh; Ali Fayazi; khosro rezaee; Afsaneh Aminaee; Hadi Halavati; Mehdi Tahernejad; Hadi Memarzadeh; Ali Masoumi; Mohammad Sadegh Jafari
Abstract
In this study, an intelligent deep learning–based system is proposed for automated detection of surface defects in copper cathode blanks used in the electrorefining process. The proposed pipeline combines a YOLOv8-based segmentation model with an EfficientNetV2-S classifier to localize and analyze ...
Read More
In this study, an intelligent deep learning–based system is proposed for automated detection of surface defects in copper cathode blanks used in the electrorefining process. The proposed pipeline combines a YOLOv8-based segmentation model with an EfficientNetV2-S classifier to localize and analyze defect-relevant regions of each blank. The segmentation module identifies the main copper regions, edge strips, and defect-prone areas associated with surface anomalies such as scratches, dents, misalignment, and discoloration, effectively reducing background interference and improving classification reliability. The dataset includes 5,266 labeled images with a significant class imbalance, addressed using focal loss and class weighting during training. Experimental results on the test set demonstrate strong performance, achieving 98.32% accuracy, 96.71% precision, 95.67% recall, an F1-score of 96.19%, and an AUC of 0.9953. Grad-CAM visualizations and error analysis further confirm that the model consistently focuses on meaningful defect regions while remaining robust to background and illumination variations. These results highlight the effectiveness of the proposed approach for reliable quality control in industrial copper electrorefining lines.
Original/Review Paper
H.3.8. Natural Language Processing
Saedeh Tahery; Saeed Farzi
Abstract
Dialogue understanding for low-resource languages like Persian remains challenging due to limited annotated data, which constrains supervised training at scale. We propose a simple yet effective training-free method that combines machine translation, retrieval-based example selection, and prompting with ...
Read More
Dialogue understanding for low-resource languages like Persian remains challenging due to limited annotated data, which constrains supervised training at scale. We propose a simple yet effective training-free method that combines machine translation, retrieval-based example selection, and prompting with a large language model (GPT-4o) to improve zero-shot cross-lingual performance. Given a Persian utterance translated into English, our method retrieves semantically and lexically similar English examples using a hybrid similarity function, translates them back into Persian, and constructs a few-shot prompt tailored to the input. This input-sensitive strategy enhances the quality of the examples, helping the model align more effectively with each instance. Experimental results on the Persian-ATIS dataset show that our approach improves intent detection and achieves competitive slot filling performance, outperforming state-of-the-art baselines without requiring any supervision in the target language. The modular pipeline is easy to reproduce and, in future work, can be extended to other low-resource languages, tasks, or retrieval configurations. The repository of our work is available at https://anonymous.4open.science/r/Persian_Language_Understanding-FDF4.
Original/Review Paper
H.6.5.13. Signal processing
Zeynab Mohammadpoory; Mahda Nasrollahzadeh; Sakineh Asadi
Abstract
Nowadays, the recognition of emotions using speech signals has gained popularity because of its vast number of applications in different fields such as medicine, online marketing, online search engines, education systems, criminal investigations, traffic collisions, and more. Many researchers have adopted ...
Read More
Nowadays, the recognition of emotions using speech signals has gained popularity because of its vast number of applications in different fields such as medicine, online marketing, online search engines, education systems, criminal investigations, traffic collisions, and more. Many researchers have adopted different methodologies to improve emotion classification accuracy using speech signals. This study presents a novel time-series-to-graph transformation framework for speech emotion recognition. Speech signals were segmented into overlapping windows, each converted into graphs, from which 16 structural features were extracted. Significant features were then selected via Minimum Redundancy Maximum Relevance (mRMR) and used to train four classifiers: random forest (RF), linear discriminant analysis (LDA), support vector machine (SVM), and k-nearest neighbors (KNN). Finally, a soft-voting ensemble strategy was employed to integrate their predictions, yielding improved classification performance. The proposed method achieved the highest sensitivity, specificity, and accuracy for the SAVEE database: 83.57%, 98.93%, and 98.16%, respectively. Similarly, for the EmoDB database, the highest values were 94.47%, 99.09%, and 98.40%, respectively. We also compared our results with other methods and found that our method outperformed state-of-the-art techniques in emotion classification.
Technical Paper
H.3.7. Learning
Malihe Danesh; Zahra Ahmadi
Abstract
In recent years, sign language recognition has emerged as a major challenge in the fields of image processing and machine learning. People with hearing impairments use sign language to communicate, but the lack of automated tools to translate it has created significant communication barriers. This study ...
Read More
In recent years, sign language recognition has emerged as a major challenge in the fields of image processing and machine learning. People with hearing impairments use sign language to communicate, but the lack of automated tools to translate it has created significant communication barriers. This study presents a hybrid model based on convolutional neural networks (CNNs), transformers, and hidden Markov models (HMMs) to accurately recognize sign language gestures using the MNIST sign language dataset. The model first extracts image features from handwritten images using CNNs and then feeds these features into the Transformer model to process complex and long-term dependencies in the feature sequence. In the next step, to smooth the predictions and improve accuracy, a hidden Markov model is employed, which adjusts the final predictions based on previous sequences. The results show that the proposed model utilizing HMM achieves an accuracy of 99% and a sign error rate of 0.0098, demonstrating its high efficiency in recognizing hand gestures. This research represents an important step toward developing assistive devices for the deaf and enhancing human interaction.
Original/Review Paper
H.5. Image Processing and Computer Vision
Ali Shabani Badi; Kambiz Rahbar; Ziaeddin Beheshtifard; Maryam Khademi
Abstract
This paper introduces a novel approach to enhance the quality of images captured under low-light conditions. The method optimizes the parameters of the established Li method by employing the evolutionary Particle Swarm Optimization (PSO) algorithm. A key contribution of this research is the formulation ...
Read More
This paper introduces a novel approach to enhance the quality of images captured under low-light conditions. The method optimizes the parameters of the established Li method by employing the evolutionary Particle Swarm Optimization (PSO) algorithm. A key contribution of this research is the formulation of a comprehensive loss function for the PSO algorithm, derived from the integration of entropy loss, edge pixel loss, and average desired image brightness loss. The objective of this optimization process is to determine the optimal parameter set for the base method, thereby improving the preservation of image structure, increasing brightness while maintaining edge details, and ensuring the overall brightness of the resulting image remains within a desirable range. An iterative optimization strategy is employed to address the resulting optimization problem. The performance of the proposed method is evaluated through quantitative and qualitative analyses on the SICE dataset and benchmarked against several state-of-the-art low-light image enhancement techniques. Quantitative evaluation, utilizing metrics such as PSNR, SSIM, PIQE, NIQE, BRISQUE, and NIMA, demonstrates that the proposed parameter tuning of the base method, guided by the PSO algorithm and our comprehensive loss function, achieves competitive or superior performance in preserving image structure and details, generating images with natural visual quality, and suppressing noise in comparison to numerous existing methods. This research highlights the efficacy of the evolutionary PSO algorithm in identifying optimal configurations for a physical model-based method aimed at enhancing the quality of low-light imagery.
Technical Paper
H.3.2.2. Computer vision
Mahdi Zarrin; Haniyeh Nikkhah
Abstract
Medical image analysis, crucial for disease diagnosis and treatment, often suffers from the challenge of class imbalance, where the area of normal tissue significantly outweighs that of abnormal regions. Furthermore, the varying class ratios across different images within a dataset complicate the application ...
Read More
Medical image analysis, crucial for disease diagnosis and treatment, often suffers from the challenge of class imbalance, where the area of normal tissue significantly outweighs that of abnormal regions. Furthermore, the varying class ratios across different images within a dataset complicate the application of uniform loss adjustments. To address these issues and advance automated segmentation, this study proposes a novel deep learning model integrating the strengths of YOLO Version 8's efficient feature extraction modules (SPPF and C2F) within a U-shaped architecture enhanced by a Receptive Field Enhancement (RFE) module. The RFE module, acting as an advanced skip connection, strategically fuses multi-scale features from corresponding and subsequent encoder layers processed through SPPF and C2F to enrich feature transfer and improve receptive field. To specifically tackle the class imbalance and the diversity of class distributions across images, we introduce a novel Adapt Exponential Loss function. This pixel-level loss dynamically adjusts class weights for each image based on its individual lesion-to-total-pixel ratio (k). We evaluated our proposed model and loss function on challenging skin lesion datasets: ISIC 2018, ISIC 2017, and PH2. Our method achieved significant segmentation performance with IoU scores of 86.47%, 85.67%, and 93.13%, and Dice scores of 91.63%, 90.19%, and 96.02% on ISIC 2018, ISIC 2017, and PH2, respectively, demonstrating its effectiveness in accurately delineating skin lesions despite class imbalance and varying lesion proportions. This work contributes a robust framework for medical image segmentation, facilitating more reliable diagnostic tools in dermatology.
Conceptual Paper
H.3.2.10. Medicine and science
Ali Ghanbari; Mohaddeseh Keyhanian; Jamshid pirgazi
Abstract
Accurate prediction of drug–target interactions is essential for advancing drug discovery and repositioning efforts. This study introduces a comprehensive framework that effectively addresses key challenges in DTI prediction, including dataset imbalance and high-dimensional feature representations. ...
Read More
Accurate prediction of drug–target interactions is essential for advancing drug discovery and repositioning efforts. This study introduces a comprehensive framework that effectively addresses key challenges in DTI prediction, including dataset imbalance and high-dimensional feature representations. The approach integrates multiple protein descriptors—specifically, nine statistical and sequence-based features—and drug molecular fingerprints encoded via Morgan algorithms, with optimal feature combinations selected through validation to capture diverse biological and chemical information. To mitigate dataset imbalance, a one-class SVM-based undersampling method (One-SVM-US) models the distribution of positive interactions to guide the selective reduction of the majority class, thereby effectively balancing positive and negative samples. Furthermore, a supervised, classification-oriented variational autoencoder is employed to compress the high-dimensional features into a lower-dimensional space while preserving class-discriminative information relevant to interaction prediction. The refined features are then classified using machine learning models to predict potential drug–target pairs. Experimental evaluations on benchmark datasets demonstrate the effectiveness of the proposed framework, with results showing perfect AUC-ROC scores of 1.00 on the EN, GPCR, and NR datasets, and a score of 0.9731 on the IC dataset, indicating performance improvements over existing methods. These findings confirm the robustness and potential of the approach as a reliable tool for drug–target interaction prediction.