H.5. Image Processing and Computer Vision
Z. Mehrnahad; A.M. Latif; J. Zarepour Ahmadabadi
Abstract
In this paper, a novel scheme for lossless meaningful visual secret sharing using XOR properties is presented. In the first step, genetic algorithm with an appropriate proposed objective function created noisy share images. These images do not contain any information about the input secret image and ...
Read More
In this paper, a novel scheme for lossless meaningful visual secret sharing using XOR properties is presented. In the first step, genetic algorithm with an appropriate proposed objective function created noisy share images. These images do not contain any information about the input secret image and the secret image is fully recovered by stacking them together. Because of attacks on image transmission, a new approach for construction of meaningful shares by the properties of XOR is proposed. In recovery scheme, the input secret image is fully recovered by an efficient XOR operation. The proposed method is evaluated using PSNR, MSE and BCR criteria. The experimental results presents good outcome compared with other methods in both quality of share images and recovered image.
M. Molaei; D. Mohamadpur
Abstract
Performing sentiment analysis on social networks big data can be helpful for various research and business projects to take useful insights from text-oriented content. In this paper, we propose a general pre-processing framework for sentiment analysis, which is devoted to adopting FastText with Recurrent ...
Read More
Performing sentiment analysis on social networks big data can be helpful for various research and business projects to take useful insights from text-oriented content. In this paper, we propose a general pre-processing framework for sentiment analysis, which is devoted to adopting FastText with Recurrent Neural Network variants to prepare textual data efficiently. This framework consists of three different stages of data cleansing, tweets padding, word embedding’s extraction from FastText and conversion of tweets to these vectors, which implemented using DataFrame data structure in Apache Spark. Its main objective is to enhance the performance of online sentiment analysis in terms of pre-processing time and handle large scale data volume. In addition, we propose a distributed intelligent system for online social big data analytics. It is designed to store, process, and classify a huge amount of information in online. The proposed system adopts any word embedding libraries like FastText with different distributed deep learning models like LSTM or GRU. The results of the evaluations show that the proposed framework can significantly improve the performance of previous RDD-based methods in terms of processing time and data volume.
J. Hamidzadeh; M. Moradi
Abstract
Recommender systems extract unseen information for predicting the next preferences. Most of these systems use additional information such as demographic data and previous users' ratings to predict users' preferences but rarely have used sequential information. In streaming recommender systems, the emergence ...
Read More
Recommender systems extract unseen information for predicting the next preferences. Most of these systems use additional information such as demographic data and previous users' ratings to predict users' preferences but rarely have used sequential information. In streaming recommender systems, the emergence of new patterns or disappearance a pattern leads to inconsistencies. However, these changes are common issues due to the user's preferences variations on items. Recommender systems without considering inconsistencies will suffer poor performance. Thereby, the present paper is devoted to a new fuzzy rough set-based method for managing in a flexible and adaptable way. Evaluations have been conducted on twelve real-world data sets by the leave-one-out cross-validation method. The results of the experiments have been compared with the other five methods, which show the superiority of the proposed method in terms of accuracy, precision, recall.
Kh. Aghajani
Abstract
Deep-learning-based approaches have been extensively used in detecting pulmonary nodules from computer Tomography (CT) scans. In this study, an automated end-to-end framework with a convolution network (Conv-net) has been proposed to detect lung nodules from CT images. Here, boundary regression has been ...
Read More
Deep-learning-based approaches have been extensively used in detecting pulmonary nodules from computer Tomography (CT) scans. In this study, an automated end-to-end framework with a convolution network (Conv-net) has been proposed to detect lung nodules from CT images. Here, boundary regression has been performed by a direct regression method, in which the offset is predicted from a given point. The proposed framework has two outputs; a pixel-wise classification between nodule or normal and a direct regression which is used to determine the four coordinates of the nodule's bounding box. The Loss function includes two terms; one for classification and the other for regression. The performance of the proposed method is compared with YOLOv2. The evaluation has been performed using Lung-Pet-CT-DX dataset. The experimental results show that the proposed framework outperforms the YOLOv2 method. The results demonstrate that the proposed framework possesses high accuracies of nodule localization and boundary estimation.
H. Momeni; N. Mabhoot
Abstract
Interest in cloud computing has grown considerably over recent years, primarily due to scalable virtualized resources. So, cloud computing has contributed to the advancement of real-time applications such as signal processing, environment surveillance and weather forecast where time and energy considerations ...
Read More
Interest in cloud computing has grown considerably over recent years, primarily due to scalable virtualized resources. So, cloud computing has contributed to the advancement of real-time applications such as signal processing, environment surveillance and weather forecast where time and energy considerations to perform the tasks are critical. In real-time applications, missing the deadlines for the tasks will cause catastrophic consequences; thus, real-time task scheduling in cloud computing environment is an important and essential issue. Furthermore, energy-saving in cloud data center, regarding the benefits such as reduction of system operating costs and environmental protection is an important concern that is considered during recent years and is reducible with appropriate task scheduling. In this paper, we present an energy-aware task scheduling approach, namely EaRTs for real-time applications. We employ the virtualization and consolidation technique subject to minimizing the energy consumptions, improve resource utilization and meeting the deadlines of tasks. In the consolidation technique, scale up and scale down of virtualized resources could improve the performance of task execution. The proposed approach comprises four algorithms, namely Energy-aware Task Scheduling in Cloud Computing(ETC), Vertical VM Scale Up(V2S), Horizontal VM Scale up(HVS) and Physical Machine Scale Down(PSD). We present the formal model of the proposed approach using Timed Automata to prove precisely the schedulability feature and correctness of EaRTs. We show that our proposed approach is more efficient in terms of deadline hit ratio, resource utilization and energy consumption compared to other energy-aware real-time tasks scheduling algorithms.
Amin Rahmati; Foad Ghaderi
Abstract
Every facial expression involves one or more facial action units appearing on the face. Therefore, action unit recognition is commonly used to enhance facial expression detection performance. It is important to identify subtle changes in face when particular action units occur. In this paper, we propose ...
Read More
Every facial expression involves one or more facial action units appearing on the face. Therefore, action unit recognition is commonly used to enhance facial expression detection performance. It is important to identify subtle changes in face when particular action units occur. In this paper, we propose an architecture that employs local features extracted from specific regions of face while using global features taken from the whole face. To this end, we combine the SPPNet and FPN modules to architect an end-to-end network for facial action unit recognition. First, different predefined regions of face are detected. Next, the SPPNet module captures deformations in the detected regions. The SPPNet module focuses on each region separately and can not take into account possible changes in the other areas of the face. In parallel, the FPN module finds global features related to each of the facial regions. By combining the two modules, the proposed architecture is able to capture both local and global facial features and enhance the performance of action unit recognition task. Experimental results on DISFA dataset demonstrate the effectiveness of our method.
V. Torkzadeh; S. Toosizadeh
Abstract
In this study, an automatic system based on image processing methods using features based on convolutional neural networks is proposed to detect the degree of possible dipping and buckling on the sandwich panel surface by a colour camera. The proposed method, by receiving an image of the sandwich panel, ...
Read More
In this study, an automatic system based on image processing methods using features based on convolutional neural networks is proposed to detect the degree of possible dipping and buckling on the sandwich panel surface by a colour camera. The proposed method, by receiving an image of the sandwich panel, can detect the dipping and buckling of its surface with acceptable accuracy. After a panel is fully processed by the system, an image output is generated to observe the surface status of the sandwich panel so that the supervisor of the production line can better detect any potential defects at the surface of the produced panels. An accurate solution is also provided to measure the amount of available distortion (depth or height of dipping and buckling) on the sandwich panels without needing expensive and complex equipment and hardware.
H.5.7. Segmentation
Ehsan Ehsaeyan
Abstract
This paper presents a novel approach to image segmentation through multilevel thresholding, leveraging the speed and precision of the technique. The proposed algorithm, based on the Grey Wolf Optimizer (GWO), integrates Darwinian principles to address the common stagnation issue in metaheuristic algorithms, ...
Read More
This paper presents a novel approach to image segmentation through multilevel thresholding, leveraging the speed and precision of the technique. The proposed algorithm, based on the Grey Wolf Optimizer (GWO), integrates Darwinian principles to address the common stagnation issue in metaheuristic algorithms, which often results in local optima and premature convergence. The search agents are efficiently steered across the search space by a dual mechanism of encouragement and punishment employed by our strategy, thereby curtailing computational time. This is implemented by segmenting the population into distinct groups, each tasked with discovering superior solutions. To validate the algorithm’s efficacy, 9 test images from the Pascal VOC dataset were selected, and the renowned energy curve method was employed for verification. Additionally, Kapur entropy was utilized to gauge the algorithm’s performance. The method was benchmarked against four disparate search algorithms, and its dominance was underscored by achieving the best outcomes in 20 out of 27 cases for image segmentation. The experimental findings collectively affirm that the Darwinian Grey Wolf Optimizer (DGWO) stands as a formidable instrument for multilevel thresholding.
Mohammad Nazari; Hossein Rahmani; Dadfar Momeni; Motahare Nasiri
Abstract
Graph representation of data can better define relationships among data components and thus provide better and richer analysis. So far, movies have been represented in graphs many times using different features for clustering, genre prediction, and even for use in recommender systems. In constructing ...
Read More
Graph representation of data can better define relationships among data components and thus provide better and richer analysis. So far, movies have been represented in graphs many times using different features for clustering, genre prediction, and even for use in recommender systems. In constructing movie graphs, little attention has been paid to their textual features such as subtitles, while they contain the entire content of the movie and there is a lot of hidden information in them. So, in this paper, we propose a method called MoGaL to construct movie graph using LDA on subtitles. In this method, each node is a movie and each edge represents the novel relationship discovered by MoGaL among two associated movies. First, we extracted the important topics of the movies using LDA on their subtitles. Then, we visualized the relationship between the movies in a graph, using the cosine similarity. Finally, we evaluated the proposed method with respect to measures genre homophily and genre entropy. MoGaL succeeded to outperforms the baseline method significantly in these measures. Accordingly, our empirical results indicate that movie subtitles could be considered a rich source of informative information for various movie analysis tasks.
M. Mohammadzadeh; H. Khosravi
Abstract
Today, video games have a special place among entertainment. In this article, we have developed an interactive video game for mobile devices. In this game, the user can control the game’s character by his face and hand gestures. Cascading classifiers along with Haar-like features and local binary ...
Read More
Today, video games have a special place among entertainment. In this article, we have developed an interactive video game for mobile devices. In this game, the user can control the game’s character by his face and hand gestures. Cascading classifiers along with Haar-like features and local binary patterns are used for hand gesture recognition and face detection. The game’s character moves according to the current hand and face state received from the frontal camera. Various ideas are used to achieve the appropriate accuracy and speed. Unity 3D and OpenCV for Unity are employed to design and implement the video game. The programming language is C#. This game is written in C# and developed for both Windows and Android operating systems. Experiments show an accuracy of 86.4% in the detection of five gestures. It also has an acceptable frame rate and can run at 11 fps and 8 fps in Windows and Android respectively.
H.3.9. Problem Solving, Control Methods, and Search
Heydar Toossian Shandiz; Mohsen Erfan Hajipour; Amir Ali Bagheri
Abstract
The aim of this paper is to create an efficient controller that can precisely track the position of autonomous surface vessels by utilizing the dynamic inversion control technique. One of the key objectives of this controller is to mitigate or eliminate the effects of environmental disturbances like ...
Read More
The aim of this paper is to create an efficient controller that can precisely track the position of autonomous surface vessels by utilizing the dynamic inversion control technique. One of the key objectives of this controller is to mitigate or eliminate the effects of environmental disturbances like wind, waves, and water flow. On the other hand, intelligent methods are used to remove disturbances and fixing modeling errors. These methods include the use of fuzzy methods to adjust the control parameters in the linear controller used in the dynamic inversion controller and the use of perceptron neural network along with the dynamic inversion controller. The effectiveness of the proposed methods is evaluated not only based on the step response but also on their ability to track a complex path. Finally, the proposed methods have been compared with one of the classic methods, namely the PID control. This evaluation provides insights into how the proposed methods fare in terms of both step response and trajectory tracking when compared to the traditional PID control approach.
Fatemeh Alinezhad; Kourosh Kiani; Razieh Rastgoo
Abstract
Gender recognition is an attractive research area in recent years. To make a user-friendly application for gender recognition, having an accurate, fast, and lightweight model applicable in a mobile device is necessary. Although successful results have been obtained using the Convolutional Neural Network ...
Read More
Gender recognition is an attractive research area in recent years. To make a user-friendly application for gender recognition, having an accurate, fast, and lightweight model applicable in a mobile device is necessary. Although successful results have been obtained using the Convolutional Neural Network (CNN), this model needs high computational resources that are not appropriate for mobile and embedded applications. To overcome this challenge and considering the recent advances in Deep Learning, in this paper, we propose a deep learning-based model for gender recognition in mobile devices using the lightweight CNN models. In this way, a pretrained CNN model, entitled Multi-Task Convolutional Neural Network (MTCNN), is used for face detection. Furthermore, the MobileFaceNet model is modified and trained using the Margin Distillation cost function. To boost the model performance, the Dense Block and Depthwise separable convolutions are used in the model. Results on six datasets confirm that the proposed model outperforms the MobileFaceNet model on six datasets with the relative accuracy improvements of 0.02%, 1.39%, 2.18%, 1.34%, 7.51%, 7.93% on the LFW, CPLFW, CFP-FP, VGG2-FP, UTKFace, and own data, respectively. In addition, we collected a dataset, including a total of 100’000 face images from both male and female in different age categories. Images of the women are with and without headgear.
A. Torkaman; K. Badie; A. Salajegheh; M. H. Bokaei; Seyed F. Fatemi
Abstract
Recently, network representation has attracted many research works mostly concentrating on representing of nodes in a dense low-dimensional vector. There exist some network embedding methods focusing only on the node structure and some others considering the content information within the nodes. In this ...
Read More
Recently, network representation has attracted many research works mostly concentrating on representing of nodes in a dense low-dimensional vector. There exist some network embedding methods focusing only on the node structure and some others considering the content information within the nodes. In this paper, we propose HDNR; a hybrid deep network representation model, which uses a triplet deep neural network architecture that considers both the node structure and content information for network representation. In addition, the author's writing style is also considered as a significant feature in the node content information. Inspired by the application of deep learning in natural language processing, our model utilizes a deep random walk method to exploit inter-node structures and two deep sequence prediction methods to extract nodes' content information. The embedding vectors generated in this manner were shown to have the ability of boosting each other for learning optimal node representation, detecting more informative features and ultimately a better community detection. The experimental results confirm the effectiveness of this model for network representation compared to other baseline methods.
S. Mavaddati; S. Mavaddati
Abstract
Development of an automatic system to classify the type of rice grains is an interesting research area in the scientific fields associated with modern agriculture. In recent years, different techniques are employed to identify the types of various agricultural products. Also, different color-based and ...
Read More
Development of an automatic system to classify the type of rice grains is an interesting research area in the scientific fields associated with modern agriculture. In recent years, different techniques are employed to identify the types of various agricultural products. Also, different color-based and texture-based features are used to yield the desired results in the classification procedure. This paper proposes a classification algorithm to detect different rice types by extracting features from the bulk samples. The feature space in this algorithm includes the fractal-based features of the extracted coefficients from the wavelet packet transform analysis. This feature vector is combined with other texture-based features and used to learn a model related to each rice type using the Gaussian mixture model classifier. Also, a sparse structured principal component analysis algorithm is applied to reduce the dimension of the feature vector and lead to the precise classification rate with less computational time. The results of the proposed classifier are compared with the results obtained from the other presented classification procedures in this context. The simulation results, along with a meaningful statistical test, show that the proposed algorithm based on the combinational features is able to detect precisely the type of rice grains with more than 99% accuracy. Also, the proposed algorithm can detect the rice quality for different percentages of combination with other rice grains with 99.75% average accuracy.
Maryam Khazaei; Nosratali Ashrafi-Payaman
Abstract
Nowadays, whereas the use of social networks and computer networks is increasing, the amount of associated complex data with graph structure and their applications, such as classification, clustering, link prediction, and recommender systems, has risen significantly. Because of security problems and ...
Read More
Nowadays, whereas the use of social networks and computer networks is increasing, the amount of associated complex data with graph structure and their applications, such as classification, clustering, link prediction, and recommender systems, has risen significantly. Because of security problems and societal concerns, anomaly detection is becoming a vital problem in most fields. Applications that use a heterogeneous graph, are confronted with many issues, such as different kinds of neighbors, different feature types, and differences in type and number of links. So, in this research, we employ the HetGNN model with some changes in loss functions and parameters for heterogeneous graph embedding to capture the whole graph features (structure and content) for anomaly detection, then pass it to a VAE to discover anomalous nodes based on reconstruction error. Our experiments on AMiner data set with many base-lines illustrate that our model outperforms state-of-the-arts methods in heterogeneous graphs while considering all types of attributes.
H.3.2.2. Computer vision
Mobina Talebian; Kourosh Kiani; Razieh Rastgoo
Abstract
Fingerprint verification has emerged as a cornerstone of personal identity authentication. This research introduces a deep learning-based framework for enhancing the accuracy of this critical process. By integrating a pre-trained Inception model with a custom-designed architecture, we propose a model ...
Read More
Fingerprint verification has emerged as a cornerstone of personal identity authentication. This research introduces a deep learning-based framework for enhancing the accuracy of this critical process. By integrating a pre-trained Inception model with a custom-designed architecture, we propose a model that effectively extracts discriminative features from fingerprint images. To this end, the input fingerprint image is aligned to a base fingerprint through minutiae vector comparison. The aligned input fingerprint is then subtracted from the base fingerprint to generate a residual image. This residual image, along with the aligned input fingerprint and the base fingerprint, constitutes the three input channels for a pre-trained Inception model. Our main contribution lies in the alignment of fingerprint minutiae, followed by the construction of a color fingerprint representation. Moreover, we collected a dataset, including 200 fingerprint images corresponding to 20 persons, for fingerprint verification. The proposed method is evaluated on two distinct datasets, demonstrating its superiority over existing state-of-the-art techniques. With a verification accuracy of 99.40% on the public Hong Kong Dataset, our approach establishes a new benchmark in fingerprint verification. This research holds the potential for applications in various domains, including law enforcement, border control, and secure access systems.
F. Jafarinejad; R. Farzbood
Abstract
Image retrieval is a basic task in many content-based image systems. Achieving high precision, while maintaining computation time is very important in relevance feedback-based image retrieval systems. This paper establishes an analogy between this and the task of image classification. Therefore, in the ...
Read More
Image retrieval is a basic task in many content-based image systems. Achieving high precision, while maintaining computation time is very important in relevance feedback-based image retrieval systems. This paper establishes an analogy between this and the task of image classification. Therefore, in the image retrieval problem, we will obtain an optimized decision surface that separates dataset images into two categories of relevant/irrelevant images corresponding to the query image. This problem will be viewed and solved as an optimization problem using particle optimization algorithm. Although the particle swarm optimization (PSO) algorithm is widely used in the field of image retrieval, no one use it for directly feature weighting. Information extracted from user feedbacks will guide particles in order to find the optimal weights of various features of images (Color-, shape- or texture-based features). Fusion of these very non-homogenous features need a feature weighting algorithm that will take place by the help of PSO algorithm. Accordingly, an innovative fitness function is proposed to evaluate each particle’s position. Experimental results on Wang dataset and Corel-10k indicate that average precision of the proposed method is higher than other semi-automatic and automatic approaches. Moreover, the proposed method suggest a reduction in the computational complexity in comparison to other PSO-based image retrieval methods.
R. Serajeh; A. Mousavinia; F. Safaei
Abstract
Classical SFM (Structure From Motion) algorithms are widely used to estimate the three-dimensional structure of a stationary scene with a moving camera. However, when there are moving objects in the scene, if the equation of the moving object is unknown, the approach fails. This paper first demonstrates ...
Read More
Classical SFM (Structure From Motion) algorithms are widely used to estimate the three-dimensional structure of a stationary scene with a moving camera. However, when there are moving objects in the scene, if the equation of the moving object is unknown, the approach fails. This paper first demonstrates that when the frame rate is high enough and the object movement is continuous in time, meaning that acceleration is limited, a simple linear model can be effectively used to estimate the motion. This theory is first mathematically proven in a closed-form expression and then optimized by a nonlinear function applicable for our problem. The algorithm is evaluated both on synthesized and real data from Hopkins dataset.
H. Aghabarar; K. Kiani; P. Keshavarzi
Abstract
Nowadays, given the rapid progress in pattern recognition, new ideas such as theoretical mathematics can be exploited to improve the efficiency of these tasks. In this paper, the Discrete Wavelet Transform (DWT) is used as a mathematical framework to demonstrate handwritten digit recognition in spiking ...
Read More
Nowadays, given the rapid progress in pattern recognition, new ideas such as theoretical mathematics can be exploited to improve the efficiency of these tasks. In this paper, the Discrete Wavelet Transform (DWT) is used as a mathematical framework to demonstrate handwritten digit recognition in spiking neural networks (SNNs). The motivation behind this method is that the wavelet transform can divide the spike information and noise into separate frequency subbands and also store the time information. The simulation results show that DWT is an effective and worthy choice and brings the network to an efficiency comparable to previous networks in the spiking field. Initially, DWT is applied to MNIST images in the network input. Subsequently, a type of time encoding called constant-current-Leaky Integrate and Fire (LIF) encoding is applied to the transformed data. Following this, the encoded images are input to the multilayer convolutional spiking network. In this architecture, various wavelets have been investigated, and the highest classification accuracy of 99.25% is achieved.
H.3.2.2. Computer vision
Zobeir Raisi; Valimohammad Nazarzehi; Rasoul Damani; Esmaeil Sarani
Abstract
This paper explores the performance of various object detection techniques for autonomous vehicle perception by analyzing classical machine learning and recent deep learning models. We evaluate three classical methods, including PCA, HOG, and HOG alongside different versions of the SVM classifier, and ...
Read More
This paper explores the performance of various object detection techniques for autonomous vehicle perception by analyzing classical machine learning and recent deep learning models. We evaluate three classical methods, including PCA, HOG, and HOG alongside different versions of the SVM classifier, and five deep-learning models, including Faster-RCNN, SSD, YOLOv3, YOLOv5, and YOLOv9 models using the benchmark INRIA dataset. The experimental results show that although classical methods such as HOG + Gaussian SVM outperform other classical approaches, they are outperformed by deep learning techniques. Furthermore, Classical methods have limitations in detecting partially occluded, distant objects and complex clothing challenges, while recent deep-learning models are more efficient and provide better performance (YOLOv9) on these challenges.
M. Rezaei; H. Nezamabadi-pour
Abstract
The present study aims to overcome some defects of the K-nearest neighbor (K-NN) rule. Two important data preprocessing methods to elevate the K-NN rule are prototype selection (PS) and prototype generation (PG) techniques. Often the advantage of these techniques is investigated separately. In this paper, ...
Read More
The present study aims to overcome some defects of the K-nearest neighbor (K-NN) rule. Two important data preprocessing methods to elevate the K-NN rule are prototype selection (PS) and prototype generation (PG) techniques. Often the advantage of these techniques is investigated separately. In this paper, using the gravitational search algorithm (GSA), two hybrid schemes have been proposed in which PG and PS problems have been considered together. To evaluate the classification performance of these hybrid models, we have performed a comparative experimental study including a comparison between our proposals and some approaches previously studied in the literature using several benchmark datasets. The experimental results demonstrate that our hybrid approaches outperform most of the competitive methods.
M. R. Fallahzadeh; F. Farokhi; A. Harimi; R. Sabbaghi-Nadooshan
Abstract
Facial Expression Recognition (FER) is one of the basic ways of interacting with machines and has been getting more attention in recent years. In this paper, a novel FER system based on a deep convolutional neural network (DCNN) is presented. Motivated by the powerful ability of DCNN to learn features ...
Read More
Facial Expression Recognition (FER) is one of the basic ways of interacting with machines and has been getting more attention in recent years. In this paper, a novel FER system based on a deep convolutional neural network (DCNN) is presented. Motivated by the powerful ability of DCNN to learn features and image classification, the goal of this research is to design a compatible and discriminative input for pre-trained AlexNet-DCNN. The proposed method consists of 4 steps: first, extracting three channels of the image including the original gray-level image, in addition to horizontal and vertical gradients of the image similar to the red, green, and blue color channels of an RGB image as the DCNN input. Second, data augmentation including scale, rotation, width shift, height shift, zoom, horizontal flip, and vertical flip of the images are prepared in addition to the original images for training the DCNN. Then, the AlexNet-DCNN model is applied to learn high-level features corresponding to different emotion classes. Finally, transfer learning is implemented on the proposed model and the presented model is fine-tuned on target datasets. The average recognition accuracy of 92.41% and 93.66% were achieved for JAFEE and CK+ datasets, respectively. Experimental results on two benchmark emotional datasets show promising performance of the proposed model that can improve the performance of current FER systems.
H.3. Artificial Intelligence
Mohammad Hossein Shayesteh; Behrooz Shahrokhzadeh; Behrooz Masoumi
Abstract
This paper provides a comprehensive review of the potential of game theory as a solution for sensor-based human activity recognition (HAR) challenges. Game theory is a mathematical framework that models interactions between multiple entities in various fields, including economics, political science, ...
Read More
This paper provides a comprehensive review of the potential of game theory as a solution for sensor-based human activity recognition (HAR) challenges. Game theory is a mathematical framework that models interactions between multiple entities in various fields, including economics, political science, and computer science. In recent years, game theory has been increasingly applied to machine learning challenges, including HAR, as a potential solution to improve recognition performance and efficiency of recognition algorithms. The review covers the shared challenges between HAR and machine learning, compares previous work on traditional approaches to HAR, and discusses the potential advantages of using game theory. It discusses different game theory approaches, including non-cooperative and cooperative games, and provides insights into how they can improve the HAR systems. The authors propose new game theory-based approaches and evaluate their effectiveness compared to traditional approaches. Overall, this review paper contributes to expanding the scope of research in HAR by introducing game-theoretic concepts and solutions to the field and provides valuable insights for researchers interested in applying game-theoretic approaches to HAR.
H.5. Image Processing and Computer Vision
Jalaluddin Zarei; Mohammad Hossein Khosravi
Abstract
Agricultural experts try to detect leaf diseases in the shortest possible time. However, limitations such as lack of manpower, poor eyesight, lack of sufficient knowledge, and quarantine restrictions in the transfer of diseases to the laboratory can be acceptable reasons to use digital technology to ...
Read More
Agricultural experts try to detect leaf diseases in the shortest possible time. However, limitations such as lack of manpower, poor eyesight, lack of sufficient knowledge, and quarantine restrictions in the transfer of diseases to the laboratory can be acceptable reasons to use digital technology to detect pests and diseases and finally dispose of them. One of the available solutions in this field is using convolutional neural networks. On the other hand, the performance of CNNs depends on the large amount of data. While there is no suitable dataset for the native trees of South Khorasan province, this motivates us to create a suitable dataset with a large amount of data. In this article, we introduce a new dataset in 9 classes of images of Healthy Barberry leaves, Barberry Rust disease, Barberry Pandemis ribeana Tortricidae pest, Healthy Jujube leaves, Jujube Ziziphus Tingid disease, Jujube Parenchyma-Eating Butterfly pest, Healthy Pomegranate leaves, Pomegranate Aphis punicae pest, and Pomegranate Leaf-Cutting Bees pest and also check the performance of several well-known convolutional neural networks using all gradient descent optimizer algorithms on this dataset. Our most important achievement is the creation of a dataset with a high data volume of pests and diseases in different classes. In addition, our experiments show that common CNN architectures, along with gradient descent optimizers, have an acceptable performance on the proposed dataset. We call the proposed dataset ”Birjand Native Plant Leaves (BNPL) Dataset”. It is available at the address https://kaggle.com/datasets/ec17162ca01825fb362419503cbc84c73d162bffe936952253ed522705228e06.
E. Kalhor; B. Bakhtiari
Abstract
Feature selection is the one of the most important steps in designing speech emotion recognition systems. Because there is uncertainty as to which speech feature is related to which emotion, many features must be taken into account and, for this purpose, identifying the most discriminative features is ...
Read More
Feature selection is the one of the most important steps in designing speech emotion recognition systems. Because there is uncertainty as to which speech feature is related to which emotion, many features must be taken into account and, for this purpose, identifying the most discriminative features is necessary. In the interest of selecting appropriate emotion-related speech features, the current paper focuses on a multi-task approach. For this reason, the study considers each speaker as a task and proposes a multi-task objective function to select features. As a result, the proposed method chooses one set of speaker-independent features of which the selected features are discriminative in all emotion classes. Correspondingly, multi-class classifiers are utilized directly or binary classifications simply perform multi-class classifications. In addition, the present work employs two well-known datasets, the Berlin and Enterface. The experiments also applied the openSmile toolkit to extract more than 6500 features. After feature selection phase, the results illustrated that the proposed method selects the features which is common in the different runs. Also, the runtime of proposed method is the lowest in comparison to other methods. Finally, 7 classifiers are employed and the best achieved performance is 73.76% for the Berlin dataset and 72.17% for the Enterface dataset, in the faced of a new speaker .These experimental results then show that the proposed method is superior to existing state-of-the-art methods.