H.3.2.2. Computer vision
Mobina Talebian; Kourosh Kiani; Razieh Rastgoo
Abstract
Fingerprint verification has emerged as a cornerstone of personal identity authentication. This research introduces a deep learning-based framework for enhancing the accuracy of this critical process. By integrating a pre-trained Inception model with a custom-designed architecture, we propose a model ...
Read More
Fingerprint verification has emerged as a cornerstone of personal identity authentication. This research introduces a deep learning-based framework for enhancing the accuracy of this critical process. By integrating a pre-trained Inception model with a custom-designed architecture, we propose a model that effectively extracts discriminative features from fingerprint images. To this end, the input fingerprint image is aligned to a base fingerprint through minutiae vector comparison. The aligned input fingerprint is then subtracted from the base fingerprint to generate a residual image. This residual image, along with the aligned input fingerprint and the base fingerprint, constitutes the three input channels for a pre-trained Inception model. Our main contribution lies in the alignment of fingerprint minutiae, followed by the construction of a color fingerprint representation. Moreover, we collected a dataset, including 200 fingerprint images corresponding to 20 persons, for fingerprint verification. The proposed method is evaluated on two distinct datasets, demonstrating its superiority over existing state-of-the-art techniques. With a verification accuracy of 99.40% on the public Hong Kong Dataset, our approach establishes a new benchmark in fingerprint verification. This research holds the potential for applications in various domains, including law enforcement, border control, and secure access systems.
H.3.2.2. Computer vision
Zobeir Raisi; Valimohammad Nazarzehi; Rasoul Damani; Esmaeil Sarani
Abstract
This paper explores the performance of various object detection techniques for autonomous vehicle perception by analyzing classical machine learning and recent deep learning models. We evaluate three classical methods, including PCA, HOG, and HOG alongside different versions of the SVM classifier, and ...
Read More
This paper explores the performance of various object detection techniques for autonomous vehicle perception by analyzing classical machine learning and recent deep learning models. We evaluate three classical methods, including PCA, HOG, and HOG alongside different versions of the SVM classifier, and five deep-learning models, including Faster-RCNN, SSD, YOLOv3, YOLOv5, and YOLOv9 models using the benchmark INRIA dataset. The experimental results show that although classical methods such as HOG + Gaussian SVM outperform other classical approaches, they are outperformed by deep learning techniques. Furthermore, Classical methods have limitations in detecting partially occluded, distant objects and complex clothing challenges, while recent deep-learning models are more efficient and provide better performance (YOLOv9) on these challenges.
H.5. Image Processing and Computer Vision
Sekine Asadi Amiri; Fatemeh Mohammady
Abstract
Fungal infections, capable of establishing in various tissues and organs, are responsible for many human diseases that can lead to serious complications. The initial step in diagnosing fungal infections typically involves the examination of microscopic images. Direct microscopic examination using potassium ...
Read More
Fungal infections, capable of establishing in various tissues and organs, are responsible for many human diseases that can lead to serious complications. The initial step in diagnosing fungal infections typically involves the examination of microscopic images. Direct microscopic examination using potassium hydroxide is commonly employed as a screening method for diagnosing superficial fungal infections. Although this type of examination is quicker than other diagnostic methods, the evaluation of a complete sample can be time-consuming. Moreover, the diagnostic accuracy of these methods may vary depending on the skill of the practitioner and does not guarantee full reliability. This paper introduces a novel approach for diagnosing fungal infections using a modified VGG19 deep learning architecture. The method incorporates two significant changes: replacing the Flatten layer with Global Average Pooling (GAP) to reduce feature count and model complexity, thereby enhancing the extraction of significant features from images. Additionally, a Dense layer with 1024 neurons is added post-GAP, enabling the model to better learn and integrate these features. The Defungi microscopic dataset was used for training and evaluating the model. The proposed method can identify fungal diseases with an accuracy of 97%, significantly outperforming the best existing method, which achieved an accuracy of 92.49%. This method not only significantly outperforms existing methods, but also, given its high accuracy, is valuable in the field of diagnosing fungal infections. This work demonstrates that the use of deep learning in diagnosing fungal diseases can lead to a substantial improvement in the quality of health services.
H.6.2. Models
Simon Kawuma; Elias Kumbakumba; Vicent Mabirizi; Deborah Nanjebe; Kenneth Mworozi; Adolf Oyesigye Mukama; Lydia Kyasimire
Abstract
Tuberculosis (TB) is an underestimated cause of death in children, with only 45% of cases correctly diagnosed and reported. It is estimated that 1.12 million TB cases occurred among newborns, children, and adolescents aged less or equal 14 years. In Uganda, TB prevalence is 8.5% in children and 16.7% ...
Read More
Tuberculosis (TB) is an underestimated cause of death in children, with only 45% of cases correctly diagnosed and reported. It is estimated that 1.12 million TB cases occurred among newborns, children, and adolescents aged less or equal 14 years. In Uganda, TB prevalence is 8.5% in children and 16.7% in adolescents. Treatment and diagnosing TB is difficulty and its high mortality rate is due to many gaps in the diagnosis of this illness especially among children. As a strategy to curb TB mortality rate in children, there exist a need to improve and expedite the screening for TB among children. Chest X-ray (CXR) are commonly used in TB burden countries like Uganda to diagnose TB patients but interpretation of the patients’ radiograph needs skilled radiologists who are few. To this end, this research aims to close the TB mortality gap in children by applying AI, primarily deep learning techniques, to detect TB in children. The study created five models, one from scratch and four transfer learning and were trained and verified using digital CXR radiograph images of children who visit the TB clinic at Mbarara Regional Referral Hospital. The model classifies clinical images of patients into normal or Tuberculosis. Transfer learning models; VGG16, VGG19, Inception V3, and ResNet50 outperformed scratch model with validation accuracy of 79.91%, 69.21%, 53.0%, 51.09% and 50.01% respectively. We hope that once the deep learning models are implemented and adopted by the radiologist, it will reduce the time spent by radiologist while analyzing CXR images.
H.5. Image Processing and Computer Vision
Farima Fakouri; Mohsen Nikpour; Abbas Soleymani Amiri
Abstract
Due to the increased mortality caused by brain tumors, accurate and fast diagnosis of brain tumors is necessary to implement the treatment of this disease. In this research, brain tumor classification performed using a network based on ResNet architecture in MRI images. MRI images that available in the ...
Read More
Due to the increased mortality caused by brain tumors, accurate and fast diagnosis of brain tumors is necessary to implement the treatment of this disease. In this research, brain tumor classification performed using a network based on ResNet architecture in MRI images. MRI images that available in the cancer image archive database included 159 patients. First, two filters called median and Gaussian filters were used to improve the quality of the images. An edge detection operator is also used to identify the edges of the image. Second, the proposed network was first trained with the original images of the database, then with Gaussian filtered and Median filtered images. Finally, accuracy, specificity and sensitivity criteria have been used to evaluate the results. Proposed method in this study was lead to 87.21%, 90.35% and 93.86% accuracy for original, Gaussian filtered and Median filtered images. Also, the sensitivity and specificity was calculated 82.3% and 84.3% for the original images, respectively. Sensitivity for Gaussian and Median filtered images was calculated 90.8% and 91.57%, respectively and specificity was calculated 93.01% and 93.36%, respectively. As a conclusion, image processing approaches in preprocessing stage should be investigated to improve the performance of deep learning networks.
H.5. Image Processing and Computer Vision
Mohammad Mahdi Nakhaie; Sasan Karamizadeh; Mohammad Ebrahim Shiri; Kambiz Badie
Abstract
Lung cancer is a highly serious illness, and detecting cancer cells early significantly enhances patients' chances of recovery. Doctors regularly examine a large number of CT scan images, which can lead to fatigue and errors. Therefore, there is a need to create a tool that can automatically detect and ...
Read More
Lung cancer is a highly serious illness, and detecting cancer cells early significantly enhances patients' chances of recovery. Doctors regularly examine a large number of CT scan images, which can lead to fatigue and errors. Therefore, there is a need to create a tool that can automatically detect and classify lung nodules in their early stages. Computer-aided diagnosis systems, often employing image processing and machine learning techniques, assist radiologists in identifying and categorizing these nodules. Previous studies have often used complex models or pre-trained networks that demand significant computational power and a long time to execute. Our goal is to achieve accurate diagnosis without the need for extensive computational resources. We introduce a simple convolutional neural network with only two convolution layers, capable of accurately classifying nodules without requiring advanced computing capabilities. We conducted training and validation on two datasets, LIDC-IDRI and LUNA16, achieving impressive accuracies of 99.7% and 97.52%, respectively. These results demonstrate the superior accuracy of our proposed model compared to state-of-the-art research papers.
H.3. Artificial Intelligence
Mahdi Rasouli; Vahid Kiani
Abstract
The identification of emotions in short texts of low-resource languages poses a significant challenge, requiring specialized frameworks and computational intelligence techniques. This paper presents a comprehensive exploration of shallow and deep learning methods for emotion detection in short Persian ...
Read More
The identification of emotions in short texts of low-resource languages poses a significant challenge, requiring specialized frameworks and computational intelligence techniques. This paper presents a comprehensive exploration of shallow and deep learning methods for emotion detection in short Persian texts. Shallow learning methods employ feature extraction and dimension reduction to enhance classification accuracy. On the other hand, deep learning methods utilize transfer learning and word embedding, particularly BERT, to achieve high classification accuracy. A Persian dataset called "ShortPersianEmo" is introduced to evaluate the proposed methods, comprising 5472 diverse short Persian texts labeled in five main emotion classes. The evaluation results demonstrate that transfer learning and BERT-based text embedding perform better in accurately classifying short Persian texts than alternative approaches. The dataset of this study ShortPersianEmo will be publicly available online at https://github.com/vkiani/ShortPersianEmo.
Document and Text Processing
Mina Tabatabaei; Hossein Rahmani; Motahareh Nasiri
Abstract
The search for effective treatments for complex diseases, while minimizing toxicity and side effects, has become crucial. However, identifying synergistic combinations of drugs is often a time-consuming and expensive process, relying on trial and error due to the vast search space involved. Addressing ...
Read More
The search for effective treatments for complex diseases, while minimizing toxicity and side effects, has become crucial. However, identifying synergistic combinations of drugs is often a time-consuming and expensive process, relying on trial and error due to the vast search space involved. Addressing this issue, we present a deep learning framework in this study. Our framework utilizes a diverse set of features, including chemical structure, biomedical literature embedding, and biological network interaction data, to predict potential synergistic combinations. Additionally, we employ autoencoders and principal component analysis (PCA) for dimension reduction in sparse data. Through 10-fold cross-validation, we achieved an impressive 98 percent area under the curve (AUC), surpassing the performance of seven previous state-of-the-art approaches by an average of 8%.
Amin Rahmati; Foad Ghaderi
Abstract
Every facial expression involves one or more facial action units appearing on the face. Therefore, action unit recognition is commonly used to enhance facial expression detection performance. It is important to identify subtle changes in face when particular action units occur. In this paper, we propose ...
Read More
Every facial expression involves one or more facial action units appearing on the face. Therefore, action unit recognition is commonly used to enhance facial expression detection performance. It is important to identify subtle changes in face when particular action units occur. In this paper, we propose an architecture that employs local features extracted from specific regions of face while using global features taken from the whole face. To this end, we combine the SPPNet and FPN modules to architect an end-to-end network for facial action unit recognition. First, different predefined regions of face are detected. Next, the SPPNet module captures deformations in the detected regions. The SPPNet module focuses on each region separately and can not take into account possible changes in the other areas of the face. In parallel, the FPN module finds global features related to each of the facial regions. By combining the two modules, the proposed architecture is able to capture both local and global facial features and enhance the performance of action unit recognition task. Experimental results on DISFA dataset demonstrate the effectiveness of our method.
Fatemeh Alinezhad; Kourosh Kiani; Razieh Rastgoo
Abstract
Gender recognition is an attractive research area in recent years. To make a user-friendly application for gender recognition, having an accurate, fast, and lightweight model applicable in a mobile device is necessary. Although successful results have been obtained using the Convolutional Neural Network ...
Read More
Gender recognition is an attractive research area in recent years. To make a user-friendly application for gender recognition, having an accurate, fast, and lightweight model applicable in a mobile device is necessary. Although successful results have been obtained using the Convolutional Neural Network (CNN), this model needs high computational resources that are not appropriate for mobile and embedded applications. To overcome this challenge and considering the recent advances in Deep Learning, in this paper, we propose a deep learning-based model for gender recognition in mobile devices using the lightweight CNN models. In this way, a pretrained CNN model, entitled Multi-Task Convolutional Neural Network (MTCNN), is used for face detection. Furthermore, the MobileFaceNet model is modified and trained using the Margin Distillation cost function. To boost the model performance, the Dense Block and Depthwise separable convolutions are used in the model. Results on six datasets confirm that the proposed model outperforms the MobileFaceNet model on six datasets with the relative accuracy improvements of 0.02%, 1.39%, 2.18%, 1.34%, 7.51%, 7.93% on the LFW, CPLFW, CFP-FP, VGG2-FP, UTKFace, and own data, respectively. In addition, we collected a dataset, including a total of 100’000 face images from both male and female in different age categories. Images of the women are with and without headgear.
H.3. Artificial Intelligence
Mohammad Hossein Shayesteh; Behrooz Shahrokhzadeh; Behrooz Masoumi
Abstract
This paper provides a comprehensive review of the potential of game theory as a solution for sensor-based human activity recognition (HAR) challenges. Game theory is a mathematical framework that models interactions between multiple entities in various fields, including economics, political science, ...
Read More
This paper provides a comprehensive review of the potential of game theory as a solution for sensor-based human activity recognition (HAR) challenges. Game theory is a mathematical framework that models interactions between multiple entities in various fields, including economics, political science, and computer science. In recent years, game theory has been increasingly applied to machine learning challenges, including HAR, as a potential solution to improve recognition performance and efficiency of recognition algorithms. The review covers the shared challenges between HAR and machine learning, compares previous work on traditional approaches to HAR, and discusses the potential advantages of using game theory. It discusses different game theory approaches, including non-cooperative and cooperative games, and provides insights into how they can improve the HAR systems. The authors propose new game theory-based approaches and evaluate their effectiveness compared to traditional approaches. Overall, this review paper contributes to expanding the scope of research in HAR by introducing game-theoretic concepts and solutions to the field and provides valuable insights for researchers interested in applying game-theoretic approaches to HAR.
Z. MohammadHosseini; A. Jalaly Bidgoly
Abstract
Social media is an inseparable part of human life, although published information through social media is not always true. Rumors may spread easily and quickly in the social media, hence, it is vital to have a tool for rumor veracity detection. Papers already proved that users’ stance is an important ...
Read More
Social media is an inseparable part of human life, although published information through social media is not always true. Rumors may spread easily and quickly in the social media, hence, it is vital to have a tool for rumor veracity detection. Papers already proved that users’ stance is an important tool for this goal. To the best knowledge of authors, so far, no work has been proposed to study the ordering of users’ stances to achieve the best possible accuracy. In this work, we have investigated the importance of the stances ordering in the efficiency of rumor veracity detection. This paper introduces a concept called trust for stance sequence ordering and shows that proper definition of this function can significantly help improve to improve veracity detection. The paper examines and compares different modes of definition of trust. Then, by choosing the best possible definition, it was able to outperform state-of-the-art results on a well-known dataset in this field, namely SemEval 2019.
H. Gholamalinejad; H. Khosravi
Abstract
Optimizers are vital components of deep neural networks that perform weight updates. This paper introduces a new updating method for optimizers based on gradient descent, called whitened gradient descent (WGD). This method is easy to implement and can be used in every optimizer based on the gradient ...
Read More
Optimizers are vital components of deep neural networks that perform weight updates. This paper introduces a new updating method for optimizers based on gradient descent, called whitened gradient descent (WGD). This method is easy to implement and can be used in every optimizer based on the gradient descent algorithm. It does not increase the training time of the network significantly. This method smooths the training curve and improves classification metrics. To evaluate the proposed algorithm, we performed 48 different tests on two datasets, Cifar100 and Animals-10, using three network structures, including densenet121, resnet18, and resnet50. The experiments show that using the WGD method in gradient descent based optimizers, improves the classification results significantly. For example, integrating WGD in RAdam optimizer increased the accuracy of DenseNet from 87.69% to 90.02% on the Animals-10 dataset.
Kh. Aghajani
Abstract
Emotion recognition has several applications in various fields, including human-computer interactions. In recent years, various methods have been proposed to recognize emotion using facial or speech information. While the fusion of these two has been paid less attention in emotion recognition. In this ...
Read More
Emotion recognition has several applications in various fields, including human-computer interactions. In recent years, various methods have been proposed to recognize emotion using facial or speech information. While the fusion of these two has been paid less attention in emotion recognition. In this paper, first of all, the use of only face or speech information in emotion recognition is examined. For emotion recognition through speech, a pre-trained network called YAMNet is used to extract features. After passing through a convolutional neural network (CNN), the extracted features are then fed into a bi-LSTM with an attention mechanism to perform the recognition. For emotion recognition through facial information, a deep CNN-based model has been proposed. Finally, after reviewing these two approaches, an emotion detection framework based on the fusion of these two models is proposed. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), containing videos taken from 24 actors (12 men and 12 women) with 8 categories has been used to evaluate the proposed model. The results of the implementation show that the combination of the face and speech information improves the performance of the emotion recognizer.
H.3.8. Natural Language Processing
P. Kavehzadeh; M. M. Abdollah Pour; S. Momtazi
Abstract
Over the last few years, text chunking has taken a significant part in sequence labeling tasks. Although a large variety of methods have been proposed for shallow parsing in English, most proposed approaches for text chunking in Persian language are based on simple and traditional concepts. In this paper, ...
Read More
Over the last few years, text chunking has taken a significant part in sequence labeling tasks. Although a large variety of methods have been proposed for shallow parsing in English, most proposed approaches for text chunking in Persian language are based on simple and traditional concepts. In this paper, we propose using the state-of-the-art transformer-based contextualized models, namely BERT and XLM-RoBERTa, as the major structure of our models. Conditional Random Field (CRF), the combination of Bidirectional Long Short-Term Memory (BiLSTM) and CRF, and a simple dense layer are employed after the transformer-based models to enhance the model's performance in predicting chunk labels. Moreover, we provide a new dataset for noun phrase chunking in Persian which includes annotated data of Persian news text. Our experiments reveal that XLM-RoBERTa achieves the best performance between all the architectures tried on the proposed dataset. The results also show that using a single CRF layer would yield better results than a dense layer and even the combination of BiLSTM and CRF.
N. Shayanfar; V. Derhami; M. Rezaeian
Abstract
In video prediction it is expected to predict next frame of video by providing a sequence of input frames. Whereas numerous studies exist that tackle frame prediction, suitable performance is not still achieved and therefore the application is an open problem. In this article multiscale processing is ...
Read More
In video prediction it is expected to predict next frame of video by providing a sequence of input frames. Whereas numerous studies exist that tackle frame prediction, suitable performance is not still achieved and therefore the application is an open problem. In this article multiscale processing is studied for video prediction and a new network architecture for multiscale processing is presented. This architecture is in the broad family of autoencoders. It is comprised of an encoder and decoder. A pretrained VGG is used as an encoder that processes a pyramid of input frames at multiple scales simultaneously. The decoder is based on 3D convolutional neurons. The presented architecture is studied by using three different datasets with varying degree of difficulty. In addition, the proposed approach is compared to two conventional autoencoders. It is observed that by using the pretrained network and multiscale processing results in a performant approach.
Kh. Aghajani
Abstract
Deep-learning-based approaches have been extensively used in detecting pulmonary nodules from computer Tomography (CT) scans. In this study, an automated end-to-end framework with a convolution network (Conv-net) has been proposed to detect lung nodules from CT images. Here, boundary regression has been ...
Read More
Deep-learning-based approaches have been extensively used in detecting pulmonary nodules from computer Tomography (CT) scans. In this study, an automated end-to-end framework with a convolution network (Conv-net) has been proposed to detect lung nodules from CT images. Here, boundary regression has been performed by a direct regression method, in which the offset is predicted from a given point. The proposed framework has two outputs; a pixel-wise classification between nodule or normal and a direct regression which is used to determine the four coordinates of the nodule's bounding box. The Loss function includes two terms; one for classification and the other for regression. The performance of the proposed method is compared with YOLOv2. The evaluation has been performed using Lung-Pet-CT-DX dataset. The experimental results show that the proposed framework outperforms the YOLOv2 method. The results demonstrate that the proposed framework possesses high accuracies of nodule localization and boundary estimation.
A. Torkaman; K. Badie; A. Salajegheh; M. H. Bokaei; Seyed F. Fatemi
Abstract
Recently, network representation has attracted many research works mostly concentrating on representing of nodes in a dense low-dimensional vector. There exist some network embedding methods focusing only on the node structure and some others considering the content information within the nodes. In this ...
Read More
Recently, network representation has attracted many research works mostly concentrating on representing of nodes in a dense low-dimensional vector. There exist some network embedding methods focusing only on the node structure and some others considering the content information within the nodes. In this paper, we propose HDNR; a hybrid deep network representation model, which uses a triplet deep neural network architecture that considers both the node structure and content information for network representation. In addition, the author's writing style is also considered as a significant feature in the node content information. Inspired by the application of deep learning in natural language processing, our model utilizes a deep random walk method to exploit inter-node structures and two deep sequence prediction methods to extract nodes' content information. The embedding vectors generated in this manner were shown to have the ability of boosting each other for learning optimal node representation, detecting more informative features and ultimately a better community detection. The experimental results confirm the effectiveness of this model for network representation compared to other baseline methods.
F. Baratzadeh; Seyed M. H. Hasheminejad
Abstract
With the advancement of technology, the daily use of bank credit cards has been increasing exponentially. Therefore, the fraudulent use of credit cards by others as one of the new crimes is also growing fast. For this reason, detecting and preventing these attacks has become an active area of study. ...
Read More
With the advancement of technology, the daily use of bank credit cards has been increasing exponentially. Therefore, the fraudulent use of credit cards by others as one of the new crimes is also growing fast. For this reason, detecting and preventing these attacks has become an active area of study. This article discusses the challenges of detecting fraudulent banking transactions and presents solutions based on deep learning. Transactions are examined and compared with other traditional models in fraud detection. According to the results obtained, optimal performance is related to the combined model of deep convolutional networks and short-term memory, which is trained using the aggregated data received from the generative adversarial network. This paper intends to produce sensible data to address the unequal class distribution problem, which is far more effective than traditional methods. Also, it uses the strengths of the two approaches by combining deep convolutional network and Long Short Term Memory network to improve performance. Due to the inefficiency of evaluation criteria such as accuracy in this application, the measure of distance score and the equal error rate has been used to evaluate models more transparent and more precise. Traditional methods were compared to the proposed approach to evaluate the efficiency of the experiment.
A. Lakizadeh; E. Moradizadeh
Abstract
Text sentiment classification in aspect level is one of the hottest research topics in the field of natural language processing. The purpose of the aspect-level sentiment analysis is to determine the polarity of the text according to a particular aspect. Recently, various methods have been developed ...
Read More
Text sentiment classification in aspect level is one of the hottest research topics in the field of natural language processing. The purpose of the aspect-level sentiment analysis is to determine the polarity of the text according to a particular aspect. Recently, various methods have been developed to determine sentiment polarity of the text at the aspect level, however, these studies have not yet been able to model well complementary effects of the context and aspect in the polarization detection process. Here, we present ACTSC, a method for determining the sentiment polarity of the text based on separate embedding of aspects and context. In the first step, ACTSC deals with separate modelling of the aspects and context to extract new representation vectors. Next, by combining generative representations of aspect and context, it determines the corresponding polarity to each particular aspect using a short-term memory network and a self-attention mechanism. Experimental results in the SemEval2014 dataset in both restaurant and laptop categories show that ACTSC has been able to improve the accuracy of aspect-based sentiment classification compared to the latest proposed methods.
M. Nasiri; H. Rahmani
Abstract
Determining the personality dimensions of individuals is very important in psychological research. The most well-known example of personality dimensions is the Five-Factor Model (FFM). There are two approaches 1- Manual and 2- Automatic for determining the personality dimensions. In a manual approach, ...
Read More
Determining the personality dimensions of individuals is very important in psychological research. The most well-known example of personality dimensions is the Five-Factor Model (FFM). There are two approaches 1- Manual and 2- Automatic for determining the personality dimensions. In a manual approach, Psychologists discover these dimensions through personality questionnaires. As an automatic way, varied personal input types (textual/image/video) of people are gathered and analyzed for this purpose. In this paper, we proposed a method called DENOVA (DEep learning based on the ANOVA), which predicts FFM using deep learning based on the Analysis of variance (ANOVA) of words. For this purpose, DENOVA first applies ANOVA to select the most informative terms. Then, DENOVA employs Word2Vec to extract document embeddings. Finally, DENOVA uses Support Vector Machine (SVM), Logistic Regression, XGBoost, and Multilayer perceptron (MLP) as classifiers to predict FFM. The experimental results show that DENOVA outperforms on average, 6.91%, the state-of-the-art methods in predicting FFM with respect to accuracy.
E. Pejhan; M. Ghasemzadeh
Abstract
This research is related to the development of technology in the field of automatic text to image generation. In this regard, two main goals are pursued; first, the generated image should look as real as possible; and second, the generated image should be a meaningful description of the input text. our ...
Read More
This research is related to the development of technology in the field of automatic text to image generation. In this regard, two main goals are pursued; first, the generated image should look as real as possible; and second, the generated image should be a meaningful description of the input text. our proposed method is a Multi Sentences Hierarchical GAN (MSH-GAN) for text to image generation. In this research project, we have considered two main strategies: 1) produce a higher quality image in the first step, and 2) use two additional descriptions to improve the original image in the next steps. Our goal is to focus on using more information to generate images with higher resolution by using more than one sentence input text. We have proposed different models based on GANs and Memory Networks. We have also used more challenging dataset called ids-ade. This is the first time; this dataset has been used in this area. We have evaluated our models based on IS, FID and, R-precision evaluation metrics. Experimental results demonstrate that our best model performs favorably against the basic state-of-the-art approaches like StackGAN and AttGAN.
K. Kiani; R. Hematpour; R. Rastgoo
Abstract
Image colorization is an interesting yet challenging task due to the descriptive nature of getting a natural-looking color image from any grayscale image. To tackle this challenge and also have a fully automatic procedure, we propose a Convolutional Neural Network (CNN)-based model to benefit from the ...
Read More
Image colorization is an interesting yet challenging task due to the descriptive nature of getting a natural-looking color image from any grayscale image. To tackle this challenge and also have a fully automatic procedure, we propose a Convolutional Neural Network (CNN)-based model to benefit from the impressive ability of CNN in the image processing tasks. To this end, we propose a deep-based model for automatic grayscale image colorization. Harnessing from convolutional-based pre-trained models, we fuse three pre-trained models, VGG16, ResNet50, and Inception-v2, to improve the model performance. The average of three model outputs is used to obtain more rich features in the model. The fused features are fed to an encoder-decoder network to obtain a color image from a grayscale input image. We perform a step-by-step analysis of different pre-trained models and fusion methodologies to include a more accurate combination of these models in the proposed model. Results on LFW and ImageNet datasets confirm the effectiveness of our model compared to state-of-the-art alternatives in the field.
H. Sadr; Mir M. Pedram; M. Teshnehlab
Abstract
With the rapid development of textual information on the web, sentiment analysis is changing to an essential analytic tool rather than an academic endeavor and numerous studies have been carried out in recent years to address this issue. By the emergence of deep learning, deep neural networks have attracted ...
Read More
With the rapid development of textual information on the web, sentiment analysis is changing to an essential analytic tool rather than an academic endeavor and numerous studies have been carried out in recent years to address this issue. By the emergence of deep learning, deep neural networks have attracted a lot of attention and become mainstream in this field. Despite the remarkable success of deep learning models for sentiment analysis of text, they are in the early steps of development and their potential is yet to be fully explored. Convolutional neural network is one of the deep learning methods that has been surpassed for sentiment analysis but is confronted with some limitations. Firstly, convolutional neural network requires a large number of training data. Secondly, it assumes that all words in a sentence have an equal contribution to the polarity of a sentence. To fill these lacunas, a convolutional neural network equipped with the attention mechanism is proposed in this paper which not only takes advantage of the attention mechanism but also utilizes transfer learning to boost the performance of sentiment analysis. According to the empirical results, our proposed model achieved comparable or even better classification accuracy than the state-of-the-art methods.
A. Alijamaat; A. Reza NikravanShalmani; P. Bayat
Abstract
Multiple Sclerosis (MS) is a disease that destructs the central nervous system cell protection, destroys sheaths of immune cells, and causes lesions. Examination and diagnosis of lesions by specialists is usually done manually on Magnetic Resonance Imaging (MRI) images of the brain. Factors such as small ...
Read More
Multiple Sclerosis (MS) is a disease that destructs the central nervous system cell protection, destroys sheaths of immune cells, and causes lesions. Examination and diagnosis of lesions by specialists is usually done manually on Magnetic Resonance Imaging (MRI) images of the brain. Factors such as small sizes of lesions, their dispersion in the brain, similarity of lesions to some other diseases, and their overlap can lead to the misdiagnosis. Automatic image detection methods as auxiliary tools can increase the diagnosis accuracy. To this end, traditional image processing methods and deep learning approaches have been used. Deep Convolutional Neural Network is a common method of deep learning to detect lesions in images. In this network, the convolution layer extracts the specificities; and the pooling layer decreases the specificity map size. The present research uses the wavelet-transform-based pooling. In addition to decomposing the input image and reducing its size, the wavelet transform highlights sharp changes in the image and better describes local specificities. Therefore, using this transform can improve the diagnosis. The proposed method is based on six convolutional layers, two layers of wavelet pooling, and a completely connected layer that had a better amount of accuracy than the studied methods. The accuracy of 98.92%, precision of 99.20%, and specificity of 98.33% are obtained by testing the image data of 38 patients and 20 healthy individuals.