Amin Rahmati; Foad Ghaderi
Abstract
Every facial expression involves one or more facial action units appearing on the face. Therefore, action unit recognition is commonly used to enhance facial expression detection performance. It is important to identify subtle changes in face when particular action units occur. In this paper, we propose ...
Read More
Every facial expression involves one or more facial action units appearing on the face. Therefore, action unit recognition is commonly used to enhance facial expression detection performance. It is important to identify subtle changes in face when particular action units occur. In this paper, we propose an architecture that employs local features extracted from specific regions of face while using global features taken from the whole face. To this end, we combine the SPPNet and FPN modules to architect an end-to-end network for facial action unit recognition. First, different predefined regions of face are detected. Next, the SPPNet module captures deformations in the detected regions. The SPPNet module focuses on each region separately and can not take into account possible changes in the other areas of the face. In parallel, the FPN module finds global features related to each of the facial regions. By combining the two modules, the proposed architecture is able to capture both local and global facial features and enhance the performance of action unit recognition task. Experimental results on DISFA dataset demonstrate the effectiveness of our method.
Fatemeh Alinezhad; Kourosh Kiani; Razieh Rastgoo
Abstract
Gender recognition is an attractive research area in recent years. To make a user-friendly application for gender recognition, having an accurate, fast, and lightweight model applicable in a mobile device is necessary. Although successful results have been obtained using the Convolutional Neural Network ...
Read More
Gender recognition is an attractive research area in recent years. To make a user-friendly application for gender recognition, having an accurate, fast, and lightweight model applicable in a mobile device is necessary. Although successful results have been obtained using the Convolutional Neural Network (CNN), this model needs high computational resources that are not appropriate for mobile and embedded applications. To overcome this challenge and considering the recent advances in Deep Learning, in this paper, we propose a deep learning-based model for gender recognition in mobile devices using the lightweight CNN models. In this way, a pretrained CNN model, entitled Multi-Task Convolutional Neural Network (MTCNN), is used for face detection. Furthermore, the MobileFaceNet model is modified and trained using the Margin Distillation cost function. To boost the model performance, the Dense Block and Depthwise separable convolutions are used in the model. Results on six datasets confirm that the proposed model outperforms the MobileFaceNet model on six datasets with the relative accuracy improvements of 0.02%, 1.39%, 2.18%, 1.34%, 7.51%, 7.93% on the LFW, CPLFW, CFP-FP, VGG2-FP, UTKFace, and own data, respectively. In addition, we collected a dataset, including a total of 100’000 face images from both male and female in different age categories. Images of the women are with and without headgear.
Z. MohammadHosseini; A. Jalaly Bidgoly
Abstract
Social media is an inseparable part of human life, although published information through social media is not always true. Rumors may spread easily and quickly in the social media, hence, it is vital to have a tool for rumor veracity detection. Papers already proved that users’ stance is an important ...
Read More
Social media is an inseparable part of human life, although published information through social media is not always true. Rumors may spread easily and quickly in the social media, hence, it is vital to have a tool for rumor veracity detection. Papers already proved that users’ stance is an important tool for this goal. To the best knowledge of authors, so far, no work has been proposed to study the ordering of users’ stances to achieve the best possible accuracy. In this work, we have investigated the importance of the stances ordering in the efficiency of rumor veracity detection. This paper introduces a concept called trust for stance sequence ordering and shows that proper definition of this function can significantly help improve to improve veracity detection. The paper examines and compares different modes of definition of trust. Then, by choosing the best possible definition, it was able to outperform state-of-the-art results on a well-known dataset in this field, namely SemEval 2019.
H. Gholamalinejad; H. Khosravi
Abstract
Optimizers are vital components of deep neural networks that perform weight updates. This paper introduces a new updating method for optimizers based on gradient descent, called whitened gradient descent (WGD). This method is easy to implement and can be used in every optimizer based on the gradient ...
Read More
Optimizers are vital components of deep neural networks that perform weight updates. This paper introduces a new updating method for optimizers based on gradient descent, called whitened gradient descent (WGD). This method is easy to implement and can be used in every optimizer based on the gradient descent algorithm. It does not increase the training time of the network significantly. This method smooths the training curve and improves classification metrics. To evaluate the proposed algorithm, we performed 48 different tests on two datasets, Cifar100 and Animals-10, using three network structures, including densenet121, resnet18, and resnet50. The experiments show that using the WGD method in gradient descent based optimizers, improves the classification results significantly. For example, integrating WGD in RAdam optimizer increased the accuracy of DenseNet from 87.69% to 90.02% on the Animals-10 dataset.
Kh. Aghajani
Abstract
Emotion recognition has several applications in various fields, including human-computer interactions. In recent years, various methods have been proposed to recognize emotion using facial or speech information. While the fusion of these two has been paid less attention in emotion recognition. In this ...
Read More
Emotion recognition has several applications in various fields, including human-computer interactions. In recent years, various methods have been proposed to recognize emotion using facial or speech information. While the fusion of these two has been paid less attention in emotion recognition. In this paper, first of all, the use of only face or speech information in emotion recognition is examined. For emotion recognition through speech, a pre-trained network called YAMNet is used to extract features. After passing through a convolutional neural network (CNN), the extracted features are then fed into a bi-LSTM with an attention mechanism to perform the recognition. For emotion recognition through facial information, a deep CNN-based model has been proposed. Finally, after reviewing these two approaches, an emotion detection framework based on the fusion of these two models is proposed. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), containing videos taken from 24 actors (12 men and 12 women) with 8 categories has been used to evaluate the proposed model. The results of the implementation show that the combination of the face and speech information improves the performance of the emotion recognizer.
H.3.8. Natural Language Processing
P. Kavehzadeh; M. M. Abdollah Pour; S. Momtazi
Abstract
Over the last few years, text chunking has taken a significant part in sequence labeling tasks. Although a large variety of methods have been proposed for shallow parsing in English, most proposed approaches for text chunking in Persian language are based on simple and traditional concepts. In this paper, ...
Read More
Over the last few years, text chunking has taken a significant part in sequence labeling tasks. Although a large variety of methods have been proposed for shallow parsing in English, most proposed approaches for text chunking in Persian language are based on simple and traditional concepts. In this paper, we propose using the state-of-the-art transformer-based contextualized models, namely BERT and XLM-RoBERTa, as the major structure of our models. Conditional Random Field (CRF), the combination of Bidirectional Long Short-Term Memory (BiLSTM) and CRF, and a simple dense layer are employed after the transformer-based models to enhance the model's performance in predicting chunk labels. Moreover, we provide a new dataset for noun phrase chunking in Persian which includes annotated data of Persian news text. Our experiments reveal that XLM-RoBERTa achieves the best performance between all the architectures tried on the proposed dataset. The results also show that using a single CRF layer would yield better results than a dense layer and even the combination of BiLSTM and CRF.
N. Shayanfar; V. Derhami; M. Rezaeian
Abstract
In video prediction it is expected to predict next frame of video by providing a sequence of input frames. Whereas numerous studies exist that tackle frame prediction, suitable performance is not still achieved and therefore the application is an open problem. In this article multiscale processing is ...
Read More
In video prediction it is expected to predict next frame of video by providing a sequence of input frames. Whereas numerous studies exist that tackle frame prediction, suitable performance is not still achieved and therefore the application is an open problem. In this article multiscale processing is studied for video prediction and a new network architecture for multiscale processing is presented. This architecture is in the broad family of autoencoders. It is comprised of an encoder and decoder. A pretrained VGG is used as an encoder that processes a pyramid of input frames at multiple scales simultaneously. The decoder is based on 3D convolutional neurons. The presented architecture is studied by using three different datasets with varying degree of difficulty. In addition, the proposed approach is compared to two conventional autoencoders. It is observed that by using the pretrained network and multiscale processing results in a performant approach.
Kh. Aghajani
Abstract
Deep-learning-based approaches have been extensively used in detecting pulmonary nodules from computer Tomography (CT) scans. In this study, an automated end-to-end framework with a convolution network (Conv-net) has been proposed to detect lung nodules from CT images. Here, boundary regression has been ...
Read More
Deep-learning-based approaches have been extensively used in detecting pulmonary nodules from computer Tomography (CT) scans. In this study, an automated end-to-end framework with a convolution network (Conv-net) has been proposed to detect lung nodules from CT images. Here, boundary regression has been performed by a direct regression method, in which the offset is predicted from a given point. The proposed framework has two outputs; a pixel-wise classification between nodule or normal and a direct regression which is used to determine the four coordinates of the nodule's bounding box. The Loss function includes two terms; one for classification and the other for regression. The performance of the proposed method is compared with YOLOv2. The evaluation has been performed using Lung-Pet-CT-DX dataset. The experimental results show that the proposed framework outperforms the YOLOv2 method. The results demonstrate that the proposed framework possesses high accuracies of nodule localization and boundary estimation.
A. Torkaman; K. Badie; A. Salajegheh; M. H. Bokaei; Seyed F. Fatemi
Abstract
Recently, network representation has attracted many research works mostly concentrating on representing of nodes in a dense low-dimensional vector. There exist some network embedding methods focusing only on the node structure and some others considering the content information within the nodes. In this ...
Read More
Recently, network representation has attracted many research works mostly concentrating on representing of nodes in a dense low-dimensional vector. There exist some network embedding methods focusing only on the node structure and some others considering the content information within the nodes. In this paper, we propose HDNR; a hybrid deep network representation model, which uses a triplet deep neural network architecture that considers both the node structure and content information for network representation. In addition, the author's writing style is also considered as a significant feature in the node content information. Inspired by the application of deep learning in natural language processing, our model utilizes a deep random walk method to exploit inter-node structures and two deep sequence prediction methods to extract nodes' content information. The embedding vectors generated in this manner were shown to have the ability of boosting each other for learning optimal node representation, detecting more informative features and ultimately a better community detection. The experimental results confirm the effectiveness of this model for network representation compared to other baseline methods.
F. Baratzadeh; Seyed M. H. Hasheminejad
Abstract
With the advancement of technology, the daily use of bank credit cards has been increasing exponentially. Therefore, the fraudulent use of credit cards by others as one of the new crimes is also growing fast. For this reason, detecting and preventing these attacks has become an active area of study. ...
Read More
With the advancement of technology, the daily use of bank credit cards has been increasing exponentially. Therefore, the fraudulent use of credit cards by others as one of the new crimes is also growing fast. For this reason, detecting and preventing these attacks has become an active area of study. This article discusses the challenges of detecting fraudulent banking transactions and presents solutions based on deep learning. Transactions are examined and compared with other traditional models in fraud detection. According to the results obtained, optimal performance is related to the combined model of deep convolutional networks and short-term memory, which is trained using the aggregated data received from the generative adversarial network. This paper intends to produce sensible data to address the unequal class distribution problem, which is far more effective than traditional methods. Also, it uses the strengths of the two approaches by combining deep convolutional network and Long Short Term Memory network to improve performance. Due to the inefficiency of evaluation criteria such as accuracy in this application, the measure of distance score and the equal error rate has been used to evaluate models more transparent and more precise. Traditional methods were compared to the proposed approach to evaluate the efficiency of the experiment.
A. Lakizadeh; E. Moradizadeh
Abstract
Text sentiment classification in aspect level is one of the hottest research topics in the field of natural language processing. The purpose of the aspect-level sentiment analysis is to determine the polarity of the text according to a particular aspect. Recently, various methods have been developed ...
Read More
Text sentiment classification in aspect level is one of the hottest research topics in the field of natural language processing. The purpose of the aspect-level sentiment analysis is to determine the polarity of the text according to a particular aspect. Recently, various methods have been developed to determine sentiment polarity of the text at the aspect level, however, these studies have not yet been able to model well complementary effects of the context and aspect in the polarization detection process. Here, we present ACTSC, a method for determining the sentiment polarity of the text based on separate embedding of aspects and context. In the first step, ACTSC deals with separate modelling of the aspects and context to extract new representation vectors. Next, by combining generative representations of aspect and context, it determines the corresponding polarity to each particular aspect using a short-term memory network and a self-attention mechanism. Experimental results in the SemEval2014 dataset in both restaurant and laptop categories show that ACTSC has been able to improve the accuracy of aspect-based sentiment classification compared to the latest proposed methods.
M. Nasiri; H. Rahmani
Abstract
Determining the personality dimensions of individuals is very important in psychological research. The most well-known example of personality dimensions is the Five-Factor Model (FFM). There are two approaches 1- Manual and 2- Automatic for determining the personality dimensions. In a manual approach, ...
Read More
Determining the personality dimensions of individuals is very important in psychological research. The most well-known example of personality dimensions is the Five-Factor Model (FFM). There are two approaches 1- Manual and 2- Automatic for determining the personality dimensions. In a manual approach, Psychologists discover these dimensions through personality questionnaires. As an automatic way, varied personal input types (textual/image/video) of people are gathered and analyzed for this purpose. In this paper, we proposed a method called DENOVA (DEep learning based on the ANOVA), which predicts FFM using deep learning based on the Analysis of variance (ANOVA) of words. For this purpose, DENOVA first applies ANOVA to select the most informative terms. Then, DENOVA employs Word2Vec to extract document embeddings. Finally, DENOVA uses Support Vector Machine (SVM), Logistic Regression, XGBoost, and Multilayer perceptron (MLP) as classifiers to predict FFM. The experimental results show that DENOVA outperforms on average, 6.91%, the state-of-the-art methods in predicting FFM with respect to accuracy.
E. Pejhan; M. Ghasemzadeh
Abstract
This research is related to the development of technology in the field of automatic text to image generation. In this regard, two main goals are pursued; first, the generated image should look as real as possible; and second, the generated image should be a meaningful description of the input text. our ...
Read More
This research is related to the development of technology in the field of automatic text to image generation. In this regard, two main goals are pursued; first, the generated image should look as real as possible; and second, the generated image should be a meaningful description of the input text. our proposed method is a Multi Sentences Hierarchical GAN (MSH-GAN) for text to image generation. In this research project, we have considered two main strategies: 1) produce a higher quality image in the first step, and 2) use two additional descriptions to improve the original image in the next steps. Our goal is to focus on using more information to generate images with higher resolution by using more than one sentence input text. We have proposed different models based on GANs and Memory Networks. We have also used more challenging dataset called ids-ade. This is the first time; this dataset has been used in this area. We have evaluated our models based on IS, FID and, R-precision evaluation metrics. Experimental results demonstrate that our best model performs favorably against the basic state-of-the-art approaches like StackGAN and AttGAN.
K. Kiani; R. Hematpour; R. Rastgoo
Abstract
Image colorization is an interesting yet challenging task due to the descriptive nature of getting a natural-looking color image from any grayscale image. To tackle this challenge and also have a fully automatic procedure, we propose a Convolutional Neural Network (CNN)-based model to benefit from the ...
Read More
Image colorization is an interesting yet challenging task due to the descriptive nature of getting a natural-looking color image from any grayscale image. To tackle this challenge and also have a fully automatic procedure, we propose a Convolutional Neural Network (CNN)-based model to benefit from the impressive ability of CNN in the image processing tasks. To this end, we propose a deep-based model for automatic grayscale image colorization. Harnessing from convolutional-based pre-trained models, we fuse three pre-trained models, VGG16, ResNet50, and Inception-v2, to improve the model performance. The average of three model outputs is used to obtain more rich features in the model. The fused features are fed to an encoder-decoder network to obtain a color image from a grayscale input image. We perform a step-by-step analysis of different pre-trained models and fusion methodologies to include a more accurate combination of these models in the proposed model. Results on LFW and ImageNet datasets confirm the effectiveness of our model compared to state-of-the-art alternatives in the field.
H. Sadr; Mir M. Pedram; M. Teshnehlab
Abstract
With the rapid development of textual information on the web, sentiment analysis is changing to an essential analytic tool rather than an academic endeavor and numerous studies have been carried out in recent years to address this issue. By the emergence of deep learning, deep neural networks have attracted ...
Read More
With the rapid development of textual information on the web, sentiment analysis is changing to an essential analytic tool rather than an academic endeavor and numerous studies have been carried out in recent years to address this issue. By the emergence of deep learning, deep neural networks have attracted a lot of attention and become mainstream in this field. Despite the remarkable success of deep learning models for sentiment analysis of text, they are in the early steps of development and their potential is yet to be fully explored. Convolutional neural network is one of the deep learning methods that has been surpassed for sentiment analysis but is confronted with some limitations. Firstly, convolutional neural network requires a large number of training data. Secondly, it assumes that all words in a sentence have an equal contribution to the polarity of a sentence. To fill these lacunas, a convolutional neural network equipped with the attention mechanism is proposed in this paper which not only takes advantage of the attention mechanism but also utilizes transfer learning to boost the performance of sentiment analysis. According to the empirical results, our proposed model achieved comparable or even better classification accuracy than the state-of-the-art methods.
A. Alijamaat; A. Reza NikravanShalmani; P. Bayat
Abstract
Multiple Sclerosis (MS) is a disease that destructs the central nervous system cell protection, destroys sheaths of immune cells, and causes lesions. Examination and diagnosis of lesions by specialists is usually done manually on Magnetic Resonance Imaging (MRI) images of the brain. Factors such as small ...
Read More
Multiple Sclerosis (MS) is a disease that destructs the central nervous system cell protection, destroys sheaths of immune cells, and causes lesions. Examination and diagnosis of lesions by specialists is usually done manually on Magnetic Resonance Imaging (MRI) images of the brain. Factors such as small sizes of lesions, their dispersion in the brain, similarity of lesions to some other diseases, and their overlap can lead to the misdiagnosis. Automatic image detection methods as auxiliary tools can increase the diagnosis accuracy. To this end, traditional image processing methods and deep learning approaches have been used. Deep Convolutional Neural Network is a common method of deep learning to detect lesions in images. In this network, the convolution layer extracts the specificities; and the pooling layer decreases the specificity map size. The present research uses the wavelet-transform-based pooling. In addition to decomposing the input image and reducing its size, the wavelet transform highlights sharp changes in the image and better describes local specificities. Therefore, using this transform can improve the diagnosis. The proposed method is based on six convolutional layers, two layers of wavelet pooling, and a completely connected layer that had a better amount of accuracy than the studied methods. The accuracy of 98.92%, precision of 99.20%, and specificity of 98.33% are obtained by testing the image data of 38 patients and 20 healthy individuals.
Seyedeh S. Sadeghi; H. Khotanlou; M. Rasekh Mahand
Abstract
In the modern age, written sources are rapidly increasing. A growing number of these data are related to the texts containing the feelings and opinions of the users. Thus, reviewing and analyzing of emotional texts have received a particular attention in recent years. A System which is based on combination ...
Read More
In the modern age, written sources are rapidly increasing. A growing number of these data are related to the texts containing the feelings and opinions of the users. Thus, reviewing and analyzing of emotional texts have received a particular attention in recent years. A System which is based on combination of cognitive features and deep neural network, Gated Recurrent Unit has been proposed in this paper. Five basic emotions used in this approach are: anger, happiness, sadness, surprise and fear. A total of 23,000 Persian documents by the average length of 24 have been labeled for this research. Emotional constructions, emotional keywords, and emotional POS are the basic cognitive features used in this approach. On the other hand, after preprocessing the texts, words of normalized text have been embedded by Word2Vec technique. Then, a deep learning approach has been done based on this embedded data. Finally, classification algorithms such as Naïve Bayes, decision tree, and support vector machines were used to classify emotions based on concatenation of defined cognitive features, and deep learning features. 10-fold cross validation has been used to evaluate the performance of the proposed system. Experimental results show the proposed system achieved the accuracy of 97%. Result of proposed system shows the improvement of several percent’s in comparison by other results achieved GRU and cognitive features in isolation. At the end, studying other statistical features and improving these cognitive features in more details can affect the results.
H. Gholamalinejad; H. Khosravi
Abstract
In recent years, vehicle classification has been one of the most important research topics. However, due to the lack of a proper dataset, this field has not been well developed as other fields of intelligent traffic management. Therefore, the preparation of large-scale datasets of vehicles for each country ...
Read More
In recent years, vehicle classification has been one of the most important research topics. However, due to the lack of a proper dataset, this field has not been well developed as other fields of intelligent traffic management. Therefore, the preparation of large-scale datasets of vehicles for each country is of great interest. In this paper, we introduce a new standard dataset of popular Iranian vehicles. This dataset, which consists of images from moving vehicles in urban streets and highways, can be used for vehicle classification and license plate recognition. It contains a large collection of vehicle images in different dimensions, viewing angles, weather, and lighting conditions. It took more than a year to construct this dataset. Images are taken from various types of mounted cameras, with different resolutions and at different altitudes. To estimate the complexity of the dataset, some classic methods alongside popular Deep Neural Networks are trained and evaluated on the dataset. Furthermore, two light-weight CNN structures are also proposed. One with 3-Conv layers and another with 5-Conv layers. The 5-Conv model with 152K parameters reached the recognition rate of 99.09% and can process 48 frames per second on CPU which is suitable for real-time applications.
A. Lakizadeh; Z. Zinaty
Abstract
Aspect-level sentiment classification is an essential issue in sentiment analysis that intends to resolve the sentiment polarity of a specific aspect mentioned in the input text. Recent methods have discovered the role of aspects in sentiment polarity classification and developed various techniques to ...
Read More
Aspect-level sentiment classification is an essential issue in sentiment analysis that intends to resolve the sentiment polarity of a specific aspect mentioned in the input text. Recent methods have discovered the role of aspects in sentiment polarity classification and developed various techniques to assess the sentiment polarity of each aspect in the text. However, these studies do not pay enough attention to the need for vectors to be optimal for the aspect. To address this issue, in the present study, we suggest a Hierarchical Attention-based Method (HAM) for aspect-based polarity classification of the text. HAM works in a hierarchically manner; firstly, it extracts an embedding vector for aspects. Next, it employs these aspect vectors with information content to determine the sentiment of the text. The experimental findings on the SemEval2014 data set show that HAM can improve accuracy by up to 6.74% compared to the state-of-the-art methods in aspect-based sentiment classification task.
N. Majidi; K. Kiani; R. Rastgoo
Abstract
This study presents a method to reconstruct a high-resolution image using a deep convolution neural network. We propose a deep model, entitled Deep Block Super Resolution (DBSR), by fusing the output features of a deep convolutional network and a shallow convolutional network. In this way, our model ...
Read More
This study presents a method to reconstruct a high-resolution image using a deep convolution neural network. We propose a deep model, entitled Deep Block Super Resolution (DBSR), by fusing the output features of a deep convolutional network and a shallow convolutional network. In this way, our model benefits from high frequency and low frequency features extracted from deep and shallow networks simultaneously. We use the residual layers in our model to make repetitive layers, increase the depth of the model, and make an end-to-end model. Furthermore, we employed a deep network in up-sampling step instead of bicubic interpolation method used in most of the previous works. Since the image resolution plays an important role to obtain rich information from the medical images and helps for accurate and faster diagnosis of the ailment, we use the medical images for resolution enhancement. Our model is capable of reconstructing a high-resolution image from low-resolution one in both medical and general images. Evaluation results on TSA and TZDE datasets, including MRI images, and Set5, Set14, B100, and Urban100 datasets, including general images, demonstrate that our model outperforms state-of-the-art alternatives in both areas of medical and general super-resolution enhancement from a single input image.
H.3. Artificial Intelligence
M. Kurmanji; F. Ghaderi
Abstract
Despite considerable enhances in recognizing hand gestures from still images, there are still many challenges in the classification of hand gestures in videos. The latter comes with more challenges, including higher computational complexity and arduous task of representing temporal features. Hand movement ...
Read More
Despite considerable enhances in recognizing hand gestures from still images, there are still many challenges in the classification of hand gestures in videos. The latter comes with more challenges, including higher computational complexity and arduous task of representing temporal features. Hand movement dynamics, represented by temporal features, have to be extracted by analyzing the total frames of a video. So far, both 2D and 3D convolutional neural networks have been used to manipulate the temporal dynamics of the video frames. 3D CNNs can extract the changes in the consecutive frames and tend to be more suitable for the video classification task, however, they usually need more time. On the other hand, by using techniques like tiling it is possible to aggregate all the frames in a single matrix and preserve the temporal and spatial features. This way, using 2D CNNs, which are inherently simpler than 3D CNNs can be used to classify the video instances. In this paper, we compared the application of 2D and 3D CNNs for representing temporal features and classifying hand gesture sequences. Additionally, providing a two-stage two-stream architecture, we efficiently combined color and depth modalities and 2D and 3D CNN predictions. The effect of different types of augmentation techniques is also investigated. Our results confirm that appropriate usage of 2D CNNs outperforms a 3D CNN implementation in this task.