H.3.8. Natural Language Processing
Nura Esfandiari; Kourosh Kiani; Razieh Rastgoo
Abstract
A chatbot is a computer program system designed to simulate human-like conversations and interact with users. It is a form of conversational agent that utilizes Natural Language Processing (NLP) and sequential models to understand user input, interpret their intent, and generate appropriate answer. This ...
Read More
A chatbot is a computer program system designed to simulate human-like conversations and interact with users. It is a form of conversational agent that utilizes Natural Language Processing (NLP) and sequential models to understand user input, interpret their intent, and generate appropriate answer. This approach aims to generate word sequences in the form of coherent phrases. A notable challenge associated with previous models lies in their sequential training process, which can result in less accurate outcomes. To address this limitation, a novel generative chatbot is proposed, integrating the power of Reinforcement Learning (RL) and transformer models. The proposed chatbot aims to overcome the challenges associated with sequential training by combining these two approaches. The proposed approach employs a Double Deep Q-Network (DDQN) architecture with utilizing a transformer model as the agent. This agent takes the human question as an input state and generates the bot answer as an action. To the best of our knowledge, this is the first time that a generative chatbot is proposed using a DDQN architecture with the embedded transformer as an agent. Results on two public datasets, Daily Dialog and Chit-Chat, validate the superiority of the proposed approach over state-of-the-art models involves employing various evaluation metrics.
H.3.2.2. Computer vision
Mobina Talebian; Kourosh Kiani; Razieh Rastgoo
Abstract
Fingerprint verification has emerged as a cornerstone of personal identity authentication. This research introduces a deep learning-based framework for enhancing the accuracy of this critical process. By integrating a pre-trained Inception model with a custom-designed architecture, we propose a model ...
Read More
Fingerprint verification has emerged as a cornerstone of personal identity authentication. This research introduces a deep learning-based framework for enhancing the accuracy of this critical process. By integrating a pre-trained Inception model with a custom-designed architecture, we propose a model that effectively extracts discriminative features from fingerprint images. To this end, the input fingerprint image is aligned to a base fingerprint through minutiae vector comparison. The aligned input fingerprint is then subtracted from the base fingerprint to generate a residual image. This residual image, along with the aligned input fingerprint and the base fingerprint, constitutes the three input channels for a pre-trained Inception model. Our main contribution lies in the alignment of fingerprint minutiae, followed by the construction of a color fingerprint representation. Moreover, we collected a dataset, including 200 fingerprint images corresponding to 20 persons, for fingerprint verification. The proposed method is evaluated on two distinct datasets, demonstrating its superiority over existing state-of-the-art techniques. With a verification accuracy of 99.40% on the public Hong Kong Dataset, our approach establishes a new benchmark in fingerprint verification. This research holds the potential for applications in various domains, including law enforcement, border control, and secure access systems.
H.3.2.2. Computer vision
Masoumeh Esmaeiili; Kourosh Kiani
Abstract
The classification of emotions using electroencephalography (EEG) signals is inherently challenging due to the intricate nature of brain activity. Overcoming inconsistencies in EEG signals and establishing a universally applicable sentiment analysis model are essential objectives. This study introduces ...
Read More
The classification of emotions using electroencephalography (EEG) signals is inherently challenging due to the intricate nature of brain activity. Overcoming inconsistencies in EEG signals and establishing a universally applicable sentiment analysis model are essential objectives. This study introduces an innovative approach to cross-subject emotion recognition, employing a genetic algorithm (GA) to eliminate non-informative frames. Then, the optimal frames identified by the GA undergo spatial feature extraction using common spatial patterns (CSP) and the logarithm of variance. Subsequently, these features are input into a Transformer network to capture spatial-temporal features, and the emotion classification is executed using a fully connected (FC) layer with a Softmax activation function. Therefore, the innovations of this paper include using a limited number of channels for emotion classification without sacrificing accuracy, selecting optimal signal segments using the GA, and employing the Transformer network for high-accuracy and high-speed classification. The proposed method undergoes evaluation on two publicly accessible datasets, SEED and SEED-V, across two distinct scenarios. Notably, it attains mean accuracy rates of 99.96% and 99.51% in the cross-subject scenario, and 99.93% and 99.43% in the multi-subject scenario for the SEED and SEED-V datasets, respectively. Noteworthy is the outperformance of the proposed method over the state-of-the-art (SOTA) in both scenarios for both datasets, thus underscoring its superior efficacy. Additionally, comparing the accuracy of individual subjects with previous works in cross subject scenario further confirms the superiority of the proposed method for both datasets.
Fatemeh Alinezhad; Kourosh Kiani; Razieh Rastgoo
Abstract
Gender recognition is an attractive research area in recent years. To make a user-friendly application for gender recognition, having an accurate, fast, and lightweight model applicable in a mobile device is necessary. Although successful results have been obtained using the Convolutional Neural Network ...
Read More
Gender recognition is an attractive research area in recent years. To make a user-friendly application for gender recognition, having an accurate, fast, and lightweight model applicable in a mobile device is necessary. Although successful results have been obtained using the Convolutional Neural Network (CNN), this model needs high computational resources that are not appropriate for mobile and embedded applications. To overcome this challenge and considering the recent advances in Deep Learning, in this paper, we propose a deep learning-based model for gender recognition in mobile devices using the lightweight CNN models. In this way, a pretrained CNN model, entitled Multi-Task Convolutional Neural Network (MTCNN), is used for face detection. Furthermore, the MobileFaceNet model is modified and trained using the Margin Distillation cost function. To boost the model performance, the Dense Block and Depthwise separable convolutions are used in the model. Results on six datasets confirm that the proposed model outperforms the MobileFaceNet model on six datasets with the relative accuracy improvements of 0.02%, 1.39%, 2.18%, 1.34%, 7.51%, 7.93% on the LFW, CPLFW, CFP-FP, VGG2-FP, UTKFace, and own data, respectively. In addition, we collected a dataset, including a total of 100’000 face images from both male and female in different age categories. Images of the women are with and without headgear.
H. Aghabarar; K. Kiani; P. Keshavarzi
Abstract
Nowadays, given the rapid progress in pattern recognition, new ideas such as theoretical mathematics can be exploited to improve the efficiency of these tasks. In this paper, the Discrete Wavelet Transform (DWT) is used as a mathematical framework to demonstrate handwritten digit recognition in spiking ...
Read More
Nowadays, given the rapid progress in pattern recognition, new ideas such as theoretical mathematics can be exploited to improve the efficiency of these tasks. In this paper, the Discrete Wavelet Transform (DWT) is used as a mathematical framework to demonstrate handwritten digit recognition in spiking neural networks (SNNs). The motivation behind this method is that the wavelet transform can divide the spike information and noise into separate frequency subbands and also store the time information. The simulation results show that DWT is an effective and worthy choice and brings the network to an efficiency comparable to previous networks in the spiking field. Initially, DWT is applied to MNIST images in the network input. Subsequently, a type of time encoding called constant-current-Leaky Integrate and Fire (LIF) encoding is applied to the transformed data. Following this, the encoded images are input to the multilayer convolutional spiking network. In this architecture, various wavelets have been investigated, and the highest classification accuracy of 99.25% is achieved.
K. Kiani; R. Hematpour; R. Rastgoo
Abstract
Image colorization is an interesting yet challenging task due to the descriptive nature of getting a natural-looking color image from any grayscale image. To tackle this challenge and also have a fully automatic procedure, we propose a Convolutional Neural Network (CNN)-based model to benefit from the ...
Read More
Image colorization is an interesting yet challenging task due to the descriptive nature of getting a natural-looking color image from any grayscale image. To tackle this challenge and also have a fully automatic procedure, we propose a Convolutional Neural Network (CNN)-based model to benefit from the impressive ability of CNN in the image processing tasks. To this end, we propose a deep-based model for automatic grayscale image colorization. Harnessing from convolutional-based pre-trained models, we fuse three pre-trained models, VGG16, ResNet50, and Inception-v2, to improve the model performance. The average of three model outputs is used to obtain more rich features in the model. The fused features are fed to an encoder-decoder network to obtain a color image from a grayscale input image. We perform a step-by-step analysis of different pre-trained models and fusion methodologies to include a more accurate combination of these models in the proposed model. Results on LFW and ImageNet datasets confirm the effectiveness of our model compared to state-of-the-art alternatives in the field.
A. Fakhari; K. Kiani
Abstract
Image restoration and its different variations are important topics in low-level image processing. One of the main challenges in image restoration is dependency of current methods to the corruption characteristics. In this paper, we have proposed an image restoration architecture that enables us to address ...
Read More
Image restoration and its different variations are important topics in low-level image processing. One of the main challenges in image restoration is dependency of current methods to the corruption characteristics. In this paper, we have proposed an image restoration architecture that enables us to address different types of corruption, regardless of type, amount and location. The main intuition behind our approach is restoring original images from abstracted perceptual features. Using an encoder-decoder architecture, image restoration can be defined as an image transformation task. Abstraction of perceptual features is done in the encoder part of the model and determines the sampling point within original images' Probability Density Function (PDF). The PDF of original images is learned in the decoder section by using a Generative Adversarial Network (GAN) that receives the sampling point from the encoder part. Concretely, sampling from the learned PDF restores original image from its corrupted version. Pretrained network extracts perceptual features and Restricted Boltzmann Machine (RBM) makes the abstraction over them in the encoder section. By developing a new algorithm for training the RBM, the features of the corrupted images have been refined. In the decoder, the Generator network restores original images from abstracted perceptual features while Discriminator determines how good the restoration result is. The proposed approach has been compared with both traditional approaches like BM3D and with modern deep models like IRCNN and NCSR. We have also considered three different categories of corruption including denoising, inpainting and deblurring. Experimental results confirm performance of the model.
N. Majidi; K. Kiani; R. Rastgoo
Abstract
This study presents a method to reconstruct a high-resolution image using a deep convolution neural network. We propose a deep model, entitled Deep Block Super Resolution (DBSR), by fusing the output features of a deep convolutional network and a shallow convolutional network. In this way, our model ...
Read More
This study presents a method to reconstruct a high-resolution image using a deep convolution neural network. We propose a deep model, entitled Deep Block Super Resolution (DBSR), by fusing the output features of a deep convolutional network and a shallow convolutional network. In this way, our model benefits from high frequency and low frequency features extracted from deep and shallow networks simultaneously. We use the residual layers in our model to make repetitive layers, increase the depth of the model, and make an end-to-end model. Furthermore, we employed a deep network in up-sampling step instead of bicubic interpolation method used in most of the previous works. Since the image resolution plays an important role to obtain rich information from the medical images and helps for accurate and faster diagnosis of the ailment, we use the medical images for resolution enhancement. Our model is capable of reconstructing a high-resolution image from low-resolution one in both medical and general images. Evaluation results on TSA and TZDE datasets, including MRI images, and Set5, Set14, B100, and Urban100 datasets, including general images, demonstrate that our model outperforms state-of-the-art alternatives in both areas of medical and general super-resolution enhancement from a single input image.