H.3.2.2. Computer vision
Razieh Rastgoo
Abstract
Sign language (SL) is the primary mode of communication within the Deaf community. Recent advances in deep learning have led to the development of various applications and technologies aimed at facilitating bidirectional communication between the Deaf and hearing communities. However, challenges remain ...
Read More
Sign language (SL) is the primary mode of communication within the Deaf community. Recent advances in deep learning have led to the development of various applications and technologies aimed at facilitating bidirectional communication between the Deaf and hearing communities. However, challenges remain in the availability of suitable datasets for deep learning-based models. Only a few public large-scale annotated datasets are available for sign sentences, and none exist for Persian Sign Language sentences. To address this gap, we have collected a large-scale dataset comprising 10,000 sign sentence videos corresponding to 100 Persian sign sentences. This dataset includes comprehensive annotations such as the bounding box of the detected hand, class labels, hand pose parameters, and heatmaps. A notable feature of the proposed dataset is that it contains isolated signs corresponding to the sign sentences within the dataset. To analyze the complexity of the proposed dataset, we present extensive experiments and discuss the results. More concretely, the results of the models in key sub-domains relevant to Sign Language Recognition (SLR), including hand detection, pose estimation, real-time tracking, and gesture recognition, have been included and analyzed. Moreover, the results of seven deep learning-based models on the proposed datasets have been discussed. Finally, the results of Sign Language Production (SLP) using deep generative models have been presented. We report the experimental results of these models from these sub-areas, showcasing their performance on the proposed dataset.
H.3.8. Natural Language Processing
Nura Esfandiari; Kourosh Kiani; Razieh Rastgoo
Abstract
Chatbots are computer programs designed to simulate human conversation. Powered by artificial intelligence (AI), these chatbots are increasingly used to provide customer service, particularly by large language models (LLMs). A process known as fine-tuning LLMs is employed to personalize chatbot answers. ...
Read More
Chatbots are computer programs designed to simulate human conversation. Powered by artificial intelligence (AI), these chatbots are increasingly used to provide customer service, particularly by large language models (LLMs). A process known as fine-tuning LLMs is employed to personalize chatbot answers. This process demands substantial high-quality data and computational resources. In this article, to overcome the computational hurdles associated with fine-tuning LLMs, innovative hybrid approach is proposed. This approach aims to enhance the answers generated by LLMs, specifically for Persian chatbots used in mobile customer services. A transformer-based evaluation model was developed to score generated answers and select the most appropriate answers. Additionally, a Persian language dataset tailored to the domain of mobile sales was collected to support the personalization of the Persian chatbot and the training of the evaluation model. This approach is expected to foster increased customer interaction and boost sales within the Persian mobile phone market. Experiments conducted on four different LLMs demonstrated the effectiveness of the proposed approach in generating more relevant and semantically accurate answers for users.
H.3.8. Natural Language Processing
Nura Esfandiari; Kourosh Kiani; Razieh Rastgoo
Abstract
A chatbot is a computer program system designed to simulate human-like conversations and interact with users. It is a form of conversational agent that utilizes Natural Language Processing (NLP) and sequential models to understand user input, interpret their intent, and generate appropriate answer. This ...
Read More
A chatbot is a computer program system designed to simulate human-like conversations and interact with users. It is a form of conversational agent that utilizes Natural Language Processing (NLP) and sequential models to understand user input, interpret their intent, and generate appropriate answer. This approach aims to generate word sequences in the form of coherent phrases. A notable challenge associated with previous models lies in their sequential training process, which can result in less accurate outcomes. To address this limitation, a novel generative chatbot is proposed, integrating the power of Reinforcement Learning (RL) and transformer models. The proposed chatbot aims to overcome the challenges associated with sequential training by combining these two approaches. The proposed approach employs a Double Deep Q-Network (DDQN) architecture with utilizing a transformer model as the agent. This agent takes the human question as an input state and generates the bot answer as an action. To the best of our knowledge, this is the first time that a generative chatbot is proposed using a DDQN architecture with the embedded transformer as an agent. Results on two public datasets, Daily Dialog and Chit-Chat, validate the superiority of the proposed approach over state-of-the-art models involves employing various evaluation metrics.
H.3.2.2. Computer vision
Mobina Talebian; Kourosh Kiani; Razieh Rastgoo
Abstract
Fingerprint verification has emerged as a cornerstone of personal identity authentication. This research introduces a deep learning-based framework for enhancing the accuracy of this critical process. By integrating a pre-trained Inception model with a custom-designed architecture, we propose a model ...
Read More
Fingerprint verification has emerged as a cornerstone of personal identity authentication. This research introduces a deep learning-based framework for enhancing the accuracy of this critical process. By integrating a pre-trained Inception model with a custom-designed architecture, we propose a model that effectively extracts discriminative features from fingerprint images. To this end, the input fingerprint image is aligned to a base fingerprint through minutiae vector comparison. The aligned input fingerprint is then subtracted from the base fingerprint to generate a residual image. This residual image, along with the aligned input fingerprint and the base fingerprint, constitutes the three input channels for a pre-trained Inception model. Our main contribution lies in the alignment of fingerprint minutiae, followed by the construction of a color fingerprint representation. Moreover, we collected a dataset, including 200 fingerprint images corresponding to 20 persons, for fingerprint verification. The proposed method is evaluated on two distinct datasets, demonstrating its superiority over existing state-of-the-art techniques. With a verification accuracy of 99.40% on the public Hong Kong Dataset, our approach establishes a new benchmark in fingerprint verification. This research holds the potential for applications in various domains, including law enforcement, border control, and secure access systems.
Fatemeh Alinezhad; Kourosh Kiani; Razieh Rastgoo
Abstract
Gender recognition is an attractive research area in recent years. To make a user-friendly application for gender recognition, having an accurate, fast, and lightweight model applicable in a mobile device is necessary. Although successful results have been obtained using the Convolutional Neural Network ...
Read More
Gender recognition is an attractive research area in recent years. To make a user-friendly application for gender recognition, having an accurate, fast, and lightweight model applicable in a mobile device is necessary. Although successful results have been obtained using the Convolutional Neural Network (CNN), this model needs high computational resources that are not appropriate for mobile and embedded applications. To overcome this challenge and considering the recent advances in Deep Learning, in this paper, we propose a deep learning-based model for gender recognition in mobile devices using the lightweight CNN models. In this way, a pretrained CNN model, entitled Multi-Task Convolutional Neural Network (MTCNN), is used for face detection. Furthermore, the MobileFaceNet model is modified and trained using the Margin Distillation cost function. To boost the model performance, the Dense Block and Depthwise separable convolutions are used in the model. Results on six datasets confirm that the proposed model outperforms the MobileFaceNet model on six datasets with the relative accuracy improvements of 0.02%, 1.39%, 2.18%, 1.34%, 7.51%, 7.93% on the LFW, CPLFW, CFP-FP, VGG2-FP, UTKFace, and own data, respectively. In addition, we collected a dataset, including a total of 100’000 face images from both male and female in different age categories. Images of the women are with and without headgear.
K. Kiani; R. Hematpour; R. Rastgoo
Abstract
Image colorization is an interesting yet challenging task due to the descriptive nature of getting a natural-looking color image from any grayscale image. To tackle this challenge and also have a fully automatic procedure, we propose a Convolutional Neural Network (CNN)-based model to benefit from the ...
Read More
Image colorization is an interesting yet challenging task due to the descriptive nature of getting a natural-looking color image from any grayscale image. To tackle this challenge and also have a fully automatic procedure, we propose a Convolutional Neural Network (CNN)-based model to benefit from the impressive ability of CNN in the image processing tasks. To this end, we propose a deep-based model for automatic grayscale image colorization. Harnessing from convolutional-based pre-trained models, we fuse three pre-trained models, VGG16, ResNet50, and Inception-v2, to improve the model performance. The average of three model outputs is used to obtain more rich features in the model. The fused features are fed to an encoder-decoder network to obtain a color image from a grayscale input image. We perform a step-by-step analysis of different pre-trained models and fusion methodologies to include a more accurate combination of these models in the proposed model. Results on LFW and ImageNet datasets confirm the effectiveness of our model compared to state-of-the-art alternatives in the field.
N. Majidi; K. Kiani; R. Rastgoo
Abstract
This study presents a method to reconstruct a high-resolution image using a deep convolution neural network. We propose a deep model, entitled Deep Block Super Resolution (DBSR), by fusing the output features of a deep convolutional network and a shallow convolutional network. In this way, our model ...
Read More
This study presents a method to reconstruct a high-resolution image using a deep convolution neural network. We propose a deep model, entitled Deep Block Super Resolution (DBSR), by fusing the output features of a deep convolutional network and a shallow convolutional network. In this way, our model benefits from high frequency and low frequency features extracted from deep and shallow networks simultaneously. We use the residual layers in our model to make repetitive layers, increase the depth of the model, and make an end-to-end model. Furthermore, we employed a deep network in up-sampling step instead of bicubic interpolation method used in most of the previous works. Since the image resolution plays an important role to obtain rich information from the medical images and helps for accurate and faster diagnosis of the ailment, we use the medical images for resolution enhancement. Our model is capable of reconstructing a high-resolution image from low-resolution one in both medical and general images. Evaluation results on TSA and TZDE datasets, including MRI images, and Set5, Set14, B100, and Urban100 datasets, including general images, demonstrate that our model outperforms state-of-the-art alternatives in both areas of medical and general super-resolution enhancement from a single input image.