H.3.2.2. Computer vision
Razieh Rastgoo
Abstract
Sign language (SL) is the primary mode of communication within the Deaf community. Recent advances in deep learning have led to the development of various applications and technologies aimed at facilitating bidirectional communication between the Deaf and hearing communities. However, challenges remain ...
Read More
Sign language (SL) is the primary mode of communication within the Deaf community. Recent advances in deep learning have led to the development of various applications and technologies aimed at facilitating bidirectional communication between the Deaf and hearing communities. However, challenges remain in the availability of suitable datasets for deep learning-based models. Only a few public large-scale annotated datasets are available for sign sentences, and none exist for Persian Sign Language sentences. To address this gap, we have collected a large-scale dataset comprising 10,000 sign sentence videos corresponding to 100 Persian sign sentences. This dataset includes comprehensive annotations such as the bounding box of the detected hand, class labels, hand pose parameters, and heatmaps. A notable feature of the proposed dataset is that it contains isolated signs corresponding to the sign sentences within the dataset. To analyze the complexity of the proposed dataset, we present extensive experiments and discuss the results. More concretely, the results of the models in key sub-domains relevant to Sign Language Recognition (SLR), including hand detection, pose estimation, real-time tracking, and gesture recognition, have been included and analyzed. Moreover, the results of seven deep learning-based models on the proposed datasets have been discussed. Finally, the results of Sign Language Production (SLP) using deep generative models have been presented. We report the experimental results of these models from these sub-areas, showcasing their performance on the proposed dataset.
H.5. Image Processing and Computer Vision
Pouria Maleki; Abbas Ramazani; Hassan Khotanlou; Sina Ojaghi
Abstract
Providing a dataset with a suitable volume and high accuracy for training deep neural networks is considered to be one of the basic requirements in that a suitable dataset in terms of the number and quality of images and labeling accuracy can have a great impact on the output accuracy of the trained ...
Read More
Providing a dataset with a suitable volume and high accuracy for training deep neural networks is considered to be one of the basic requirements in that a suitable dataset in terms of the number and quality of images and labeling accuracy can have a great impact on the output accuracy of the trained network. The dataset presented in this article contains 3000 images downloaded from online Iranian car sales companies, including Divar and Bama sites, which are manually labeled in three classes: car, truck, and bus. The labels are in the form of 5765 bounding boxes, which characterize the vehicles in the image with high accuracy, ultimately resulting in a unique dataset that is made available for public use.The YOLOv8s algorithm, trained on this dataset, achieves an impressive final precision of 91.7% for validation images. The Mean Average Precision (mAP) at a 50% threshold is recorded at 92.6%. This precision is considered suitable for city vehicle detection networks. Notably, when comparing the YOLOv8s algorithm trained with this dataset to YOLOv8s trained with the COCO dataset, there is a remarkable 10% increase in mAP at 50% and an approximately 22% improvement in the mAP range of 50% to 95%.