H.3.2.2. Computer vision
Zobeir Raisi; Valimohammad Nazarzehi; Rasoul Damani; Esmaeil Sarani
Abstract
This paper explores the performance of various object detection techniques for autonomous vehicle perception by analyzing classical machine learning and recent deep learning models. We evaluate three classical methods, including PCA, HOG, and HOG alongside different versions of the SVM classifier, and ...
Read More
This paper explores the performance of various object detection techniques for autonomous vehicle perception by analyzing classical machine learning and recent deep learning models. We evaluate three classical methods, including PCA, HOG, and HOG alongside different versions of the SVM classifier, and five deep-learning models, including Faster-RCNN, SSD, YOLOv3, YOLOv5, and YOLOv9 models using the benchmark INRIA dataset. The experimental results show that although classical methods such as HOG + Gaussian SVM outperform other classical approaches, they are outperformed by deep learning techniques. Furthermore, Classical methods have limitations in detecting partially occluded, distant objects and complex clothing challenges, while recent deep-learning models are more efficient and provide better performance (YOLOv9) on these challenges.
H.3.2.2. Computer vision
Mobina Talebian; Kourosh Kiani; Razieh Rastgoo
Abstract
Fingerprint verification has emerged as a cornerstone of personal identity authentication. This research introduces a deep learning-based framework for enhancing the accuracy of this critical process. By integrating a pre-trained Inception model with a custom-designed architecture, we propose a model ...
Read More
Fingerprint verification has emerged as a cornerstone of personal identity authentication. This research introduces a deep learning-based framework for enhancing the accuracy of this critical process. By integrating a pre-trained Inception model with a custom-designed architecture, we propose a model that effectively extracts discriminative features from fingerprint images. To this end, the input fingerprint image is aligned to a base fingerprint through minutiae vector comparison. The aligned input fingerprint is then subtracted from the base fingerprint to generate a residual image. This residual image, along with the aligned input fingerprint and the base fingerprint, constitutes the three input channels for a pre-trained Inception model. Our main contribution lies in the alignment of fingerprint minutiae, followed by the construction of a color fingerprint representation. Moreover, we collected a dataset, including 200 fingerprint images corresponding to 20 persons, for fingerprint verification. The proposed method is evaluated on two distinct datasets, demonstrating its superiority over existing state-of-the-art techniques. With a verification accuracy of 99.40% on the public Hong Kong Dataset, our approach establishes a new benchmark in fingerprint verification. This research holds the potential for applications in various domains, including law enforcement, border control, and secure access systems.
H.3.2.2. Computer vision
Masoumeh Esmaeiili; Kourosh Kiani
Abstract
The classification of emotions using electroencephalography (EEG) signals is inherently challenging due to the intricate nature of brain activity. Overcoming inconsistencies in EEG signals and establishing a universally applicable sentiment analysis model are essential objectives. This study introduces ...
Read More
The classification of emotions using electroencephalography (EEG) signals is inherently challenging due to the intricate nature of brain activity. Overcoming inconsistencies in EEG signals and establishing a universally applicable sentiment analysis model are essential objectives. This study introduces an innovative approach to cross-subject emotion recognition, employing a genetic algorithm (GA) to eliminate non-informative frames. Then, the optimal frames identified by the GA undergo spatial feature extraction using common spatial patterns (CSP) and the logarithm of variance. Subsequently, these features are input into a Transformer network to capture spatial-temporal features, and the emotion classification is executed using a fully connected (FC) layer with a Softmax activation function. Therefore, the innovations of this paper include using a limited number of channels for emotion classification without sacrificing accuracy, selecting optimal signal segments using the GA, and employing the Transformer network for high-accuracy and high-speed classification. The proposed method undergoes evaluation on two publicly accessible datasets, SEED and SEED-V, across two distinct scenarios. Notably, it attains mean accuracy rates of 99.96% and 99.51% in the cross-subject scenario, and 99.93% and 99.43% in the multi-subject scenario for the SEED and SEED-V datasets, respectively. Noteworthy is the outperformance of the proposed method over the state-of-the-art (SOTA) in both scenarios for both datasets, thus underscoring its superior efficacy. Additionally, comparing the accuracy of individual subjects with previous works in cross subject scenario further confirms the superiority of the proposed method for both datasets.
H.3.2.2. Computer vision
H. Hosseinpour; Seyed A. Moosavie nia; M. A. Pourmina
Abstract
Virtual view synthesis is an essential part of computer vision and 3D applications. A high-quality depth map is the main problem with virtual view synthesis. Because as compared to the color image the resolution of the corresponding depth image is low. In this paper, an efficient and confided method ...
Read More
Virtual view synthesis is an essential part of computer vision and 3D applications. A high-quality depth map is the main problem with virtual view synthesis. Because as compared to the color image the resolution of the corresponding depth image is low. In this paper, an efficient and confided method based on the gradual omission of outliers is proposed to compute reliable depth values. In the proposed method depth values that are far from the mean of depth values are omitted gradually. By comparison with other state of the art methods, simulation results show that on average, PSNR is 2.5dB (8.1 %) higher, SSIM is 0.028 (3%) more, UNIQUE is 0.021 (2.4%) more, the running time is 8.6s (6.1%) less and wrong pixels are 1.97(24.8%) less.
H.3.2.2. Computer vision
M. H. Khosravi
Abstract
Image segmentation is an essential and critical process in image processing and pattern recognition. In this paper we proposed a textured-based method to segment an input image into regions. In our method an entropy-based textured map of image is extracted, followed by an histogram equalization step ...
Read More
Image segmentation is an essential and critical process in image processing and pattern recognition. In this paper we proposed a textured-based method to segment an input image into regions. In our method an entropy-based textured map of image is extracted, followed by an histogram equalization step to discriminate different regions. Then with the aim of eliminating unnecessary details and achieving more robustness against unwanted noises, a low-pass filtering technique is successfully used to smooth the image. As the next step, the appropriate pixons are extracted and delivered to a fuzzy c-mean clustering stage to obtain the final image segments. The results of applying the proposed method on several different images indicate its better performance in image segmentation compared to the other methods.
H.3.2.2. Computer vision
Seyyed A. Hoseini; P. Kabiri
Abstract
In this paper, a feature-based technique for the camera pose estimation in a sequence of wide-baseline images has been proposed. Camera pose estimation is an important issue in many computer vision and robotics applications, such as, augmented reality and visual SLAM. The proposed method can track captured ...
Read More
In this paper, a feature-based technique for the camera pose estimation in a sequence of wide-baseline images has been proposed. Camera pose estimation is an important issue in many computer vision and robotics applications, such as, augmented reality and visual SLAM. The proposed method can track captured images taken by hand-held camera in room-sized workspaces with maximum scene depth of 3-4 meters. The system can be used in unknown environments with no additional information available from the outside world except in the first two images that are used for initialization. Pose estimation is performed using only natural feature points extracted and matched in successive images. In wide-baseline images unlike consecutive frames of a video stream, displacement of the feature points in consecutive images is notable and hence cannot be traced easily using patch-based methods. To handle this problem, a hybrid strategy is employed to obtain accurate feature correspondences. In this strategy, first initial feature correspondences are found using similarity of their descriptors and then outlier matchings are removed by applying RANSAC algorithm. Further, to provide a set of required feature matchings a mechanism based on sidelong result of robust estimator was employed. The proposed method is applied on indoor real data with images in VGA quality (640×480 pixels) and on average the translation error of camera pose is less than 2 cm which indicates the effectiveness and accuracy of the proposed approach.
H.3.2.2. Computer vision
M. Askari; M. Asadi; A. Asilian Bidgoli; H. Ebrahimpour
Abstract
For many years, researchers have studied high accuracy methods for recognizing the handwriting and achieved many significant improvements. However, an issue that has rarely been studied is the speed of these methods. Considering the computer hardware limitations, it is necessary for these methods to ...
Read More
For many years, researchers have studied high accuracy methods for recognizing the handwriting and achieved many significant improvements. However, an issue that has rarely been studied is the speed of these methods. Considering the computer hardware limitations, it is necessary for these methods to run in high speed. One of the methods to increase the processing speed is to use the computer parallel processing power. This paper introduces one of the best feature extraction methods for the handwritten recognition, called DPP (Derivative Projection Profile), which is employed for isolated Persian handwritten recognition. In addition to achieving good results, this (computationally) light feature can easily be processed. Moreover, Hamming Neural Network is used to classify this system. To increase the speed, some part of the recognition method is executed on GPU (graphic processing unit) cores implemented by CUDA platform. HADAF database (Biggest isolated Persian character database) is utilized to evaluate the system. The results show 94.5% accuracy. We also achieved about 5.5 times speed-up using GPU.