Methodologies
H.3.11. Vision and Scene Understanding
S. Bayatpour; M. Sharghi
Abstract
Digital images are being produced in a massive number every day. Acomponent that may exist in digital images is text. Textual information can beextracted and used in a variety of fields. Noise, blur, distortions, occlusion, fontvariation, alignments, and orientation, are among the main challenges for ...
Read More
Digital images are being produced in a massive number every day. Acomponent that may exist in digital images is text. Textual information can beextracted and used in a variety of fields. Noise, blur, distortions, occlusion, fontvariation, alignments, and orientation, are among the main challenges for textdetection in natural images. Despite many advances in text detection algorithms,there is not yet a single algorithm that addresses all of the above problemssuccessfully. Furthermore, most of the proposed algorithms can only detecthorizontal texts and a very small fraction of them consider Farsi language. Inthis paper, a method is proposed for detecting multi-orientated texts in both Farsiand English languages. We have defined seven geometric features to distinguishtext components from the background and proposed a new contrast enhancementmethod for text detection algorithms. Our experimental results indicate that theproposed method achieves a high performance in text detection on natural images.
Original/Review Paper
H. Gholamalinejad; H. Khosravi
Abstract
Optimizers are vital components of deep neural networks that perform weight updates. This paper introduces a new updating method for optimizers based on gradient descent, called whitened gradient descent (WGD). This method is easy to implement and can be used in every optimizer based on the gradient ...
Read More
Optimizers are vital components of deep neural networks that perform weight updates. This paper introduces a new updating method for optimizers based on gradient descent, called whitened gradient descent (WGD). This method is easy to implement and can be used in every optimizer based on the gradient descent algorithm. It does not increase the training time of the network significantly. This method smooths the training curve and improves classification metrics. To evaluate the proposed algorithm, we performed 48 different tests on two datasets, Cifar100 and Animals-10, using three network structures, including densenet121, resnet18, and resnet50. The experiments show that using the WGD method in gradient descent based optimizers, improves the classification results significantly. For example, integrating WGD in RAdam optimizer increased the accuracy of DenseNet from 87.69% to 90.02% on the Animals-10 dataset.
Original/Review Paper
H. Sarabi Sarvarani; F. Abdali-Mohammadi
Abstract
Bone age assessment is a method that is constantly used for investigating growth abnormalities, endocrine gland treatment, and pediatric syndromes. Since the advent of digital imaging, for several decades the bone age assessment has been performed by visually examining the ossification of the left hand, ...
Read More
Bone age assessment is a method that is constantly used for investigating growth abnormalities, endocrine gland treatment, and pediatric syndromes. Since the advent of digital imaging, for several decades the bone age assessment has been performed by visually examining the ossification of the left hand, usually using the G&P reference method. However, the subjective nature of hand-craft methods, the large number of ossification centers in the hand, and the huge changes in ossification stages lead to some difficulties in the evaluation of the bone age. Therefore, many efforts were made to develop image processing methods. These methods automatically extract the main features of the bone formation stages to effectively and more accurately assess the bone age. In this paper, a new fully automatic method is proposed to reduce the errors of subjective methods and improve the automatic methods of age estimation. This model was applied to 1400 radiographs of healthy children from 0 to 18 years of age and gathered from 4 continents. This method starts with the extraction of all regions of the hand, the five fingers and the wrist, and independently calculates the age of each region through examination of the joints and growth regions associated with these regions by CNN networks; It ends with the final age assessment through an ensemble of CNNs. The results indicated that the proposed method has an average assessment accuracy of 81% and has a better performance in comparison to the commercial system that is currently in use.
Original/Review Paper
Seyyed A. Hoseini; P. Kabiri
Abstract
When a camera moves in an unfamiliar environment, for many computer vision and robotic applications it is desirable to estimate camera position and orientation. Camera tracking is perhaps the most challenging part of Visual Simultaneous Localization and Mapping (Visual SLAM) and Augmented Reality problems. ...
Read More
When a camera moves in an unfamiliar environment, for many computer vision and robotic applications it is desirable to estimate camera position and orientation. Camera tracking is perhaps the most challenging part of Visual Simultaneous Localization and Mapping (Visual SLAM) and Augmented Reality problems. This paper proposes a feature-based approach for tracking a hand-held camera that moves within an indoor place with a maximum depth of around 4-5 meters. In the first few frames the camera observes a chessboard as a marker to bootstrap the system and construct the initial map. Thereafter, upon arrival of each new frame, the algorithm pursues the camera tracking procedure. This procedure is carried-out in a framework, which operates using only the extracted visible natural feature points and the initial map. Constructed initial map is extended as the camera explores new areas. In addition, the proposed system employs a hierarchical method on basis of Lucas-Kanade registration technique to track FAST features. For each incoming frame, 6-DOF camera pose parameters are estimated using an Unscented Kalman Filter (UKF). The proposed algorithm is tested on real-world videos and performance of the UKF is compared against other camera tracking methods. Two evaluation criteria (i.e. Relative pose error and absolute trajectory error) are used to assess performance of the proposed algorithm. Accordingly, reported experimental results show accuracy and effectiveness and of the presented approach. Conducted experiments also indicate that the type of extracted feature points has not significant effect on precision of the proposed approach.
Original/Review Paper
F. Jafarinejad
Abstract
In recent years, new word embedding methods have clearly improved the accuracy of NLP tasks. A review of the progress of these methods shows that the complexity of these models and the number of their training parameters grows increasingly. Therefore, there is a need for methodological innovation for ...
Read More
In recent years, new word embedding methods have clearly improved the accuracy of NLP tasks. A review of the progress of these methods shows that the complexity of these models and the number of their training parameters grows increasingly. Therefore, there is a need for methodological innovation for presenting new word embedding methodologies. Most current word embedding methods use a large corpus of unstructured data to train the semantic vectors of words. This paper addresses the basic idea of utilizing from structure of structured data to introduce embedding vectors. Therefore, the need for high processing power, large amount of processing memory, and long processing time will be met using structures and conceptual knowledge lies in them. For this purpose, a new embedding vector, Word2Node is proposed. It uses a well-known structured resource, the WordNet, as a training corpus and hypothesis that graphic structure of the WordNet includes valuable linguistic knowledge that can be considered and not ignored to provide cost-effective and small sized embedding vectors. The Node2Vec graph embedding method allows us to benefit from this powerful linguistic resource. Evaluation of this idea in two tasks of word similarity and text classification has shown that this method perform the same or better in comparison to the word embedding method embedded in it (Word2Vec). This result is achieved while the required training data is reduced by about 50,000,000%. These results provide a view of capacity of the structured data to improve the quality of existing embedding methods and the resulting vectors.
Original/Review Paper
E. Zarei; N. Barimani; G. Nazari Golpayegani
Abstract
Cardiac Arrhythmias are known as one of the most dangerous cardiac diseases. Applying intelligent algorithms in this area, leads into the reduction of the ECG signal processing time by the physician as well as reducing the probable mistakes caused by fatigue of the specialist. The purpose of this study ...
Read More
Cardiac Arrhythmias are known as one of the most dangerous cardiac diseases. Applying intelligent algorithms in this area, leads into the reduction of the ECG signal processing time by the physician as well as reducing the probable mistakes caused by fatigue of the specialist. The purpose of this study is to introduce an intelligent algorithm for the separation of three cardiac arrhythmias by using chaos features of ECG signal and combining three types of the most common classifiers in these signal’s processing area. First, ECG signals related to three cardiac arrhythmias of Atrial Fibrillation, Ventricular Tachycardia and Post Supra Ventricular Tachycardia along with the normal cardiac signal from the arrhythmia database of MIT-BIH were gathered. Then, chaos features describing non-linear dynamic of ECG signal were extracted by calculating the Lyapunov exponent values and signal’s fractal dimension. finally, the compound classifier was used by combining of multilayer perceptron neural network, support vector machine network and K-Nearest Neighbor. Obtained results were compared to the classifying method based on features of time-domain and time-frequency domain, as a proof for the efficacy of the chaos features of the ECG signal. Likewise, to evaluate the efficacy of the compound classifier, each network was used both as separately and also as combined and the results were compared. The obtained results showed that Using the chaos features of ECG signal and the compound classifier, can classify cardiac arrhythmias with 99.1% ± 0.2 accuracy and 99.6% ± 0.1 sensitivity and specificity rate of 99.3 % ± 0.1
Other
Kh. Aghajani
Abstract
Emotion recognition has several applications in various fields, including human-computer interactions. In recent years, various methods have been proposed to recognize emotion using facial or speech information. While the fusion of these two has been paid less attention in emotion recognition. In this ...
Read More
Emotion recognition has several applications in various fields, including human-computer interactions. In recent years, various methods have been proposed to recognize emotion using facial or speech information. While the fusion of these two has been paid less attention in emotion recognition. In this paper, first of all, the use of only face or speech information in emotion recognition is examined. For emotion recognition through speech, a pre-trained network called YAMNet is used to extract features. After passing through a convolutional neural network (CNN), the extracted features are then fed into a bi-LSTM with an attention mechanism to perform the recognition. For emotion recognition through facial information, a deep CNN-based model has been proposed. Finally, after reviewing these two approaches, an emotion detection framework based on the fusion of these two models is proposed. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), containing videos taken from 24 actors (12 men and 12 women) with 8 categories has been used to evaluate the proposed model. The results of the implementation show that the combination of the face and speech information improves the performance of the emotion recognizer.
Original/Review Paper
B. Z. Mansouri; H.R. Ghaffary; A. Harimi
Abstract
Speech emotion recognition (SER) is a challenging field of research that has attracted attention during the last two decades. Feature extraction has been reported as the most challenging issue in SER systems. Deep neural networks could partially solve this problem in some other applications. In order ...
Read More
Speech emotion recognition (SER) is a challenging field of research that has attracted attention during the last two decades. Feature extraction has been reported as the most challenging issue in SER systems. Deep neural networks could partially solve this problem in some other applications. In order to address this problem, we proposed a novel enriched spectrogram calculated based on the fusion of wide-band and narrow-band spectrograms. The proposed spectrogram benefited from both high temporal and spectral resolution. Then we applied the resultant spectrogram images to the pre-trained deep convolutional neural network, ResNet152. Instead of the last layer of ResNet152, we added five additional layers to adopt the model to the present task. All the experiments performed on the popular EmoDB dataset are based on leaving one speaker out of a technique that guarantees the speaker's independency from the model. The model gains an accuracy rate of 88.97% which shows the efficiency of the proposed approach in contrast to other state-of-the-art methods.
Original/Review Paper
A. Rahati; K. Rahbar
Abstract
Doing sports movements correctly is very important in ensuring body health. In this article, an attempt has been made to achieve the movements correction through the usage of a different approach based on the 2D position of the joints from the image in 3D space. A person performing in front of the camera ...
Read More
Doing sports movements correctly is very important in ensuring body health. In this article, an attempt has been made to achieve the movements correction through the usage of a different approach based on the 2D position of the joints from the image in 3D space. A person performing in front of the camera with landmarks on his/her joints is the subject of the input image. The coordinates of the joints are then measured in 2D space which is adapted to the extracted 2D skeletons from the reference skeletal sparse model modified movements. The accuracy and precision of this approach is accomplished on the standard Adidas dataset. Its efficiency has also been studied under the influence of cumulative Gaussian and impulse noises. Meanwhile, the average error of the model in detecting the wrong exercise in the set of sports movements is reported to be 5.69 pixels.
Technical Paper
H.5. Image Processing and Computer Vision
S. Asadi Amiri; Z. Mohammadpoory; M. Nasrolahzadeh
Abstract
Content based image retrieval (CBIR) systems compare a query image with images in a dataset to find similar images to a query image. In this paper a novel and efficient CBIR system is proposed using color and texture features. The color features are represented by color moments and color histograms of ...
Read More
Content based image retrieval (CBIR) systems compare a query image with images in a dataset to find similar images to a query image. In this paper a novel and efficient CBIR system is proposed using color and texture features. The color features are represented by color moments and color histograms of RGB and HSV color spaces and texture features are represented by localized Discrete Cosine Transform (DCT) and localized Gray level co-occurrence matrix and local binary patterns (LBP). The DCT coefficients and Gray level co-occurrence matrix of the blocks are examined for assessing the block details. Also, LBP is used for rotation invariant texture information of the image. After feature extraction, Shannon entropy criterion is used to reduce inefficient features. Finally, an improved version of Canberra distance is employed to compare similarity of feature vectors. Experimental analysis is carried out using precision and recall on Corel-5K and Corel-10K datasets. Results demonstrate that the proposed method can efficiently improve the precision and recall and outperforms the most existing methods.s the most existing methods.
Original/Review Paper
M. Azimi hemat; F. Shamsezat Ezat; M. Kuchaki Rafsanjani
Abstract
In content-based image retrieval (CBIR), the visual features of the database images are extracted, and the visual features database is assessed to find the images closest to the query image. Increasing the efficiency and decreasing both the time and storage space of indexed images is the priority in ...
Read More
In content-based image retrieval (CBIR), the visual features of the database images are extracted, and the visual features database is assessed to find the images closest to the query image. Increasing the efficiency and decreasing both the time and storage space of indexed images is the priority in developing image retrieval systems. In this research, an efficient system is proposed for image retrieval by applying fuzzy techniques, which are advantageous in increasing the efficiency and decreasing the length of the feature vector and storage space. The effect of increasing the considered content features' count is assessed to enhance image retrieval efficiency. The fuzzy features consist of color, statistical information related to the spatial dependency of the pixels on each other, and the position of image edges. These features are indexed in fuzzy vector format 16, 3, and 16 lengths. The extracted vectors are compared through the fuzzy similarity measures, where the most similar images are retrieved. To evaluate the proposed systems' performance, this system and three other non-fuzzy systems where fewer features are of concern were implemented. These four systems are tested on a database containing 1000 images, and the results indicate improvement in the retrieval precision and storage space.
Research Note
Mojtaba Nasehi; Mohsen Ashourian; Hosein Emami
Abstract
Vehicle type recognition has been widely used in practical applications such as traffic control, unmanned vehicle control, road taxation, smuggling detection, and so on. In this paper, various techniques such as data augmentation and space filtering have been used to improve and enhance the data. Then, ...
Read More
Vehicle type recognition has been widely used in practical applications such as traffic control, unmanned vehicle control, road taxation, smuggling detection, and so on. In this paper, various techniques such as data augmentation and space filtering have been used to improve and enhance the data. Then, a developed algorithm that integrates VGG neural network and YOLO algorithm has been used to detect and identify vehicles, Then the implementation on the Raspberry hardware board and practically through a scenario is mentioned. Real including image data sets are analyzed. The results show the good performance of the implemented algorithm in terms of detection performance (98%), processing speed, and environmental conditions, which indicates its capability in practical applications with low cost.