Shahrood University of TechnologyJournal of AI and Data Mining2322-521110420221101A bilingual text detection in natural images using heuristic and unsupervised learning449466249110.22044/jadm.2022.11089.2260ENS. BayatpourFaculty of Engineering and Technology, Alzahra University, Tehran, Iran.0000-0001-9792-0433M. SharghiFaculty of Engineering and Technology, Alzahra University, Tehran, Iran.Journal Article20210827Digital images are being produced in a massive number every day. A<br /><br />component that may exist in digital images is text. Textual information can be<br /><br />extracted and used in a variety of fields. Noise, blur, distortions, occlusion, font<br /><br />variation, alignments, and orientation, are among the main challenges for text<br /><br />detection in natural images. Despite many advances in text detection algorithms,<br /><br />there is not yet a single algorithm that addresses all of the above problems<br /><br />successfully. Furthermore, most of the proposed algorithms can only detect<br /><br />horizontal texts and a very small fraction of them consider Farsi language. In<br /><br />this paper, a method is proposed for detecting multi-orientated texts in both Farsi<br /><br />and English languages. We have defined seven geometric features to distinguish<br /><br />text components from the background and proposed a new contrast enhancement<br /><br />method for text detection algorithms. Our experimental results indicate that the<br /><br />proposed method achieves a high performance in text detection on natural images.https://jad.shahroodut.ac.ir/article_2491_2393334bcf9bf32898974a728b47f465.pdfShahrood University of TechnologyJournal of AI and Data Mining2322-521110420221101Whitened gradient descent, a new updating method for optimizers in deep neural networks467477249610.22044/jadm.2022.11325.2291ENH. GholamalinejadDepartment of Computer, Faculty of Engineering, Bozorgmehr University of Qaenat, Qaen, Iran.0000-0001-6912-6910H. KhosraviFaculty of Electrical Engineering Shahrood University of Technology.Journal Article20211108Optimizers are vital components of deep neural networks that perform weight updates. This paper introduces a new updating method for optimizers based on gradient descent, called whitened gradient descent (WGD). This method is easy to implement and can be used in every optimizer based on the gradient descent algorithm. It does not increase the training time of the network significantly. This method smooths the training curve and improves classification metrics. To evaluate the proposed algorithm, we performed 48 different tests on two datasets, Cifar100 and Animals-10, using three network structures, including densenet121, resnet18, and resnet50. The experiments show that using the WGD method in gradient descent based optimizers, improves the classification results significantly. For example, integrating WGD in RAdam optimizer increased the accuracy of DenseNet from 87.69% to 90.02% on the Animals-10 dataset.https://jad.shahroodut.ac.ir/article_2496_6374b89dbfd0e0729d4d21745d97b6c7.pdfShahrood University of TechnologyJournal of AI and Data Mining2322-521110420221101An Ensemble Convolutional Neural Networks for Detection of Growth Anomalies in Children with X-ray Images479492250410.22044/jadm.2022.11752.2323ENH. Sarabi SarvaraniDepartment of Computer Engineering and Information Technology, Razi University, Kermanshah, Iran.F. Abdali-MohammadiDepartment of Computer Engineering and Information Technology, Razi University, Kermanshah, Iran.Journal Article20220314Bone age assessment is a method that is constantly used for investigating growth abnormalities, endocrine gland treatment, and pediatric syndromes. Since the advent of digital imaging, for several decades the bone age assessment has been performed by visually examining the ossification of the left hand, usually using the G&P reference method. However, the subjective nature of hand-craft methods, the large number of ossification centers in the hand, and the huge changes in ossification stages lead to some difficulties in the evaluation of the bone age. Therefore, many efforts were made to develop image processing methods. These methods automatically extract the main features of the bone formation stages to effectively and more accurately assess the bone age. In this paper, a new fully automatic method is proposed to reduce the errors of subjective methods and improve the automatic methods of age estimation. This model was applied to 1400 radiographs of healthy children from 0 to 18 years of age and gathered from 4 continents. This method starts with the extraction of all regions of the hand, the five fingers and the wrist, and independently calculates the age of each region through examination of the joints and growth regions associated with these regions by CNN networks; It ends with the final age assessment through an ensemble of CNNs. The results indicated that the proposed method has an average assessment accuracy of 81% and has a better performance in comparison to the commercial system that is currently in use.https://jad.shahroodut.ac.ir/article_2504_4c39fd64875d39c8f08c74278b0bf407.pdfShahrood University of TechnologyJournal of AI and Data Mining2322-521110420221101A UKF-based Approach for Indoor Camera Trajectory Estimation493503250510.22044/jadm.2022.11550.2315ENSeyyed A. HoseiniFaculty of Electrical and Computer Engineering, University of Birjand, Birjand, Iran.P. KabiriSchool of Computer Engineering, Iran University of Science and Technology, Tehran, Iran.Journal Article20220105When a camera moves in an unfamiliar environment, for many computer vision and robotic applications it is desirable to estimate camera position and orientation. Camera tracking is perhaps the most challenging part of Visual Simultaneous Localization and Mapping (Visual SLAM) and Augmented Reality problems. This paper proposes a feature-based approach for tracking a hand-held camera that moves within an indoor place with a maximum depth of around 4-5 meters. In the first few frames the camera observes a chessboard as a marker to bootstrap the system and construct the initial map. Thereafter, upon arrival of each new frame, the algorithm pursues the camera tracking procedure. This procedure is carried-out in a framework, which operates using only the extracted visible natural feature points and the initial map. Constructed initial map is extended as the camera explores new areas. In addition, the proposed system employs a hierarchical method on basis of Lucas-Kanade registration technique to track FAST features. For each incoming frame, 6-DOF camera pose parameters are estimated using an Unscented Kalman Filter (UKF). The proposed algorithm is tested on real-world videos and performance of the UKF is compared against other camera tracking methods. Two evaluation criteria (i.e. Relative pose error and absolute trajectory error) are used to assess performance of the proposed algorithm. Accordingly, reported experimental results show accuracy and effectiveness and of the presented approach. Conducted experiments also indicate that the type of extracted feature points has not significant effect on precision of the proposed approach.https://jad.shahroodut.ac.ir/article_2505_ca406eb6896a61f817bba3bf8c413136.pdfShahrood University of TechnologyJournal of AI and Data Mining2322-521110420221101Benefiting from Structured Resources to Present a Computationally Efficient Word Embedding Method505514252810.22044/jadm.2022.12113.2362ENF. JafarinejadFaculty of Computer Engineering, Shahrood University of Technology, Shahrood, Iran.Journal Article20220714In recent years, new word embedding methods have clearly improved the accuracy of NLP tasks. A review of the progress of these methods shows that the complexity of these models and the number of their training parameters grows increasingly. Therefore, there is a need for methodological innovation for presenting new word embedding methodologies. Most current word embedding methods use a large corpus of unstructured data to train the semantic vectors of words. This paper addresses the basic idea of utilizing from structure of structured data to introduce embedding vectors. Therefore, the need for high processing power, large amount of processing memory, and long processing time will be met using structures and conceptual knowledge lies in them. For this purpose, a new embedding vector, Word2Node is proposed. It uses a well-known structured resource, the WordNet, as a training corpus and hypothesis that graphic structure of the WordNet includes valuable linguistic knowledge that can be considered and not ignored to provide cost-effective and small sized embedding vectors. The Node2Vec graph embedding method allows us to benefit from this powerful linguistic resource. Evaluation of this idea in two tasks of word similarity and text classification has shown that this method perform the same or better in comparison to the word embedding method embedded in it (Word2Vec). This result is achieved while the required training data is reduced by about 50,000,000%. These results provide a view of capacity of the structured data to improve the quality of existing embedding methods and the resulting vectors.https://jad.shahroodut.ac.ir/article_2528_372db8f9da0285b585d73b4783934e9f.pdfShahrood University of TechnologyJournal of AI and Data Mining2322-521110420221101Cardiac Arrhythmia Diagnosis with an Intelligent Algorithm using Chaos Features of Electrocardiogram Signal and Compound Classifier515527253610.22044/jadm.2022.12165.2370ENE. ZareiDepartment of Electrical Engineering,Yadegar-e-Imam Khomeini (RAH) Shahre Rey Branch, Islamic Azad University, Tehran. Iran.N. BarimaniDepartment of Electrical Engineering,Yadegar-e-Imam Khomeini (RAH) Shahre Rey Branch, Islamic Azad University, Tehran. IranG. Nazari GolpayeganiDepartment of Electrical Engineering,Yadegar-e-Imam Khomeini (RAH) Shahre Rey Branch, Islamic Azad University, Tehran. IranJournal Article20220801Cardiac Arrhythmias are known as one of the most dangerous cardiac diseases. Applying intelligent algorithms in this area, leads into the reduction of the ECG signal processing time by the physician as well as reducing the probable mistakes caused by fatigue of the specialist. The purpose of this study is to introduce an intelligent algorithm for the separation of three cardiac arrhythmias by using chaos features of ECG signal and combining three types of the most common classifiers in these signal’s processing area. First, ECG signals related to three cardiac arrhythmias of Atrial Fibrillation, Ventricular Tachycardia and Post Supra Ventricular Tachycardia along with the normal cardiac signal from the arrhythmia database of MIT-BIH were gathered. Then, chaos features describing non-linear dynamic of ECG signal were extracted by calculating the Lyapunov exponent values and signal’s fractal dimension. finally, the compound classifier was used by combining of multilayer perceptron neural network, support vector machine network and K-Nearest Neighbor. Obtained results were compared to the classifying method based on features of time-domain and time-frequency domain, as a proof for the efficacy of the chaos features of the ECG signal. Likewise, to evaluate the efficacy of the compound classifier, each network was used both as separately and also as combined and the results were compared. The obtained results showed that Using the chaos features of ECG signal and the compound classifier, can classify cardiac arrhythmias with 99.1% ± 0.2 accuracy and 99.6% ± 0.1 sensitivity and specificity rate of 99.3 % ± 0.1https://jad.shahroodut.ac.ir/article_2536_4d2916e06e3b8e5015f96d655c9b5fba.pdfShahrood University of TechnologyJournal of AI and Data Mining2322-521110420221101Audio-visual emotion recognition based on a deep convolutional neural network529537256910.22044/jadm.2022.11809.2331ENKh. AghajaniDepartment of Computer Engineering, University of Mazandaran, Babolsar, Iran.Journal Article20220409Emotion recognition has several applications in various fields, including human-computer interactions. In recent years, various methods have been proposed to recognize emotion using facial or speech information. While the fusion of these two has been paid less attention in emotion recognition. In this paper, first of all, the use of only face or speech information in emotion recognition is examined. For emotion recognition through speech, a pre-trained network called YAMNet is used to extract features. After passing through a convolutional neural network (CNN), the extracted features are then fed into a bi-LSTM with an attention mechanism to perform the recognition. For emotion recognition through facial information, a deep CNN-based model has been proposed. Finally, after reviewing these two approaches, an emotion detection framework based on the fusion of these two models is proposed. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), containing videos taken from 24 actors (12 men and 12 women) with 8 categories has been used to evaluate the proposed model. The results of the implementation show that the combination of the face and speech information improves the performance of the emotion recognizer.https://jad.shahroodut.ac.ir/article_2569_8f90f16cc9c85e870b0d47187b4375ef.pdfShahrood University of TechnologyJournal of AI and Data Mining2322-521110420221101Speech Emotion Recognition using Enriched Spectrogram and Deep Convolutional Neural Network Transfer Learning539547257610.22044/jadm.2022.12241.2372ENB. Z. MansouriElectrical and Computer Engineering Department, Ferdows branch, Islamic Azad University, Ferdows, Iran.H.R. GhaffaryElectrical and Computer Engineering Department, Ferdows branch, Islamic Azad University, Ferdows, Iran.A. HarimiElectrical and Computer Engineering Department, Ferdows branch, Islamic Azad University, Ferdows, Iran.Electrical and Computer Engineering Department, Shahrood branch, Islamic Azad University, Shahrood, Iran.Journal Article20220828Speech emotion recognition (SER) is a challenging field of research that has attracted attention during the last two decades. Feature extraction has been reported as the most challenging issue in SER systems. Deep neural networks could partially solve this problem in some other applications. In order to address this problem, we proposed a novel enriched spectrogram calculated based on the fusion of wide-band and narrow-band spectrograms. The proposed spectrogram benefited from both high temporal and spectral resolution. Then we applied the resultant spectrogram images to the pre-trained deep convolutional neural network, ResNet152. Instead of the last layer of ResNet152, we added five additional layers to adopt the model to the present task. All the experiments performed on the popular EmoDB dataset are based on leaving one speaker out of a technique that guarantees the speaker's independency from the model. The model gains an accuracy rate of 88.97% which shows the efficiency of the proposed approach in contrast to other state-of-the-art methods.https://jad.shahroodut.ac.ir/article_2576_9ab41e081a2486b986bdbc5e658ee3f4.pdfShahrood University of TechnologyJournal of AI and Data Mining2322-521110420221101Sports movements modification based on 2D joint position using YOLO to 3D skeletal model adaptation549557257810.22044/jadm.2022.11975.2344ENA. RahatiDepartment of Computer Engineering, South Tehran Branch, Islamic Azad University, Tehran, Iran.K. RahbarDepartment of Computer Engineering, South Tehran Branch, Islamic Azad University, Tehran, Iran.0000-0003-2212-0479Journal Article20220606Doing sports movements correctly is very important in ensuring body health. In this article, an attempt has been made to achieve the movements correction through the usage of a different approach based on the 2D position of the joints from the image in 3D space. A person performing in front of the camera with landmarks on his/her joints is the subject of the input image. The coordinates of the joints are then measured in 2D space which is adapted to the extracted 2D skeletons from the reference skeletal sparse model modified movements. The accuracy and precision of this approach is accomplished on the standard Adidas dataset. Its efficiency has also been studied under the influence of cumulative Gaussian and impulse noises. Meanwhile, the average error of the model in detecting the wrong exercise in the set of sports movements is reported to be 5.69 pixels.https://jad.shahroodut.ac.ir/article_2578_746a9149db4eaa7cec31690d56fb406f.pdfShahrood University of TechnologyJournal of AI and Data Mining2322-521110420221101A Novel Content-based Image Retrieval System using Fusing Color and Texture Features559568250610.22044/jadm.2022.12042.2353ENS. Asadi AmiriDepartment of Computer Engineering, University of Mazandaran, Babolsar, Iran.Z. MohammadpooryDepartment of Electronic and biomedical Engineering, Shahrood University of Technology, Shahrood, Iran.M. NasrolahzadehDepartment of Biomedical Engineering, Hakim Sabzevari University, Sabzevar, Iran.Journal Article20220628Content based image retrieval (CBIR) systems compare a query image with images in a dataset to find similar images to a query image. In this paper a novel and efficient CBIR system is proposed using color and texture features. The color features are represented by color moments and color histograms of RGB and HSV color spaces and texture features are represented by localized Discrete Cosine Transform (DCT) and localized Gray level co-occurrence matrix and local binary patterns (LBP). The DCT coefficients and Gray level co-occurrence matrix of the blocks are examined for assessing the block details. Also, LBP is used for rotation invariant texture information of the image. After feature extraction, Shannon entropy criterion is used to reduce inefficient features. Finally, an improved version of Canberra distance is employed to compare similarity of feature vectors. Experimental analysis is carried out using precision and recall on Corel-5K and Corel-10K datasets. Results demonstrate that the proposed method can efficiently improve the precision and recall and outperforms the most existing methods.s the most existing methods.https://jad.shahroodut.ac.ir/article_2506_7f0d2a86f3d838dcb90e1277ab0db515.pdfShahrood University of TechnologyJournal of AI and Data Mining2322-521110420221101Image Retrieval based on Multi-features using Fuzzy Set569578260410.22044/jadm.2022.11876.2337ENM. Azimi HematDepartment of Computer Engineering and Information Technology, Payame Noor University, Iran.F. Shamsezat EzatDepartment of Computer Science, Faculty of Mathematics and Computer, Fasa University, Fasa, Iran.M. Kuchaki RafsanjaniDepartment of Computer Science, Faculty of Mathematics and Computer, Shahid Bahonar University of Kerman, Kerman, Iran .0000-0002-3220-4839Journal Article20220516In content-based image retrieval (CBIR), the visual features of the database images are extracted, and the visual features database is assessed to find the images closest to the query image. Increasing the efficiency and decreasing both the time and storage space of indexed images is the priority in developing image retrieval systems. In this research, an efficient system is proposed for image retrieval by applying fuzzy techniques, which are advantageous in increasing the efficiency and decreasing the length of the feature vector and storage space. The effect of increasing the considered content features' count is assessed to enhance image retrieval efficiency. The fuzzy features consist of color, statistical information related to the spatial dependency of the pixels on each other, and the position of image edges. These features are indexed in fuzzy vector format 16, 3, and 16 lengths. The extracted vectors are compared through the fuzzy similarity measures, where the most similar images are retrieved. To evaluate the proposed systems' performance, this system and three other non-fuzzy systems where fewer features are of concern were implemented. These four systems are tested on a database containing 1000 images, and the results indicate improvement in the retrieval precision and storage space.https://jad.shahroodut.ac.ir/article_2604_b3368a96fc71d9f8ff3a7bc35915b1af.pdfShahrood University of TechnologyJournal of AI and Data Mining2322-521110420221101Vehicle Type, Color and Speed Detection Implementation by Integrating VGG Neural Network and YOLO algorithm utilizing Raspberry Pi Hardware579588262910.22044/jadm.2022.11915.2338ENMojtaba NasehiFaculty of Electrical Engineering, Islamic Azad University Majlisi Branch, Isfahan, Iran.Mohsen AshourianFaculty of Electrical Engineering, Islamic Azad University Majlisi Branch, Isfahan, Iran.Hosein EmamiFaculty of Electrical Engineering, Islamic Azad University Majlisi Branch, Isfahan, Iran.Journal Article20220512Vehicle type recognition has been widely used in practical applications such as traffic control, unmanned vehicle control, road taxation, smuggling detection, and so on. In this paper, various techniques such as data augmentation and space filtering have been used to improve and enhance the data. Then, a developed algorithm that integrates VGG neural network and YOLO algorithm has been used to detect and identify vehicles, Then the implementation on the Raspberry hardware board and practically through a scenario is mentioned. Real including image data sets are analyzed. The results show the good performance of the implemented algorithm in terms of detection performance (98%), processing speed, and environmental conditions, which indicates its capability in practical applications with low cost.https://jad.shahroodut.ac.ir/article_2629_10a22265c99235cafe1d1013f6131f72.pdf