H.3. Artificial Intelligence
Amir Mehrabinezhad; Mohammad Teshnelab; Arash Sharifi
Abstract
Due to the growing number of data-driven approaches, especially in artificial intelligence and machine learning, extracting appropriate information from the gathered data with the best performance is a remarkable challenge. The other important aspect of this issue is storage costs. The principal component ...
Read More
Due to the growing number of data-driven approaches, especially in artificial intelligence and machine learning, extracting appropriate information from the gathered data with the best performance is a remarkable challenge. The other important aspect of this issue is storage costs. The principal component analysis (PCA) and autoencoders (AEs) are samples of the typical feature extraction methods in data science and machine learning that are widely used in various approaches. The current work integrates the advantages of AEs and PCA for presenting an online supervised feature extraction selection method. Accordingly, the desired labels for the final model are involved in the feature extraction procedure and embedded in the PCA method as well. Also, stacking the nonlinear autoencoder layers with the PCA algorithm eliminated the kernel selection of the traditional kernel PCA methods. Besides the performance improvement proved by the experimental results, the main advantage of the proposed method is that, in contrast with the traditional PCA approaches, the model has no requirement for all samples to feature extraction. As regards the previous works, the proposed method can outperform the other state-of-the-art ones in terms of accuracy and authenticity for feature extraction.
H.3.8. Natural Language Processing
A. Khazaei; M. Ghasemzadeh
Abstract
This paper compares clusters of aligned Persian and English texts obtained from k-means method. Text clustering has many applications in various fields of natural language processing. So far, much English documents clustering research has been accomplished. Now this question arises, are the results of ...
Read More
This paper compares clusters of aligned Persian and English texts obtained from k-means method. Text clustering has many applications in various fields of natural language processing. So far, much English documents clustering research has been accomplished. Now this question arises, are the results of them extendable to other languages? Since the goal of document clustering is grouping of documents based on their content, it is expected that the answer to this question is yes. On the other hand, many differences between various languages can cause the answer to this question to be no. This research has focused on k-means that is one of the basic and popular document clustering methods. We want to know whether the clusters of aligned Persian and English texts obtained by the k-means are similar. To find an answer to this question, Mizan English-Persian Parallel Corpus was considered as benchmark. After features extraction using text mining techniques and applying the PCA dimension reduction method, the k-means clustering was performed. The morphological difference between English and Persian languages caused the larger feature vector length for Persian. So almost in all experiments, the English results were slightly richer than those in Persian. Aside from these differences, the overall behavior of Persian and English clusters was similar. These similar behaviors showed that results of k-means research on English can be expanded to Persian. Finally, there is hope that despite many differences between various languages, clustering methods may be extendable to other languages.