Document Type : Applied Article

Authors

1 Department of Computer Engineering, University of Tehran, Kish International Campus, Kish, Iran

2 Data Analysis & Processing Research Group, IT Research Faculty, ICT Research Institute, Tehran, Iran.

3 School of Engineering Science, College of Engineering, University of Tehran, Tehran, Iran.

Abstract

In the era of pervasive internet use and the dominance of social networks, researchers face significant challenges in Persian text mining, including the scarcity of adequate datasets in Persian and the inefficiency of existing language models. This paper specifically tackles these challenges, aiming to amplify the efficiency of language models tailored to the Persian language. Focusing on enhancing the effectiveness of sentiment analysis, our approach employs an aspect-based methodology utilizing the ParsBERT model, augmented with a relevant lexicon. The study centers on sentiment analysis of user opinions extracted from the Persian website 'Digikala.' The experimental results not only highlight the proposed method's superior semantic capabilities but also showcase its efficiency gains with an accuracy of 88.2% and an F1 score of 61.7. The importance of enhancing language models in this context lies in their pivotal role in extracting nuanced sentiments from user-generated content, ultimately advancing the field of sentiment analysis in Persian text mining by increasing efficiency and accuracy.

Keywords

Main Subjects

[1] ‎"Digikala," [Online]. Available: https://www.digikala.com. [Accessed: Jan.13, 2024].
[2] A. Tabassum and R. R Patil. "A survey on text pre-processing & feature extraction techniques in natural language processing," International Research Journal on Engineering Technology, vol.7, Issue 6, pp. 4864-4867, June 2020.
[3] M. Z. Asghar, A.Khan, Sh. Ahmad, and F. M. Kundi, "A review of feature extraction in sentiment analysis," Journal of Basic and Applied Scientific Research, vol. 4, no. 3, pp. 181-186, January 2014.
[4] S. Kamble, S. Mandage, S. Topale, D. Vagare, and P. Babbar, "Survey on Summarization Techniques and Existing Work," International Journal of Applied Engineering Research, vol. 12, no. 1, pp.69-86, 2017.
[5] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2019), vol. 1, Minneapolis, MN, USA, 2019, pp. 4171-4186.
[6] H. Tang, S. Tan, and X. Cheng, "A survey on sentiment detection of reviews," Expert System with Applications, vol. 36, issue 7, pp. 10760-10773, September 2009.
[7] M. Annett and G. Kondrak, "A Comparison of Sentiment Analysis Techniques: Polarizing Movie Blogs," in Proceedings of Advances in Artificial Intelligence: 21st Conference of the Canadian Society for Computational Studies of Intelligence, Canadian AI 2008 Windsor, Canada, Springer Berlin Heidelberg, 2008, pp. 25-35.
[8] T. Joachims,"Text categorization with support vector machines: Learning with many relevant features," in European Conference on Machine Learning, 1998, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 137-142.
[9] N. L. Adam, N. H. Rosli, and S.C. Soh, "Sentiment analysis on movie review using Naïve Bayes," in 2021 IEEE 2nd International Conference on Artificial Intelligence and Data Sciences (AiDAS), IPOH, Malaysia, 2021, pp. 1-6.
[10] S. Baccianella, A. Esuli, and F. Sebastiani. "Sentiwordnet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining," in Proceedings of the International Conference on Language Resources and Evaluation (Lrec), vol. 10, no. 2010, Valletta, Malta, 2010, pp. 2200-2204.
[11] J. Kaur and N. Duhan, "A Survey on Sentiment Analysis and Techniques," International Journal of Creative Research Thoughts, vol. 6, no. 2, pp. 1029-1031, April 2018.
[12] P. Thakor and S. Sasi, "Ontology-based sentiment analysis process for social media content," Procedia Computer Science. vol. 53, pp. 199-207, January 2015.
[13] K. Schouten and F. Frasincar, "Survey on Aspect-Level Sentiment Analysis," IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 3, pp. 813-830, October 2015.
[14] C. Sun, L. Huang, and X. Qiu, "Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence," in Proceedings of 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT), 2019, pp. 380-385.
[15] A. Lakizadeh and E. Moradzadeh, "Text Sentiment Classification based on Separate Embedding of Aspect and Context," Journal of AI and Data Mining, vol. 10, issue 1 , pp. 139-149, January 2022.
[16] A. Lakizadeh and Z. Zinaty, " A Novel Hierarchical Attention-based Method for Aspect-level Sentiment Classification," Journal of AI and Data Mining, vol. 9, issue 1 , pp. 87-97, January 2021.
[17] M. Farahani, M. Gharachorloo, M. Farahani, and M. Manthouri, "ParsBERT: Transformer-based Model for Persian Language Understanding," Neural Processing Letters, vol. 53, pp. 3831-3847, December 2021.
[18] P. Hosseini, A. Ahmadian Ramaki, H. Maleki, M. Anvari, and S. A. Mirroshandel, "SentiPers: A Sentiment Analysis Corpus for Persian," in Proceedings of 3rd Conference on Computational Linguistics, Tehran, Iran, 2018, pp. 1-11.
[19] M. B. Dastgheib, S. Koleini, and F. Rasti, "The application of Deep Learning in Persian Documents Sentiment Analysis," International journal of Information Science and Management, vol. 18, pp. 1-15, February 2020.
[20] J. PourMostafa R. Sharami, and P. A. Sarabestani, "Deep SentiPers: Novel Deep Learning Models Trained over Proposed Augmented Persian Sentiment Corpus," arXiv preprint, pp. 1-9, 2020.
[21] T. S. Ataei, K. Darvishi, S. Javdan, B. Minaei-Bidgoli, and S. Etemadi, "Pars-ABSA: an aspect-based sentiment analysis dataset for Persian," arXiv preprint, pp. 1-6, 2019.
[22] S. Yellanki, G. Deepika, and V. S. Krishna, "Sentiment Analysis Using Naive Bayes Classifier for Telugu Movie Reviews," International Journal of Innovative Technology and Exploring Engineering (IJITEE), vol. 10, no. 9, pp. 401-408, 2021.
[23] D. Kar and A. Srivastava, "Product Review Sentiment Analysis using Naive Bayes Classifier and n-gram Approach," International Journal of Computer Applications, vol. 176, no. 15, pp. 34-39, 2020.
[24] M. A. Al-Masni, M. N. Al-Kabi, and A. A. Abu-Jbara, "Sentiment Analysis of Arabic Social Media Texts Using Support Vector Machines," Information Processing & Management, vol. 58, no. 2, 2021.
[25] S. K. Singh, N. Thakur, G. Dhiman, and S. K. Singh, "Opinion Mining on Big Data using Support Vector Machines," Expert Systems with Applications, vol. 154, 2020.
[26] B. Gokulakrishnan, P. Priyanthan, T. Ragavan, N. Prasath, and A. Perera, "Opinion Mining and Sentiment Analysis on a Twitter Data Stream," in Proceedings of the International Conference on Advances in ICT for Emerging Regions (ICTer2012), Colombo, Sri Lanka, 2009, pp. 182-188.
[27] C. J. Hutto and E. Gilbert, "VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text," in Proceedings of 8th International Conference on Weblogs and Social Media (ICWSM-14), Ann Arbor, Michigan, USA, 2014, pp. 216-225.
[28] A. Esuli and F. Sebastiani, "SentiWordNet: A Publicly Available Lexical Resource for Opinion Mining," in Proceedings of the 5th Conference on Language Resources and Evaluation (LREC), vol. 6, Genoa – Italy, 2006, pp. 417-422.
[29] A. Severyn and A. Moschitti, "A Deep Learning Approach to Sentiment Analysis of Short Texts Using Word Embeddings and Convolutional Neural Networks," ACM Transactions on Information Systems (TOIS), vol. 34, no. 2, 2015.
[30] L. A. Tawalbeh and A. M. Al-Omari, "Sentiment Analysis Using Corpus-Based Approaches and Lexicon-Based Approaches: A Comparative Study," International Journal of Advanced Computer Science and Applications (IJACSA), vol. 10, no. 9, pp. 103-110, 2019.
[31] P. Chmiel, T. Kajdanowicz, and J. A. Hołyst, "Hybrid Approach for Sentiment Analysis Based on Lexicon Expansion and Convolutional Neural Networks," Information Sciences, vol. 432, pp. 482-492, 2018.
[32] U. A. Khan, B. B. Baharudin, and F. A. Khan, "A Comparative Study of Supervised Machine Learning Techniques for Sentiment Analysis on Twitter Data," Journal of Network and Computer Applications, vol. 88, pp. 57-67, 2017.
[33] I. Loshchilov and F. Hutter, "Decoupled Weight Decay Regularization," in Proceedings of 7th International Conference on Learning Representations (ICLR 2019), New Orleans, LA, USA, 2019, pp. 1-11.
[34] "Abadis," [Online]. Available: https://abadis.ir. [Accessed: Jan.14, 2024].
[35] D. Khashabi, A. Cohan, S. Shakeri, P. Hosseini, P. Pezeshkpour, M. Alikhani, M. Aminnaseri, M. Bitaab, F. Brahman, S. Ghazarian, M. Gheini, A. Kabiri, R. Karimi Mahabagdi, O. Memarrast, and Mosalla, "PARSINLU: A Suite of Language Understanding Challenges for Persian," Transactions of the Association for Computational Linguistics, vol. 9, pp. 1147-1162, 2021.