Document Type : Original/Review Paper


Faculty of Computer Engineering, Shahrood University of Technology, Shahrood, Iran.


Coronavirus disease as a persistent epidemic of acute respiratory syndrome posed a challenge to global healthcare systems. Many people have been forced to stay in their homes due to unprecedented quarantine practices around the world. Since most people used social media during the Coronavirus epidemic, analyzing the user-generated social content can provide new insights and be a clue to track changes and their occurrence over time. An active area in this space is the prediction of new infected cases from Coronavirus-generated social content. Identifying the social content that relates to Coronavirus is a challenging task because a significant number of posts contain Coronavirus-related content but do not include hashtags or Corona-related words. Conversely, posts that have the hashtag or the word Corona but are not really related to the meaning of Coronavirus and are mostly promotional. In this paper, we propose a semantic approach based on word embedding techniques to model Corona and then introduce a new feature namely semantic similarity to measure the similarity of a given post to Corona in semantic space. Furthermore, we propose two other features namely fear emotion and hope feeling to identify the Coronavirus-related posts. These features are used as statistical indicators in a regression model to estimate the new infected cases. We evaluate our features on the Persian dataset of Instagram posts, which was collected in the first wave of Coronavirus, and demonstrate that the consideration of the proposed features will lead to improved performance of the Coronavirus incidence rate estimation.


Main Subjects

[1] S.M. Ayyoubzadeh, S. M. Ayyoubzadeh, H. Zahed, M. Ahmadi, and S. R. N. Kalhori, “Predicting COVID-19 Incidence Through Analysis of Google Trends Data in Iran: Data Mining and Deep Learning Pilot Study,” JMIR Public Health Surveill, vol. 6, no. 2, pp. e18828, 2020.
[2] S.-F. Tsao, H. Chen, T. Tisseverasinghe, Y. Yang, L. Li, Z. A. Butt, “What social media told us in the time of COVID-19: a scoping review,” The Lancet Digital Health, vol. 3, no. 3, pp. e175-e94, 2021.
[3] A. M. Forati and R. Ghose, “Geospatial analysis of misinformation in COVID-19 related tweets,” Applied Geography, vol. 133, pp. 102473. 2021.
[4] K. Rudra, A. Sharma, N. Ganguly, and M. Imran, “Classifying and summarizing information from microblogs during epidemics,” Information Systems Frontiers, vol. 20, no. 5, pp. 933– 948, 2018.
[5] Y. Su, P. Wu, S. Li, J. Xue, and T. Zhu, “Public emotion responses during covid-19 in China on social media: An observational study,” Human Behavior and Emerging Technologies, vol. 3, no. 1, pp. 127–136, 2021.
[6] E. Abdukhamidov, F. Juraev, M. Abuhamad, S. El-Sappagh, T. AbuHmed, “Sentiment Analysis of Users’ Reactions on Social Media during the Pandemic, ” Electronics, vol. 11, no. 10, pp. 1648, 2022.
[7] A. Abd-Alrazaq, D. Alhuwail, M. Househ, M. Hamdi, Z. Shah, et al. “Top Concerns of Tweeters During the COVID-19 Pandemic: Infoveillance Study,” Journal of Medical Internet Research, vol. 22, no. 4, pp. e19016, 2020.
[8] A. Al-Rawi,  M. Siddiqi, R. Morgan, N. Vandan, J. Smith, C. Wenham, “COVID-19 and the Gendered Use of Emojis on Twitter: Infodemiology Study,” Journal of Medical Internet Research, vol. 22,no. 11, pp. e21646, 2020.
[9] S. Yousefinaghani, R. Dara, S. Mubareka, S. Sharif, “Prediction of COVID-19 Waves Using Social Media and Google Search: A Case Study of the US and Canada,” Frontiers in Public Health, vol. 9, pp. 359, 2021.
[10] E. Gharavi, N. Nazemi, and  F. Dadgostari, “Early Outbreak Detection for Proactive Crisis Management Using Twitter Data: COVID-19 a Case Study in the US,” arXiv preprint arXiv:2005.00475, 2020.
[11] P. Bojanowski, E. Grave, A. Joulin, T. Mikolov, “Enriching Word Vectors with Subword Information,” Transactions of the Association for Computational Linguistics, vol. 5, pp. 135-46, 2017.
[12] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” Proceedings of  Workshop at ICLR, 2013.
[13] F. Amiri, S. Abbasi,  and M. Babaie Mohamadeh, “Clustering Methods to Analyze Social Media Posts during Coronavirus Pandemic in Iran,” Journal of AI and Data Mining, vol. 10, no. 2, pp. 159-69, 2022.
[14] M. Stellefson, S. R. Paige, B. H. Chaney, J. D. Chaney, "Evolving Role of Social Media in Health Promotion: Updated Responsibilities for Health Education Specialists,” International Journal of Environmental Research and Public Health, vol. 17, no. 4, 2020.
[15] E. Aramaki, S. Maskawa, and M. Morita, “Twitter catches the flu: detecting influenza epidemics using Twitter,” in Proceedings of the 2011 Conference on empirical methods in natural language processing, pp. 1568-1576, 2011.
[16] T. Bodnar and M. Salathé, “Validating models for disease detection using twitter,” in Proceedings of the 22nd International Conference on World Wide Web,  pp. 699-702, 2013.
[17] M. Odlum and S. Yoon, “What can we learn about the Ebola outbreak from tweets?” American journal of infection control, vol. 43, no. 6, pp. 563-71,  2015.
[18] C. Li, L. J. Chen, X. Chen, M. Zhang, C. P. Pang, and H. Chen, “Retrospective analysis of the possibility of predicting the COVID-19 outbreak from Internet searches and social media data, China, 2020,” Eurosurveillance, vol. 25, no. 10, pp. 2000199, 2020.
[19] D. E. O'Leary and V. C. Storey, “A Google–Wikipedia–Twitter model as a leading indicator of the numbers of coronavirus deaths, ” Intelligent Systems in Accounting, Finance and Management, vol. 27, no. 3, pp. 151-8, 2020.
[20] G. Doblhammer, C. Reinke, and D. Kreft, “Social disparities in the first wave of COVID-19 incidence rates in Germany: a county-scale explainable machine learning approach,” BMJ open, vol. 12, no. 2, pp. e049852, 2022.
[21] M. Català, D. Pino, M. Marchena, P. Palacios, T. Urdiales, Cardona P-J, et al. “Robust estimation of diagnostic rate and real incidence of COVID-19 for European policymakers,” PLoS One, vol. 16, no. 1, pp. e0243701, 2021.
[22] F. Niknam, M. Samadbeik, F. Fatehi, M. Shirdel, M. Rezazadeh, and P. Bastani, “COVID-19 on Instagram: A content analysis of selected accounts,” Health Policy and Technology, vol. 10, no. 1, pp. 165-73, 2021.
[23] D. Amanatidis, I. Mylona, I. Kamenidou, S. Mamalis, and A. Stavrianea, “Mining textual and imagery instagram data during the COVID-19 pandemic,” Applied Sciences, vol. 11, no. 9, pp. 4281, 2021.
[24] F. Jafarinejad,  M. Rahimi, and H. Mashayekhi,  “Tracking and analysis of discourse dynamics and polarity during the early Corona pandemic in Iran,”  Journal of Biomedical Informatics, vol. 121, pp. 103862, 2021.
[25] F. Abel, Q. Gao, G.-J. Houben,  and K. Tao, “Analyzing temporal dynamics in twitter profiles for personalized recommendations in the social web,” Proceedings of the 3rd international web science conference, pp.1-8, 2011.
[26] K. Lee, A. Agrawal, and A. Choudhary,  “Forecasting influenza levels using real-time social media streams,” 2017 IEEE International Conference on Healthcare Informatics (ICHI), pp. 409-414, IEEE, 2017.
[27] J. Pennington, R. Socher, C. D. Manning, “Glove: Global vectors for word representation,” Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532-1543, 2014.
[28] C. Baydogan et al. “Deep-Cov19-hate: A textual-based novel approach for automatic detection of hate speech in online social networks throughout COVID-19 with shallow and deep learning models,” Tehnički vjesnik, vol. 29, no. 1, pp. 149-156, 2022.
[29] F. Es-Sabery, K. Es-Sabery, J. Qadir, B. Sainz-De-Abajo, A. Hair, B. Garcia-Zapirain, and  I.  De  La Torre-D´ıez, “A MapReduce opinion mining for COVID-19-related tweets classification using enhanced ID3 decision tree classifier,” IEEE Access, vol. 9, pp. 58706-39, 2021.
[30] A. M. U. D. Khanday, Q. R. Khan, and S. T. Rabani, “Identifying propaganda from online social networks during COVID-19 using machine learning techniques,” International Journal of Information Technology, vol. 13, no. 1, pp. 115-122, 2021.
[31] T. Kenter and M. De Rijke, “Short text similarity with word embeddings,” Proceedings of the 24th ACM international on conference on information and knowledge management, pp. 1411-1420, 2015.
[32] J. M. Shultz, J. I. Cooper, F. Baingana, M. A. Oquendo, Z. Espinel, B. M.  Althouse, et al, “The role of fear-related behaviors in the 2013–2016 West Africa Ebola virus disease outbreak,” Current psychiatry reports, vol. 18, no. 11, pp. 1-14, 2016.
[33] C. Murray, L. Mitchell, J. Tuke, and M. Macka, “Symptom extraction from the narratives of personal experiences with COVID-19 on Reddit,” MAISON Workshop Proceedings of the 15th International AAAI Conference on Web and social media (ICWSM), 2020.
[34] B. Kleinberg, I. van der Vegt, M. Mozes, “Measuring emotions in the covid-19 real world worry dataset,”  Proceedings of the 1st Work-shop on NLP for COVID-19 at ACL, 2020.
[35] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Hu-man Language Technologies, vol. 1, pp. 4171-4186, 2018.
[36] M. Y. Kabir and S. Madria, “EMOCOV: Machine learning for emotion detection, analysis and visualization using COVID-19 tweets,” Online Social Networks and Media, vol. 23, pp. 100135, 2021.
[37] A. Chiorrini, C. Diamantini, A. Mircoli, D. Potena, “Emotion and sentiment analysis of tweets using BERT,” EDBT/ICDT Workshops, 2021.
[38] J. Wang and L. Wei, “Fear and hope, bitter and sweet: Emotion sharing of cancer community on twitter,” Social Media+ Society, vol. 6,no. 1, pp. 2056305119897319, 2020.
[39] J. G. Myrick, A. E. Holton, I. Himelboim, B. Love, “# Stupidcancer: exploring a typology of social support and the role of emotional expression in a social media community,” Health communication, vol. 31, no. 5, pp. 596-605, 2016.
[40] J. G. Myrick and J. F. Willoughb, “A mixed methods inquiry into the role of Tom Hanks’ COVID-19 social media disclosure in shaping willingness to engage in prevention behaviors,” Health Communication, vol. 37, no. 7, pp. 824-32, 2022.
[41] C. A. Mousing, D. Sørensen, “Living with the risk of being infected: COPD patients' experiences during the coronavirus pandemic,” Journal of clinical nursing, vol. 30, no. 11-12, pp. 1719-29, 2021.
[42] M. Awad, R. Khanna, M. Awad, and R. Khanna, Support vector regression. Efficient learning machines: Theories, concepts, and applications for engineers and system designers, pp. 67-80, Springer nature, 2015.
[43] R. Řehůřek and P. Sojka, “Software framework for topic modelling with large corpora, ” University of Malta, 2010.
[44] A. Andreas, C. X. Mavromoustakis, G. Mastorakis, S. Mumtaz, J. M. Batalla, E. Pallis, “Modified machine learning Techique for curve fitting on regression models for COVID-19 projections,” 2020 IEEE 25th international workshop on computer aided modeling and design of communication links and networks (CAMAD), vol. 6, no. 2, pp. e18828, 2020.