Document Type : Technical Paper

Authors

1 School of Computer Engineering, Iran University of Science and Technology, Tehran, Iran.

2 School of Computer engineering, Iran University of Science and Technology, Tehran, Iran.

3 School of Computer Engineering, Iran University of Science and Technology, Tehran, Iran

Abstract

Graph representation of data can better define relationships among data components and thus provide better and richer analysis. So far, movies have been represented in graphs many times using different features for clustering, genre prediction, and even for use in recommender systems. In constructing movie graphs, little attention has been paid to their textual features such as subtitles, while they contain the entire content of the movie and there is a lot of hidden information in them. So, in this paper, we propose a method called MoGaL to construct movie graph using LDA on subtitles. In this method, each node is a movie and each edge represents the novel relationship discovered by MoGaL among two associated movies. First, we extracted the important topics of the movies using LDA on their subtitles. Then, we visualized the relationship between the movies in a graph, using the cosine similarity. Finally, we evaluated the proposed method with respect to measures genre homophily and genre entropy. MoGaL succeeded to outperforms the baseline method significantly in these measures. Accordingly, our empirical results indicate that movie subtitles could be considered a rich source of informative information for various movie analysis tasks.

Keywords

[1]‎ J. Luhmann, M. Burghardt, and J. Tiepmar, "SubRosa: Determining Movie Similarities based on Subtitles," INFORMATIK 2020, 2021.
 
[2]‎ K. Bougiatiotis and T. Giannakopoulos, "Content representation and similarity of movies based on topic extraction from subtitles," in Proceedings of the 9th Hellenic Conference on Artificial Intelligence, 2016, pp. 1-7.
 
[3]‎ M. M. Hasan, S. T. Dip, T. M. Kamruzzaman, S. Akter, and I. Salehin, "Movie Subtitle Document Classification Using Unsupervised Machine Learning Approach," in 2021 IEEE 6th International Conference on Computing, Communication and Automation (ICCCA), 2021, pp. 219-224.
 
[4]‎ Y. Tang, J. Yu, C. Li, and J. Fan, "Visual analysis of multimodal movie network data based on the double-layered view," International Journal of Distributed Sensor Networks, 2015.
 
[5]‎ H. Koosha, Z. Ghorbani and R. Nikfetrat, "A Clustering-Classification Recommender System based on Firefly Algorithm," Journal of AI and Data Mining, vol. 10, pp. 103-116, 2022.
 
[6]‎ J. B. Lee, R. A. Rossi, S. Kim, N. K. Ahmed, and E. Koh, "Attention models in graphs: A survey," ACM Transactions on Knowledge Discovery from Data (TKDD), 2019, pp. 1-15.
 
[7]‎ B. Rao and A. Mitra, "Graph Mining and Its Applications in Studying Community-Based Graph under the Preview of Social Network" in Product Innovation through Knowledge Management and Social Media Strategies, 2016, pp. 94-146.
 
[8]‎ D. Sulieman, M. Malek, H. Kadima, and D. Laurent, "Toward social-semantic recommender systems" International Journal of Information Systems and Social Change (IJISSC), vol. 7, pp. 1-30, 2016.
 
[9]‎ M. Zhang and Y. Chen, "Inductive matrix completion based on graph neural networks," in International Conference on Learning Representations, 2020.
 
[10]‎ S. Eden, A. Livne, O. Sar Shalom, B. Shapira, and D. Jannach, "Investigating the Value of Subtitles for Improved Movie Recommendations," in Proceedings of the 30th ACM Conference on User Modeling, Adaptation and Personalization, 2022, pp. 99-109.
 
[11]‎ C. Zhou, H. Chen, J. Zhang, Q. Li, D. Hu, and V. S. Sheng, "Multi-label graph node classification with label attentive neighborhood convolution" Expert Systems with Applications, Vol. 180, 2021.
 
[12]‎ A. Ahmed, V. Batagelj, X. Fu, S. H. Hong, D. Merrick, and A. Mrvar, "Visualisation and analysis of the Internet movie database," in 2007 6th International Asia-Pacific Symposium on Visualization, 2007, pp. 17-24.
 
[13]‎ S. Eden, A. Livne, O. Sar Shalom, B. Shapira, and D. Jannach, "Investigating the Value of Subtitles for Improved Movie Recommendations," in Proceedings of the 30th ACM Conference on User Modeling, Adaptation and Personalization, 2022, pp. 99-109.
 
[14]‎ C. Lee, D. Han, K. Han, and M. Yi, "Improving graph-based movie recommender system using cinematic experience" Applied Sciences, vol. 12, pp. 1493, 2022.
 
[15]‎ M. Goyani and N. Chaurasiya, "A review of movie recommendation system: Limitations, Survey and Challenges" ELCVIA: electronic letters on computer vision and image analysis, vol. 19, pp. 18-37, 2020.
 
[16]‎ D. M. Blei, A. Y. Ng, and M. I. Jordan, "Latent dirichlet allocation" Journal of machine Learning research, pp. 993-1022, Jan 2003.
 
[17]‎ S. B. Park, K. J. Oh, and G. S. Jo, "Social network analysis in a movie using character-net" Multimedia Tools and Applications, vol. 59, pp. 601-627, Jul 2012.
 
[18]‎ A. Spitz and E. Á. Horvát, "Measuring long-term impact based on network centrality: Unraveling cinematic citations," PloS one, Oct 2014.
 
[19]‎ T. Bogers, "Movie recommendation using random walks over the contextual graph," 2010.
 
[20]‎ Z. Z. Darban and M. H. Valipour, "GHRS: Graph-based hybrid recommendation system with application to movie recommendation," Expert Systems with Applications, vol. 200, Aug 2022.
 
[21] K. Bougiatiotis and T. Giannakopoulos, "Enhanced movie content similarity based on textual, auditory and visual information," Expert Systems with Applications, 2018.
 
[22] "IMDb Datasets," IMDb, 16 05 2021. [Online]. Available: https://www.imdb.com/interfaces/. [Accessed: May. 16, 2021].
 
[23] M. Honnibal and I. Montani, "spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing," To appear, vol 7, pp. 411-420, Jul 2017.
 
[24] A. Grover and J. Leskovec, "node2vec: Scalable feature learning for networks," in Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, 2016, pp. 855-864.
 
[25] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, "Distributed representations of words and phrases and their compositionality," Advances in neural information processing systems, vol. 26, 2013.