Document Type : Original/Review Paper

Authors

Department of Industrial Engineering, Ferdowsi University of Mashhad, Mashhad, Iran.

Abstract

In time series clustering, features are typically extracted from the time series data and used for clustering instead of directly clustering the data. However, using the same set of features for all data sets may not be effective. To overcome this limitation, this study proposes a five-step algorithm that extracts a complete set of features for each data set, including both direct and indirect features. The algorithm then selects essential features for clustering using a genetic algorithm and internal clustering criteria. The final clustering is performed using a hierarchical clustering algorithm and the selected features. Results from applying the algorithm to 81 data sets indicate an average Rand index of 72.16%, with 38 of the 78 extracted features, on average, being selected for clustering. Statistical tests comparing this algorithm to four others in the literature confirm its effectiveness.

Keywords

Main Subjects

[1] W. Lin, M. A. Orgun, and G. J. Williams, "An Overview of Temporal Data Mining," in AusDM, pp. 83-90, 2002.
 
[2] P. Rai and S. Singh, "A survey of clustering techniques," International Journal of Computer Applications, Vol. 7, No. 12, pp. 1-5, Art No. 2010.
 
[3] A. Ghorbanian and H. Razavi, "A new method based on ensemble time series for fast and accurate clustering," Data Technologies and Applications. 2023.
 
[4] A. Javed, B. S. Lee, and D. M. Rizzo, "A benchmark study on time series clustering," Machine Learning with Applications, vol. 1, p. 100001, Art No. 2020.
 
[5] A. Ghorbanian and H. Razavi, "A novel two-level clustering algorithm for time series group forecasting," Journal of Industrial and Systems Engineering. 2023.
 
[6] D. Graves and W. Pedrycz, "Proximity fuzzy clustering and its application to time series clustering and prediction," in 2010 10th International Conference on Intelligent Systems Design and Applications: IEEE, pp. 49-54, 2010.
 
[7] S. Datta, S. Rokade, and S. P. Rajput, "Classification of uncontrolled intersections using hierarchical clustering," Arabian Journal for Science and Engineering, vol. 45, no. 10, pp. 8591-8606, Art No. 2020.
 
[8] H. Koosha, Z. Ghorbani, and R. Nikfetrat, "A Clustering-Classification Recommender System based on Firefly Algorithm," Journal of AI and Data Mining, vol. 10, no. 1, pp. 103-116, Art No. 2022.
 
[9] N. Esfandian, F. Jahani Bahnamiri, and S. Mavaddati, "Voice activity detection using clustering-based method in Spectro-Temporal features space," Journal of AI and Data Mining, vol. 10, no. 3, pp. 401-409, Art No. 2022.
 
[10] G. Soleimani and M. Abessi, "DLCSS: A new similarity measure for time series data mining," Engineering Applications of Artificial Intelligence, vol. 92, p. 103664, Art No. 2020.
 
[11] M. A. Rahim Khan and M. Zakarya, "Longest common subsequence based algorithm for measuring similarity between time series: a new approach," World Applied Sciences Journal, vol. 24, no. 9, pp. 1192-1198, Art No. 2013.
 
[12] M. Łuczak, "Hierarchical clustering of time series data with parametric derivative dynamic time warping," Expert Systems with Applications, vol. 62, pp. 116-130, Art No. 2016.
 
[13] X. Zhang, J. Liu, Y. Du, and T. Lv, "A novel clustering method on time series data," Expert Systems with Applications, vol. 38, no. 9, pp. 11891-11900, Art No. 2011.
 
[14] N. Manakova and V. Tkachenko, "Two-stage time-series clustering approach under reducing time cost requirement," in 2020 IEEE 15th International Conference on Advanced Trends in Radioelectronics, Telecommunications and Computer Engineering (TCSET): IEEE, pp. 653-658, 2020.
 
[15] N. Tavakoli, S. Siami-Namini, M. Adl Khanghah, F. Mirza Soltani, and A. Siami Namin, "An autoencoder-based deep learning approach for clustering time series data," SN Applied Sciences, vol. 2, pp. 1-25, Art No. 2020.
 
[16] S. Mehrmolaei and M. R. Keyvanpour, "A comparative study on weighting-based clustering techniques: Time series data," in 2018 8th Conference of AI & Robotics and 10th RoboCup Iranopen International Symposium (IRANOPEN): IEEE, pp. 65-72, 2018.
 
[17] S. Zolhavarieh, S. Aghabozorgi, and Y. W. Teh, "A review of subsequence time series clustering," The Scientific World Journal, vol. 2014. 2014.
 
[18] B. D. Fulcher, "Feature-based time-series analysis," arXiv preprint arXiv:1709.08055. 2017.
[19] X. Wang, K. Smith, and R. Hyndman, "Characteristic-based clustering for time series data," Data mining and knowledge Discovery, vol. 13, no. 3, pp. 335-364, Art No. 2006.
 
[20] T. Räsänen and M. Kolehmainen, "Feature-based clustering for electricity use time series data," in International conference on adaptive and natural computing algorithms: Springer, pp. 401-412, 2009.
 
[21] B. D. Fulcher and N. S. Jones, "Highly comparative feature-based time-series classification," IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 12, pp. 3026-3037, Art No. 2014.
 
[22] X. Wang, A. Wirth, and L. Wang, "Structure-based statistical features and multivariate time series clustering," in Seventh IEEE international conference on data mining (ICDM 2007): IEEE, pp. 351-360, 2007.
 
[23] R. J. Hyndman, E. Wang, and N. Laptev, "Large-scale unusual time series detection," in 2015 IEEE international conference on data mining workshop (ICDMW): IEEE, pp. 1616-1619, 2015.
 
[24] M. Barandas et al., "TSFEL: Time series feature extraction library," SoftwareX, vol. 11, p. 100456, Art No. 2020.
 
[25] C. Liu, W.-X. Zhou, and W.-K. Yuan, "Statistical properties of visibility graph of energy dissipation rates in three-dimensional fully developed turbulence," Physica A: Statistical Mechanics and its Applications, vol. 389, no. 13, pp. 2675-2681, Art No. 2010.
 
[26] E. Zhuang, M. Small, and G. Feng, "Time series analysis of the developed financial markets’ integration using visibility graphs," Physica A: Statistical Mechanics and its Applications, vol. 410, pp. 483-495, Art No. 2014.
 
[27] B. Luque, L. Lacasa, F. Ballesteros, and J. Luque, "Horizontal visibility graphs: Exact results for random time series," Physical Review E, vol. 80, no. 4, p. 046103, Art No. 2009.
 
[28] A. S. Campanharo, M. I. Sirer, R. D. Malmgren, F. M. Ramos, and L. A. N. Amaral, "Duality between time series and networks," PloS one, vol. 6, no. 8, p. e23378, Art No. 2011.
 
[29] Y. Zou, R. V. Donner, N. Marwan, J. F. Donges, and J. Kurths, "Complex network approaches to nonlinear time series analysis," Physics Reports, vol. 787, pp. 1-97, Art No. 2019.
 
[30] V. A. F. da Silva, "Time Series Analysis based on Complex Networks." 2018.
 
[31] H. Liu and H. Motoda (2012) Feature selection for knowledge discovery and data mining. Springer Science & Business Media.
 
[32] S. Hosseini and M. Khorashadizade, "Efficient Feature Selection Method using Binary Teaching-learning-based Optimization Algorithm," Journal of AI and Data Mining, vol. 11, no. 1, pp. 29-37, Art No. 2023.
 
[33] M. A. Khan et al., "A fused heterogeneous deep neural network and robust feature selection framework for human actions recognition," Arabian Journal for Science and Engineering, pp. 1-16, Art No. 2021.
 
[34] G. Ansari, T. Ahmad, and M. N. Doja, "Hybrid filter–wrapper feature selection method for sentiment classification," Arabian Journal for Science and Engineering, vol. 44, no. 11, pp. 9191-9208, Art No. 2019.
 
[35] R. Lamba, T. Gulati, and A. Jain, "A Hybrid Feature Selection Approach for Parkinson’s Detection Based on Mutual Information Gain and Recursive Feature Elimination," Arabian Journal for Science and Engineering, vol. 47, no. 8, pp. 10263-10276, Art No. 2022.
 
[36] A. Deniz, H. E. Kiziloz, T. Dokeroglu, and A. Cosar, "Robust multiobjective evolutionary feature subset selection algorithm for binary classification using machine learning techniques," Neurocomputing, vol. 241, pp. 128-146, Art No. 2017.
 
[37] C. Wang, Y. Huang, M. Shao, Q. Hu, and D. Chen, "Feature selection based on neighborhood self-information," IEEE Transactions on Cybernetics, vol. 50, no. 9, pp. 4031-4042, Art No. 2019.
 
[38] I. Tsamardinos, G. Borboudakis, P. Katsogridakis, P. Pratikakis, and V. Christophides, "A greedy feature selection algorithm for Big Data of high dimensionality," Machine learning, vol. 108, no. 2, pp. 149-202, Art No. 2019.
 
[39] E.-G. Talbi, L. Jourdan, J. Garcia-Nieto, and E. Alba, "Comparison of population based metaheuristics for feature selection: Application to microarray data classification," in 2008 IEEE/ACS International Conference on Computer Systems and Applications: IEEE, pp. 45-52, 2008.
 
[40] L. Ostroumova Prokhorenkova, "Global clustering coefficient in scale-free weighted and unweighted networks," Internet Mathematics, vol. 12, no. 1-2, pp. 54-67, Art No. 2016.
 
[41] S. Kumar, B. Panda, and D. Aggarwal, "Community detection in complex networks using network embedding and gravitational search algorithm," Journal of Intelligent Information Systems, vol. 57, no. 1, pp. 51-72, Art No. 2021.
 
[42] A. Ghorbanian and M. Neyestani, "A New Approach to Community Detection in Complex Networks by Using Memetic Algorithms," Advances in Modelling and Analysis A, vol. 54, no. 3, pp. 384-406, Art No. 2017.
 
[43] Y. Liu, Z. Li, H. Xiong, X. Gao, and J. Wu, "Understanding of internal clustering validation measures," in 2010 IEEE international conference on data mining: IEEE, pp. 911-916, 2010.
 
[44] H. A. Dau et al., "The UCR time series archive," IEEE/CAA Journal of Automatica Sinica, vol. 6, no. 6, pp. 1293-1305, Art No. 2019.
 
[45] J. Demšar, "Statistical comparisons of classifiers over multiple data sets," The Journal of Machine Learning Research, vol. 7, pp. 1-30, Art No. 2006.
 
[46] J. Yang and J. Leskovec, "Patterns of temporal variation in online media," in Proceedings of the fourth ACM international conference on Web search and data mining, pp. 177-186, 2011.
 
[47] D. Guijo-Rubio, A. M. Durán-Rosal, P. A. Gutiérrez, A. Troncoso, and C. Hervás-Martínez, "Time-Series Clustering Based on the Characterization of Segment Typologies," IEEE Transactions on Cybernetics, vol. 51, no. 11, pp. 5409-5422, Art No. 2020.