A Novel Combination of Segmentation, Ensemble Clustering and Genetic Algorithm for Clustering Time Series

Ghorbani, Zahra; Ghorbanian, Ali

doi:10.22044/jadm.2024.14170.2526

Document Type : Original/Review Paper

Authors

Zahra Ghorbani ¹
Ali Ghorbanian ²

¹ Edinburgh Business School, Heriot-Watt University, Edinburgh, Scotland (UK).

² Department of Industrial Engineering, Esfarayen University of Technology, Esfarayen, Iran.

https://doi.org/10.22044/jadm.2024.14170.2526

Abstract

Increasing the accuracy of time-series clustering while reducing execution time is a primary challenge in the field of time-series clustering. Researchers have recently applied approaches, such as the development of distance metrics and dimensionality reduction, to address this challenge. However, using segmentation and ensemble clustering to solve this issue is a key aspect that has received less attention in previous research. In this study, an algorithm based on the selection and combination of the best segments created from a time-series dataset was developed. In the first step, the dataset was divided into segments of equal lengths. In the second step, each segment is clustered using a hierarchical clustering algorithm. In the third step, a genetic algorithm selects different segments and combines them using combinatorial clustering. The resulting clustering of the selected segments was selected as the final dataset clustering. At this stage, an internal clustering criterion evaluates and sorts the produced solutions. The proposed algorithm was executed on 82 different datasets in 10 repetitions. The results of the algorithm indicated an increase in the clustering efficiency of 3.07%, reaching a value of 67.40. The obtained results were evaluated based on the length of the time series and the type of dataset. In addition, the results were assessed using statistical tests with the six algorithms existing in the literature.

Keywords

Main Subjects

D. Data

References

[1] M. Maleki, H. Bidram, and D. Wraith, "Robust clustering of COVID-19 cases across US counties using mixtures of asymmetric time series models with time varying and freely indexed covariates," Journal of Applied Statistics. vol. 50, pp. 2648–2662, 2022.

[2] M. Castán-Lascorz, P. Jiménez-Herrera, A. Troncoso, and G. Asencio-Cortés, "A new hybrid method for predicting univariate and multivariate time series based on pattern forecasting," Information Sciences. vol. 586, pp. 611-627, 2022.

[3] P. Laurinec, M. Lóderer, M. Lucká, and V. Rozinajová, "Density-based unsupervised ensemble learning methods for time series forecasting of aggregated or clustered electricity consumption," Journal of Intelligent Information Systems. vol. 53, pp. 219-239, 2019.

[4] S. Xu, H. K. Chan, E. Ch’ng, and K. H. Tan, "A comparison of forecasting methods for medical device demand using trend-based clustering scheme," Journal of Data, Information and Management. vol. 2, pp. 85–94, 2020.

[5] T. M. Dantas and F. L. C. Oliveira, "Improving time series forecasting: An approach combining bootstrap aggregation, clusters and exponential smoothing," International Journal of Forecasting. vol. 34, pp. 748-761, 2018.

[6] J. Li, H. Izakian, W. Pedrycz, and I. Jamal, "Clustering-based anomaly detection in multivariate time series data," Applied Soft Computing. vol. 100, p. 106919, 2021.

[7] P. D’Urso, L. De Giovanni, and R. Massari, "Trimmed fuzzy clustering of financial time series based on dynamic time warping," Annals of Operations Research. vol. 299, pp. 1379-1395, 2021.

[8] S. Datta, S. Rokade, and S. P. Rajput, "Classification of uncontrolled intersections using hierarchical clustering," Arabian Journal for Science and Engineering. vol. 45, pp. 8591-8606, 2020.

[9] A. Hatamlou and M. Deljavan, "Forecasting gold price using data mining techniques by considering new factors," Journal of AI and Data Mining. vol. 7, pp. 411-420, 2019.

[10] S. Aghabozorgi, A. S. Shirkhorshidi, and T. Y. Wah, "Time-series clustering–a decade review," Information Systems. vol. 53, pp. 16-38, 2015.

[11] L. Wang and P. Koniusz."Uncertainty-DTW for time series and sequences," presented at the European Conference on Computer Vision, 2022, pp. 176-195.

[12] G. Soleimani and M. Abessi, "DLCSS: A new similarity measure for time series data mining," Engineering Applications of Artificial Intelligence. vol. 92, p. 103664, 2020.

[13] M. A. Rahim Khan and M. Zakarya, "Longest common subsequence based algorithm for measuring similarity between time series: a new approach," World Applied Sciences Journal. vol. 24, pp. 1192-1198, 2013.

[14] H. Kamalzadeh, A. Ahmadi, and S. Mansour, "Clustering time-series by a novel slope-based similarity measure considering particle swarm optimization," Applied Soft Computing. vol. 96, p. 106701, 2020.

[15] X. Wang, F. Yu, W. Pedrycz, and J. Wang, "Hierarchical clustering of unequal-length time series with area-based shape distance," Soft Computing. vol. 23, pp. 6331-6343, 2019.

[16] M. Łuczak, "Hierarchical clustering of time series data with parametric derivative dynamic time warping," Expert Systems with Applications. vol. 62, pp. 116-130, 2016.

[17] R. Ma and R. Angryk."Distance and density clustering for time series data," presented at the 2017 IEEE International Conference on Data Mining Workshops (ICDMW), 2017, pp. 25-32

[18] T. Górecki, "Classification of time series using combination of DTW and LCSS dissimilarity measures," Communications in Statistics-Simulation and Computation. vol. 47, pp. 263-276, 2018.

[19] S. Aghabozorgi, T. Ying Wah, T. Herawan, H. A. Jalab, M. A. Shaygan, and A. Jalali, "A hybrid algorithm for clustering of time series data based on affinity search technique," The Scientific World Journal. vol. 2014, 2014.

[20] X. Zhang, J. Liu, Y. Du, and T. Lv, "A novel clustering method on time series data," Expert Systems with Applications. vol. 38, pp. 11891-11900, 2011.

[21] N. Manakova and V. Tkachenko."Two-stage time-series clustering approach under reducing time cost requirement," presented at the 2020 IEEE 15th International Conference on Advanced Trends in Radioelectronics, Telecommunications and Computer Engineering (TCSET), 2020, pp. 653-658.

[22] Z. Izakian and M. Mesgari, "Fuzzy clustering of time series data: A particle swarm optimization approach," Journal of AI and Data Mining. vol. 3, pp. 39-46, 2015.

[23] R. J. Hyndman, E. Wang, and N. Laptev."Large-scale unusual time series detection," presented at the 2015 IEEE International Conference on Data Mining Workshop (ICDMW), 2015, pp. 1616-1619.

[24] Y. Zou, R. V. Donner, N. Marwan, J. F. Donges, and J. Kurths, "Complex network approaches to nonlinear time series analysis," Physics Reports. vol. 787, pp. 1-97, 2019.

[25] L. N. Ferreira and L. Zhao, "Time series clustering via community detection in networks," Information Sciences. vol. 326, pp. 227-242, 2016.

[26] H. Liu, J. Zou, and N. Ravishanker, "Clustering high‐frequency financial time series based on information theory," Applied Stochastic Models in Business and Industry. vol. 38, pp. 4-26, 2022.

[27] D. Guijo-Rubio, A. M. Durán-Rosal, P. A. Gutiérrez, A. Troncoso, and C. Hervás-Martínez, "Time-Series Clustering Based on the Characterization of Segment Typologies," IEEE Transactions on Cybernetics. vol. 51, pp. 5409-5422, 2020.

[28] F. Bonacina, E. S. Miele, and A. Corsini, "Time Series Clustering: A Complex Network-Based Approach for Feature Selection in Multi-Sensor Data," Modelling. vol. 1, pp. 1-21, 2020.

[29] A. Koski, M. Juhola, and M. Meriste, "Syntactic recognition of ECG signals by attributed finite automata," Pattern Recognition. vol. 28, pp. 1927-1940, 1995.

[30] E. J. Keogh and M. J. Pazzani."An enhanced representation of time series which allows fast and accurate classification, clustering and relevance feedback," presented at the Knowledge Discovery and Data Mining, 1998, pp. 239-243.

[31] E. Keogh, S. Chu, D. Hart, and M. Pazzani (2004), "Segmenting time series: A survey and novel approach," in Data Mining in Time Series Databases, M. Last Ed.: World Scientific, pp. 1-21.

[32] C. Faloutsos, M. Ranganathan, and Y. Manolopoulos, "Fast subsequence matching in time-series databases," ACM Sigmod Record. vol. 23, pp. 419-429, 1994.

[33] E. Keogh and C. A. Ratanamahatana, "Exact indexing of dynamic time warping," Knowledge and Information Systems. vol. 7, pp. 358-386, 2005.

[34] M. Djukanovic, G. R. Raidl, and C. Blum, "Finding Longest Common Subsequences: New anytime A∗ search results," Applied Soft Computing. vol. 95, p. 106499, 2020.

[35] M. Paterson and V. Dančík."Longest common subsequences," presented at the International Symposium on Mathematical Foundations of Computer Science, 1994, pp. 127-142.

[36] R. Lin, A. King-lp, and H. S. S. K. Shim."Fast similarity search in the presence of noise, scaling, and translation in time-series databases," presented at the Proceeding of the 21th International Conference on Very Large Data Bases, 1995, pp. 490-501.

[37] M. Vlachos, G. Kollios, and D. Gunopulos."Discovering similar multidimensional trajectories," presented at the Proceedings 18th International Conference on Data Engineering, 2002, pp. 673-684.

[38] D. Huang, C.-D. Wang, and J.-H. Lai, "Locally weighted ensemble clustering," IEEE Transactions on Cybernetics. vol. 48, pp. 1460-1473, 2017.

[39] J. Yang and J. Leskovec."Patterns of temporal variation in online media," presented at the Proceedings of the Fourth ACM International Conference on Web Search and Data mining, 2011, pp. 177-186.

[40] J. Demšar, "Statistical comparisons of classifiers over multiple data sets," The Journal of Machine Learning Research. vol. 7, pp. 1-30, 2006.

A Novel Combination of Segmentation, Ensemble Clustering and Genetic Algorithm for Clustering Time Series

References

References

Volume 12, Issue 2April 2024Pages 273-286

Volume 12, Issue 2
April 2024
Pages 273-286