Discrete Rotated Isolation Forest in High Dimensions

Monemizadeh, Vahideh; Kiani, Kourosh

doi:10.22044/jadm.2025.15883.2704

Document Type : Original/Review Paper

Authors

Electrical and Computer Engineering Department, Semnan University, Semnan, Iran.

https://doi.org/10.22044/jadm.2025.15883.2704

Abstract

Anomaly detection is becoming increasingly crucial across various fields, including cybersecurity, financial risk management, and health monitoring. However, it faces significant challenges when dealing with large-scale, high-dimensional, and unlabeled datasets. This study focuses on decision tree-based methods for anomaly detection due to their scalability, interpretability, and effectiveness in managing high-dimensional data. Although Isolation Forest (iForest) and its extended variant, Extended Isolation Forest (EIF), are widely used, they exhibit limitations in identifying anomalies, particularly in handling normal data distributions and preventing the formation of ghost clusters. The Rotated Isolation Forest (RIF) was developed to address these challenges, enhancing the model's ability to discern true anomalies from normal variations by employing randomized rotations in feature space. Building on this approach, we proposed the Discrete Rotated Isolation Forest (DRIF) model, which integrates an Autoencoder for dimensionality reduction. Using a discrete probability distribution and an Autoencoder enhance computational efficiency. Experimental evaluations on synthetic and real-world datasets demonstrate that proposed model outperforms iForest, EIF, and RIF. And also achieving higher Receiver Operating Characteristic-Area Under the Curve (ROC-AUC) scores and significantly faster execution times. These findings establish the proposed model as a robust, scalable, and efficient approach for unsupervised anomaly detection in high-dimensional datasets.

Keywords

Main Subjects

H.3. Artificial Intelligence

References

[1] V. Monemizadeh, K. Kiani. “Detecting anomalies using rotated isolation forest.” Data Min Knowl Disc 39, 24 (2025).

[2] Chugh, Bharti, Nitin Malik, Deepak Gupta, and Badr S. Alkahtani. “A probabilistic approach driven credit card anomaly detection with CBLOF and isolation forest models.” Alexandria Engineering Journal 114 (2025).

[3] Buchdadi, A. Dharmawan, and A. Salamh Mujali Al-Rawahna. “Anomaly Detection in Open Metaverse Blockchain Transactions Using Isolation Forest and Autoencoder Neural Networks.” International Journal Research on Metaverse 2, no. 1 (2025).

[4] R. Morshedi, and S. Mojtaba Matinkhah. “Anomaly Detection in IoT Traffic in the Presence of Gaussian Noise Using Deep Neural Networks.” Journal of AI and Data Mining (2025).

[5] ‎V. Chandola, A. Banerjee, V. Kumar. “Anomaly detection: A survey.” ACM Computing Surveys. 41 (3): 1–58. (2009).

[6] M. Çelik, F, Dadaşer-Çelik, and A, Şakir Dokuz. “Anomaly detection in temperature data using dbscan algorithm.” In 2011 international symposium on innovations in intelligent systems and applications, pages 91–95. IEEE, (2011).

[7] G. Münz, Sa Li, and G. Carle. “Traffic anomaly detection using k-means clustering.” In GI/ITG Workshop MMBnet, volume 7, page 9, (2007).

[8] F. Tony Liu, K. Ming Ting, and Z. Zhou. “Isolation forest.” In 2008 eighth ieee international conference on data mining, pages 413–422. IEEE, (2008).

[9] Liu, F. Tony, K. Ming Ting, and Z. Zhou. “Isolation-based anomaly detection.” ACM Transactions on Knowledge Discovery from Data (TKDD) 6, no. 1 (2012).

[10] S. Hariri, M. Carrasco Kind, and R. J. Brunner. “Extended isolation forest.” IEEE transactions on knowledge and data engineering 33, no. 4 (2019).

[11] A. Maćkiewicz, and W. Ratajczak. “Principal components analysis (PCA).” Computers & Geosciences 19, no. 3 (1993).

[12] B. W. Johnson, and J. Lindenstrauss. "Extensions of Lipschitz mappings into a Hilbert space." Contemporary mathematics 26, no. 189-206 (1984).

[13] R. Chalapathy, A. Krishna Menon, and S. Chawla. “Anomaly detection using one-class neural networks.” arXiv preprint arXiv:1802.06360, (2018).

[14] S. Mascaro, A. E Nicholso, and K. B Korb. “Anomaly detection in vessel tracks using bayesian networks.” International Journal of Approximate Reasoning, 55(1):84–98, (2014).

[15] K. Li, H. Huang, S. Tian, and W. Xu. “Improving one-class svm for anomaly detection.” In Proceedings of the 2003 international conference on machine learning and cybernetics (IEEE Cat. No. 03EX693), volume 5, pages 3077–3081. IEEE, (2003).

[16] N. Duffield, P. Haffner, B. Krishnamurthy, and H. Ringberg. “Rule-based anomaly detection on ip flows.” In IEEE INFOCOM 2009, pages 424–432. IEEE, (2009).

[17] R. Laxhammar. “Anomaly detection for sea surveillance.” In 2008 11th international conference on information fusion, pages 1–8. IEEE, (2008).

[18] O. Salem, A. Guerassimov, A. Mehaoua, A. Marcus, and B. Furht. “Anomaly detection in medical wireless sensor networks using svm and linear regression models.” International Journal of E-Health and Medical Communications (IJEHMC), 5(1):20–45, (2014).

[19] M. M. Breunig, H.-Peter Kriegel, R. T. Ng, and J. Sander. “Lof: Identifying density-based local outliers.” SIGMOD , page 93–104, New York, NY, USA, (2000).

[20] J. Tang, Z. Chen, A. Wai-Chee Fu, and D. W Cheung. “Enhancing effectiveness of outlier detections for low density patterns.” In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 535–548. Springer, (2002).

[21] S. Papadimitriou, H. Kitagawa, P. B Gibbons, and C. Faloutsos. “Loci: Fast outlier detection using the local correlation integral.” In Proceedings 19th international conference on data engineering (Cat. No. 03CH37405), pages 315–326. IEEE, (2003).

[22] W. Jin, A. KH Tung, J. Han, and W. Wang. “Ranking outliers using symmetric neighborhood relationship.” In Pacific-Asia conference on knowledge discovery and data mining, pages 577–593. Springer, (2006).

[23] H.-Peter Kriegel, P. Kröger, E. Schubert, and A. Zimek. “Loop: local outlier probabilities.” In Proceedings of the 18th ACM conference on Information and knowledge management, pages 1649–1652, (2009).

[24] V. Chandola, A. Banerjee, and Kumar. “Anomaly detection: A survey.” In Computing Surveys 41, 3, pages 1–58, (2009).

[25] P. N. Tan, M. Steinbach, and V. Kumar. “Introduction to data mining.” Addison-Wesley, (2005).

[26] M. Ester, H.-Peter Kriegel, J. Sander, X. Xu, et al. “A density-based algorithm for discovering clusters in large spatial databases with noise.” In Kdd, volume 96, pages 226–231, (1996).

[27] S. Guha, N. Mishra, G. Roy, and O. Schrijvers. “Robust random cut forest-based anomaly detection on streams.” In International conference on machine learning, pages 2712–2721. PMLR, (2016).

[28] Achlioptas, Dimitris. "Database-friendly random projections." In Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 274-281. (2001).

[29]https://github.com/getcrest/AutoOut/tree/master/app/outlier_treatment/datasets/csv

[30] https://github.com/jbrownlee/Datasets/tree/master

[31] https://archive.ics.uci.edu/dataset/94/spambase

32] https://github.com/GuansongPang/ADRepository-Anomaly-detection-datasets/tree/main/numerical%20data/DevNet%20datasets.

[33]http://www.svcl.ucsd.edu/projects/anomaly/dataset.htm

[34] https://archive.ics.uci.edu/dataset/171/madelon

[35] P. Rambaud and et.al. "Binary classification vs. anomaly detection on imbalanced tabular medical datasets." In 2023 Congress in Computer Science, Computer Engineering, & Applied Computing (CSCE), pp. 01-05. IEEE, (2023).

[36] F. Melo. “Area under the ROC Curve.” Encyclopedia of systems biology, (2013).

[37] Brzezinski, Dariusz. "Random Similarity Isolation Forests." arXiv preprint arXiv:2502.19122 (2025).

Discrete Rotated Isolation Forest in High Dimensions

References

References

Volume 13, Issue 3July 2025Pages 347-358

Volume 13, Issue 3
July 2025
Pages 347-358