Document Type : Applied Article

Authors

School of Computer Engineering, Iran University of Science and Technology, Tehran, Iran.

Abstract

Fraud in financial data is a significant concern for both businesses and individuals. Credit card transactions involve numerous features, some of which may lack relevance for classifiers and could lead to overfitting. A pivotal step in the fraud detection process is feature selection, which profoundly impacts model accuracy and execution time. In this paper, we introduce an ensemble-based, explainable feature selection framework founded on SHAP and LIME algorithms, called "X-SHAoLIM". We applied our framework to diverse combinations of the best models from previous studies, conducting both quantitative and qualitative comparisons with other feature selection methods. The quantitative evaluation of the "X-SHAoLIM" framework across various model combinations revealed consistent accuracy improvements on average, including increases in Precision (+5.6), Recall (+1.5), F1-Score (+3.5), and AUC-PR (+6.75). Beyond enhanced accuracy, our proposed framework, leveraging explainable algorithms like SHAP and LIME, provides a deeper understanding of features' importance in model predictions, delivering effective explanations to system users.

Keywords

Main Subjects

[1] S. Mittal and S. Tyagi, “Performance Evaluation of Machine Learning Algorithms for Credit Card Fraud Detection,” IEEE Xplore, Jan. 01, 2019.
 
[2] S. Beigi and M. R. Amin Naseri, “Credit Card Fraud Detection using Data mining and Statistical Methods,” Journal of AI and Data Mining, vol. 8, no. 2, pp. 149–160, Apr. 2020.
 
[3] R. Yan, Y. Liu, R. Jin, and A. Hauptmann, “On predicting rare classes with SVM ensembles in scene classification,” IEEE Xplore, Apr. 01, 2003.
 
[4] N. V. Chawla, N. Japkowicz, and A. Kotcz, “Editorial,” ACM SIGKDD Explorations Newsletter, Vol. 6, No. 1, p. 1, Jun. 2004.
 
[5] Y. Russac, O. Caelen, and L. He-Guelton, “Embeddings of Categorical Variables for Sequential Data in Fraud Context,” The International Conference on Advanced Machine Learning Technologies and Applications (AMLTA2018), pp. 542–552, 2018.
 
[6] C. Guo and F. Berkhahn, “Entity Embeddings of Categorical Variables,” arXiv.org, Apr. 22, 2016.
 
[7] S. Bhattacharyya, S. Jha, K. Tharakunnel, and J. C. Westland, “Data mining for credit card fraud: A comparative study,” Decision Support Systems, Vol. 50, No. 3, pp. 602–613, Feb. 2011.
 
[8] A. Adadi and M. Berrada, “Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI),” IEEE Access, Vol. 6, pp. 52138–52160, 2018.
 
[9] Małgorzata Magdziarczyk, “RIGHT TO BE FORGOTTEN IN LIGHT OF REGULATION (EU) 2016/679 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL OF 27 APRIL 2016 ON THE PROTECTION OF NATURAL PERSONS WITH REGARD TO THE PROCESSING OF PERSONAL DATA AND ON THE FREE MOVEMENT OF SUCH DATA, AND REPEALING DIRECTIVE 95/46/EC,” SGEM International Multidisciplinary Scientific Conferences on Social Sciences and Arts, Apr. 2019.
 
[10] R. Omobolaji Alabi, A. Almangush, M. Elmusrati, I. Leivo, and A. A. Mäkitie, “An interpretable machine learning prognostic system for risk stratification in oropharyngeal cancer,” International Journal of Medical Informatics, Vol. 168, p. 104896, Dec. 2022.
 
[11] Rasheed Omobolaji Alabi, M. Elmusrati, Ilmo Leivo, Alhadi Almangush, and Antti Mäkitie, “Machine learning explainability in nasopharyngeal cancer survival using LIME and SHAP,” Scientific Reports, Vol. 13, No. 1, Jun. 2023.
 
[12] A. Gramegna and P. Giudici, “SHAP and LIME: An Evaluation of Discriminative Power in Credit Risk,” Frontiers in Artificial Intelligence, Vol. 4, Sep. 2021.‏
 
[13] I. Sohony, R. Pratap, and U. Nambiar, “Ensemble learning for credit card fraud detection,” Proceedings of the ACM India Joint International Conference on Data Science and Management of Data - CoDS-COMAD ’18, 2018.
 
[14] A. Dal Pozzolo, O. Caelen, Y.-A. Le Borgne, S. Waterschoot, and G. Bontempi, “Learned lessons in credit card fraud detection from a practitioner perspective,” Expert Systems with Applications, Vol. 41, No. 10, pp. 4915–4928, Aug. 2014.
 
[15] V. Van Vlasselaer et al., “APATE: A novel approach for automated credit card transaction fraud detection using network-based extensions,” Decision Support Systems, vol. 75, pp. 38–48, Jul. 2015.
 
[16] M. Zareapoor and P. Shamsolmoali, “Application of Credit Card Fraud Detection: based on Bagging Ensemble Classifier,” Procedia Computer Science, Vol. 48, pp. 679–685, 2015.
 
[17] S. D. Penmetsa and S. Mohammed, “Ensemble Techniques for Credit Card Fraud Detection,” International Journal of Smart Business and Technology, Vol. 9, No. 2, pp. 33–48, Sep. 2021.
 
[18] K. Randhawa, C. K. Loo, M. Seera, C. P. Lim, and A. K. Nandi, “Credit Card Fraud Detection using AdaBoost and Majority Voting,” IEEE Access, Vol. 6, pp. 14277–14284, 2018.
 
[19] E. Figuerola Ullastres, “Credit Card Fraud Detection using Ensemble Learning Algorithms,” norma.ncirl.ie. May 30, 2022.
 
[20] A. Correa Bahnsen, D. Aouada, A. Stojanovic, and B. Ottersten, “Feature engineering strategies for credit card fraud detection,” Expert Systems with Applications, Vol. 51, pp. 134–142, Jun. 2016.
 
[21] Y. K. Saheed, M. A. Hambali, M. O. Arowolo, and Y. A. Olasupo, “Application of GA Feature Selection on Naive Bayes, Random Forest and SVM for Credit Card Fraud Detection,” 2020 International Conference on Decision Aid Sciences and Application (DASA), Nov. 2020.
 
[22]. E. Ileberi, Y. Sun, and Z. Wang, “A machine learning based credit card fraud detection using the GA algorithm for feature selection,” Journal of Big Data, Vol. 9, No. 1, Feb. 2022.
 
[23] Bharat Kumar Padhi, S. Chakravarty, B. Naik, Radha Mohan Pattanayak, and H. Das, “RHSOFS: Feature Selection Using the Rock Hyrax Swarm Optimization Algorithm for Credit Card Fraud Detection System,” Vol. 22, No. 23, pp. 9321–9321, Nov. 2022.
 
[24] van, Process Mining Handbook. Springer Nature.
 
[25] W. Rizzi, C. Di Francescomarino, and F. M. Maggi, “Explainability in Predictive Process Monitoring: When Understanding Helps Improving,” Lecture Notes in Business Information Processing, pp. 141–158, 2020.
 
[26] R. Sindhgatta, C. Ouyang, and C. Moreira, “Exploring Interpretability for Predictive Process Analytics,” Service-Oriented Computing, pp. 439–447, 2020.
 
[27] I. Psychoula, A. Gutmann, P. Mainali, S. H. Lee, P. Dunphy, and F. Petitcolas, “Explainable Machine Learning for Fraud Detection,” Computer, vol. 54, no. 10, pp. 49–59, Oct. 2021.
 
[28] W. E. Marcilio and D. M. Eler, “From explanations to feature selection: assessing SHAP values as feature selection mechanism,” 2020 33rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), Nov. 2020.
 
[29] T.-Y. Wu and Y.-T. Wang, “Locally Interpretable One-Class Anomaly Detection for Credit Card Fraud Detection,” IEEE Xplore, Nov. 01, 2021.
 
[30] Kaggle, “Credit Card Fraud Detection,” www.kaggle.com, 2018. https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud.
 
[31] K. Shenoy, “Credit Card Transactions Fraud Detection Dataset,” Kaggle.com, 2019. https://www.kaggle.com/datasets/kartik2112/fraud-detection‌.