Document Type : Technical Paper

Authors

1 Department of Computer Engineering, Technical and Vocational University (TVU), Tehran, Iran.

2 Department of Computer Engineering, Faculty of Engineering, Yazd University, Yazd, Iran.

Abstract

Software Cost Estimation (SCE) is one of the most widely used and effective activities in project management. In machine learning methods, some features have adverse effects on accuracy. Thus, preprocessing methods based on reducing non-effective features can improve accuracy in these methods. In clustering techniques, samples are categorized into different clusters according to their semantic similarity. Accordingly, in the proposed study, to improve SCE accuracy, first samples are clustered based on original features. Then, a feature selection (FS) technique is separately done for each cluster. The proposed FS method is based on a combination of filter and wrapper FS methods. The proposed method uses both filter and wrapper advantages in selecting effective features of each cluster, with less computational complexity and more accuracy. Furthermore, as the assessment criteria have significant impacts on wrapper methods, a fused criterion has also been used. The proposed method was applied to Desharnais, COCOMO81, COCONASA93, Kemerer, and Albrecht datasets, and the obtained Mean Magnitude of Relative Error (MMRE) for these datasets were 0.2173, 0.6489, 0.3129, 0.4898 and 0.4245, respectively. These results were compared with previous studies and showed improvement in the error rate of SCE.

Keywords

Main Subjects

[1] S. K. Sehra, Y. S. Brar, N. Kaur and S. S. Sehra, "Research patterns and trends in software effort estimation," Information and Software Technology, vol. 91, pp. 1-21, 2017.
 
[2] S. Sarwar and M. Gupta, "Proposing effort estimation of cocomo ii through perceptron learning rule," International Journal of Computer Applications, vol. 70, no. 1, 2013.
 
[3] P. Pandey, "Analysis of the techniques for software cost estimation," in 2013 Third International Conference on Advanced Computing and Communication Technologies (ACCT), 2013.
 
[4] M. O. Elish, T. Helmy and M. I. Hussain, "Empirical study of homogeneous and heterogeneous ensemble models for software development effort estimation," Mathematical Problems in Engineering, vol. 2013, 2013.
 
[5] E. Papatheocharous, H. Papadopoulos and A. S. Andreou, "Feature subset selection for software cost modelling and estimation," arXiv preprint arXiv:1210.1161, 2012.
 
[6] C. Kirsopp, M. J. Shepperd and J. Hart, "Search heuristics, case-based reasoning and software project effort prediction," in the 4th Annual Conference on Genetic and Evolutionary Computation, Morgan Kaufmann Publishers Inc, 2002.
 
[7] T. Menzies, K. Ammar, A. Nikora and J. DiStefano, "How simple is software defect detection," Submitted to the Emprical Software Engineering Journal, 2003.
 
[8] M. F. Bosu, "Data quality in empirical software engineering: An investigation of time-aware models in software effort estimation (Doctoral dissertation, University of Otago)," Doctoral dissertation, University of Otago, 2016.
 
[9] S. Beiranvand and Z. Chahooki, "Bridging the semantic gap for software effort estimation by hierarchical feature selection techniques," Journal of AI and Data Mining, vol. 4, no. 2, pp. 157-168, 2016.
 
[10] Uc-Cetina and V´ıctor, "Recent Advances in Software Effort Estimation using Machine Learning," arXiv preprint arXiv:2303.03482 (2023).
 
[11] C. A. P. Rodríguez, L. M. S. Martínez, D. H. P. Ordóñez and J. A. T. Peña, "Effort Estimation in Agile Software Development: A Systematic Map Study," INGE CUC, vol. 19, no. 1, 2023.
 
[12] J. Antil and R. Rishi, "SOFTWARE COST ESTIMATION USING TEMPORAL DATA MINING TECHNIQUES: AN OVERVIEW," Journal of Data Acquisition and Processing, vol. 38, no. 2, pp. 2718-2728, 2023.
 
[13] Usman, Muhammad, J. Börstler and K. Petersen, "An effort estimation taxonomy for agile software development," International Journal of Software Engineering and Knowledge Engineering, vol. 27, no. 4, pp. 641-674, 2017.
 
[14] A. Zaid, M. H. Selamat, A. Ghani, R. Atan and K. Wei, "Issues in Software Cost Estimation," International Journal of Computer Science and Network Security, vol. 8, no. 11, pp. 350-356, 2008.
 
[15] T. R. Benala and R. Mall, "DABE: Differential evolution in analogy-based software development effort estimation.," Swarm and Evolutionary Computation, vol. 38, pp. 158-172, 2018.
 
[16] J. Wen, S. Li, Z. Lin, Y. Hu and C. Huang, "Systematic literature review of machine learning based software development effort estimation models," Information and Software Technology, vol. 54, no. 1, pp. 41-59, 2012.
 
[17] S. M. R. Chirra and H. Reza, "A survey on software cost estimation techniques," Journal of Software Engineering and Applications, vol. 12, no. 6, pp. 226-248, 2019.
 
[18] A. Moradbeiky, "FEEM: A Flexible Model based on Artificial Intelligence for Software," Journal of Artificial Intelligence and Data Mining (JAIDM), vol. 11, no. 1, pp. 39-51, 2023.
 
[19] B. Baskeles, B. Turhan and A. Bener, "Software effort estimation using machine learning methods," in 2007 22nd international symposium on computer and information sciences, IEEE, 2007.
 
[20] L. Radlinski, "A survey of bayesian net models for software development effort prediction," International Journal of Software Engineering and Computing, vol. 2, no. 2, pp. 95-109, 2010.
 
[21] A. Chavoya, C. Lopez-Martin, I. R. Andalon-Garcia and M. Meda-Campaña, "Genetic programming as alternative for predicting development effort of individual software projects," PloS one, vol. 7, no. 11, p. e50531, 2012.
 
[22] V. Venkataiah, R. Mohanty and M. Nagaratna, "Prediction of software cost estimation using spiking neural networks," in Smart Intelligent Computing and Applications: Proceedings of the Second International Conference on SCI 2018, Volume 2, Springer, 2019.
 
[23] S. Bibi, I. Stamelos and L. Angelis, "Combining probabilistic models for explanatory productivity estimation," Information and Software Technology, vol. 50, no. 7-8, pp. 656-669, 2008.
 
[24] M. Hosni, A. Idri and A. Abran, "Evaluating filter fuzzy analogy homogenous ensembles for software development effort estimation," Journal of Software: Evolution and Process, vol. 31, no. 2, p. e2117, 2019.
 
[25] O. Jalali, T. Menzies, D. Baker and J. Hihn, "Column pruning beats stratification in effort estimation," in Third International Workshop on Predictor Models in Software Engineering (PROMISE'07: ICSE Workshops 2007), 2007.
 
[26] M. A. Saleem, R. Ahmad, T. Alyas, M. Idrees, A. Farooq, A. S. Khan and K. Ali, "Systematic literature review of identifying issues in software cost estimation techniques," International Journal of Advanced Computer Science and Applications, vol. 10, no. 8, pp. 341-346, 2019.
 
[27] M. A. Christina and C. Banumathy, "Software cost estimation using neuro fuzzy logic Framework," International Journal of Research in Engineering, Science and Management, vol. 2, no. 1, pp. 219-224, 2019.
 
[28] H. Mustapha and N. Abdelwahed, "Investigating the use of random forest in software effort estimation," Procedia computer science, vol. 148, pp. 343-352, 2019.
 
[29] Z. Chen, T. Menzies, D. Port and B. Boehm, "Feature subset selection can improve software cost estimation accuracy," in Proceedings of the 2005 workshop on Predictor Models in Software Engineering, 2005.
 
[30] Q. Song, J. Ni and G. Wang, "A fast clustering-based feature subset selection algorithm for high-dimensional data," IEEE transactions on knowledge and data engineering, vol. 25, no. 1, pp. 1-14, 2011.
 
[31] E. Kocaguneli, T. Menzies, J. Keung, D. Cok and R. Madachy, "Active learning and effort estimation: Finding the essential content of software effort estimation data," IEEE Transactions on software engineering, vol. 39, no. 8, pp. 1040-1053, 2012.
 
[32] T. Menzies, D. Port, Z. Chen and J. Hihn, "Specialization and extrapolation of software cost models," in Proceedings of the 20th IEEE/ACM international Conference on Automated software engineering, ACM, 2005.
 
[33] S.-J. Huang and N.-H. Chiu, "Optimization of analogy weights by genetic algorithm for software effort estimation," Information and software technology, vol. 48, no. 11, pp. 1034-1045, 2006.
 
[34] J. W. Keung, B. A. Kitchenham and D. R. Jeffery, "Analogy-X: providing statistical inference to analogy-based software cost estimation," IEEE Transactions on Software Engineering, vol. 34, no. 4, pp. 471-484, 2008.
 
[35] M. Auer, A. Trendowicz, B. Graser, E. Haunschmid and S. Biffl, "Optimal project feature weights in analogy-based cost estimation: Improvement and limitations," IEEE Transactions on Software Engineering, vol. 32, no. 9, pp. 83-92, 2006.
 
[36] T. S. Sethi, C. V. Hari, B. Kaushal and A. Sharma, "Cluster analysis & Pso for software cost estimation," in Information Technology and Mobile Communication: International Conference, AIM 2011, Nagpur, Maharashtra, Springer Berlin Heidelberg, 2011.
 
[37] M. J. Madari and M. Niazi, "Improve software effort estimation using information entropy," International Journal of Computer Science Issues (IJCSI), vol. 16, no. 2, pp. 17-22, 2019.
 
[38] L. L. Minku and X. Yao, "Ensembles and locality: Insight on improving software effort estimation," Information and Software Technology, vol. 55, no. 8, pp. 1512-1528, 2013.
 
[39] Huang, Sun-Jen, N.-H. Chiu and Y.-J. Liu, "A comparative evaluation on the accuracies of software effort estimates from clustered data," Information and Software Technology, vol. 50, no. 9-10, pp. 879-888, 2008.
 
[40] A. Sharma and N. Chaudhary, "Prediction of Software Effort by Using Non-Linear Power Regression for Heterogeneous Projects Based on Use case Points and Lines of code," Procedia Computer Science, vol. 218, p. 1601–1611, 2023.
 
[41] S. Hameed, Y. Elsheikh and M. Azzeh, "An Optimized Case-Based Software Project Effort Estimation Using Genetic Algorithm," Information and Software Technology, vol. 153, p. 107088, 2023.
 
[42] S. Hameed, Y. Elsheikh and M. Azzeh, "An Optimized Case-Based Software Project Effort Estimation Using Genetic Algorithm," Information and Software Technology, vol. 153, p. 107088, 2023.
 
[43] S. S. ALI, J. REN, K. ZHANG, J. WU and C. LIU, "Heterogeneous Ensemble Model to Optimize Software Effort Estimation Accuracy," IEEE Access, vol. 11, pp. 27759-27792, 2013.
 
[44] J. Aroba, J. J. Cuadrado-Gallego, M.-Á. Sicilia, I. Ramos and E. Garcia-Barriocanal, "Segmented software cost estimation models based on fuzzy clustering," Journal of Systems and Software, vol. 81, no. 11, pp. 1944-1950, 2008.
 
[45] B. Ghotra, S. McIntosh and A. E. Hassan, "Revisiting the impact of classification techniques on the performance of defect prediction models," in 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, IEEE, IEEE.
 
[46] T. S. Sethi, C. Hari, B. Kaushal and A. Sharma, "Cluster Analysis and Pso for Software Cost Estimation," in Information Technology and Mobile Communication: International Conference, AIM 2011, Nagpur, Maharashtra, India, April 21-22, 2011. Proceedings. Springer Berlin Heidelberg, 2011., Springer, 2011.
 
[47] J. Rodriguez, L. Kuncheva and C. Alonso, "Rotation forest: A new classifier ensemble method," IEEE transactions on pattern analysis and machine intelligence, vol. 28, no. 10, pp. 1619-1630, 2006.
 
[48] P. Nerurkara, A. Shirkeb, M. Chandanec and S. Bhirud, "Empirical Analysis of Data Clustering Algorithms," in 6th International Conference on Smart Computing and Communications, Kurukshetra, India, 2017.
 
[49] V. Khatibi, D. N. Jawawi and E. Khatibi, "Increasing the accuracy of analogy based software development effort estimation using neural networks," International Journal of Computer and Communication Engineering, vol. 2, no. 1, p. 78, 2013.
 
[50] K. Dejaeger, W. Verbeke, D. Martens and B. Baesens, "Data mining techniques for software effort estimation: a comparative study," IEEE transactions on software, vol. 38, no. 2, pp. 375-397, 2011.
 
[51] A. C. Pocock, "Feature selection via joint likelihood," Doctoral dissertation, University of Manchester, 2012.
 
[52] V. K. Bardsiri, D. N. A. Jawawi, S. Z. M. Hashim and E. Khatibi, "Increasing the accuracy of software development effort estimation using projects clustering," IET software, vol. 6, no. 6, pp. 461-473, 2012.
 
[53] Y.-S. K.-A. Y. a. D.-H. B. Seo, "An empirical analysis of software effort estimation with outlier elimination," in Proceedings of the 4th international workshop on Predictor models in software engineering, 2008.
 
[54] T. R. Benala, S. Dehuri, S. C. Satapathy and S. Madhurakshara, "Genetic algorithm for optimizing functional link artificial neural network based software cost estimation," in Proceedings of the international conference on information systems design and intelligent applications, 2012.
 
[55] G. Nagpal, M. Uddin and A. Kaur, "A hybrid technique using grey relational analysis and regression for software effort estimation using feature selection," International Journal of Soft Computing and Engineering, vol. 1, no. 6, pp. 20-27, 2012.
 
[56] M. Azzeh, D. Neagu and P. I. Cowling, "Analogy-based software effort estimation using Fuzzy numbers," Journal of Systems and Software, vol. 84, no. 2, pp. 270-284, 2011.
 
[57] A. Kumar, B. Patro and B. K. Singh, "Parameter Tuning for Software Effort Estimation Using Particle Swarm Optimization Algorithm," Int. J. Appl. Eng. Res, vol. 14, no. 2, pp. 139-144, 2019.
 
[58] I. U. Rehman, Z. Ali and Z. Jan, "An Empirical Analysis on Software Development Efforts Estimation in Machine Learning Perspective," ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal, vol. 10, no. 3, pp. 227-240, 2021.
 
[59] V. Resmi and K. Anitha, "Software Effort Estimation using Machine Learning Techniques," A Journal of Physical Sciences, Engineering and Technology, vol. 15, no. 01, pp. 86-90, 2023.
 
[60] M. B. Dowlatshahi, M. A. Zare-Chahooki, S. Beiranvand and A. Hashemi, "GKRR: A gravitational-based kernel ridge regression for software development effort estimation," Journal of Mahani Mathematical Research, vol. 11, no. 3, pp. 147-174, 2022.
 
[61] E. Praynlin, "Using meta-cognitive sequential learning Neuro-fuzzy inference system to estimate software development effort," Journal of Ambient Intelligence and Humanized Computing, vol. 12, no. 9, pp. 8763-8776, 2021.
 
[62] A. Moradbeiky, V. K. Bardsiri and M. Jafari, "Open Hybrid Model: A New Ensemble Model for Software Development Cost Estimation," Computing and Informatics, vol. 39, no. 6, pp. 1148-1171, 2020.
 
[63] Z. shahpar, V. K. Bardsiri and A. K. Bardsiri, "Hybrid PSO-SA approach for feature weighting in analogy-based software project effort estimation," Journal of AI and Data Mining, vol. 9, no. 3, pp. 329-340, 2021.
 
[64] M. U. A. K. Geeta Nagpal, "A Hybrid Technique using Grey Relational Analysis and Regression for Software Computing and Engineering, vol. 1, no. 6, pp. 20-27, 2012.
 
[65] B. P. B. K. S. Ashok Kumar, "Parameter Tuning for Software Effort Estimation Using Particle Swarm Optimization Algorithm," International Journal of Applied Engineering Research, vol. 14, no. 2, pp. 139-144, 2019.
 
[66] M. A. I. a. A. A. Hosni, "Evaluating filter fuzzy analogy homogenous ensembles for software development effort estimation," Journal of Software: Evolution and Process, vol. 31, no. 2, p. e2117, 2019.
 
[67] Z. Chen, T. Menzies, D. Port and D. Boehm, "Finding the right data for software cost modeling," IEEE software, vol. 22, no. 6, pp. 38-46, 2005.
 
[68] A. Sharma and N. Chaudhary, "Prediction of Software Effort by Using Non-Linear Power Regression for Heterogeneous Projects Based on Use case Points and Lines of code," Procedia Computer Science 218, pp. 1601-1611, 2023.