H.6.3.2. Feature evaluation and selection
Sayyed Mohammad Hoseini; Majid Ebtia; Mohanna Dehgardi
Abstract
The abundance of high dimensional datasets and the computational limitations of data analysis processes in applying to high-dimensional data have made clear the importance of developing feature selection methods. The negative impact of irrelevant variables on prediction and increasing unnecessary calculations ...
Read More
The abundance of high dimensional datasets and the computational limitations of data analysis processes in applying to high-dimensional data have made clear the importance of developing feature selection methods. The negative impact of irrelevant variables on prediction and increasing unnecessary calculations due to the redundant attributes lead to poor results or performance of the classifiers. Feature selection is, therefore, applied to facilitate a better understanding of the datasets, reduce computational time, and enhance prediction accuracy. In this research, we develop a composite method for feature selection that combines support vector machines and principal component analysis. Then the method is implemented to the -nearest neighbor and the Naïve Bayes algorithms. The datasets utilized in this study consist of three from the UCI Machine Learning Repository, used to assess the performance of the proposed models. Additionally, a dataset gathered from the central library of Ayatollah Boroujerdi University was considered. This dataset encompasses 1,910 instances with 30 attributes, including gender, native status, entry term, faculty code, cumulative GPA, and the number of books borrowed. After applying the proposed feature selection method, an accuracy of 70% was obtained with only five features. Experimental results demonstrate that the proposed feature selection method chooses appropriate feature subset. The approach yields enhanced classification performance, as evaluated by metrics such as accuracy, -score and Matthews correlation coefficient.
C.1. General
L. khalvati; M. Keshtgary; N. Rikhtegar
Abstract
Information security and Intrusion Detection System (IDS) plays a critical role in the Internet. IDS is an essential tool for detecting different kinds of attacks in a network and maintaining data integrity, confidentiality and system availability against possible threats. In this paper, a hybrid approach ...
Read More
Information security and Intrusion Detection System (IDS) plays a critical role in the Internet. IDS is an essential tool for detecting different kinds of attacks in a network and maintaining data integrity, confidentiality and system availability against possible threats. In this paper, a hybrid approach towards achieving high performance is proposed. In fact, the important goal of this paper is generating an efficient training dataset. To exploit the strength of clustering and feature selection, an intensive focus on intrusion detection combines the two, so the proposed method is using these techniques too. At first, a new training dataset is created by K-Medoids clustering and Selecting Feature using SVM method. After that, Naïve Bayes classifier is used for evaluating. The proposed method is compared with another mentioned hybrid algorithm and also 10-fold cross validation. Experimental results based on KDD CUP’99 dataset show that the proposed method has better accuracy, detection rate and also false alarm rate than others.