H.6.3.2. Feature evaluation and selection
Farhad Abedinzadeh Torghabeh; Yeganeh Modaresnia; Seyyed Abed Hosseini
Abstract
Various data analysis research has recently become necessary in to find and select relevant features without class labels using Unsupervised Feature Selection (UFS) approaches. Despite the fact that several open-source toolboxes provide feature selection techniques to reduce redundant features, data ...
Read More
Various data analysis research has recently become necessary in to find and select relevant features without class labels using Unsupervised Feature Selection (UFS) approaches. Despite the fact that several open-source toolboxes provide feature selection techniques to reduce redundant features, data dimensionality, and computation costs, these approaches require programming knowledge, which limits their popularity and has not adequately addressed unlabeled real-world data. Automatic UFS Toolbox (Auto-UFSTool) for MATLAB, proposed in this study, is a user-friendly and fully-automatic toolbox that utilizes several UFS approaches from the most recent research. It is a collection of 25 robust UFS approaches, most of which were developed within the last five years. Therefore, a clear and systematic comparison of competing methods is feasible without requiring a single line of code. Even users without any previous programming experience may utilize the actual implementation by the Graphical User Interface (GUI). It also provides the opportunity to evaluate the feature selection results and generate graphs that facilitate the comparison of subsets of varying sizes. It is freely accessible in the MATLAB File Exchange repository and includes scripts and source code for each technique. The link to this toolbox is freely available to the general public on: bit.ly/AutoUFSTool
H.6.3.2. Feature evaluation and selection
E. Enayati; Z. Hassani; M. Moodi
Abstract
Breast cancer is one of the most common cancer in the world. Early detection of cancers cause significantly reduce in morbidity rate and treatment costs. Mammography is a known effective diagnosis method of breast cancer. A way for mammography screening behavior identification is women's awareness evaluation ...
Read More
Breast cancer is one of the most common cancer in the world. Early detection of cancers cause significantly reduce in morbidity rate and treatment costs. Mammography is a known effective diagnosis method of breast cancer. A way for mammography screening behavior identification is women's awareness evaluation for participating in mammography screening programs. Todays, intelligence systems could identify main factors on specific incident. These could help to the experts in the wide range of areas specially health scopes such as prevention, diagnosis and treatment. In this paper we use a hybrid model called H-BwoaSvm which BWOA is used for detecting effective factors on mammography screening behavior and SVM for classification. Our model is applied on a data set which collected from a segmental analytical descriptive study on 2256 women. Proposed model is operated on data set with 82.27 and 98.89 percent accuracy and select effective features on mammography screening behavior.
H.6.3.2. Feature evaluation and selection
A. Zangooei; V. Derhami; F. Jamshidi
Abstract
Phishing is one of the luring techniques used to exploit personal information. A phishing webpage detection system (PWDS) extracts features to determine whether it is a phishing webpage or not. Selecting appropriate features improves the performance of PWDS. Performance criteria are detection accuracy ...
Read More
Phishing is one of the luring techniques used to exploit personal information. A phishing webpage detection system (PWDS) extracts features to determine whether it is a phishing webpage or not. Selecting appropriate features improves the performance of PWDS. Performance criteria are detection accuracy and system response time. The major time consumed by PWDS arises from feature extraction that is considered as feature cost in this paper. Here, two novel features are proposed. They use semantic similarity measure to determine the relationship between the content and the URL of a page. Since suggested features don't apply third-party services such as search engines result, the features extraction time decreases dramatically. Login form pre-filer is utilized to reduce unnecessary calculations and false positive rate. In this paper, a cost-based feature selection is presented as the most effective feature. The selected features are employed in the suggested PWDS. Extreme learning machine algorithm is used to classify webpages. The experimental results demonstrate that suggested PWDS achieves high accuracy of 97.6% and short average detection time of 120.07 milliseconds.
H.6.3.2. Feature evaluation and selection
Sh kashef; H. Nezamabadi-pour
Abstract
Multi-label classification has gained significant attention during recent years, due to the increasing number of modern applications associated with multi-label data. Despite its short life, different approaches have been presented to solve the task of multi-label classification. LIFT is a multi-label ...
Read More
Multi-label classification has gained significant attention during recent years, due to the increasing number of modern applications associated with multi-label data. Despite its short life, different approaches have been presented to solve the task of multi-label classification. LIFT is a multi-label classifier which utilizes a new strategy to multi-label learning by leveraging label-specific features. Label-specific features means that each class label is supposed to have its own characteristics and is determined by some specific features that are the most discriminative features for that label. LIFT employs clustering methods to discover the properties of data. More precisely, LIFT divides the training instances into positive and negative clusters for each label which respectively consist of the training examples with and without that label. It then selects representative centroids in the positive and negative instances of each label by k-means clustering and replaces the original features of a sample by the distances to these representatives. Constructing new features, the dimensionality of the new space reduces significantly. However, to construct these new features, the original features are needed. Therefore, the complexity of the process of multi-label classification does not diminish, in practice. In this paper, we make a modification on LIFT to reduce the computational burden of the classifier and improve or at least preserve the performance of it, as well. The experimental results show that the proposed algorithm has obtained these goals, simultaneously.
H.6.3.2. Feature evaluation and selection
M. Imani; H. Ghassemian
Abstract
Feature extraction is a very important preprocessing step for classification of hyperspectral images. The linear discriminant analysis (LDA) method fails to work in small sample size situations. Moreover, LDA has poor efficiency for non-Gaussian data. LDA is optimized by a global criterion. Thus, it ...
Read More
Feature extraction is a very important preprocessing step for classification of hyperspectral images. The linear discriminant analysis (LDA) method fails to work in small sample size situations. Moreover, LDA has poor efficiency for non-Gaussian data. LDA is optimized by a global criterion. Thus, it is not sufficiently flexible to cope with the multi-modal distributed data. We propose a new feature extraction method in this paper, which uses the boundary semi-labeled samples for solving small sample size problem. The proposed method, which called hybrid feature extraction based on boundary semi-labeled samples (HFE-BSL), uses a hybrid criterion that integrates both the local and global criteria for feature extraction. Thus, it is robust and flexible. The experimental results with three real hyperspectral images show the good efficiency of HFE-BSL compared to some popular and state-of-the-art feature extraction methods.
H.6.3.2. Feature evaluation and selection
E. Golpar-Rabooki; S. Zarghamifar; J. Rezaeenour
Abstract
Opinion mining deals with an analysis of user reviews for extracting their opinions, sentiments and demands in a specific area, which can play an important role in making major decisions in such area. In general, opinion mining extracts user reviews at three levels of document, sentence and feature. ...
Read More
Opinion mining deals with an analysis of user reviews for extracting their opinions, sentiments and demands in a specific area, which can play an important role in making major decisions in such area. In general, opinion mining extracts user reviews at three levels of document, sentence and feature. Opinion mining at the feature level is taken into consideration more than the other two levels due to orientation analysis of different aspects of an area. In this paper, two methods are introduced for feature extraction. The recommended methods consist of four main stages. At the first stage, opinion-mining lexicon for Persian is created. This lexicon is used to determine the orientation of users’ reviews. The second one is the preprocessing stage including unification of writing, tokenization, creating parts-of-speech tagging and syntactic dependency parsing for documents. The third stage involves the extraction of features using two methods including frequency-based feature extraction and association rule based feature extraction. In the fourth stage, the features and polarities of the word reviews extracted in the previous stage are modified and the final features' polarity is determined. To assess the suggested techniques, a set of user reviews in both scopes of university and cell phone areas were collected and the results of the two methods were compared.
H.6.3.2. Feature evaluation and selection
Maryam Imani; Hassan Ghassemian
Abstract
When the number of training samples is limited, feature reduction plays an important role in classification of hyperspectral images. In this paper, we propose a supervised feature extraction method based on discriminant analysis (DA) which uses the first principal component (PC1) to weight the scatter ...
Read More
When the number of training samples is limited, feature reduction plays an important role in classification of hyperspectral images. In this paper, we propose a supervised feature extraction method based on discriminant analysis (DA) which uses the first principal component (PC1) to weight the scatter matrices. The proposed method, called DA-PC1, copes with the small sample size problem and has not the limitation of linear discriminant analysis (LDA) in the number of extracted features. In DA-PC1, the dominant structure of distribution is preserved by PC1 and the class separability is increased by DA. The experimental results show the good performance of DA-PC1 compared to some state-of-the-art feature extraction methods.