An Analysis of Feature Selection Methods for Multiclass Text Classification

MAYANK, KALBHOR and SANJAY, AGARWAL (2018) An Analysis of Feature Selection Methods for Multiclass Text Classification. In: Eighth International Conference On Advances in Computing, Electronics and Electrical Technology - CEET 2018, 03-04 February, 2018, Kuala Lumpur, Malaysia.

[img]
Preview
Text
20180307_105047.pdf - Published Version

Download (726kB) | Preview
Official URL: https://www.seekdl.org/conferences/paper/details/9...

Abstract

To classify objects into different classes, feature plays a vital role. So identification of best features is a backbone of classification process. In text classification, features are simple words, having very large dimension so finding the most appropriate feature set is a big challenge. This paper includes analysis of some feature selection methods for multi class text classification and checks their results on different classifier for an email classification. We run our experiments on 20NewGroups and PU corpora datasets. Experiments are done on some well-known feature selection method like Term Selection, Document Frequency, Mutual Information, Odds Ratio, Chi square and etc. This paper concludes that Mutual Information and Chi square are most appropriate for text classification.

Item Type: Conference or Workshop Item (Paper)
Uncontrolled Keywords: Feature Selection, Text Classification, MultiClass Classification
Depositing User: Mr. John Steve
Date Deposited: 10 Mar 2019 09:19
Last Modified: 10 Mar 2019 09:19
URI: http://publications.theired.org/id/eprint/233

Actions (login required)

View Item View Item