DOI

10.3906/elk-1102-1015

Abstract

Feature selection and extraction are frequently used solutions to overcome the curse of dimensionality in text classification problems. We introduce an extraction method that summarizes the features of the document samples, where the new features aggregate information about how much evidence there is in a document, for each class. We project the high dimensional features of documents onto a new feature space having dimensions equal to the number of classes in order to form the abstract features. We test our method on 7 different text classification algorithms, with different classifier design approaches. We examine performances of the classifiers applied on standard text categorization test collections and show the enhancements achieved by applying our extraction method. We compare the classification performance results of our method with popular and well-known feature selection and feature extraction schemes. Results show that our summarizing abstract feature extraction method encouragingly enhances classification performances on most of the classifiers when compared with other methods.

Keywords

Dimensionality reduction, feature extraction, preprocessing for classification, probabilistic abstract features

First Page

1137

Last Page

1159

Recommended Citation

BİRİCİK, GÖKSEL; DİRİ, BANU; and SÖNMEZ, AHMET COŞKUN (2012) "Abstract feature extraction for text classification," Turkish Journal of Electrical Engineering and Computer Sciences: Vol. 20: No. 7, Article 9. https://doi.org/10.3906/elk-1102-1015
Available at: https://journals.tubitak.gov.tr/elektrik/vol20/iss7/9

Download

Included in

Computer Engineering Commons, Computer Sciences Commons, Electrical and Computer Engineering Commons

COinS

Turkish Journal of Electrical Engineering and Computer Sciences

Abstract feature extraction for text classification

DOI

Abstract

Keywords

First Page

Last Page

Recommended Citation

Included in

Issues by Year

Search

Turkish Journal of Electrical Engineering and Computer Sciences

Abstract feature extraction for text classification

Authors

DOI

Abstract

Keywords

First Page

Last Page

Recommended Citation

Included in

Share

Issues by Year

Search