Turkish Journal of Electrical Engineering and Computer Sciences
Abstract
In this study, we describe a keyword extraction technique that uses latent semantic analysis (LSA) to identify semantically important single topic words or keywords. We compare our method against two other automated keyword extractors, Tf-idf (term frequency-inverse document frequency) and Metamap, using human-annotated keywords as a reference. Our results suggest that the LSA-based keyword extraction method performs comparably to the other techniques. Therefore, in an incremental update setting, the LSA-based keyword extraction method can be preferably used to extract keywords from text descriptions from big data when compared to existing keyword extraction methods.
DOI
10.3906/elk-1511-203
Keywords
Bioinformatics, text mining, information retrieval
First Page
1784
Last Page
1794
Recommended Citation
SÜZEK, T. Ö (2017). Using latent semantic analysis for automated keyword extraction from large document corpora. Turkish Journal of Electrical Engineering and Computer Sciences 25 (3): 1784-1794. https://doi.org/10.3906/elk-1511-203
Included in
Computer Engineering Commons, Computer Sciences Commons, Electrical and Computer Engineering Commons