Turkish Journal of Electrical Engineering and Computer Sciences
DOI
10.3906/elk-1511-203
Abstract
In this study, we describe a keyword extraction technique that uses latent semantic analysis (LSA) to identify semantically important single topic words or keywords. We compare our method against two other automated keyword extractors, Tf-idf (term frequency-inverse document frequency) and Metamap, using human-annotated keywords as a reference. Our results suggest that the LSA-based keyword extraction method performs comparably to the other techniques. Therefore, in an incremental update setting, the LSA-based keyword extraction method can be preferably used to extract keywords from text descriptions from big data when compared to existing keyword extraction methods.
Keywords
Bioinformatics, text mining, information retrieval
First Page
1784
Last Page
1794
Recommended Citation
SÜZEK, TUĞBA ÖNAL
(2017)
"Using latent semantic analysis for automated keyword extraction from large document corpora,"
Turkish Journal of Electrical Engineering and Computer Sciences: Vol. 25:
No.
3, Article 15.
https://doi.org/10.3906/elk-1511-203
Available at:
https://journals.tubitak.gov.tr/elektrik/vol25/iss3/15
Included in
Computer Engineering Commons, Computer Sciences Commons, Electrical and Computer Engineering Commons