Turkish Journal of Electrical Engineering and Computer Sciences
Using latent semantic analysis for automated keyword extraction from large document corpora
In this study, we describe a keyword extraction technique that uses latent semantic analysis (LSA) to identify semantically important single topic words or keywords. We compare our method against two other automated keyword extractors, Tf-idf (term frequency-inverse document frequency) and Metamap, using human-annotated keywords as a reference. Our results suggest that the LSA-based keyword extraction method performs comparably to the other techniques. Therefore, in an incremental update setting, the LSA-based keyword extraction method can be preferably used to extract keywords from text descriptions from big data when compared to existing keyword extraction methods.
Bioinformatics, text mining, information retrieval
SÜZEK, TUĞBA ÖNAL
"Using latent semantic analysis for automated keyword extraction from large document corpora,"
Turkish Journal of Electrical Engineering and Computer Sciences: Vol. 25:
3, Article 15.
Available at: https://journals.tubitak.gov.tr/elektrik/vol25/iss3/15
Computer Engineering Commons, Computer Sciences Commons, Electrical and Computer Engineering Commons