Turkish Journal of Electrical Engineering and Computer Sciences

Exploring the power of supervised learning methods for company name disambiguation in microblog posts

NAFİYE POLAT
ALİ ÇAKMAK
RABİA TURAN

DOI

10.3906/elk-1809-167

Abstract

Twitter is an online social networking website where people can post short messages on any subject, and these messages become visible to other users. Users intentionally express their opinions about companies or products via microblogging texts. Analyzing such messages might help explore what customers think about company products, or what the broad feelings of customers are. Identifying tweets referring to products and companies is becoming an important tool recently. However, company names are often vague. Hence, the first step is to locate the messages that are relevant to a company. In this paper, we present a number of supervised learning techniques to decide whether a given tweet is about a company, e.g., whether a message containing the term \textquoteleft amazon\textquoteright is related to the company Amazon Inc. or not. Solving this task is challenging in comparison to the classical classification process. The main difficulty with this problem is that tweets and company names include limited information. To make this task tractable, external resources are used to get richer data about a company. More specifically, we generate several profiles for each organization, which contain richer information. Then we perform feature extraction to obtain both numerical and categorical features and we do feature selection to identify the most relevant attributes with our task. Finally, we train several supervised classifiers. Our constructed classifiers and carefully selected features provide high accuracy on the WePS-3 dataset. Our results show considerable improvement of accuracy by 11 % over baseline approaches.

Keywords

Text processing, name disambiguation, entity resolution, supervised classification, microblogs

First Page

2400

Last Page

2415

Recommended Citation

POLAT, NAFİYE; ÇAKMAK, ALİ; and TURAN, RABİA (2020) "Exploring the power of supervised learning methods for company name disambiguation in microblog posts," Turkish Journal of Electrical Engineering and Computer Sciences: Vol. 28: No. 5, Article 3. https://doi.org/10.3906/elk-1809-167
Available at: https://journals.tubitak.gov.tr/elektrik/vol28/iss5/3

Download

Included in

Computer Engineering Commons, Computer Sciences Commons, Electrical and Computer Engineering Commons

COinS

Turkish Journal of Electrical Engineering and Computer Sciences

Exploring the power of supervised learning methods for company name disambiguation in microblog posts

DOI

Abstract

Keywords

First Page

Last Page

Recommended Citation

Included in

Issues by Year

Search

Turkish Journal of Electrical Engineering and Computer Sciences

Exploring the power of supervised learning methods for company name disambiguation in microblog posts

Authors

DOI

Abstract

Keywords

First Page

Last Page

Recommended Citation

Included in

Share

Issues by Year

Search