Authors: IJAZ HUSSAIN, SOHAIL ASGHAR
Abstract: Author name ambiguity may occur when multiple authors share the same name or different name variations of a single author exist. This degrades search results and correct attributions in bibliographic databases. Existing solutions require either the actual number of ambiguous authors or extra information that is collected from the Web. However, in many scenarios, obtaining such auxiliary information is not possible or requires much extra effort. An effective and scalable method, ASONET, is proposed that uses graph community detection algorithms and graph operations to disambiguate namesakes. The citation dataset is preprocessed and ambiguous author blocks are formed. A graph structural clustering, gSkeletonClu, is applied to identify hubs, outliers, and clusters of nodes in a coauthor's graph. Namesakes are resolved by splitting these clusters across the hub if their feature vector similarity is less than a predefined threshold. ASONET utilizes only coauthors and titles that are surely available in all bibliographic databases. To validate the ASONET performance, experiments are performed on two real-world datasets of Arnetminer and DBLP. The results confirm that ASONET is scalable and outperforms baselines.
Keywords: Author name disambiguation, namesakes, graph structural clustering, community detection
Full Text: PDF