Turkish Journal of Electrical Engineering and Computer Sciences




Time is an important aspect in temporal information retrieval (TIR), a subfield of information retrieval (IR). Web search engines like Google or Bing are common examples of IR systems. An important constituent of a search engine is news retrieval, where users present their information needs in the form of temporal queries. Users are usually interested in news documents focusing on a particular time period. Existing search engines rarely fulfill the temporal information requirements as they ignore the temporal information available in the content of news documents, also known as document focus time. Furthermore, information related to multiple time periods in a news document makes the identification of document focus time a challenging task. Therefore, it is necessary to classify news documents based on temporal specificity before it is possible to use the temporal information in the retrieval process. In this study, we formulate the temporal specificity problem as a time-based classification task by classifying news documents into three temporal classes, i.e. high temporal specificity, medium temporal specificity, and low temporal specificity. For such classification, rule-based and temporal specificity score (TSS)-based classification approaches are proposed. In the former approach, news documents are classified using a defined set of rules that are based on temporal features. The later approach classifies news documents based on a TSS score using the temporal features. The results of the proposed techniques are compared with four machine learning classification algorithms: Bayes net, support vector machine, random forest, and decision tree. The results show that the proposed rule-based classifier outperforms the four algorithms by achieving 82 % accuracy, whereas TSS classification achieves 77 % accuracy.


Text classification, temporal classification, temporal specificity, temporal information retrieval, specificity score

First Page


Last Page