Turkish Journal of Electrical Engineering and Computer Sciences

A word spotting method for Farsi machine-printed document images

Abstract

In this paper, a word spotting approach for Farsi printed document images has been presented. The main idea of the paper is the font recognition of Farsi document images and query word modification according to the document image's font before searching. This operation increases the similarity between the query word image and its instances in the document image; therefore, the performance of the word spotting system increases. In the proposed word spotting approach, after the query word modification, the query word image rectangle is searched in the text lines of the document image using XNOR similarity measurement. In order to increase the recall rate, we considered an almost low value as an acceptance/rejection threshold (\delta) and in order to increase precision rate, we used some other features, e.g., number of holes, ascenders, descenders, and dots. With multilevel matching and considering the mentioned features, the problem of justifying the operation (aligning the text to both the left and right) that occurs during the writing of Farsi documents has been solved. This approach was applied on a computer-made dataset consisting of 440 Farsi printed document images, and a precision rate of 97.5% at a recall rate of 92.1% was obtained. Moreover, when applying this approach on a dataset consisting of 224 Farsi scanned document images, a precision rate of 87.6% at recall rate of 79.3% was obtained.

DOI

10.3906/elk-1107-26

Keywords

Farsi document image, font recognition, word spotting, retrieval

First Page

734

Last Page

746

Recommended Citation

POURASAD, Y, HASSIBI, H, & GHORBANI, A (2013). A word spotting method for Farsi machine-printed document images. Turkish Journal of Electrical Engineering and Computer Sciences 21 (3): 734-746. https://doi.org/10.3906/elk-1107-26

Download

Included in

Computer Engineering Commons, Computer Sciences Commons, Electrical and Computer Engineering Commons

COinS

Turkish Journal of Electrical Engineering and Computer Sciences

A word spotting method for Farsi machine-printed document images

Abstract

DOI

Keywords

First Page

Last Page

Recommended Citation

Included in

Issues by Year

Search

Turkish Journal of Electrical Engineering and Computer Sciences

A word spotting method for Farsi machine-printed document images

Authors

Abstract

DOI

Keywords

First Page

Last Page

Recommended Citation

Included in

Share

Issues by Year

Search