Turkish Journal of Electrical Engineering and Computer Sciences
Abstract
In this paper, a new representation of Farsi words is proposed to present the keyword spotting problems in Farsi document image retrieval. In this regard, we define a signature for each Farsi word based on the word connected component layout. The mentioned signature is shown as boxes, and then, by sketching vertical and horizontal lines, we construct a grid of each word to provide a new descriptor. One of the advantages of this method is that it can be used for both handwritten and machine-printed texts. Finally, to evaluate the performance of our system in comparison to other methods, a database that contains 19,582 printed Farsi words is examined, and after applying this approach, a recall rate of 98.1 % and a precision rate of 94.3 % are obtained.
DOI
10.3906/elk-1804-92
Keywords
Farsi document image retrieval, word spotting, word layout signature, optical character recognition
First Page
1477
Last Page
1488
Recommended Citation
ERGÜN, C, & NOROZPOUR, S (2019). Farsi document image recognition system using word layout signature. Turkish Journal of Electrical Engineering and Computer Sciences 27 (2): 1477-1488. https://doi.org/10.3906/elk-1804-92
Included in
Computer Engineering Commons, Computer Sciences Commons, Electrical and Computer Engineering Commons