Turkish Journal of Electrical Engineering and Computer Sciences
DOI
10.3906/elk-1804-92
Abstract
In this paper, a new representation of Farsi words is proposed to present the keyword spotting problems in Farsi document image retrieval. In this regard, we define a signature for each Farsi word based on the word connected component layout. The mentioned signature is shown as boxes, and then, by sketching vertical and horizontal lines, we construct a grid of each word to provide a new descriptor. One of the advantages of this method is that it can be used for both handwritten and machine-printed texts. Finally, to evaluate the performance of our system in comparison to other methods, a database that contains 19,582 printed Farsi words is examined, and after applying this approach, a recall rate of 98.1 % and a precision rate of 94.3 % are obtained.
Keywords
Farsi document image retrieval, word spotting, word layout signature, optical character recognition
First Page
1477
Last Page
1488
Recommended Citation
ERGÜN, CEM and NOROZPOUR, SAJEDEH
(2019)
"Farsi document image recognition system using word layout signature,"
Turkish Journal of Electrical Engineering and Computer Sciences: Vol. 27:
No.
2, Article 56.
https://doi.org/10.3906/elk-1804-92
Available at:
https://journals.tubitak.gov.tr/elektrik/vol27/iss2/56
Included in
Computer Engineering Commons, Computer Sciences Commons, Electrical and Computer Engineering Commons