Turkish Journal of Electrical Engineering and Computer Sciences
In this paper, a new representation of Farsi words is proposed to present the keyword spotting problems in Farsi document image retrieval. In this regard, we define a signature for each Farsi word based on the word connected component layout. The mentioned signature is shown as boxes, and then, by sketching vertical and horizontal lines, we construct a grid of each word to provide a new descriptor. One of the advantages of this method is that it can be used for both handwritten and machine-printed texts. Finally, to evaluate the performance of our system in comparison to other methods, a database that contains 19,582 printed Farsi words is examined, and after applying this approach, a recall rate of 98.1 % and a precision rate of 94.3 % are obtained.
Farsi document image retrieval, word spotting, word layout signature, optical character recognition
ERGÜN, CEM and NOROZPOUR, SAJEDEH
"Farsi document image recognition system using word layout signature,"
Turkish Journal of Electrical Engineering and Computer Sciences: Vol. 27:
2, Article 56.
Available at: https://journals.tubitak.gov.tr/elektrik/vol27/iss2/56
Computer Engineering Commons, Computer Sciences Commons, Electrical and Computer Engineering Commons