Turkish Journal of Electrical Engineering and Computer Sciences
DOI
10.3906/elk-1304-165
Abstract
We introduce a fast bitwise exact pattern-matching algorithm, which speeds up short-length pattern searches on large-sized DNA databases. Our contributions are two-fold. First, we introduce a novel exact matching algorithm designed specifically for modern processor architectures. Second, we conduct a detailed comparative performance analysis of bitwise exact matching algorithms by utilizing hardware counters. Our algorithmic technique is based on condensed bitwise operators and multifunction variables, which minimize register spills and instruction counts during searches. In addition, the technique aims to efficiently utilize CPU branch predictors and to ensure smooth instruction flow through the processor pipeline. Analyzing letter occurrence probability estimations for DNA databases, we develop a skip mechanism to reduce memory accesses. For comparison, we exploit the complete \textit{Mus musculus} sequence, a commonly used DNA sequence that is larger than 2 GB. Compared to five state-of-the-art pattern-matching algorithms, experimental results show that our technique outperforms the best algorithm even for the worst-case DNA pattern for our technique.
Keywords
Computer architecture, bitwise string match, short DNA pattern, packed variables, 32-bit word
First Page
1405
Last Page
1417
Recommended Citation
ÖZCAN, GIYASETTİN and ÜNSAL, OSMAN SABRİ
(2015)
"Fast bitwise pattern-matching algorithm for DNA sequences on modern hardware,"
Turkish Journal of Electrical Engineering and Computer Sciences: Vol. 23:
No.
5, Article 16.
https://doi.org/10.3906/elk-1304-165
Available at:
https://journals.tubitak.gov.tr/elektrik/vol23/iss5/16
Included in
Computer Engineering Commons, Computer Sciences Commons, Electrical and Computer Engineering Commons