Abstract: Use of information technologies to analyse big data on SARS-CoV-2 genome provides an insight for tracking variations and examining the evolution of the virus. Nevertheless, storing, processing, alignment and analyses of these numerous genomes are still a challenge. In this study, over 1 million SARS-CoV-2 genomes have been analysed to show distribution and relationship of variations that could enlighten development and evolution of the virus. In all genomes analysed in this study, a total of over 215M SNVs have been detected and average number of SNV per isolate was found to be 21.83. Single nucleotide variant (SNV) average is observed to reach 31.25 just in March 2021. The average variation number of isolates is increasing and compromising with total case numbers around the world. Remarkably, cytosine deamination, which is one of the most important biochemical processes in the evolutionary development of coronaviruses, accounts for 46% of all SNVs seen in SARS-CoV-2 genomes within 16 months. This study is one of the most comprehensive SARS-CoV-2 genomic analysis study in terms of number of genomes analysed in an academic publication so far, and reported results could be useful in monitoring the development of SARS-CoV-2.

Keywords: SARS-CoV-2, COVID-19, mutation, variation, genomic analysis, single nucleotide variant (SNV), 1 million genomes

Full Text: PDF