Finding SNP Sites in the Whole Genome of a Plant Using GATK
As I have discussed in the previous blog, we already have aligned SAM (.sam) file to be further utilized for analyzing genomic variations. In this post, I am going to instruct the steps involved in the analysis of genomic variations among the closely related organisms. In particular, I am presenting the workflow pipeline for the GATK (Genome Analysis Tool Kit) HaplotypeCaller for detecting SNP (Single Nucleotide Polymorphism) sites and their annotation by snpEff . Note: In order to carry out SNP sites detection by GATK, there should be SAM file of the whole genome sequence of an organism, preinstalled GATK software version 4.1.0.1 and above (McKenna et al., 2010) with its accessory packages, Samtools, and snpEff 4.3t (Cingolani et al., 2012) in Linux server. Image 1. a screenshot of a home page of gatk website (source: https://software.broadinstitute.org/gatk/) In brief 1. get the SAM file of the whole gen...