YoVDO

WGS Variant Calling - Variant Calling with GATK - Part 1 - Detailed NGS Analysis Workflow

Offered By: Bioinformagician via YouTube

Tags

Bioinformatics Courses Quality Control Courses Bash Scripting Courses

Course Description

Overview

Dive into a comprehensive tutorial on variant calling from whole genome sequencing (WGS) data using the GATK best practice workflow. Learn how to set up a pipeline in bash (Linux) to pre-process and align reads, ultimately generating a VCF file. Follow step-by-step instructions for quality control with FastQC, alignment using BWA-MEM, marking duplicate reads, performing Base Quality Score Recalibration (BQSR), and calling variants with HaplotypeCaller. Gain insights into the intuition behind each step, runtime expectations, and memory requirements. Access provided code, data sources, and additional resources to enhance your understanding of SAM file formats, SAM flags, and VCF file formats. Perfect for bioinformaticians and researchers looking to master variant calling techniques in genomic analysis.

Syllabus

Intro
Aim & Intuition behind variant calling
What is GATK?
Somatic vs Germline variants
GATK best practice workflow steps
Data pre-processing steps - alignment
A note on Read Groups
Data pre-processing steps - mark duplicate reads
Data pre-processing steps - Base Quality Score Recalibrator
Variant discovery
Data used for demonstration
System requirements
Setting up directories
Download data
Download reference fasta, known sites and create supporting files .fai, .dict
Setting directory paths
Step 1: Perform QC - FastQC
Step 2: Align reads - BWA-MEM
Step 3: Mark Duplicate Reads - GATK MarkDuplicatesSpark
Step 4: Base Quality Score Recalibration - GATK BaseRecalibrator + ApplyBQSR
Step 5: Post Alignment QC - GATK CollectAlignmentSummaryMetrics and CollectInsertSizeMetrics
Create multiQC report of post alignment metrics
Step 6: Call variants - GATK HaplotypeCaller


Taught by

bioinformagician

Related Courses

Network Analysis in Systems Biology
Icahn School of Medicine at Mount Sinai via Coursera
Molecular Dynamics for Computational Discoveries in Science
University of Massachusetts Boston via Independent
Biology Meets Programming: Bioinformatics for Beginners
University of California, San Diego via Coursera
Python for Informatics: Exploring Information
Open Education by Blackboard
Genomic Medicine Gets Personal
Georgetown University via edX