Skip to main content

기본 분석 수행하기

Bacterial Genome data 분석

We이제 will now가지 run샘플 some코드를 sample실행해 code.보겠습니다.

First, let’s check our tools:

which bwa

Output shows where bwa is installed.

which samtools

Output shows where samtools is installed.

Basic Bacterial Genome Sequence Analysis

  1. Get a reference sequence:
mkdir -p /tmp/outbreaks/SG-M1
cd /tmp/outbreaks/SG-M1
wget ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/001/275/545/GCF_001275545.2_ASM127554v2/GCF_001275545.2_ASM127554v2_genomic.fna.gz
gunzip GCF_001275545.2_ASM127554v2_genomic.fna.gz
mv GCF_001275545.2_ASM127554v2_genomic.fna SG-M1.fna
  1. Map and call SNPs:
    Note: For an annotation of the programs used below and other bioinformatics tools, check out our course github page.

Reference indexing

bwa index SG-M1.fna

Mapping

bwa mem SG-M1.fna /tmp/fastq/SRR6327950/SRR6327950_1.fastq /tmp/fastq/SRR6327950/SRR6327950_2.fastq | samtools view -bS - > SRR6327950.bam

BAM Sorting

samtools sort SRR6327950.bam -o SRR6327950-sort.bam

BAM Indexing

samtools index SRR6327950-sort.bam

Variant calling

lofreq faidx SG-M1.fna
lofreq call -f SG-M1.fna -r NZ_CP012419.2:400000-500000 SRR6327950-sort.bam > SRR6327950-400k.vcf

Mapping takes ~5 min on a t2.medium. Sorting takes ~2 min. Running lofreq on this limited section of the genome takes ~1 min.

  1. Assembly (runs ~4 min then will run out of RAM if you’re on a t2.medium):
spades.py -t 2 -1 /tmp/fastq/SRR6327950/SRR6327950_1.fastq.gz -2 /tmp/fastq/SRR6327950/SRR6327950_2.fastq.gz -o SRR6327950_spades

NOTE: This assembly above will complete on a t3a.large and takes about 5 hours.

Excellent!훌륭합니다! This is a pretty routine task that can easily be run on an작업은 AWS EC2 instance.인스턴스에서 As쉽게 experienced실행할 when conducting있는 the매우 assembly일상적인 in작업입니다. Step3단계에서 3,어셈블리를 selecting수행할 the right경험했듯이, machine작업에 for적합한 the머신을 job선택하는 is것은 incredibly매우 important.중요합니다. IfRAM이나 you디스크 run공간이 out부족하면 of작업이 RAM중단될 or space있습니다. on다행히도 your인스턴스 disk,유형을 your변경하거나 job may quit. Luckily, these can be easily addressed by changing your instance type or by attaching another다른 EBS volume볼륨을 to머신에 your연결하면 machine.이러한 문제를 쉽게 해결할 수 있습니다.