기본 분석 수행하기
Bacterial Genome data 분석
We이제 will몇 now가지 run샘플 some코드를 sample실행해 code.보겠습니다.
First, let’s check our tools:
which bwa
Output shows where bwa is installed.
which samtools
Output shows where samtools is installed.
Basic Bacterial Genome Sequence Analysis
- Get a reference sequence:
mkdir -p /tmp/outbreaks/SG-M1
cd /tmp/outbreaks/SG-M1
wget ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/001/275/545/GCF_001275545.2_ASM127554v2/GCF_001275545.2_ASM127554v2_genomic.fna.gz
gunzip GCF_001275545.2_ASM127554v2_genomic.fna.gz
mv GCF_001275545.2_ASM127554v2_genomic.fna SG-M1.fna
- Map and call SNPs:
Note: For an annotation of the programs used below and other bioinformatics tools, check out our course github page.
Reference indexing
bwa index SG-M1.fna
Mapping
bwa mem SG-M1.fna /tmp/fastq/SRR6327950/SRR6327950_1.fastq /tmp/fastq/SRR6327950/SRR6327950_2.fastq | samtools view -bS - > SRR6327950.bam
BAM Sorting
samtools sort SRR6327950.bam -o SRR6327950-sort.bam
BAM Indexing
samtools index SRR6327950-sort.bam
Variant calling
lofreq faidx SG-M1.fna
lofreq call -f SG-M1.fna -r NZ_CP012419.2:400000-500000 SRR6327950-sort.bam > SRR6327950-400k.vcf
Mapping takes ~5 min on a t2.medium. Sorting takes ~2 min. Running lofreq on this limited section of the genome takes ~1 min.
- Assembly (runs ~4 min then will run out of RAM if you’re on a t2.medium):
spades.py -t 2 -1 /tmp/fastq/SRR6327950/SRR6327950_1.fastq.gz -2 /tmp/fastq/SRR6327950/SRR6327950_2.fastq.gz -o SRR6327950_spades
NOTE: This assembly above will complete on a t3a.large and takes about 5 hours.
Excellent!훌륭합니다! This이 is a pretty routine task that can easily be run on an작업은 AWS EC2 instance.인스턴스에서 As쉽게 experienced실행할 when수 conducting있는 the매우 assembly일상적인 in작업입니다. Step3단계에서 3,어셈블리를 selecting수행할 the때 right경험했듯이, machine작업에 for적합한 the머신을 job선택하는 is것은 incredibly매우 important.중요합니다. IfRAM이나 you디스크 run공간이 out부족하면 of작업이 RAM중단될 or수 space있습니다. on다행히도 your인스턴스 disk,유형을 your변경하거나 job may quit. Luckily, these can be easily addressed by changing your instance type or by attaching another다른 EBS volume볼륨을 to머신에 your연결하면 machine.이러한 문제를 쉽게 해결할 수 있습니다.