Practical tasks¶
Determine the platformn and coding¶
Analyze the format and quality for each of the read files
and determine the sequencing platform and version.
Hints¶
You might use fastqc. Take into account that the solid reads might be double encoded.
Clean adapters and contaminants¶
We have some 454
and Illumina
reads that might have adapters in them due to the cloning process.
Do they really have adapters
?
These reads also come from a RNA-Seq experiment and might have ribosomic RNA
that we should filter out before doing any further analysis.
Try to assess these problems with fastqc.
Hints¶
Evaluate the quality, look for overrepresented k-mers and take a look at the base composition of the first bases of the reads with fastqc.
To remove the adaptor you could use cutadapt, scythe or fastx.
To trim the quality you could use fastx, sequence cleaner, or sickle.
Mithocondrial assembly¶
We have recived 20000 human mithocondrial reads
from a yoruba individual.
The reads are Illumina paired end and we want to assemble them:
Reads Quality Assessment.
Clean the reads.
Assemble the reads with velvet.
Calculate statistics for the result.
Take a look at the contigs.
Hint¶
We have already seen how to assemble a mithocondrial genome using velvet.
Map to the mithocondrial assembly¶
As the result of the mithocondrial assembly that we have just done with we obtained a fasta file with the scaffolds. Map the reads used to create the assembly using bwa to the scaffolds. Once you have the mapping note how many reads had been mapped and open the mapping IGV.
Human Mithocondrial SNP calling¶
We want to compare the mithocondrial DNA of a yoruba individual with the reference
cambrigde mithocondrial sequence.
We have 40000 Illumnia reads
.
Do two SNP callings one with bowtie2 and samtools and another with ngs_backbone and compare them.
Hints¶
Why ngs_backbone does not find any SNP at the first attempt? Modify the backbone.conf parameters related with the quality and look for the SNPs again (e.j. min_quality=21, min_reads_for_allele=10).
You can compare the SNP callings with bedtools or grep.
Take a look at the SNPs with IGV.
Obtain the SNP flanking sequence¶
Search for the common SNPs of the two previous SNP calls and create a fasta file with 100 pair bases around each SNP.
Hints¶
You can use the intersectBed, flankBed and fastaFromBed tools.
Cucurbita EST SNP calling¶
We have ordered two 454 and two Illumina reads from a Cucurbita transcriptome.
The 454 sequences came from two individuals (indi1 and indi2) and the Illumina came from a mix of different individuals.
The sequencing service has send the reads
in sanger fastq format.
Do a quality check of the reads provided by the sequencing service.
Clean the reads and check the cleaning process.
Map the reads.
Call the SNPs.