Determine the platformn and coding¶
Analyze the format and quality for each of the
read files and determine the sequencing platform and version.
You might use guess_seq_format and calculate_stats from seq_crumbs or fastqc. Take into account that the solid reads might be double encoded.
Clean adapters and contaminants¶
We have some
Illumina reads that might have adapters in them due to the cloning process.
Do they really have
These reads also come from a RNA-Seq experiment and might have
ribosomic RNA that we should filter out before doing any further analysis.
Try to assess and solve these problems with fastqc and seq_crumbs.
Evaluate the quality, look for overrepresented k-mers and take a look at the base composition of the first bases of the reads with fastqc or seq_crumbs.
To filter the small reads you could use filter_by_length (from seq_crumbs) and to remove the ribosomic reads filter_by_blast (seq_crumbs).
Split 454 matepairs¶
We have received some 454 titanium mate pair
First we need to split both ends of the mate pair for each reads and then we have to trim using a quality criteria, filter out the small read segments (smaller than 40 base pairs).
Finally we need to obtain two files with the interleaved forward and reverse fragments of the reads that still have both and another file with the orphaned segments.
In seq_crumbs there are scripts for all steps.
We have recived 20000 human mithocondrial
reads from a yoruba individual.
The reads are Illumina paired end and we want to assemble them:
- Reads Quality Assessment.
- Clean the reads.
- Assemble the reads with velvet.
- Calculate statistics for the result.
- Take a look at the contigs.
We have already seen how to assemble a mithocondrial genome using velvet.
Map to the mithocondrial assembly¶
As the result of the mithocondrial assembly that we have just done with we obtained a fasta file with the scaffolds. Map the reads used to create the assembly using bwa to the scaffolds. Once you have the mapping note how many reads had been mapped and open the mapping IGV.
Human Mithocondrial SNP calling¶
We want to compare the mithocondrial DNA of a yoruba individual with the
reference cambrigde mithocondrial sequence.
We have 40000 Illumnia
Do two SNP callings one with bowtie2 and samtools and another with ngs_backbone and compare them.
Why ngs_backbone does not find any SNP at the first attempt? Modify the backbone.conf parameters related with the quality and look for the SNPs again (e.j. min_quality=21, min_reads_for_allele=10).
You can compare the SNP callings with bedtools or grep.
Take a look at the SNPs with IGV.
Obtain the SNP flanking sequence¶
Search for the common SNPs of the two previous SNP calls and create a fasta file with 100 pair bases around each SNP.
You can use the intersectBed, flankBed and fastaFromBed tools.
Cucurbita EST SNP calling¶
We have ordered two 454 and two Illumina reads from a Cucurbita transcriptome.
The 454 sequences came from two individuals (indi1 and indi2) and the Illumina came from a mix of different individuals.
The sequencing service has send the
reads in sanger fastq format.
- Do a quality check of the reads provided by the sequencing service.
- Clean the reads and check the cleaning process.
- Map the reads.
- Call the SNPs.
There a walkthroughs for some of these tasks.