Bioinformatics at COMAV

Practical tasks

Determine the platformn and coding

Analyze the format and quality for each of the read files and determine the sequencing platform and version.


You might use fastqc. Take into account that the solid reads might be double encoded.

Clean adapters and contaminants

We have some 454 and Illumina reads that might have adapters in them due to the cloning process. Do they really have adapters? These reads also come from a RNA-Seq experiment and might have ribosomic RNA that we should filter out before doing any further analysis. Try to assess these problems with fastqc.


Evaluate the quality, look for overrepresented k-mers and take a look at the base composition of the first bases of the reads with fastqc.

To remove the adaptor you could use cutadapt, scythe or fastx.

To trim the quality you could use fastx, sequence cleaner, or sickle.

Mithocondrial assembly

We have recived 20000 human mithocondrial reads from a yoruba individual. The reads are Illumina paired end and we want to assemble them:

  1. Reads Quality Assessment.

  2. Clean the reads.

  3. Assemble the reads with velvet.

  4. Calculate statistics for the result.

  5. Take a look at the contigs.


We have already seen how to assemble a mithocondrial genome using velvet.

Map to the mithocondrial assembly

As the result of the mithocondrial assembly that we have just done with we obtained a fasta file with the scaffolds. Map the reads used to create the assembly using bwa to the scaffolds. Once you have the mapping note how many reads had been mapped and open the mapping IGV.

Human Mithocondrial SNP calling

We want to compare the mithocondrial DNA of a yoruba individual with the reference cambrigde mithocondrial sequence. We have 40000 Illumnia reads. Do two SNP callings one with bowtie2 and samtools and another with ngs_backbone and compare them.


Why ngs_backbone does not find any SNP at the first attempt? Modify the backbone.conf parameters related with the quality and look for the SNPs again (e.j. min_quality=21, min_reads_for_allele=10).

You can compare the SNP callings with bedtools or grep.

Take a look at the SNPs with IGV.

Obtain the SNP flanking sequence

Search for the common SNPs of the two previous SNP calls and create a fasta file with 100 pair bases around each SNP.


You can use the intersectBed, flankBed and fastaFromBed tools.

Cucurbita EST SNP calling

We have ordered two 454 and two Illumina reads from a Cucurbita transcriptome. The 454 sequences came from two individuals (indi1 and indi2) and the Illumina came from a mix of different individuals. The sequencing service has send the reads in sanger fastq format.

  1. Do a quality check of the reads provided by the sequencing service.

  2. Clean the reads and check the cleaning process.

  3. Map the reads.

  4. Call the SNPs.