Bioinformatics at COMAV

NGS expression analysis

There are several interesting RNA-Seq reviews: From RNA-seq reads to differential expression results, Microarrays, deep sequencing and the true measure of the transcriptome, Computational methods for transcriptome annotation and quantification using RNA-seq, Next-generation transcriptome assembly, An opinionated guide to the proper care and feeding of your transcriptome and Evaluation of de novo transcriptome assemblies from RNA-Seq data.

There are also some interesting studies regarding the post-processing, quality of the assembled transcriptomes, the alternative transcript reconstruction success of different algorithms, the evaluation of different assembly strategies.

StringTie reconstruction algorithm.

transrate, a software to evaluate the transcriptome assembly quality.

Practical tasks

Mapping and count sequences

The objective is to map and count redas using an example file of a RNAseq experience with only a sample. We use TopHat to map read againt the genome and Cufflinks to count the reads.

First we create an index of genome file.

ngs_user@machine:~/rnaseq$ mkdir rnaseq
ngs_user@machine:~/rnaseq$ bowtie2-build sl2.fasta sl2

Now, we map the reads.

ngs_user@machine:~/rnaseq$ tophat sl2 SRR45_region.fastq -o SRR45

ngs_user@machine:~/rnaseq$ less ./SRR45/align_summary.txt

We sort and index the bam file to open with IGV.

ngs_user@machine:~/rnaseq$ samtools sort ./SRR45/accepted_hits.bam ./SSR45/accepted_hits_45.sort

ngs_user@machine:~/rnaseq$ samtools index ./SSR45/accepted_hits_45.sort

Now, we can construct the transcripts and the number of reads in each transcript. We need also the genome annotation file

ngs_user@machine:~/rnaseq$ cufflinks -g sl2.gff3 ./SRR45/accepted_hits.bam

ngs_user@machine:~/rnaseq$ ls

ngs_user@machine:~/rnaseq$ less transcripts.gtf

ngs_user@machine:~/rnaseq$ less genes.fpkm_tracking

Load the accepted_hits_45.sort.bam file, the transcripts.gtf and the annotation file sl2.gff3 on the IGV to test the mapping and transcript assembly done with Cufflinks. The next steps will be integrate all transcript assemblies of all samples to create a new annotation file and use it to quantify again the reads of each sample. After that we will start the normalization, differential analysis, etc.

| | index