Bioinformatics at COMAV

NGS expression analysis

There are several interesting RNA-Seq reviews: A survey of best practices for RNA-seq data analysis, RNA-seq workflow: gene-level exploratory analysis and differential expression, From RNA-seq reads to differential expression results, Microarrays, deep sequencing and the true measure of the transcriptome, Computational methods for transcriptome annotation and quantification using RNA-seq, Next-generation transcriptome assembly, An opinionated guide to the proper care and feeding of your transcriptome and Evaluation of de novo transcriptome assemblies from RNA-Seq data.

There are also some interesting studies regarding the post-processing, quality of the assembled transcriptomes, the alternative transcript reconstruction success of different algorithms, the evaluation of different assembly strategies.

StringTie reconstruction algorithm.

transrate, a software to evaluate the transcriptome assembly quality.

Practical tasks

Mapping and count sequences

The objective is to map and count redas using an example file of a RNAseq experience with only a sample. We use Hisat2 to map read againt the genome and StringTie to count the reads.

First we download the index files of genome.

ngs_user@machine: mkdir rnaseq
ngs_user@machine:~$ cd rnaseq
ngs_user@machine:~/rnaseq$ unzip

Now, we map the reads.

ngs_user@machine:~/rnaseq$ hisat2 -x sl2_index -U SRR45_region.fastq -S SRR45.sam

We sort and index the bam file to open with IGV.

ngs_user@machine:~/rnaseq/tophat_out$ samtools view -hb -o SRR45.bam -S SRR45.sam

ngs_user@machine:~/rnaseq/tophat_out$ samtools sort SRR45.bam -o SRR45_sort.bam

Now, we can construct the transcripts and the number of reads in each transcript.

ngs_user@machine:~/rnaseq/tophat_out$ stringtie  SRR45_sort.bam  -o SRR45.gtf -A SRR45_exp.txt

To use the gene models as reference, we need also the genome annotation file

ngs_user@machine:~/rnaseq/tophat_out$ stringtie  SRR45_sort.bam -G sl2.gff3 -o SRR45_ref.gtf -A SRR45_exp_ref.txt

Load the bam sorted file, the gtf files and the annotation file sl2.gff3 on the IGV to test the mapping and transcript assembly done with StrinTie. The next steps will be integrate all transcript assemblies of all samples to create a new annotation file and use it to quantify again the reads of each sample. After that we will start the normalization, differential analysis, etc.