NGS expression analysis¶
There are several interesting RNA-Seq reviews: A survey of best practices for RNA-seq data analysis, RNA-seq workflow: gene-level exploratory analysis and differential expression, From RNA-seq reads to differential expression results, Microarrays, deep sequencing and the true measure of the transcriptome, Computational methods for transcriptome annotation and quantification using RNA-seq, Next-generation transcriptome assembly, An opinionated guide to the proper care and feeding of your transcriptome and Evaluation of de novo transcriptome assemblies from RNA-Seq data.
There are also some interesting studies regarding the post-processing, quality of the assembled transcriptomes, the alternative transcript reconstruction success of different algorithms, the evaluation of different assembly strategies.
StringTie reconstruction algorithm.
transrate, a software to evaluate the transcriptome assembly quality.
Practical tasks¶
Mapping and count sequences¶
The objective is to map and count redas using an example file
of a RNAseq experience with only a sample. We use Hisat2 to map read againt the genome
and StringTie to count the reads.
First we download the index files of genome
.
ngs_user@machine: mkdir rnaseq
ngs_user@machine:~$ cd rnaseq
ngs_user@machine:~/rnaseq$ unzip sl2.index.zip
Now, we map the reads.
ngs_user@machine:~/rnaseq$ hisat2 -x sl2_index -U SRR45_region.fastq -S SRR45.sam
We sort and index the bam file to open with IGV.
ngs_user@machine:~/rnaseq/tophat_out$ samtools view -hb -o SRR45.bam -S SRR45.sam
ngs_user@machine:~/rnaseq/tophat_out$ samtools sort SRR45.bam -o SRR45_sort.bam
Now, we can construct the transcripts and the number of reads in each transcript.
ngs_user@machine:~/rnaseq/tophat_out$ stringtie SRR45_sort.bam -o SRR45.gtf -A SRR45_exp.txt
To use the gene models as reference, we need also the genome annotation file
ngs_user@machine:~/rnaseq/tophat_out$ stringtie SRR45_sort.bam -G sl2.gff3 -o SRR45_ref.gtf -A SRR45_exp_ref.txt
Load the bam sorted file, the gtf files and the annotation file sl2.gff3 on the IGV to test the mapping and transcript assembly done with StrinTie. The next steps will be integrate all transcript assemblies of all samples to create a new annotation file and use it to quantify again the reads of each sample. After that we will start the normalization, differential analysis, etc.