Bioinformatics at COMAV

Practical tasks

Mapping and count sequences

The objective is to map and count redas using an example file of a RNAseq experience with only a sample. We use TopHat to map read againt the genome and Cufflinks to count the reads.

First we create an index of genome file.

ngs_user@machine:~/rnaseq$ mkdir rnaseq
ngs_user@machine:~$ cd rnaseq
ngs_user@machine:~/rnaseq$ bowtie2-build sl2.fasta sl2

Now, we map the reads.

ngs_user@machine:~/rnaseq$ tophat sl2 SRR45_region.fastq

ngs_user@machine:~/rnaseq$ cd tophat_out

ngs_user@machine:~/rnaseq/tophat_out$ less align_summary.txt

We sort and index the bam file to open with IGV.

ngs_user@machine:~/rnaseq/tophat_out$ samtools sort accepted_hits.bam -o accepted_hits.sort.bam

ngs_user@machine:~/rnaseq/tophat_out$ samtools index accepted_hits.sort.bam

Now, we can construct the transcripts and the number of reads in each transcript. We need also the genome annotation file

ngs_user@machine:~/rnaseq/tophat_out$ cufflinks -g sl2.gff3 accepted_hits.bam

ngs_user@machine:~/rnaseq/tophat_out$ ls

ngs_user@machine:~/rnaseq/tophat_out$ less transcripts.gtf

ngs_user@machine:~/rnaseq/tophat_out$ less genes.fpkm_tracking

Load the accepted_hits.sort.bam file, the transcripts.gtf and the annotation file sl2.gff3 on the IGV to test the mapping and transcript assembly done with Cufflinks. The next steps will be integrate all transcript assemblies of all samples to create a new annotation file and use it to quantify again the reads of each sample. After that we will start the normalization, differential analysis, etc.

| | index