Case Study: Aggregate Salmon Quant
Last updated
Was this helpful?
Last updated
Was this helpful?
In the , we went through the whole process of mapping 16 bulk RNA-seq samples with salmon, and we got 16 transcript-level quantification tables for each sample. In this page, we want to aggregate the transcript-level counts into gene-level counts () for further Differentially Expressed Gene (DEG) analysis. In order to do so, we will use a R package called "tximport".
For coding, you will learn two things in this case study:
More examples of manipulating table using pandas;
How to integrate R seamlessly include some open-box R packages into python.
In this notebook, we will use the GENCODE GTF file to extract informations related to next step. The same GTF file is used in salmon index when I prepare the data. It is IMPORTANT to keep using the same reference (e.g., genome FASTA, gene annotation GTF) throughout your single project.
We will extract gene_id for each transcript_id from the GTF, then save them into another table.
In this notebook, we will use the R package "tximport" to aggregate transcript quant into gene quant for each sample. This step also provides an example how we can combine R and python in the same notebook, taking advantages from both world!
I also created one large table in the end for the whole dataset, which contains all samples' gene-level information. This is the start point of all our future analysis, just a 6Mb CSV table.