πŸŽ‰Case Study: Aggregate Salmon Quant

In the last case study, we went through the whole process of mapping 16 bulk RNA-seq samples with salmon, and we got 16 transcript-level quantification tables for each sample. In this page, we want to aggregate the transcript-level counts into gene-level counts (the reason is described in this paperarrow-up-right) for further Differentially Expressed Gene (DEG) analysis. In order to do so, we will use a R package called "tximport".

For coding, you will learn two things in this case study:

  1. More examples of manipulating table using pandas;

  2. How to integrate R seamlessly include some open-box R packages into python.

Step 1. Prepare transcript2gene map

See Jupyter Notebookarrow-up-right

In this notebook, we will use the GENCODE GTF file to extract informations related to next step. The same GTF file is used in salmon index when I prepare the data. It is IMPORTANT to keep using the same reference (e.g., genome FASTA, gene annotation GTF) throughout your single project.

We will extract gene_id for each transcript_id from the GTF, then save them into another table.

Step 2. Transcript Quant To Gene Quant

See Jupyter Notebookarrow-up-right

In this notebook, we will use the R package "tximport" to aggregate transcript quant into gene quant for each sample. This step also provides an example how we can combine R and python in the same notebook, taking advantages from both world!

I also created one large table in the end for the whole dataset, which contains all samples' gene-level information. This is the start point of all our future analysis, just a 6Mb CSV table.

Last updated

Was this helpful?