Case Study: Mapping bulk RNA-seq reads with salmon
Last updated
Was this helpful?
Last updated
Was this helpful?
In this book, we will use a bulk RNA-seq data from mouse developing forebrain as an example. In the github repo, I already provided . In this page, we will use a small subset of the data to reproduce the these salmon quant
tables using exact process.
The input are 25 FASTQ files (truncated for demo purpose) downloaded from ENCODE that belong to 16 samples, the output are 16 salmon quant
tables contain transcript-level quantification for each sample.
, the files for this case study located in py_genome_sci_book/analysis/salmon_demo
The analysis contain four main steps, each associated with a jupyter notebook in that directory
On github, you can see the executed version of these notebooks, you should be able to execute each of them in your local environment, and check the content with online version.
.
In this step, we will create salmon index for the mouse GENCODE transcriptome annotation.
In this step, we will go through these 25 FASTQ files, rename them via soft-link with meaningful names, and create a metadata table for them.
In this step, we will trim the FASTQ using trim_galore. These step generate a new set of 25 trimmed FASTQ files so we will also update the metadata table.
When doing analysis, keep your file organized into sub-directory;
Always associate your data files with a metadata table locating besides them, include all necessary informations in that metadata table;
Document all steps into Jupyter Notebook, or at least some files with your commands for each step. Doing analysis is just like doing wet lab experiment ๐งช, we need to write down what we did๐๐
In this step, we will use salmon quant to quantify the 25 trimmed FASTQ files into 16 salmon quant tables. These tables has same layout as the one provided in our, but the numbers are mostly zero, because here we used a much smaller FASTQ files as input. For any further analysis, please refer to the .