Chapter Contents

Scope of Writing

I want to write a practical coding book, mainly talking about how to prepare the data for analysis, how to handle analysis results for visualization. I will not explain the theory of analysis in detail (e.g., statistics, machine learning, or biological aspects), but I will provide materials when I mention them. I do hope to keep this brief and straightforward, not to write an encyclopedia of genome science analysis 🧐.

Chapter 1. Understand Raw Data

In this chapter, I will talk about process of generating the "raw" data from the Illumina sequencing, with an emphasize on the universal principles of different technologies. I will also introduce the datasets used throughout the book:

  • A bulk RNA-seq dataset from ENCODE

  • A single-cell RNA-seq dataset from Allen Brain Institute

  • A single-cell snmC-seq2 dataset from my research project

Chapter 2. Work environment

In this chapter, I will imagine if I have a new computer or server account, the steps of setting up my work environment.

  • How to install python and all genomic science tools/packages/software?

  • How to do data analysis on the jupyter notebook/lab?

  • Some tips on the system/shell level

Chapter 3. Data Cleaning

In this chapter, I will talk about using pandas with different genome data files. If you do data analysis, you will spend > 60% of your time cleaning your data. If you do not know pandas well, data cleaning can take even longer.

Chapter 4. Genome Science Data

In this chapter, I will explain the genome science data format in detail. I will also introduce essential tools associated with each data format, including their python versions!

Chapter 5. Python Basics

In this chapter, I will summarize critical concepts related to the python language, such as "pointer", "everything is an object". I will also list some language skills that prettify your code and significantly improve your efficiency.

The guide books for me in this chapter are "Python Cookbook" and "Fluent Python". They change my understanding of the python language.

Chapter 6. Data Visualization

In this chapter, I will mainly talk about the matplotlib and seaborn package to make publication level figures. I will explain the matplotlib package in detail, and reproduce complex figures line by line from scratch.

Chapter 7. Use R in Python

In this chapter, I will talk about using rpy2 to integrate useful R packages into python.

Last updated

Was this helpful?