Chapter Contents

Scope of Writing

I want to write a practical coding book, mainly talking about how to prepare the data for analysis, how to handle analysis results for visualization. I will not explain the theory of analysis in detail (e.g., statistics, machine learning, or biological aspects), but I will provide materials when I mention them. I do hope to keep this brief and straightforward, not to write an encyclopedia of genome science analysis 🧐.

Chapter 1. Understand Raw Data

In this chapter, I will talk about process of generating the "raw" data from the Illumina sequencing, with an emphasize on the universal principles of different technologies. I will also introduce the datasets used throughout the book:

  • A bulk RNA-seq dataset from ENCODE

  • A single-cell RNA-seq dataset from Allen Brain Institute

  • A single-cell snmC-seq2 dataset from my research project

Chapter 2. Work environment

In this chapter, I will imagine if I have a new computer or server account, the steps of setting up my work environment.

  • How to install python and all genomic science tools/packages/software?

  • How to do data analysis on the jupyter notebook/lab?

  • Some tips on the system/shell level

Chapter 3. Data Cleaning

In this chapter, I will talk about using pandas with different genome data files. If you do data analysis, you will spend > 60% of your time cleaningarrow-up-right your data. If you do not know pandas well, data cleaning can take even longer.

Chapter 4. Genome Science Data

In this chapter, I will explain the genome science data format in detail. I will also introduce essential tools associated with each data format, including their python versions!

Chapter 5. Python Basics

In this chapter, I will summarize critical concepts related to the python language, such as "pointer", "everything is an object". I will also list some language skills that prettify your code and significantly improve your efficiency.

The guide books for me in this chapter are "Python Cookbookarrow-up-right" and "Fluent Pythonarrow-up-right". They change my understanding of the python language.

Chapter 6. Data Visualization

In this chapter, I will mainly talk about the matplotlib and seaborn package to make publication level figures. I will explain the matplotlib package in detail, and reproduce complex figures line by line from scratch.

Chapter 7. Use R in Python

In this chapter, I will talk about using rpy2 to integrate useful R packages into python.

Last updated

Was this helpful?