📗
Essential Python For Genome Science
  • Before Start
  • Chapter Contents
  • Prerequisites
    • About the UNIX system
    • About python
  • UNDERSTAND RAW DATA
    • Stages of Genome Data Generation
    • From Bulk To Single Cell
    • Introduction To the Datasets
      • bulk RNA-seq
      • single-cell data
  • Work Environment
    • Chapter Ensemble
    • All About Installations
    • Keep Running
    • Coding Environment
    • Git and Github
    • Other Tips
  • Python and UNIX System
    • Run Python
    • File I/O
    • Run Shell Command In Python - I
    • 🎉Case Study: Mapping bulk RNA-seq reads with salmon
  • Data Cleaning
    • 🎉Key Concept of Pandas
    • 🎉Case Study: Aggregate Salmon Quant
    • Case Study: Exploring The Dataset 🚩
    • The "copy" and "inplace" Parameter 🚩
    • Case Study: Extract and Reformat GTF file 🚩
    • the correct vs. the wrong way of using pandas 🚩
    • Case Study: Bulk Sample PCA 🚩
  • PYTHON BASICS
    • Python can be lightning-fast ⚡️ 🚩
    • Run Shell Command In Python - II 🚩
    • Pointers In Python 🚩
    • Everything is an object 🚩
    • Thread and Process 🚩
    • Resource For Intermediate Python Knowledge 🚩
    • Python magic method 🚩
  • Genome Science Data
    • NGS Data Formats and Tools 🚩
      • SAM/BAM 🚩
      • BED 🚩
      • GTF 🚩
      • Bigwig / Bigbed 🚩
      • VCF / BCF 🚩
    • The Python Packages 🚩
  • Data visualization
    • Matplotlib Basics 🚩
    • Seaborn Basics 🚩
    • Interactive Data Visualization 🚩
  • Use R in Python
    • Why? 🚩
    • rpy2 🚩
  • Gotchas
    • Check whether package X is installed
    • BAM to FASTQ
    • Genomic Websites
Powered by GitBook
On this page
  • Scope of Writing
  • Chapter 1. Understand Raw Data
  • Chapter 2. Work environment
  • Chapter 3. Data Cleaning
  • Chapter 4. Genome Science Data
  • Chapter 5. Python Basics
  • Chapter 6. Data Visualization
  • Chapter 7. Use R in Python

Was this helpful?

Chapter Contents

Scope of Writing

I want to write a practical coding book, mainly talking about how to prepare the data for analysis, how to handle analysis results for visualization. I will not explain the theory of analysis in detail (e.g., statistics, machine learning, or biological aspects), but I will provide materials when I mention them. I do hope to keep this brief and straightforward, not to write an encyclopedia of genome science analysis 🧐.

Chapter 1. Understand Raw Data

In this chapter, I will talk about process of generating the "raw" data from the Illumina sequencing, with an emphasize on the universal principles of different technologies. I will also introduce the datasets used throughout the book:

  • A bulk RNA-seq dataset from ENCODE

  • A single-cell RNA-seq dataset from Allen Brain Institute

  • A single-cell snmC-seq2 dataset from my research project

Chapter 2. Work environment

In this chapter, I will imagine if I have a new computer or server account, the steps of setting up my work environment.

  • How to install python and all genomic science tools/packages/software?

  • How to do data analysis on the jupyter notebook/lab?

  • Some tips on the system/shell level

Chapter 3. Data Cleaning

Chapter 4. Genome Science Data

In this chapter, I will explain the genome science data format in detail. I will also introduce essential tools associated with each data format, including their python versions!

Chapter 5. Python Basics

In this chapter, I will summarize critical concepts related to the python language, such as "pointer", "everything is an object". I will also list some language skills that prettify your code and significantly improve your efficiency.

Chapter 6. Data Visualization

In this chapter, I will mainly talk about the matplotlib and seaborn package to make publication level figures. I will explain the matplotlib package in detail, and reproduce complex figures line by line from scratch.

Chapter 7. Use R in Python

In this chapter, I will talk about using rpy2 to integrate useful R packages into python.

PreviousBefore StartNextPrerequisites

Last updated 5 years ago

Was this helpful?

In this chapter, I will talk about using pandas with different genome data files. If you do data analysis, you will your data. If you do not know pandas well, data cleaning can take even longer.

The guide books for me in this chapter are "" and "". They change my understanding of the python language.

spend > 60% of your time cleaning
Python Cookbook
Fluent Python