📗
Essential Python For Genome Science
  • Before Start
  • Chapter Contents
  • Prerequisites
    • About the UNIX system
    • About python
  • UNDERSTAND RAW DATA
    • Stages of Genome Data Generation
    • From Bulk To Single Cell
    • Introduction To the Datasets
      • bulk RNA-seq
      • single-cell data
  • Work Environment
    • Chapter Ensemble
    • All About Installations
    • Keep Running
    • Coding Environment
    • Git and Github
    • Other Tips
  • Python and UNIX System
    • Run Python
    • File I/O
    • Run Shell Command In Python - I
    • 🎉Case Study: Mapping bulk RNA-seq reads with salmon
  • Data Cleaning
    • 🎉Key Concept of Pandas
    • 🎉Case Study: Aggregate Salmon Quant
    • Case Study: Exploring The Dataset 🚩
    • The "copy" and "inplace" Parameter 🚩
    • Case Study: Extract and Reformat GTF file 🚩
    • the correct vs. the wrong way of using pandas 🚩
    • Case Study: Bulk Sample PCA 🚩
  • PYTHON BASICS
    • Python can be lightning-fast ⚡️ 🚩
    • Run Shell Command In Python - II 🚩
    • Pointers In Python 🚩
    • Everything is an object 🚩
    • Thread and Process 🚩
    • Resource For Intermediate Python Knowledge 🚩
    • Python magic method 🚩
  • Genome Science Data
    • NGS Data Formats and Tools 🚩
      • SAM/BAM 🚩
      • BED 🚩
      • GTF 🚩
      • Bigwig / Bigbed 🚩
      • VCF / BCF 🚩
    • The Python Packages 🚩
  • Data visualization
    • Matplotlib Basics 🚩
    • Seaborn Basics 🚩
    • Interactive Data Visualization 🚩
  • Use R in Python
    • Why? 🚩
    • rpy2 🚩
  • Gotchas
    • Check whether package X is installed
    • BAM to FASTQ
    • Genomic Websites
Powered by GitBook
On this page
  • Python Environments
  • Python interpreter
  • ipython kernel
  • Jupyter notebook or lab
  • IDE
  • Start Jupyter Notebook
  • ipynb file
  • Notebook extensions
  • Papermill

Was this helpful?

  1. Work Environment

Coding Environment

PreviousKeep RunningNextGit and Github

Last updated 5 years ago

Was this helpful?

Python Environments

There are multiple ways to code and run python. Some people may only use a text editor and the original python interpreter; some may use the IDEs; for data-analysis-oriented coding, I highly suggest using the Jupyter notebook.

Python interpreter

$ python
Python 3.7.6 | packaged by conda-forge | (default, Mar 23 2020, 22:45:16)
[Clang 9.0.1 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> print("hello world!")
hello world!

This is the basic python interpreter. I do not use this directly for production, but it is used by all other environments bellow, they make this basic interpreter more convenient to use. I will explain what a python interpreter is in .

ipython kernel built on top of the python interpreter, providing a more convenient way to run python. It is the default python kernel used by Jupyter notebook. I do not use it directly via shell. I use it with its high-level GUI, the Jupyter Notebook.

Jupyter notebook or lab

Jupyter notebook is my only working environment for daily analysis. Here are the main reasons I like it:

  • Like a lab notebook, it integrates all your code and annotation, and figures in one file. Everything is self-explanatory.

  • It executes not only python code, but also shell commands. Using different , it can also execute many other languages like R or even . For python, the ipython kernel is used.

  • Abundant make my life much easier.

For analysis, I use Jupyter notebook, Jupyter notebook uses ipython kernel, ipython kernel uses the python interpreter.

IDE

For package development, I use PyCharm, PyCharm uses the python interpreter.

Start Jupyter Notebook

Jupyter Notebook has a web server, you start and keep the server process on running, and use it through a web browser:

# remember to go into your environment that installed jupyter notebook and all other packages
$ conda activate genome_book
(genome_book)

# Start jupyter is just one command:
$ jupyter notebook
[I 17:44:42.958 NotebookApp] [jupyter_nbextensions_configurator] enabled 0.4.1
[I 17:44:44.049 NotebookApp] JupyterLab extension loaded from /Users/hq/miniconda3/envs/genome_book/lib/python3.7/site-packages/jupyterlab
[I 17:44:44.049 NotebookApp] JupyterLab application directory is /Users/hq/miniconda3/envs/genome_book/share/jupyter/lab
[I 17:44:44.060 NotebookApp] Serving notebooks from local directory: /Users/hq/Documents/pkg/py_genome_sci_book/analysis/hello_world
[I 17:44:44.060 NotebookApp] The Jupyter Notebook is running at:
[I 17:44:44.060 NotebookApp] http://localhost:8888/?token=3bf94daea7b735bb796af59ea4a31e9c8061b47632600af8
[I 17:44:44.060 NotebookApp]  or http://127.0.0.1:8888/?token=3bf94daea7b735bb796af59ea4a31e9c8061b47632600af8
[I 17:44:44.061 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 17:44:44.067 NotebookApp]

    To access the notebook, open this file in a browser:
        file:///Users/hq/Library/Jupyter/runtime/nbserver-19557-open.html
    Or copy and paste one of these URLs:
        http://localhost:8888/?token=3bf94daea7b735bb796af59ea4a31e9c8061b47632600af8
     or http://127.0.0.1:8888/?token=3bf94daea7b735bb796af59ea4a31e9c8061b47632600af8
$ jupyter notebook password
Enter password:  ****
Verify password: ****
[NotebookPasswordApp] Wrote hashed password to /Users/you/.jupyter/jupyter_notebook_config.json

ipynb file

It can be viewed in:

  • The jupyter notebook webserver

Notebook extensions

Spend a few minutes to read those ones come with documentation and customize your jupyter notebook environment!

Often times, after finishing an analysis in a jupyter notebook, we want to rerun it with different input files or test different parameters. Copy notebook files and changing all the value is tedious and error-prone. Here is a great tool that solves this problem! With simple modifications, it turns any notebook into a parameterized pipeline:

Papermill is not very good at taking complex input (such as a dict with special object in it), better use simple data type (number, string, simple list etc.) in the parameter cells, if you really have complex needs, I usually solve that in two ways:

  • Write additional logic determine complex input from simple parameter in another cell

  • Use more sophisticated config file such as an .ini or .yaml and only provide their path as a parameter to papermill. I hardly find this is necessary.

is another kind of GUI form Jupyter, which is termed as "next-generation", check that out if you like, but I haven't fully switched to that in 2020, maybe I will, when it's more mature.

IDE is a more comprehensive developing environment. I use when I develop python packages for the basic infrastructure of my project. But I do not use it for daily analysis or this book. When you are getting more familiar with programming and starting to build your tools, you will need to know more about .

By default, is the URL to open your jupyter notebook navigator. The token is just a temporary password. Setting up a real password for jupyter is more convenient, it's explained :

I suggest you for jupyter command because it needs to be always alive when doing your analysis.

Here is a simple example of . When you save it from jupyter notebook page, it becomes a .ipynb file, which is a special that contains all your code, markdown notes, and other metadata.

, a website rendering any notebooks from public github repositories.

, a GUI app for jupyter

Jupyter notebook comes with rich third-party extensions that make it highly efficient, , I added the jupyter_contrib_nbextensions package, which is a collection of most jupyter notebook extensions. Because it's installed, you can see an additional panel called Nbextensions in your jupyter navigator. You can activate any extensions from here. Not all of them are useful, but here are the ones I used:

Step 1: write a notebook template

Step 2: write an or use the to automatically execute the template notebook with different parameters!

Jupyter Lab
PyCharm
PyCharm
http://localhost:8888/
here
jupyter notebook
JSON format
nbviewer
nteract
Papermill
with a parameter cell
execution notebook
command line interface
another page
ipython kernel
kernels
run python and R in a same notebook
notebook extensions
use a named screen
in the installation step