Coding Environment

Python Environments

There are multiple ways to code and run python. Some people may only use a text editor and the original python interpreter; some may use the IDEs; for data-analysis-oriented coding, I highly suggest using the Jupyter notebook.

Python interpreter

$ python
Python 3.7.6 | packaged by conda-forge | (default, Mar 23 2020, 22:45:16)
[Clang 9.0.1 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> print("hello world!")
hello world!

This is the basic python interpreter. I do not use this directly for production, but it is used by all other environments bellow, they make this basic interpreter more convenient to use. I will explain what a python interpreter is in another page.

ipython kernel built on top of the python interpreter, providing a more convenient way to run python. It is the default python kernel used by Jupyter notebook. I do not use it directly via shell. I use it with its high-level GUI, the Jupyter Notebook.

Jupyter notebook or lab

Jupyter notebook is my only working environment for daily analysis. Here are the main reasons I like it:

Jupyter Labarrow-up-right is another kind of GUI form Jupyter, which is termed as "next-generation", check that out if you like, but I haven't fully switched to that in 2020, maybe I will, when it's more mature.

circle-info

For analysis, I use Jupyter notebook, Jupyter notebook uses ipython kernel, ipython kernel uses the python interpreter.

IDE

IDE is a more comprehensive developing environment. I use PyCharmarrow-up-right when I develop python packages for the basic infrastructure of my project. But I do not use it for daily analysis or this book. When you are getting more familiar with programming and starting to build your tools, you will need to know more about PyCharmarrow-up-right.

circle-info

For package development, I use PyCharm, PyCharm uses the python interpreter.

Start Jupyter Notebook

Jupyter Notebook has a web server, you start and keep the server process on running, and use it through a web browser:

By default, http://localhost:8888/arrow-up-right is the URL to open your jupyter notebook navigator. The token is just a temporary password. Setting up a real password for jupyter is more convenient, it's explained herearrow-up-right:

circle-info

I suggest you use a named screen for jupyter command because it needs to be always alive when doing your analysis.

ipynb file

Here is a simple example of jupyter notebookarrow-up-right. When you save it from jupyter notebook page, it becomes a .ipynb file, which is a special JSON formatarrow-up-right that contains all your code, markdown notes, and other metadata.

It can be viewed in:

Notebook extensions

Jupyter notebook comes with rich third-party extensions that make it highly efficient, in the installation step, I added the jupyter_contrib_nbextensions package, which is a collection of most jupyter notebook extensions. Because it's installed, you can see an additional panel called Nbextensions in your jupyter navigator. You can activate any extensions from here. Not all of them are useful, but here are the ones I used:

Spend a few minutes to read those ones come with documentation and customize your jupyter notebook environment!

Often times, after finishing an analysis in a jupyter notebook, we want to rerun it with different input files or test different parameters. Copy notebook files and changing all the value is tedious and error-prone. Here is a great tool that solves this problem! With simple modifications, it turns any notebook into a parameterized pipeline:

circle-info

Papermill is not very good at taking complex input (such as a dict with special object in it), better use simple data type (number, string, simple list etc.) in the parameter cells, if you really have complex needs, I usually solve that in two ways:

  • Write additional logic determine complex input from simple parameter in another cell

  • Use more sophisticated config file such as an .ini or .yaml and only provide their path as a parameter to papermill. I hardly find this is necessary.

Last updated

Was this helpful?