About python
Last updated
Was this helpful?
Last updated
Was this helpful?
Here I listed prerequisites for continue reading. Don't be overwhelmed by the long list, because a full understanding of the things below is not required, I will also explain them along the book. In the beginning, you only need to know these names and understand their primary usage and purpose. The best way to do so, is take a well organized course or read a book for beginners. I don't want to repeat their message here, because they just did a much better job than me on this.
Here are my recommendations (choose any one of them, they are redundant):
Class 1, 2, 3 of this Coursera specialization:
Introduction to Python 3 by RealPython:
(If you prefer book, this is my first introduction to python, before Coursera becomes popular. This book also has Chinese version.
) Beginning Python: .
It might take several days to finish the introduction contents, but it helps you build up a reliable knowledge graph for future study.
Basic data types: int, float, str
Basic data structure: list, dict, set
Control flow: if, for, break, continue, try...except...
: range(), any(), all(), enumerate(), dir(), help(), isinstance(), open(), print() and others.
Python built-in modules (just knowing what they are and the general usage, google and youtube can be helpful):
system and file related: pathlib, subprocess, json, gzip, multiprocessing, concurrent.futures
: re
Other enhancement on control flow or data structure: collections, random, itertools
scipy: some basic statistical tests (t, Wilcoxon, fisher_exact), build dendrogram, sparse matrix format.
scikit-learn: PCA, Kmeans, RandomForest, and many other models/algorithms
statsmodels: build linear models, multi-test correction, ANOVA
: The built-in modules come with python installation, and can be considered part of python language. Python is a versatile language for almost all programming application, so does its built-in modules. Not all modules are necessary for daily genome science.
: The "N-dimensional array" data structure in python, everything related to linear algebra based on this. In other words, everything based on this.
: The "Excel" in python, handle your tables. Must learn for genome science.
, , and : The statistics and basic machine learning packages for Python (and many other applications out of my knowledge). They all contain tons of functions, but here are simple examples on each:
, and : The must-learn python visualization package. For publication purpose, these two are enough for any figure, your imagination is the limit.
: explain pandas usage in 10 mins maybe an hour...
: This youtube video explains all packages listed here.
: The Seaborn documentation is a beautiful tutorial not only for the package, but also for the data visualization principles.