About python
Take An Introductory Course
Here I listed prerequisites for continue reading. Don't be overwhelmed by the long list, because a full understanding of the things below is not required, I will also explain them along the book. In the beginning, you only need to know these names and understand their primary usage and purpose. The best way to do so, is take a well organized course or read a book for beginners. I don't want to repeat their message here, because they just did a much better job than me on this.
Here are my recommendations (choose any one of them, they are redundant):
Class 1, 2, 3 of this Coursera specialization: https://www.coursera.org/specializations/python
Introduction to Python 3 by RealPython: https://realpython.com/learning-paths/python3-introduction/
(If you prefer book, this is my first introduction to python, before Coursera becomes popular. This book also has Chinese version.
) Beginning Python: https://www.amazon.com/Beginning-Python-Professional-Magnus-Hetland-ebook/dp/B06XGVVVMG.
It might take several days to finish the introduction contents, but it helps you build up a reliable knowledge graph for future study.
The Python Language
Basic data types: int, float, str
Basic data structure: list, dict, set
Control flow: if, for, break, continue, try...except...
Python built-in functions: range(), any(), all(), enumerate(), dir(), help(), isinstance(), open(), print() and others.
Python built-in modules (just knowing what they are and the general usage, google and youtube can be helpful):
system and file related: pathlib, subprocess, json, gzip, multiprocessing, concurrent.futures
Other enhancement on control flow or data structure: collections, random, itertools
Third-Party Packages
Numpy: The "N-dimensional array" data structure in python, everything related to linear algebra based on this. In other words, everything based on this.
Pandas: The "Excel" in python, handle your tables. Must learn for genome science.
scipy, scikit-learn, and statsmodels: The statistics and basic machine learning packages for Python (and many other applications out of my knowledge). They all contain tons of functions, but here are simple examples on each:
scipy: some basic statistical tests (t, Wilcoxon, fisher_exact), build dendrogram, sparse matrix format.
scikit-learn: PCA, Kmeans, RandomForest, and many other models/algorithms
statsmodels: build linear models, multi-test correction, ANOVA
matplotlib, and seaborn: The must-learn python visualization package. For publication purpose, these two are enough for any figure, your imagination is the limit.
Last updated
Was this helpful?