📗
Essential Python For Genome Science
  • Before Start
  • Chapter Contents
  • Prerequisites
    • About the UNIX system
    • About python
  • UNDERSTAND RAW DATA
    • Stages of Genome Data Generation
    • From Bulk To Single Cell
    • Introduction To the Datasets
      • bulk RNA-seq
      • single-cell data
  • Work Environment
    • Chapter Ensemble
    • All About Installations
    • Keep Running
    • Coding Environment
    • Git and Github
    • Other Tips
  • Python and UNIX System
    • Run Python
    • File I/O
    • Run Shell Command In Python - I
    • 🎉Case Study: Mapping bulk RNA-seq reads with salmon
  • Data Cleaning
    • 🎉Key Concept of Pandas
    • 🎉Case Study: Aggregate Salmon Quant
    • Case Study: Exploring The Dataset 🚩
    • The "copy" and "inplace" Parameter 🚩
    • Case Study: Extract and Reformat GTF file 🚩
    • the correct vs. the wrong way of using pandas 🚩
    • Case Study: Bulk Sample PCA 🚩
  • PYTHON BASICS
    • Python can be lightning-fast ⚡️ 🚩
    • Run Shell Command In Python - II 🚩
    • Pointers In Python 🚩
    • Everything is an object 🚩
    • Thread and Process 🚩
    • Resource For Intermediate Python Knowledge 🚩
    • Python magic method 🚩
  • Genome Science Data
    • NGS Data Formats and Tools 🚩
      • SAM/BAM 🚩
      • BED 🚩
      • GTF 🚩
      • Bigwig / Bigbed 🚩
      • VCF / BCF 🚩
    • The Python Packages 🚩
  • Data visualization
    • Matplotlib Basics 🚩
    • Seaborn Basics 🚩
    • Interactive Data Visualization 🚩
  • Use R in Python
    • Why? 🚩
    • rpy2 🚩
  • Gotchas
    • Check whether package X is installed
    • BAM to FASTQ
    • Genomic Websites
Powered by GitBook
On this page
  • Git
  • Github
  • Use git and Github
  • Clone Github Repository of This Book

Was this helpful?

  1. Work Environment

Git and Github

PreviousCoding EnvironmentNextOther Tips

Last updated 5 years ago

Was this helpful?

Git and Github are fundamental tools for CS, which I don't think I've fully explored. Yet, the primary usage of git and Github is still essential for my work, so here I only give you some basic introductions based on what I know and may add much more stuff in the future if I learn more about them.

Minimum things you should know are:

  • Github is the host I used to save and share with you the code and small data files.

  • it's based on git.

  • Use git and Github for each one of your project.

Git

From git :

Git is a free and open-source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.

Git helps you:

  1. manage the history of ALL your files related to a certain package (usually inside a single directory, which is called a repository). It adds the "save" and "rollback" bottom to your whole directory.

  2. Together with Github, it backups your files online in a remote repository.

  3. It allows multiple people to collaborate together. Indeed, most packages hosted on Github has dozens or even hundreds of contributors.

Github

While git controls your local file, Github is the remote host that allows you to upload your local repository into a Github repository. It's free!

Github repo is not intended for hosting large files (> 100 MB). It's usually used for hosting code, text documentation (markdown), Jupyter notebook file, some small dataset. If you save large raw data in the same directory, you can use .gitignore file to exclude them from backing up to Github. Use other ways (google drive, etc.) to back up large files.

Use git and Github

I do not want to explain git in detail, because there are tons of material introducing git and Github for beginners. Here are my suggestions to start:

  • Check your terminal, whether git is installed.

My Github repo is still updating until this tip disappears. If you cloned my repo, usegit pullto update it.

Clone Github Repository of This Book

To get all the data and jupyter notebooks for this book:

$ cd /To/The/Place/You/Want/To/Save/This/Repo/

$ git clone https://github.com/lhqing/py_genome_sci_book.git

$ cd py_genome_sci_book/data/
# This is the data directory

$ cd ../analysis/
# This is the analysis diretroy, will contain .ipynb files

# If you want to update this repository from github, go back to the repo dir
$ cd /To/The/Place/You/Want/To/Save/This/Repo/py_genome_sci_book

$ git pull

I do not recommend direct changing this repository directory, because you may encounter some git errors when you are trying the git pull , if that happened and you can't solve it, delete this directory and rerun git clone. But that means all your custom changes will be gone.

Create a . (This is )

Watch this .

Read this neat introduction:

Once you understand basic git command (git clone, add, commit, push, checkout, merge), you can use for easier controlling.

Clone or fork (in Github, fork means make a duplication of this repo to your Github) .

to prevent adding large file or temp file into your git repo.

home page
Github account
mine
3min video from Github
https://rogerdudler.github.io/git-guide/
Github Desktop - GUI from Github
my Github repo for this book
Learn a bit more about git ignore