Methylome Pseudo-bulk
Prepare the Snakemake file for generating pseudo-bulk files
After single-cell level analysis, the next step is to merge the single-cell methylome (ALLC files) into the pseudo-bulk level to increase the whole genome coverage. This allows us to generate the genome browser and identify potential regulatory elements at high genomic resolution.
Input
single-cell ALLC files from mapping
cell group labels from the metadata or clustering analysis
These two kinds of inputs need to be organized in a Tab-separated ALLC table like this:
NO HEADER
/path/to/allc/file/cell1.allc.tsv.gz
group1
/path/to/allc/file/cell2.allc.tsv.gz
group1
/path/to/allc/file/cell3.allc.tsv.gz
group2
/path/to/allc/file/cell4.allc.tsv.gz
group2
...
...
Generate Snakefile
Just like mapping, we use the snakemake
to execute all the specific commands. YAP is just helping prepare a Snakefile for the whole process to
Generate a group-merged pseudo-bulk ALLC file for each group. These ALLC files have the same format as single-cell ALLC files. All the base counts in the same position are added together.
Generate BigWig files for each pseudo-bulk ALLC file. The mC context and bin size of the BigWig are provided by you.
OPTIONAL: Extract mCG sites from the merged ALLC. These files can be used to call CpG Differential Methylated Regions (CG-DMRs)
To generate the Snakefile, just run
After running yap mc-bulk
, there will be Snakefile in the output_dir
, this Snakefile contains all the steps to generate the output files.
Execute Snakefile
Single command
To execute the Snakefile, simply run
Snakemake will parallel all the jobs for you.
Execute in batches
If you are merging a large number of files, you may want to execute the Snakefile in multiple batches and parallel each batch to speed up merging.
To execute the Snakefile in batches, simply run
You can any number of batches between 2 and N-cell-groups, just be consistent among your commands so no group is missed. Snakemake will automatically determine subsets to run in each command.
These commands can be executed in parallel, for example, if you separated five batches like above, you can submit five qsub/sbatch jobs to parallel them.
Currently, the merging step using allcools merge
is still very I/O intensive, especailly when each of your group contains hundreds of cells. It can easily cause very high server load with high number of cores. Please monitor your jobs and use appropreate number of cores.
Last updated