Methylome Pseudo-bulk

Prepare the Snakemake file for generating pseudo-bulk files

After single-cell level analysis, the next step is to merge the single-cell methylome (ALLC files) into the pseudo-bulk level to increase the whole genome coverage. This allows us to generate the genome browser and identify potential regulatory elements at high genomic resolution.

Input

  • single-cell ALLC files from mapping

  • cell group labels from the metadata or clustering analysis

These two kinds of inputs need to be organized in a Tab-separated ALLC table like this:

NO HEADER

/path/to/allc/file/cell1.allc.tsv.gz

group1

/path/to/allc/file/cell2.allc.tsv.gz

group1

/path/to/allc/file/cell3.allc.tsv.gz

group2

/path/to/allc/file/cell4.allc.tsv.gz

group2

...

...

Generate Snakefile

Just like mapping, we use the snakemake to execute all the specific commands. YAP is just helping prepare a Snakefile for the whole process to

  1. Generate a group-merged pseudo-bulk ALLC file for each group. These ALLC files have the same format as single-cell ALLC files. All the base counts in the same position are added together.

  2. Generate BigWig files for each pseudo-bulk ALLC file. The mC context and bin size of the BigWig are provided by you.

  3. OPTIONAL: Extract mCG sites from the merged ALLC. These files can be used to call CpG Differential Methylated Regions (CG-DMRs)

To generate the Snakefile, just run

After running yap mc-bulk , there will be Snakefile in the output_dir, this Snakefile contains all the steps to generate the output files.

Execute Snakefile

Single command

To execute the Snakefile, simply run

Snakemake will parallel all the jobs for you.

Execute in batches

If you are merging a large number of files, you may want to execute the Snakefile in multiple batches and parallel each batch to speed up merging.

To execute the Snakefile in batches, simply run

You can any number of batches between 2 and N-cell-groups, just be consistent among your commands so no group is missed. Snakemake will automatically determine subsets to run in each command.

These commands can be executed in parallel, for example, if you separated five batches like above, you can submit five qsub/sbatch jobs to parallel them.

Last updated

Was this helpful?