File Formats
Base level methylation counts
Last updated
Was this helpful?
Base level methylation counts
Last updated
Was this helpful?
Both ALLC and MCDS files are generated by ALLCools. Read more details about ALLCools .
The ALLC (ALL Cytosine) format is a tab-separated table contain base level methylation and coverage counts. ALLC format is originally defined by , a python package developed in our lab for bulk WGBS-seq data analysis. Each row in an ALLC file corresponds to one cytosine in the genome. An ALLC file contains 7 mandatory columns and no header. YAP uses ALLCools to generate ALLC files compressed and indexed by bgzip
and tabix
from the .
The ALLC file generated by YAP only contains information from a single cell, while the ALLC file can also be merged from multiple single cells (by cluster, etc.) as a bulk-level methylation table.
index
column name
example
note
1
chromosome
chr12
The same as genome FASTA
2
position
18283342
1-based
3
strand
+
either + or -
4
sequence context
CGT
can be more than 3 bases, used to determine mC type
5
mc
1
count of reads supporting methylation
6
cov
2
read coverage, cov >= mc
7
methylated
1
indicator of significant methylation (1 if no test is performed)
Using bgzip
and tabix
, we can compress the ALLC file while allowing region query. This is done by YAP (using ALLCools) automatically, but here is an example of the exact command to do so:
ALLC file records all the methylation raw counts, but for clustering analysis, we need to do some "binning" to get the feature-level (genomic-region-level) raw counts. .
Unlike the ALLC file that has fixed format, the MCDS file is a flexible dataset storing all different kinds of feature counts (gene, genomic bins at different length) at different methylation types (CpH, CpG) in a single.
MCDS file is generated and manipulated by the python package xarray.Dataset
, which allows easy combination, selection, and transformation of the multi-dimensional raw count array to cell-by-feature 2-D array for clustering.
I provide a toy dataset in MCDS format and some example code of reading it using xarray. For more details about MCDS and further clustering steps, please go to .