Key Mapping Metrics

Important mapping metrics for evaluating cell quality

snmC-seq 2/3

Below is the minimum mapping metric I used to evaluate cell and library quality before any computational analysis is done. The numbers here related to how the library is sequenced, I gave this number based on loading 16 plates (3072 wells) in a MiSeq run or a NovaSeq run using S4 flowcell. If your library is loaded differently (e.g., 32 plates in a NovaSeq run using S4 flowcell), you need to change the cutoffs accordingly.
  • FinalmCReads, this is the final number of reads used in methylation calling, therefore, represents the real genome coverage.
    • MiSeq cutoff: FinalmCReads > 100
    • NovaSeq cutoff: FinalmCReads > 500,000
    • The library average is ~1.6 M reads/cell.
  • mCCCFrac, this is the upper bound of non-conversion rate. mCCC fraction is usually close to the non-conversion rate measured by lambda DNA spike-in, but it is positively correlated with the cell's mCH fraction, therefore, can be a bit higher in cells with high mCH (e.g., some inhibitory neurons). Therefore, I recommend using different thresholds for neurons and other tissues:
    • For neuronal related sample, use mCCC fraction < 0.03
    • For other tissue that known to have low mCH fraction, use mCCC fraction < 0.01
  • R1MappingRate, this metric is species-specific, the library average usually between 65% - 75%. I use R1MappingRate > 50% as the cutoff. A low mapping rate indicates potential contamination.
  • R2MappingRate, this metric is lower than R1MappingRate, because the R2 base quality is not as good as R1, the average is usually 10% lower than R1 (but highly correlated).
  • R1(R2)DuplicationRate, the library average usually between 25% - 35%. I do not filter cells based on this metric.
  • Overall success rate: after filtering by FinalmCReads, mCCCFrac, R1MappingRate, we usually got ~80% wells (or cells) remaining. The success rate between MiSeq and NovaSeq should be very close. If the MiSeq success rate is below 65% (< 2000 success in a total of 3072), I do not recommend proceeding to NovaSeq. There must be some quality issues either during FACS or due to the library preparation.

🚧 snmCT-seq

🚧 snm3C-seq