Key Mapping Metrics
Important mapping metrics for evaluating cell quality
Below is the minimum mapping metric I used to evaluate cell and library quality before any computational analysis is done. The numbers here related to how the library is sequenced, I gave this number based on loading 16 plates (3072 wells) in a MiSeq run or a NovaSeq run using S4 flowcell. If your library is loaded differently (e.g., 32 plates in a NovaSeq run using S4 flowcell), you need to change the cutoffs accordingly.
- FinalmCReads, this is the final number of reads used in methylation calling, therefore, represents the real genome coverage.
- MiSeq cutoff: FinalmCReads > 100
- NovaSeq cutoff: FinalmCReads > 500,000
- The library average is ~1.6 M reads/cell.
- mCCCFrac, this is the upper bound of non-conversion rate. mCCC fraction is usually close to the non-conversion rate measured by lambda DNA spike-in, but it is positively correlated with the cell's mCH fraction, therefore, can be a bit higher in cells with high mCH (e.g., some inhibitory neurons). Therefore, I recommend using different thresholds for neurons and other tissues:
- For neuronal related sample, use mCCC fraction < 0.03
- For other tissue that known to have low mCH fraction, use mCCC fraction < 0.01
- R1(R2)DuplicationRate, the library average usually between 25% - 35%. I do not filter cells based on this metric.
- Overall success rate: after filtering by FinalmCReads, mCCCFrac, R1MappingRate, we usually got ~80% wells (or cells) remaining. The success rate between MiSeq and NovaSeq should be very close. If the MiSeq success rate is below 65% (< 2000 success in a total of 3072), I do not recommend proceeding to NovaSeq. There must be some quality issues either during FACS or due to the library preparation.