Demultiplex
Related Commands
Purpose
The bcl2fastq
command only demultiplexed the PCR index. Therefore, each set of the raw FASTQ files still contain reads from several cells (8 cells in V1; 64 cells in V2). This step further demultiplex random index on the 5' of R1, generating cell-level R1 and R2 FASTQ files.
The random index is trimmed after demultiplex. The random index name occurs at the FASTQ file name, which combines with previous information to form the cell id.
This step also prepares Snakefiles that contain all the commands for mapping (using snakemake).
Input
Illumina bcl2fastq
created FASTQ file sets.
For MiSeq, each set has two files (R1 & R2 from one lane)
For NovaSeq, each set has 2 * N_lane files, N_lane depends on the flowcell used and the way of loading.
Demultiplex
For Ecker Lab users, do not run this step on DDN drive, cutadapt demultiplex
(what yap
is based on) constantly raise errors on DDN drive. More safely, do not run mapping on DDN drive.
Output
The random index sequence will be removed from the reads
6bp removed from R1 5' in V1 indexed libraries.
8bp removed from R1 5' in V2 indexed libraries.
Each cell will have two FASTQ files in the output directory, with a fixed name pattern:
{cell_id}-R1.fq.gz
for R1{cell_id}-R2.fq.gz
for R2
Files are organized by the following structure, a minimum example is also attached below.
Last updated
Was this helpful?