Mapping Form Cell-Level FASTQ Files
If you start from yap demultiplex
, you do not need to follow this page. This page is for the situation where you directly start from cell-level FASTQ files.
If you directly downloaded demultiplexed cell-level FASTQ files from the database, you can skip the yap demultiplex
step and directly prepare the Snakefiles
for mapping using yap start-from-cell-fastq
.
Requirements on FASTQ files
The FASTQ file name must follow the following pattern, especially the suffix part:
For R1 FASTQ:
{cell_id}-R1.fq.gz
For R2 FASTQ:
{cell_id}-R2.fq.gz
The
{cell_id}
must be unique, it will be used as cells' ID in the mapping summary.Each cell must have one R1 and one R2 FASTQ file.
Input
FASTQ files prepared as the above section
Mapping config INI file of corresponding technology, see how to prepare that file here.
If you don't know the barcode version, you may have a guess based on the file name pattern. If the file name does not follow a normal pattern, just use "V2"
the algorithm will skip generating plate information if the file name pattern is not following YAP's pattern, but plate information is not necessary for the following analysis.
Prepare Mapping
This is a single command that finishes quickly. It does not run any mapping command, instead, it generates a very similar directory structure as the yap demultiplex
will do. It also generates all the Snakefile
that contain the actual mapping commands and can be executed using snakemake
.
One technical note, here I randomly group FASTQ pairs into several groups, each group got one Snakefile
and will be mapped together. This step does not impact any of the mapping results.
Execute Mapping
Just like the previous page, all you need to do is execute the snakemake commands. All commands should be summarized in the {output_dir}/snakemake
directory.
Output
The output is just like this example. The MappingSummary.csv.gz
in each group directory is the final target file, if this file exists, all command must be executed successfully in that group directory.
After mapping, you can also run yap summary
as explained on the next page.
More customized mapping
The scope of YAP is just for some fixed mapping pipeline that consistent with our publications. If you do want to have more finner control of the mapping steps, I suggest you read the Snakefile templates and make your own versions by yourself.
The main work YAP did is generating the Snakefile
based on file inputs and the Snakefile
templates. The resulting Snakefile
contains all steps to map the data.
Snakefile templates are part of YAP's source code and are pretty straightforward to read. If you need any help on the Snakefile format, you can go to snakemake's documentation.
Last updated