Mapping Form Cell-Level FASTQ Files
If you start from
yap demultiplex
, you do not need to follow this page. This page is for the situation where you directly start from cell-level FASTQ files.If you directly downloaded demultiplexed cell-level FASTQ files from the database, you can skip the
yap demultiplex
step and directly prepare the Snakefiles
for mapping using yap start-from-cell-fastq
.- The FASTQ file name must follow the following pattern, especially the suffix part:
- For R1 FASTQ:
{cell_id}-R1.fq.gz
- For R2 FASTQ:
{cell_id}-R2.fq.gz
- The
{cell_id}
must be unique, it will be used as cells' ID in the mapping summary. - Each cell must have one R1 and one R2 FASTQ file.
- FASTQ files prepared as the above section
If you don't know the barcode version, you may have a guess based on the file name pattern. If the file name does not follow a normal pattern, just use "V2"
the algorithm will skip generating plate information if the file name pattern is not following YAP's pattern, but plate information is not necessary for the following analysis.
This is a single command that finishes quickly. It does not run any mapping command, instead, it generates a very similar directory structure as the
yap demultiplex
will do. It also generates all the Snakefile
that contain the actual mapping commands and can be executed using snakemake
. $ yap start-from-cell-fastq -h
usage: yap start-from-cell-fastq [-h] --output_dir OUTPUT_DIR --config_path
CONFIG_PATH --fastq_pattern FASTQ_PATTERN
optional arguments:
-h, --help show this help message and exit
Required inputs:
--output_dir OUTPUT_DIR, -o OUTPUT_DIR
Pipeline output directory, if not exist, will create
recursively. (default: None)
--config_path CONFIG_PATH, -config CONFIG_PATH
Path to the mapping config, see 'yap default-mapping-
config' about how to generate this file. (default:
None)
--fastq_pattern FASTQ_PATTERN, -fq FASTQ_PATTERN
Path pattern with wildcard to match all cell-level
FASTQ files, pattern with wildcard must be quoted.
(default: None)
One technical note, here I randomly group FASTQ pairs into several groups, each group got one
Snakefile
and will be mapped together. This step does not impact any of the mapping results.Just like the previous page, all you need to do is execute the snakemake commands. All commands should be summarized in the
{output_dir}/snakemake
directory.The output is just like this example. The
MappingSummary.csv.gz
in each group directory is the final target file, if this file exists, all command must be executed successfully in that group directory.After mapping, you can also run
yap summary
as explained on the next page.The scope of YAP is just for some fixed mapping pipeline that consistent with our publications. If you do want to have more finner control of the mapping steps, I suggest you read the Snakefile templates and make your own versions by yourself.
The main work YAP did is generating the
Snakefile
based on file inputs and the Snakefile
templates. The resulting Snakefile
contains all steps to map the data. Snakefile templates are part of YAP's source code and are pretty straightforward to read. If you need any help on the Snakefile format, you can go to snakemake's documentation.
Last modified 2yr ago