Mapping Via Sbatch

Input

After demultiplexing, all the snakemake command is also summarized in the {output_dir}/snakemake/sbatch directory.

The snakemake_cmd.txt contains all the snakemake command for all PCR index sub-directories.

The sbatch.sh is a submission script file that can automatically submit all these commands via yap sbatch. yap sbatch will control the total number of jobs run in parallel on the sbatch.

output_dir
├── snakemake
│   ├── sbatch
│   │   ├── sbatch.sh
│   │   └── snakemake_cmd.txt

sbatch.sh

# this command run all the snakemake commands for mapping
yap sbatch \
--project_name mc-V2 \
--command_file_path $SCRATCH/{lib_name}/snakemake/sbatch/snakemake_cmd.txt \
--working_dir $SCRATCH/{lib_name}/snakemake/sbatch \
--time_str 12:00:00

Transfer Files to Stampede2 Scratch

# you can get your scratch dir location by
# on tacc login node
echo $SCRATCH

# and then make a soft link to your home dir
# on tacc login node
ln -s $SCRATCH ~/scratch

# on local server
rsync -arv {output_dir} tacc:scratch/

Execute

Just like qsub, you only need to execute the sbatch.sh. It will generate all the sbatch script for each snakemake command and execute them. And it will also wait for all command to finish before exit. I do not recommend to run this as a separate sbatch job, because the execution time is long. You can just execute this in a screen or nohup

# open a screen
screen -R sbatch

# in that screen, activate the mapping environment
conda activate mapping

# run the submitter interactively
sh sbatch.sh

Output

Transfer Files Back

After mapping, you can rsync the whole output_dir from the remote server back to the same location. If you rsync to the same path, you may skip the FASTQ files because they are unchanged during mapping.

# the {output_dir} is the same dir uploaded to tacc
rsync -arv --exclude "*fq.gz" tacc:scratch/{lib_name} {output_dir}

Last updated

Was this helpful?