Chapter 5 Processes

This section includes some extra features related to processes.

A full link of process info can be seen in the nextflow docs

5.1 Directives

You can include directives in each process. These can be used to specify the execution requirements of a process.

Link to full list of process directives

5.1.1 cpus

One example is that you can use cpus to specify the number of cpus to be used by a process.

process RUN {
  cpus 2
}

5.1.2 publishDir

The basics of pblishDir

My preferred method is to assign an overall output/results directory as a params.

params.outdir="./results"

process RUN {
  publishDir params.outdir, mode: 'copy'
}

You can also set a subdiretcory of the output directory in the process.

params.outdir="./results"

process RUN {
  publishDir {
    "${params.outdir}/stage_1"
    }, mode: 'copy'
}

You can even do this with input variables.

params.outdir="./results"
input:
  val sample_id
process RUN {
  publishDir {
    "${params.outdir}/stage_1/${sample_id}"
    }, mode: 'copy'
}

5.1.3 conda

A conda environment can be specified.

Further info on using conda environments

5.1.3.1 Local environment

You can specify a conda environment you have locally created.

process RUN {
  conda "/home/minforge3/envs/run_env"
}

If using a locally installed env it is best to specify it as a params to make it quicker to add/edit for multiple processes.

params.conda_env="/home/minforge3/envs/run_env"

process RUN {
  conda params.conda_env
}

5.1.3.2 URI based environment

You can have nextflow install a conda packages for specific process.

process RUN {
  conda "bioconda::samtools=1.20"
}

You can find what packages can be downloaded this way through sequera containers.

For an easy example search for bioconda::samtools on the above link.

5.1.3.3 Setting workflow to conda usage

When you want to use the specified conda environments in a workflow you must either:

Include -with-conda/-use-conda in the nextflow run command

or:

Better yet add conda.enabled=true to your nextflow.config file

5.2 Script

5.2.1 Variables

Within the script nextflow variables are called as ${samples_id}.

Bash variables are called as \${sample_id}

5.2.2 Other languages

Other languages can be used within the nextflow script section.

For example python:

script
"""
#!/usr/bin/env python
"""

5.3 Modules

The primary main.nf can become quite large by having a lot of processes. To counteract this each process can be stored in a separate main.nf file.

The recommendation is to store them in a directory called modules/local within the main workflow directory. Then each process would be within a main.nf file within various directories.

A module main.nf would be as so:

#!/usr/bin/env nextflow
process RUN {
  / process contents
}

Modules are imported as a process as so:

/*
 * Processes
*/
include { FASTQC_RAW } from './modules/local/run/main.nf'

It is common to have subdirectories within modules/local grouped by tools. For example if you were performing 16S analysis with qiime2 you may have some of the following modules:

include { IMPORT } from './modules/local/qiime2/import_data/main.nf'
include { CUTADAPT } from './modules/local/qiime2/cutadpat/main.nf'
include { DADA2 } from './modules/local/qiime2/dada2/main.nf'