Chapter 3 Basic layout

When using Nextflow each project should be within its own directory.

Within the main directory of the project you will have all the Nextflow files you need.

There are many Nextflow files you can have but the only essential file is a main .nf file which many people like to name as main.nf.

This chapter will go over a basic main.nf file and its different parts.

3.1 main.nf

The main.nf is a plain text file that contains Nextflow code. Below is an example:

#!/usr/bin/env nextflow
/*
  *Params
*/
params.dir = "."
params.outdir = "./results"
params.input_file = "/path/to/data/input.txt"

/*
  * Processes
*/
process STEP1 {
  publishDir:
    params.outdir, mode: 'copy'
  input:
    path(input_data)
  output:
    path("output1.txt")
  script:
  """
  create_output -i $input_data -o output1.txt
  """
}

process STEP2 {
  publishDir:
    params.outdir, mode: 'copy'
  input:
    path(input_data)
  output:
    path("output2.txt")
  script:
  """
  create_output_2 -i $input_data -o output2.txt
  """
}

/*
  * workflow
*/
workflow {
  /// Set input params to channels
  input_ch=file(params.input_file)
  /// Process 1
  STEP1(input_ch)
  /// Process 2
  STEP2(step1.out)
}

3.2 Params

The first section is the various initial parameters. This is useful for specifying input information for the workflow including:

  • Input & output directories
  • Input data
  • Metadata information
  • Reference files

Any time they are used anywhere within the script they must include params. as a prefix.

3.3 Processes

Processes are how the tasks of a workflow are specified and have many parts.

Process reference

3.3.1 Initialistion

A process is defined by process NAME and the process body is cotnained with {}.

The Process name is arbitrary and decided by the workflow designer. However some suggestions are:

  • Do not start the name with numbers.
  • Capitalise all letters used, this is normal convention and makes it easier to see what are processes in your workflow block.
  • Separate words with _.
  • Ensure all process names are unique and somewhat descriptive within your workflow.

3.3.2 publishDir

All files created by the Process will be contained with the workflow's work directory (more info later). All files specified in the output section will be stored in the directory specified by publishDir, if the directory does not exist Nextflow will create it. The output data is stored in the specified based on the mode:.

The modes are:

  • copy: Copies the output files into the publish directory.
  • copyNoFollow: Copies the output files into the publish directory without following symlinks ie. copies the links themselves.
  • link: Creates a hard link in the publish directory for each output file.
  • move: Moves the output files into the publish directory. Note: this is only supposed to be used for a terminal process i.e. a process whose output is not consumed by any other downstream process.
  • rellink: Creates a relative symbolic link in the publish directory for each output file.
  • symlink: Creates an absolute symbolic link in the publish directory for each output file (default).

More info on publishDir

3.3.3 input

These specify the input names and type. Input names are arbitrary but should follow variables rules.

The 2 basic input types are:

  • path: Specifies paths, generally poiting to files.
  • val: Specifies values, these are generally text or numbers such as sample ids.

3.3.4 output

These specify the output.

If the output is a file it needs to be the actual name of the output file in quotes (double quotes are normally preferred).

3.3.5 script

The script block, denoted by flanking triple double quotes ("""), contains the code the process will carry out. By default this will be bash code.

Input variables can be denoted by a $ and output variables are written like normal script but can also use input variables such as sample ids.

3.4 workflow

The workflow section is where all the other sections come together so the workflow knows how to utilise them.