├── .gitignore ├── LICENSE ├── README.md ├── main.nf ├── modules ├── align.nf ├── fastqc.nf ├── manifest.nf ├── multiqc.nf └── quality.nf ├── nextflow.config ├── templates ├── bwa.sh ├── bwa_index.sh ├── fastqc.sh ├── flagstats.sh ├── multiqc.sh ├── quality_trim.sh ├── template_workflow.zip └── validate_manifest.py └── test_data ├── fastq ├── ERR098010_R1.fastq.gz ├── SRR2057021_R1.fastq.gz ├── SRR2057021_R2.fastq.gz ├── SRR4051738_R1.fastq.gz └── SRR4051738_R2.fastq.gz ├── genome_fasta └── NC_001422.1.fasta ├── manifest.csv ├── output ├── alignments │ ├── SRR2057021 │ │ └── aligned.bam │ ├── SRR4051738 │ │ └── aligned.bam │ └── multiqc_report.html ├── input │ └── multiqc_report.html └── quality_trimmed │ └── multiqc_report.html └── test.sh /.gitignore: -------------------------------------------------------------------------------- 1 | *.nextflow.log* 2 | .nextflow/ 3 | *.code-workspace 4 | report.html* 5 | work/ -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2021 Fred Hutchinson Cancer Research Center 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Workflow Template - Nextflow 2 | Template for building a small workflow in Nextflow 3 | 4 | ## Background 5 | 6 | [Nextflow](https://www.nextflow.io/) is a free and open source software 7 | project which makes it easier to run a computational workflow consisting 8 | of a series of interconnected steps. There are many different ways that 9 | Nextflow can be used, and the [documentation](https://www.nextflow.io/docs/latest/index.html) 10 | can be overwhelming. This repository provides an opinionated example of 11 | how a bioinformatician can structure their code to be run using Nextflow. 12 | 13 | ## Use DSL-2 14 | 15 | After getting started, Nextflow added a lot of extremely useful functionality in 16 | a major release. Since that new functionality was not always backwards compatible, 17 | the new syntax was called "DSL-2". There is [extensive documentation](https://www.nextflow.io/docs/latest/dsl2.html) 18 | on the DSL-2 syntax. The only thing you need to do inside a workflow to use 19 | these features is to add `nextflow.enable.dsl=2` at the top of `main.nf`. 20 | 21 | It is worth your time to read over the DSL-2 documentation if you want to 22 | write workflows which are elegant and easy to maintain. 23 | 24 | ## Repository Structure 25 | 26 | The essential components of the workflow repository are: 27 | - `main.nf`: Contains the primary workflow code which pulls in all additional code from the repository 28 | - `modules/`: Contains all of the sub-workflows which are used to organize large chunks of analysis 29 | - `templates/`: Contains all of the code which is executed in each individual step of the workflow 30 | 31 | To help you get up and running with this structure, download the minimal template 32 | directory provided in the zip archive `templates/template_workflow.zip`. 33 | 34 | ## Parameter Inheritance 35 | 36 | When running a workflow you can tell it what to do by passing in parameters with 37 | `--param_name param_value`. To make this work easily in Nextflow, make sure to 38 | set up the default value in `nextflow.config` in the `params` scope (e.g. `params{param_name = 'default_value'}`). 39 | If a user passes in a value, then `params.param_name` will have that value. If they 40 | do not, it will be `default_value`. The really useful thing about the `params` is that 41 | they are inherited by every sub-workflow and process that is invoked. In other words, 42 | without having to do _anything_ else, I can use `${params.param_name}` in one of the 43 | script files in `templates/`, and I know that it will contain the value that was provided 44 | by the user. 45 | 46 | There are options to override this parameter inheritance if you want to get really fancy, 47 | but this default behavior is extremely useful if you just want to write code and not 48 | worry about explicitly piping together each of the variables into each sub-workflow 49 | as it is imported. 50 | 51 | ### User Input of Parameters 52 | 53 | There are two ways that users can most easily provide their own inputs to a workflow, 54 | with command-line flags or with a params file. 55 | 56 | On the command line, parameters are provided using two dashes before the parameter 57 | name, e.g. `--param_name value`. One limitation of this approach is that the provided 58 | value will be interpreted as a string. The best example of this is the edge case of the 59 | the negative boolean (`false`), which will be interpreted by Nextflow as a string (`'false'`). 60 | The second limitation is that the command line string starts to become rather long. 61 | Another consideration of providing parameters on the command line is that they may be 62 | interpreted by the shell before execution. For example, in the context of a BASH script 63 | `--param_name *.fastq.gz` will first be expanded to the list of files which match that 64 | pattern (e.g., `--param_name 1.fastq.gz 2.fastq.gz 3.fastq.gz`), which may not be the 65 | intention. This behavior can be prevented explicitly with single-quotes in BASH, with 66 | `--param_name '*.fastq.gz'` being unaltered by the shell before execution. 67 | 68 | By using a params file, the user is able to more explicitly define the set of parameters 69 | which will be provided. The params file can be formatted as JSON or YAML, with the example 70 | below shown in JSON. 71 | 72 | ``` 73 | { 74 | "param_name": "*.fastq.gz", 75 | "second_param": false, 76 | "third_param": 5 77 | } 78 | ``` 79 | 80 | The params file is provided by the user with the `-params-file` flag. 81 | While this approach requires the user to create an additional file, it also provides a 82 | method for defining variables without worrying about the nuances of the shell interpreter. 83 | 84 | If both methods are used for providing parameters, the command line flags will take 85 | precedence over the params file ([docs](https://www.nextflow.io/docs/latest/config.html)). 86 | 87 | ## Templates 88 | 89 | One of the options for defining the code that is run inside a Nextflow process 90 | is to use their [template syntax](https://www.nextflow.io/docs/latest/process.html#template). 91 | The advantage of this approach is that the code can be defined in a separate file 92 | with the appropriate file extension which can be recognized by your favorite IDE 93 | and linter. Any variables from Nextflow will be interpolated using an easy `${var_name}` 94 | syntax, and all other code will be native to the desired language. 95 | 96 | The one 'gottcha' for the template structure is the backslashes are used to escape Nextflow interpolation (meaning that internal BASH variables can be specified with `\$INTERNAL_VAR_NAME`), 97 | and so any use of backslashes for special characters must have two backslashes. Put simply, 98 | if you want to strip the newline character in Python, you would need to write `str.strip('\\n')` 99 | instead of `str.strip('\n')`. 100 | 101 | ## Software Containers 102 | 103 | Each individual step in a workflow should be run inside a container (using 104 | either Docker or Singularity) which has the required dependencies. There is a 105 | long list of public images with commonly used bioinformatics tools available 106 | at the [BioContainers Registry](https://biocontainers.pro/registry). Specific builds 107 | should be identified from the [corresponding repository](https://quay.io/repository/biocontainers/bwa?tab=tags) 108 | for use in a workflow. 109 | 110 | Software containers should be defined as parameters in `main.nf`, which allows 111 | the value to propagate automatically to all imported sub-workflows, while also 112 | being able to be overridden easily by the user if needs be. 113 | 114 | Practically speaking, this means that every process should have a `container` 115 | declared which follows the pattern `container "${params.container__toolname}"`, 116 | and which was set in `nextflow.config` with `params{container__toolname = "quay.io/org/image:tag"}`. 117 | It is crucial that the parameter be set _before_ the subworkflows are imported, as 118 | shown in this example workflow. 119 | 120 | ## Workflow Style Guide 121 | 122 | While a workflow could be made in almost any way imaginable, there are some 123 | tips and tricks which make debugging and development easier. This is a highly 124 | opinionated list, and should be taken simply as one perspective on the topic. 125 | 126 | - Never use file names to encode metadata (like specimen name, `.trimmed`, etc.) 127 | - Always publish files with `mode: 'copy', overwrite: true` 128 | - Use `.toSortedList()` instead of `.collect()` for reproducible ordering 129 | - Add `set -Eeuo pipefail` to the header of any BASH script 130 | - Every process uses a `container`, which is defined as a `param.container__toolname` in `main.nf` 131 | - Never use `.baseName` to remove file extension, instead use (e.g.) `.name.replaceAll('.fastq.gz', '')` 132 | 133 | ## Going Further 134 | 135 | If you are interested in writing workflows in a way which can be best 136 | shared with the worldwide community of Nextflow developers, please join the 137 | [nf-core](https://nf-co.re/) community. In addition to providing a catalog of 138 | increcibly useful workflows, this group of core bioinformaticians has created an entire 139 | software suite for authoring workflows using community-driven best practices. 140 | 141 | The nf-core codebase can be used to quickly create workflow templates (`nf-core create`) 142 | which are far more sophisticated and robust than this repository. The code here is 143 | for a quick-and-dirty launch into Nextflow. If you want to go deeper, connect with 144 | the other people around the world who have already put in the work to build a 145 | community, and you will go farther together. 146 | -------------------------------------------------------------------------------- /main.nf: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env nextflow 2 | 3 | // Using DSL-2 4 | nextflow.enable.dsl=2 5 | 6 | // All of the default parameters are being set in `nextflow.config` 7 | 8 | // Import sub-workflows 9 | include { validate_manifest } from './modules/manifest' 10 | include { quality_wf } from './modules/quality' 11 | include { align_wf } from './modules/align' 12 | 13 | 14 | // Function which prints help message text 15 | def helpMessage() { 16 | log.info""" 17 | Usage: 18 | 19 | nextflow run FredHutch/workflow-template-nextflow 20 | 21 | Required Arguments: 22 | 23 | Input Data: 24 | --fastq_folder Folder containing paired-end FASTQ files ending with .fastq.gz, 25 | containing either "_R1" or "_R2" in the filename. 26 | or 27 | --manifest Single file with the location of all input data. Must be formatted 28 | as a CSV with columns: sample,R1,R2 29 | 30 | Reference Data: 31 | --genome_fasta Reference genome to use for alignment, in FASTA format 32 | 33 | Output Location: 34 | --output_folder Folder for output files 35 | 36 | Optional Arguments: 37 | --min_qvalue Minimum quality score used to trim data (default: ${params.min_qvalue}) 38 | --min_align_score Minimum alignment score (default: ${params.min_align_score}) 39 | """.stripIndent() 40 | } 41 | 42 | 43 | // Main workflow 44 | workflow { 45 | 46 | // Show help message if the user specifies the --help flag at runtime 47 | // or if any required params are not provided 48 | if ( params.help || params.output_folder == false || params.genome_fasta == false ){ 49 | // Invoke the function above which prints the help message 50 | helpMessage() 51 | // Exit out and do not run anything else 52 | exit 1 53 | } 54 | 55 | // The user should specify --fastq_folder OR --manifest, but not both 56 | if ( params.fastq_folder && params.manifest ){ 57 | log.info""" 58 | User may specify --fastq_folder OR --manifest, but not both 59 | """.stripIndent() 60 | // Exit out and do not run anything else 61 | exit 1 62 | } 63 | if ( ! params.fastq_folder && ! params.manifest ){ 64 | log.info""" 65 | User must specify --fastq_folder or --manifest. 66 | Run with --help for more details. 67 | """.stripIndent() 68 | // Exit out and do not run anything else 69 | exit 1 70 | } 71 | 72 | // If the --fastq_folder input option was provided 73 | if ( params.fastq_folder ){ 74 | 75 | // Make a channel with the input FASTQ read pairs from the --fastq_folder 76 | // After calling `fromFilePairs`, the structure must be changed from 77 | // [specimen, [R1, R2]] 78 | // to 79 | // [specimen, R1, R2] 80 | // with the map{} expression 81 | 82 | // Define the pattern which will be used to find the FASTQ files 83 | fastq_pattern = "${params.fastq_folder}/*_R{1,2}*fastq.gz" 84 | 85 | // Set up a channel from the pairs of files found with that pattern 86 | fastq_ch = Channel 87 | .fromFilePairs(fastq_pattern) 88 | .ifEmpty { error "No files found matching the pattern ${fastq_pattern}" } 89 | .map{ 90 | [it[0], it[1][0], it[1][1]] 91 | } 92 | 93 | // Otherwise, they must have provided --manifest 94 | } else { 95 | 96 | // Parse the CSV file which was provided by the user 97 | // and make sure that it has the expected set of columns 98 | // (this is the most common user error with manifest files) 99 | validate_manifest( 100 | Channel.fromPath(params.manifest) 101 | ) 102 | 103 | // Make a channel which includes 104 | // The sample name from the first column 105 | // The file which is referenced in the R1 column 106 | // The file which is referenced in the R2 column 107 | fastq_ch = validate_manifest 108 | .out 109 | .splitCsv(header: true) 110 | .flatten() 111 | .map {row -> [row.sample, file(row.R1), file(row.R2)]} 112 | 113 | // The code above is an example of how we can take a flat file 114 | // (the manifest), split it into each row, and then parse 115 | // the location of the files which are pointed to by their 116 | // paths in two of the columns (but not the first one, which 117 | // is just a string) 118 | 119 | } 120 | 121 | // Perform quality trimming on the input 122 | quality_wf( 123 | fastq_ch 124 | ) 125 | // output: 126 | // reads: 127 | // tuple val(specimen), path(read_1), path(read_2) 128 | 129 | // Align the quality-trimmed reads to the reference genome 130 | align_wf( 131 | quality_wf.out.reads, 132 | file(params.genome_fasta) 133 | ) 134 | // output: 135 | // bam: 136 | // tuple val(specimen), path(bam) 137 | 138 | } -------------------------------------------------------------------------------- /modules/align.nf: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env nextflow 2 | 3 | // Using DSL-2 4 | nextflow.enable.dsl=2 5 | 6 | // Import the multiqc process, while specifying the location for the outputs 7 | include { multiqc as multiqc_flagstats } from './multiqc' addParams(output_subfolder: 'alignments') 8 | 9 | // Index a genome for alignment with BWA MEM 10 | process bwa_index { 11 | container "${params.container__bwa}" 12 | 13 | input: 14 | path genome_fasta 15 | 16 | output: 17 | path "ref.tar.gz" 18 | 19 | script: 20 | template 'bwa_index.sh' 21 | 22 | } 23 | 24 | // Align reads with BWA MEM 25 | process bwa { 26 | container "${params.container__bwa}" 27 | publishDir "${params.output_folder}/alignments/${specimen}/", mode: 'copy', overwrite: true 28 | 29 | input: 30 | tuple val(specimen), path(R1), path(R2) 31 | path ref 32 | 33 | output: 34 | tuple val(specimen), path("aligned.bam"), emit: bam 35 | 36 | script: 37 | template 'bwa.sh' 38 | 39 | } 40 | 41 | // Count up the number of aligned reads 42 | process flagstats { 43 | container "${params.container__bwa}" 44 | 45 | input: 46 | tuple val(specimen), path(bam) 47 | 48 | output: 49 | file "${specimen}.flagstats" 50 | 51 | script: 52 | template 'flagstats.sh' 53 | 54 | } 55 | 56 | workflow align_wf{ 57 | 58 | take: 59 | reads_ch 60 | genome_fasta 61 | 62 | main: 63 | 64 | // Index the reference genome 65 | bwa_index(genome_fasta) 66 | 67 | // Align the reads 68 | bwa(reads_ch, bwa_index.out) 69 | 70 | // Count up the reads 71 | flagstats(bwa.out.bam) 72 | 73 | // Combine the flagstats reports 74 | multiqc_flagstats(flagstats.out.toSortedList()) 75 | 76 | emit: 77 | bam = bwa.out.bam 78 | 79 | } -------------------------------------------------------------------------------- /modules/fastqc.nf: -------------------------------------------------------------------------------- 1 | 2 | // Assess quality of input data 3 | process fastqc { 4 | container "${params.container__fastqc}" 5 | 6 | input: 7 | tuple val(specimen), path(R1), path(R2) 8 | 9 | output: 10 | path "fastqc/*.zip", emit: zip 11 | path "fastqc/*.html", emit: html 12 | 13 | script: 14 | template 'fastqc.sh' 15 | 16 | } -------------------------------------------------------------------------------- /modules/manifest.nf: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env nextflow 2 | 3 | // Using DSL-2 4 | nextflow.enable.dsl=2 5 | 6 | process validate_manifest { 7 | // Run inside a container with Python/Pandas installed 8 | container "${params.container__pandas}" 9 | 10 | input: 11 | path manifest_csv 12 | 13 | output: 14 | file "manifest.csv" 15 | 16 | script: 17 | template 'validate_manifest.py' 18 | 19 | } -------------------------------------------------------------------------------- /modules/multiqc.nf: -------------------------------------------------------------------------------- 1 | 2 | // Combine all FASTQC data into a single report 3 | process multiqc { 4 | container "${params.container__multiqc}" 5 | publishDir "${params.output_folder}/${params.output_subfolder}/", mode: 'copy', overwrite: true 6 | 7 | input: 8 | path "*" 9 | 10 | output: 11 | path "multiqc_report.html" 12 | 13 | script: 14 | template 'multiqc.sh' 15 | 16 | } -------------------------------------------------------------------------------- /modules/quality.nf: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env nextflow 2 | 3 | // Using DSL-2 4 | nextflow.enable.dsl=2 5 | 6 | // Import the fastqc and multiqc processes so that they can 7 | // each be used in two places independently 8 | include { fastqc as fastqc_input } from './fastqc' 9 | include { fastqc as fastqc_trimmed } from './fastqc' 10 | // Using the output_subfolder parameter, we can publish the output from 11 | // each invocation of multiqc to a different location 12 | include { multiqc as multiqc_input } from './multiqc' addParams(output_subfolder: 'input') 13 | include { multiqc as multiqc_trimmed } from './multiqc' addParams(output_subfolder: 'quality_trimmed') 14 | 15 | // Perform quality trimming on the input FASTQ data 16 | process quality_trim { 17 | container "${params.container__cutadapt}" 18 | 19 | input: 20 | tuple val(specimen), path(R1), path(R2) 21 | 22 | output: 23 | tuple val(specimen), path("${R1.name.replaceAll(/.fastq.gz/, '')}.trimmed.fastq.gz"), path("${R2.name.replaceAll(/.fastq.gz/, '')}.trimmed.fastq.gz"), emit: reads 24 | tuple val(specimen), path("${specimen}.cutadapt.json"), emit: log 25 | 26 | script: 27 | template 'quality_trim.sh' 28 | 29 | } 30 | 31 | workflow quality_wf{ 32 | 33 | take: 34 | reads_ch 35 | // tuple val(specimen), path(read_1), path(read_2) 36 | 37 | main: 38 | 39 | // Generate quality metrics for the input data 40 | fastqc_input(reads_ch) 41 | 42 | // Combine all of the FASTQC data for the input data 43 | multiqc_input(fastqc_input.out.zip.flatten().toSortedList()) 44 | 45 | // Run quality trimming 46 | quality_trim(reads_ch) 47 | 48 | // Generate quality metrics for the trimmed data 49 | fastqc_trimmed(quality_trim.out.reads) 50 | 51 | // Combine all of the FASTQC data for the trimmed data 52 | multiqc_trimmed(fastqc_trimmed.out.zip.flatten().toSortedList()) 53 | 54 | emit: 55 | reads = quality_trim.out.reads 56 | 57 | } -------------------------------------------------------------------------------- /nextflow.config: -------------------------------------------------------------------------------- 1 | profiles { 2 | docker { 3 | docker { 4 | enabled = true 5 | temp = 'auto' 6 | } 7 | } 8 | } 9 | 10 | /* 11 | Set default parameters 12 | 13 | Any parameters provided by the user with a -params-file or 14 | with -- command-line arguments will override the values 15 | defined below. 16 | */ 17 | params { 18 | help = false 19 | fastq_folder = false 20 | manifest = false 21 | genome_fasta = false 22 | output_folder = false 23 | 24 | // Quality trimming 25 | min_qvalue = 20 26 | min_align_score = 40 27 | 28 | // Set the containers to use for each component 29 | container__cutadapt = "quay.io/biocontainers/cutadapt:3.5--py36hc5360cc_0" 30 | container__fastqc = "quay.io/biocontainers/fastqc:0.11.9--hdfd78af_1" 31 | container__multiqc = "quay.io/biocontainers/multiqc:1.11--pyhdfd78af_0" 32 | container__bwa = "quay.io/hdc-workflows/bwa-samtools:latest" 33 | container__pandas = "quay.io/fhcrc-microbiome/python-pandas:0fd1e29" 34 | 35 | } -------------------------------------------------------------------------------- /templates/bwa.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | set -euo pipefail 4 | 5 | echo "Specimen: $specimen" 6 | echo "R1: $R1" 7 | echo "R2: $R2" 8 | 9 | # Unpack the reference tarball 10 | tar xzvf ${ref} 11 | 12 | REF=\$(find -name "*.amb" | sed 's/.amb//') 13 | echo "REF=\$REF" 14 | 15 | echo "Running BWA MEM" 16 | bwa \ 17 | mem \ 18 | -a \ 19 | -t3 \ 20 | \$REF \ 21 | ${R1} \ 22 | ${R2} \ 23 | | samtools \ 24 | sort \ 25 | -m3G \ 26 | -@3 \ 27 | -o aligned.bam - 28 | 29 | echo "DONE" -------------------------------------------------------------------------------- /templates/bwa_index.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | set -euo pipefail 4 | 5 | echo "Input: $genome_fasta" 6 | 7 | echo "Building index" 8 | bwa \ 9 | index \ 10 | "${genome_fasta}" 11 | 12 | ls -lahtr 13 | 14 | echo "Combining into a tar" 15 | tar -czvf ref.tar.gz ${genome_fasta}* 16 | 17 | echo "DONE" -------------------------------------------------------------------------------- /templates/fastqc.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | set -euo pipefail 4 | 5 | echo "Creating output folder" 6 | mkdir fastqc 7 | 8 | echo "Running FASTQC" 9 | fastqc -o fastqc "$R1" "$R2" 10 | 11 | echo "DONE" -------------------------------------------------------------------------------- /templates/flagstats.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | set -euo pipefail 4 | 5 | # Count up the number of aligned reads 6 | samtools flagstats "${bam}" > "${specimen}.flagstats" 7 | -------------------------------------------------------------------------------- /templates/multiqc.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | set -euo pipefail 4 | 5 | multiqc . -------------------------------------------------------------------------------- /templates/quality_trim.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | set -euo pipefail 4 | 5 | echo "Processing specimen: $specimen" 6 | echo "R1: $R1" 7 | echo "R2: $R2" 8 | echo "--quality-cutoff=${params.min_qvalue}" 9 | echo "--minimum-length=${params.min_align_score}" 10 | echo "--json="${specimen}.cutadapt.json"" 11 | 12 | cutadapt \ 13 | --pair-filter=any \ 14 | --quality-cutoff=${params.min_qvalue} \ 15 | --minimum-length=${params.min_align_score} \ 16 | -o "${R1.name.replaceAll(/.fastq.gz/, '')}.trimmed.fastq.gz" \ 17 | -p "${R2.name.replaceAll(/.fastq.gz/, '')}.trimmed.fastq.gz" \ 18 | --json="${specimen}.cutadapt.json" \ 19 | "$R1" \ 20 | "$R2" 21 | 22 | echo DONE 23 | -------------------------------------------------------------------------------- /templates/template_workflow.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FredHutch/workflow-template-nextflow/25f3253bf650486a6d2418663657f0bdd24080ff/templates/template_workflow.zip -------------------------------------------------------------------------------- /templates/validate_manifest.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | import pandas as pd 4 | import os 5 | 6 | # The file path of the manifest CSV (provided by the user) 7 | # is encoded by Nextflow by the following expression 8 | manifest_csv = "${manifest_csv}" 9 | # Now the Python variable `manifest_csv` contains the local filepath 10 | # which contains the file which was specified by the user 11 | 12 | # Make sure we can find the file 13 | assert os.path.exists(manifest_csv), f"Cannot find {manifest_csv} in the local process folder" 14 | 15 | # Try to read in the file as a CSV 16 | print(f"Reading in {manifest_csv} as CSV") 17 | df = pd.read_csv(manifest_csv) 18 | print(f"Read in {df.shape[0]:,} rows and {df.shape[1]:,} columns") 19 | 20 | # Note in the lines below how the newline character needs to be escaped 21 | # This is due to the way that Nextflow treats template files, and interpolating 22 | # variables from the process namespace 23 | column_list_str = "\\n".join([n for n in df.columns.values]) 24 | print(f"Columns: \\n{column_list_str}") 25 | 26 | # Now we need to make sure that all of the expected columns are present 27 | for cname in ['sample', 'R1', 'R2']: 28 | assert cname in df.columns.values, f"Manifest file must contain a column {cname}" 29 | 30 | # At this point, everything checks out 31 | 32 | # Write out the file, which should help remove any carriage returns which the user may 33 | # have left in the file (if they made it on a Windows machine) 34 | df.to_csv("manifest.csv", index=None) -------------------------------------------------------------------------------- /test_data/fastq/ERR098010_R1.fastq.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FredHutch/workflow-template-nextflow/25f3253bf650486a6d2418663657f0bdd24080ff/test_data/fastq/ERR098010_R1.fastq.gz -------------------------------------------------------------------------------- /test_data/fastq/SRR2057021_R1.fastq.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FredHutch/workflow-template-nextflow/25f3253bf650486a6d2418663657f0bdd24080ff/test_data/fastq/SRR2057021_R1.fastq.gz -------------------------------------------------------------------------------- /test_data/fastq/SRR2057021_R2.fastq.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FredHutch/workflow-template-nextflow/25f3253bf650486a6d2418663657f0bdd24080ff/test_data/fastq/SRR2057021_R2.fastq.gz -------------------------------------------------------------------------------- /test_data/fastq/SRR4051738_R1.fastq.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FredHutch/workflow-template-nextflow/25f3253bf650486a6d2418663657f0bdd24080ff/test_data/fastq/SRR4051738_R1.fastq.gz -------------------------------------------------------------------------------- /test_data/fastq/SRR4051738_R2.fastq.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FredHutch/workflow-template-nextflow/25f3253bf650486a6d2418663657f0bdd24080ff/test_data/fastq/SRR4051738_R2.fastq.gz -------------------------------------------------------------------------------- /test_data/genome_fasta/NC_001422.1.fasta: -------------------------------------------------------------------------------- 1 | >NC_001422.1 Coliphage phi-X174, complete genome 2 | GAGTTTTATCGCTTCCATGACGCAGAAGTTAACACTTTCGGATATTTCTGATGAGTCGAAAAATTATCTT 3 | GATAAAGCAGGAATTACTACTGCTTGTTTACGAATTAAATCGAAGTGGACTGCTGGCGGAAAATGAGAAA 4 | ATTCGACCTATCCTTGCGCAGCTCGAGAAGCTCTTACTTTGCGACCTTTCGCCATCAACTAACGATTCTG 5 | TCAAAAACTGACGCGTTGGATGAGGAGAAGTGGCTTAATATGCTTGGCACGTTCGTCAAGGACTGGTTTA 6 | GATATGAGTCACATTTTGTTCATGGTAGAGATTCTCTTGTTGACATTTTAAAAGAGCGTGGATTACTATC 7 | TGAGTCCGATGCTGTTCAACCACTAATAGGTAAGAAATCATGAGTCAAGTTACTGAACAATCCGTACGTT 8 | TCCAGACCGCTTTGGCCTCTATTAAGCTCATTCAGGCTTCTGCCGTTTTGGATTTAACCGAAGATGATTT 9 | CGATTTTCTGACGAGTAACAAAGTTTGGATTGCTACTGACCGCTCTCGTGCTCGTCGCTGCGTTGAGGCT 10 | TGCGTTTATGGTACGCTGGACTTTGTGGGATACCCTCGCTTTCCTGCTCCTGTTGAGTTTATTGCTGCCG 11 | TCATTGCTTATTATGTTCATCCCGTCAACATTCAAACGGCCTGTCTCATCATGGAAGGCGCTGAATTTAC 12 | GGAAAACATTATTAATGGCGTCGAGCGTCCGGTTAAAGCCGCTGAATTGTTCGCGTTTACCTTGCGTGTA 13 | CGCGCAGGAAACACTGACGTTCTTACTGACGCAGAAGAAAACGTGCGTCAAAAATTACGTGCGGAAGGAG 14 | TGATGTAATGTCTAAAGGTAAAAAACGTTCTGGCGCTCGCCCTGGTCGTCCGCAGCCGTTGCGAGGTACT 15 | AAAGGCAAGCGTAAAGGCGCTCGTCTTTGGTATGTAGGTGGTCAACAATTTTAATTGCAGGGGCTTCGGC 16 | CCCTTACTTGAGGATAAATTATGTCTAATATTCAAACTGGCGCCGAGCGTATGCCGCATGACCTTTCCCA 17 | TCTTGGCTTCCTTGCTGGTCAGATTGGTCGTCTTATTACCATTTCAACTACTCCGGTTATCGCTGGCGAC 18 | TCCTTCGAGATGGACGCCGTTGGCGCTCTCCGTCTTTCTCCATTGCGTCGTGGCCTTGCTATTGACTCTA 19 | CTGTAGACATTTTTACTTTTTATGTCCCTCATCGTCACGTTTATGGTGAACAGTGGATTAAGTTCATGAA 20 | GGATGGTGTTAATGCCACTCCTCTCCCGACTGTTAACACTACTGGTTATATTGACCATGCCGCTTTTCTT 21 | GGCACGATTAACCCTGATACCAATAAAATCCCTAAGCATTTGTTTCAGGGTTATTTGAATATCTATAACA 22 | ACTATTTTAAAGCGCCGTGGATGCCTGACCGTACCGAGGCTAACCCTAATGAGCTTAATCAAGATGATGC 23 | TCGTTATGGTTTCCGTTGCTGCCATCTCAAAAACATTTGGACTGCTCCGCTTCCTCCTGAGACTGAGCTT 24 | TCTCGCCAAATGACGACTTCTACCACATCTATTGACATTATGGGTCTGCAAGCTGCTTATGCTAATTTGC 25 | ATACTGACCAAGAACGTGATTACTTCATGCAGCGTTACCATGATGTTATTTCTTCATTTGGAGGTAAAAC 26 | CTCTTATGACGCTGACAACCGTCCTTTACTTGTCATGCGCTCTAATCTCTGGGCATCTGGCTATGATGTT 27 | GATGGAACTGACCAAACGTCGTTAGGCCAGTTTTCTGGTCGTGTTCAACAGACCTATAAACATTCTGTGC 28 | CGCGTTTCTTTGTTCCTGAGCATGGCACTATGTTTACTCTTGCGCTTGTTCGTTTTCCGCCTACTGCGAC 29 | TAAAGAGATTCAGTACCTTAACGCTAAAGGTGCTTTGACTTATACCGATATTGCTGGCGACCCTGTTTTG 30 | TATGGCAACTTGCCGCCGCGTGAAATTTCTATGAAGGATGTTTTCCGTTCTGGTGATTCGTCTAAGAAGT 31 | TTAAGATTGCTGAGGGTCAGTGGTATCGTTATGCGCCTTCGTATGTTTCTCCTGCTTATCACCTTCTTGA 32 | AGGCTTCCCATTCATTCAGGAACCGCCTTCTGGTGATTTGCAAGAACGCGTACTTATTCGCCACCATGAT 33 | TATGACCAGTGTTTCCAGTCCGTTCAGTTGTTGCAGTGGAATAGTCAGGTTAAATTTAATGTGACCGTTT 34 | ATCGCAATCTGCCGACCACTCGCGATTCAATCATGACTTCGTGATAAAAGATTGAGTGTGAGGTTATAAC 35 | GCCGAAGCGGTAAAAATTTTAATTTTTGCCGCTGAGGGGTTGACCAAGCGAAGCGCGGTAGGTTTTCTGC 36 | TTAGGAGTTTAATCATGTTTCAGACTTTTATTTCTCGCCATAATTCAAACTTTTTTTCTGATAAGCTGGT 37 | TCTCACTTCTGTTACTCCAGCTTCTTCGGCACCTGTTTTACAGACACCTAAAGCTACATCGTCAACGTTA 38 | TATTTTGATAGTTTGACGGTTAATGCTGGTAATGGTGGTTTTCTTCATTGCATTCAGATGGATACATCTG 39 | TCAACGCCGCTAATCAGGTTGTTTCTGTTGGTGCTGATATTGCTTTTGATGCCGACCCTAAATTTTTTGC 40 | CTGTTTGGTTCGCTTTGAGTCTTCTTCGGTTCCGACTACCCTCCCGACTGCCTATGATGTTTATCCTTTG 41 | AATGGTCGCCATGATGGTGGTTATTATACCGTCAAGGACTGTGTGACTATTGACGTCCTTCCCCGTACGC 42 | CGGGCAATAACGTTTATGTTGGTTTCATGGTTTGGTCTAACTTTACCGCTACTAAATGCCGCGGATTGGT 43 | TTCGCTGAATCAGGTTATTAAAGAGATTATTTGTCTCCAGCCACTTAAGTGAGGTGATTTATGTTTGGTG 44 | CTATTGCTGGCGGTATTGCTTCTGCTCTTGCTGGTGGCGCCATGTCTAAATTGTTTGGAGGCGGTCAAAA 45 | AGCCGCCTCCGGTGGCATTCAAGGTGATGTGCTTGCTACCGATAACAATACTGTAGGCATGGGTGATGCT 46 | GGTATTAAATCTGCCATTCAAGGCTCTAATGTTCCTAACCCTGATGAGGCCGCCCCTAGTTTTGTTTCTG 47 | GTGCTATGGCTAAAGCTGGTAAAGGACTTCTTGAAGGTACGTTGCAGGCTGGCACTTCTGCCGTTTCTGA 48 | TAAGTTGCTTGATTTGGTTGGACTTGGTGGCAAGTCTGCCGCTGATAAAGGAAAGGATACTCGTGATTAT 49 | CTTGCTGCTGCATTTCCTGAGCTTAATGCTTGGGAGCGTGCTGGTGCTGATGCTTCCTCTGCTGGTATGG 50 | TTGACGCCGGATTTGAGAATCAAAAAGAGCTTACTAAAATGCAACTGGACAATCAGAAAGAGATTGCCGA 51 | GATGCAAAATGAGACTCAAAAAGAGATTGCTGGCATTCAGTCGGCGACTTCACGCCAGAATACGAAAGAC 52 | CAGGTATATGCACAAAATGAGATGCTTGCTTATCAACAGAAGGAGTCTACTGCTCGCGTTGCGTCTATTA 53 | TGGAAAACACCAATCTTTCCAAGCAACAGCAGGTTTCCGAGATTATGCGCCAAATGCTTACTCAAGCTCA 54 | AACGGCTGGTCAGTATTTTACCAATGACCAAATCAAAGAAATGACTCGCAAGGTTAGTGCTGAGGTTGAC 55 | TTAGTTCATCAGCAAACGCAGAATCAGCGGTATGGCTCTTCTCATATTGGCGCTACTGCAAAGGATATTT 56 | CTAATGTCGTCACTGATGCTGCTTCTGGTGTGGTTGATATTTTTCATGGTATTGATAAAGCTGTTGCCGA 57 | TACTTGGAACAATTTCTGGAAAGACGGTAAAGCTGATGGTATTGGCTCTAATTTGTCTAGGAAATAACCG 58 | TCAGGATTGACACCCTCCCAATTGTATGTTTTCATGCCTCCAAATCTTGGAGGCTTTTTTATGGTTCGTT 59 | CTTATTACCCTTCTGAATGTCACGCTGATTATTTTGACTTTGAGCGTATCGAGGCTCTTAAACCTGCTAT 60 | TGAGGCTTGTGGCATTTCTACTCTTTCTCAATCCCCAATGCTTGGCTTCCATAAGCAGATGGATAACCGC 61 | ATCAAGCTCTTGGAAGAGATTCTGTCTTTTCGTATGCAGGGCGTTGAGTTCGATAATGGTGATATGTATG 62 | TTGACGGCCATAAGGCTGCTTCTGACGTTCGTGATGAGTTTGTATCTGTTACTGAGAAGTTAATGGATGA 63 | ATTGGCACAATGCTACAATGTGCTCCCCCAACTTGATATTAATAACACTATAGACCACCGCCCCGAAGGG 64 | GACGAAAAATGGTTTTTAGAGAACGAGAAGACGGTTACGCAGTTTTGCCGCAAGCTGGCTGCTGAACGCC 65 | CTCTTAAGGATATTCGCGATGAGTATAATTACCCCAAAAAGAAAGGTATTAAGGATGAGTGTTCAAGATT 66 | GCTGGAGGCCTCCACTATGAAATCGCGTAGAGGCTTTGCTATTCAGCGTTTGATGAATGCAATGCGACAG 67 | GCTCATGCTGATGGTTGGTTTATCGTTTTTGACACTCTCACGTTGGCTGACGACCGATTAGAGGCGTTTT 68 | ATGATAATCCCAATGCTTTGCGTGACTATTTTCGTGATATTGGTCGTATGGTTCTTGCTGCCGAGGGTCG 69 | CAAGGCTAATGATTCACACGCCGACTGCTATCAGTATTTTTGTGTGCCTGAGTATGGTACAGCTAATGGC 70 | CGTCTTCATTTCCATGCGGTGCACTTTATGCGGACACTTCCTACAGGTAGCGTTGACCCTAATTTTGGTC 71 | GTCGGGTACGCAATCGCCGCCAGTTAAATAGCTTGCAAAATACGTGGCCTTATGGTTACAGTATGCCCAT 72 | CGCAGTTCGCTACACGCAGGACGCTTTTTCACGTTCTGGTTGGTTGTGGCCTGTTGATGCTAAAGGTGAG 73 | CCGCTTAAAGCTACCAGTTATATGGCTGTTGGTTTCTATGTGGCTAAATACGTTAACAAAAAGTCAGATA 74 | TGGACCTTGCTGCTAAAGGTCTAGGAGCTAAAGAATGGAACAACTCACTAAAAACCAAGCTGTCGCTACT 75 | TCCCAAGAAGCTGTTCAGAATCAGAATGAGCCGCAACTTCGGGATGAAAATGCTCACAATGACAAATCTG 76 | TCCACGGAGTGCTTAATCCAACTTACCAAGCTGGGTTACGACGCGACGCCGTTCAACCAGATATTGAAGC 77 | AGAACGCAAAAAGAGAGATGAGATTGAGGCTGGGAAAAGTTACTGTAGCCGACGTTTTGGCGGCGCAACC 78 | TGTGACGACAAATCTGCTCAAATTTATGCGCGCTTCGATAAAAATGATTGGCGTATCCAACCTGCA 79 | 80 | -------------------------------------------------------------------------------- /test_data/manifest.csv: -------------------------------------------------------------------------------- 1 | sample,R1,R2 2 | SRR2057021,test_data/fastq/SRR2057021_R1.fastq.gz,test_data/fastq/SRR2057021_R2.fastq.gz 3 | SRR4051738,test_data/fastq/SRR4051738_R1.fastq.gz,test_data/fastq/SRR4051738_R2.fastq.gz 4 | -------------------------------------------------------------------------------- /test_data/output/alignments/SRR2057021/aligned.bam: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FredHutch/workflow-template-nextflow/25f3253bf650486a6d2418663657f0bdd24080ff/test_data/output/alignments/SRR2057021/aligned.bam -------------------------------------------------------------------------------- /test_data/output/alignments/SRR4051738/aligned.bam: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FredHutch/workflow-template-nextflow/25f3253bf650486a6d2418663657f0bdd24080ff/test_data/output/alignments/SRR4051738/aligned.bam -------------------------------------------------------------------------------- /test_data/test.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Run the workflow on the test data, and write the output to output/ 4 | nextflow \ 5 | run \ 6 | -profile docker \ 7 | ../main.nf \ 8 | --fastq_folder fastq \ 9 | --genome_fasta genome_fasta/NC_001422.1.fasta \ 10 | --output_folder output \ 11 | -with-report \ 12 | -resume 13 | --------------------------------------------------------------------------------