├── .gitattributes
├── .gitignore
├── docs
    ├── images
    │   ├── preseq_plot.png
    │   ├── saturation.png
    │   ├── cutadapt_plot.png
    │   ├── dupRadar_plot.png
    │   ├── infer_experiment.png
    │   ├── nfcore-rnaseq_logo.ai
    │   ├── read_duplication.png
    │   ├── junction_saturation.png
    │   ├── nf-core-rnaseq_logo.png
    │   ├── star_alignment_plot.png
    │   ├── inner_distance_concept.png
    │   ├── mqc_hcplot_hocmzpdjsq.png
    │   ├── mqc_hcplot_ltqchiyxfz.png
    │   ├── mqc_hcplot_wtnqrdhkuc.png
    │   ├── rseqc_read_dups_plot.png
    │   ├── preseq_complexity_curve.png
    │   ├── featureCounts_biotype_plot.png
    │   ├── rseqc_infer_experiment_plot.png
    │   ├── rseqc_inner_distance_plot.png
    │   ├── featureCounts_assignment_plot.png
    │   ├── rseqc_read_distribution_plot.png
    │   ├── rseqc_junction_saturation_plot.png
    │   └── rseqc_junction_annotation_junctions_plot.png
    ├── README.md
    ├── output.md
    └── usage.md
├── assets
    ├── nf-core-rnaseq_logo.png
    ├── biotypes_header.txt
    ├── heatmap_header.txt
    ├── mdsplot_header.txt
    ├── rrna-db-defaults.txt
    ├── multiqc_config.yaml
    ├── sendmail_template.txt
    ├── where_are_my_files.txt
    ├── email_template.txt
    └── email_template.html
├── .github
    ├── markdownlint.yml
    ├── workflows
    │   ├── branch.yml
    │   ├── ci.yml
    │   └── linting.yml
    ├── ISSUE_TEMPLATE
    │   ├── feature_request.md
    │   └── bug_report.md
    ├── PULL_REQUEST_TEMPLATE.md
    └── CONTRIBUTING.md
├── Dockerfile
├── conf
    ├── awsbatch.config
    ├── test.config
    ├── test_gz.config
    ├── base.config
    └── igenomes.config
├── LICENSE
├── environment.yml
├── bin
    ├── markdown_to_html.r
    ├── se.r
    ├── filter_gtf_for_genes_in_genome.py
    ├── mqc_features_stat.py
    ├── tximport.r
    ├── parse_gtf.py
    ├── edgeR_heatmap_MDS.r
    ├── scrape_software_versions.py
    ├── gtf2bed
    └── dupRadar.r
├── .travis.yml
├── CODE_OF_CONDUCT.md
├── README.md
├── nextflow.config
├── CHANGELOG.md
└── parameters.settings.json


/.gitattributes:
--------------------------------------------------------------------------------
1 | *.config linguist-language=nextflow
2 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | .nextflow*
2 | work/
3 | data/
4 | results/
5 | .DS_Store
6 | tests/test_data
7 | *.pyc
8 | 


--------------------------------------------------------------------------------
/docs/images/preseq_plot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/viklund/rnaseq/master/docs/images/preseq_plot.png


--------------------------------------------------------------------------------
/docs/images/saturation.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/viklund/rnaseq/master/docs/images/saturation.png


--------------------------------------------------------------------------------
/assets/nf-core-rnaseq_logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/viklund/rnaseq/master/assets/nf-core-rnaseq_logo.png


--------------------------------------------------------------------------------
/docs/images/cutadapt_plot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/viklund/rnaseq/master/docs/images/cutadapt_plot.png


--------------------------------------------------------------------------------
/docs/images/dupRadar_plot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/viklund/rnaseq/master/docs/images/dupRadar_plot.png


--------------------------------------------------------------------------------
/docs/images/infer_experiment.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/viklund/rnaseq/master/docs/images/infer_experiment.png


--------------------------------------------------------------------------------
/docs/images/nfcore-rnaseq_logo.ai:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/viklund/rnaseq/master/docs/images/nfcore-rnaseq_logo.ai


--------------------------------------------------------------------------------
/docs/images/read_duplication.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/viklund/rnaseq/master/docs/images/read_duplication.png


--------------------------------------------------------------------------------
/docs/images/junction_saturation.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/viklund/rnaseq/master/docs/images/junction_saturation.png


--------------------------------------------------------------------------------
/docs/images/nf-core-rnaseq_logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/viklund/rnaseq/master/docs/images/nf-core-rnaseq_logo.png


--------------------------------------------------------------------------------
/docs/images/star_alignment_plot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/viklund/rnaseq/master/docs/images/star_alignment_plot.png


--------------------------------------------------------------------------------
/docs/images/inner_distance_concept.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/viklund/rnaseq/master/docs/images/inner_distance_concept.png


--------------------------------------------------------------------------------
/docs/images/mqc_hcplot_hocmzpdjsq.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/viklund/rnaseq/master/docs/images/mqc_hcplot_hocmzpdjsq.png


--------------------------------------------------------------------------------
/docs/images/mqc_hcplot_ltqchiyxfz.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/viklund/rnaseq/master/docs/images/mqc_hcplot_ltqchiyxfz.png


--------------------------------------------------------------------------------
/docs/images/mqc_hcplot_wtnqrdhkuc.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/viklund/rnaseq/master/docs/images/mqc_hcplot_wtnqrdhkuc.png


--------------------------------------------------------------------------------
/docs/images/rseqc_read_dups_plot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/viklund/rnaseq/master/docs/images/rseqc_read_dups_plot.png


--------------------------------------------------------------------------------
/docs/images/preseq_complexity_curve.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/viklund/rnaseq/master/docs/images/preseq_complexity_curve.png


--------------------------------------------------------------------------------
/docs/images/featureCounts_biotype_plot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/viklund/rnaseq/master/docs/images/featureCounts_biotype_plot.png


--------------------------------------------------------------------------------
/docs/images/rseqc_infer_experiment_plot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/viklund/rnaseq/master/docs/images/rseqc_infer_experiment_plot.png


--------------------------------------------------------------------------------
/docs/images/rseqc_inner_distance_plot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/viklund/rnaseq/master/docs/images/rseqc_inner_distance_plot.png


--------------------------------------------------------------------------------
/docs/images/featureCounts_assignment_plot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/viklund/rnaseq/master/docs/images/featureCounts_assignment_plot.png


--------------------------------------------------------------------------------
/docs/images/rseqc_read_distribution_plot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/viklund/rnaseq/master/docs/images/rseqc_read_distribution_plot.png


--------------------------------------------------------------------------------
/docs/images/rseqc_junction_saturation_plot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/viklund/rnaseq/master/docs/images/rseqc_junction_saturation_plot.png


--------------------------------------------------------------------------------
/docs/images/rseqc_junction_annotation_junctions_plot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/viklund/rnaseq/master/docs/images/rseqc_junction_annotation_junctions_plot.png


--------------------------------------------------------------------------------
/.github/markdownlint.yml:
--------------------------------------------------------------------------------
 1 | # Markdownlint configuration file
 2 | default: true,
 3 | line-length: false
 4 | no-multiple-blanks: 0
 5 | blanks-around-headers: false
 6 | blanks-around-lists: false
 7 | header-increment: false
 8 | no-duplicate-header:
 9 |     siblings_only: true
10 | 


--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------
1 | FROM nfcore/base:1.7
2 | LABEL authors="phil.ewels@scilifelab.se" \
3 |       description="Docker image containing all requirements for the nfcore/rnaseq pipeline"
4 | 
5 | COPY environment.yml /
6 | RUN conda env create -f /environment.yml && conda clean -a
7 | ENV PATH /opt/conda/envs/nf-core-rnaseq-1.4.2/bin:$PATH
8 | 


--------------------------------------------------------------------------------
/assets/biotypes_header.txt:
--------------------------------------------------------------------------------
 1 | # id: 'biotype-counts'
 2 | # section_name: 'Biotype Counts'
 3 | # description: "shows reads overlapping genomic features of different biotypes,
 4 | #     counted by <a href='http://bioinf.wehi.edu.au/featureCounts'>featureCounts</a>."
 5 | # plot_type: 'bargraph'
 6 | # anchor: 'featurecounts_biotype'
 7 | # pconfig:
 8 | #     id: "featureCounts_biotype_plot"
 9 | #     title: "featureCounts: Biotypes"
10 | #     xlab: "# Reads"
11 | #     cpswitch_counts_label: "Number of Reads"
12 | 


--------------------------------------------------------------------------------
/assets/heatmap_header.txt:
--------------------------------------------------------------------------------
 1 | # id: 'sample-similarity'
 2 | # section_name: 'edgeR: Sample Similarity'
 3 | # description: "is generated from normalised gene counts through
 4 | #        <a href='https://bioconductor.org/packages/release/bioc/html/edgeR.html' target='_blank'>edgeR</a>.
 5 | #        Pearson's correlation between log<sub>2</sub> normalised CPM values are then calculated and clustered."
 6 | # plot_type: 'heatmap'
 7 | # anchor: 'ngi_rnaseq-sample_similarity'
 8 | # pconfig:
 9 | #     title: "edgeR: Pearson's correlation"
10 | #     xlab: True
11 | #     reverseColors: True
12 | 


--------------------------------------------------------------------------------
/assets/mdsplot_header.txt:
--------------------------------------------------------------------------------
 1 | # id: 'edgeR-sample-distances'
 2 | # section_name: 'MDS Plot'
 3 | # description: "show relatedness between samples in a project.
 4 | #             These values are calculated using <a href='https://bioconductor.org/packages/release/bioc/html/edgeR.html'>edgeR</a>
 5 | #             in the <a href='https://github.com/nf-core/rnaseq/blob/master/bin/edgeR_heatmap_MDS.r'><code>edgeR_heatmap_MDS.r</code></a> script."
 6 | # plot_type: 'scatter'
 7 | # anchor: 'ngi_rnaseq-mds_plot'
 8 | # pconfig:
 9 | #     xlab: 'Leading'
10 | #     title: 'MDS Plot'
11 | #     ylab: 'logFC'
12 | 


--------------------------------------------------------------------------------
/docs/README.md:
--------------------------------------------------------------------------------
 1 | # nf-core/rnaseq: Documentation
 2 | 
 3 | The nf-core/rnaseq documentation is split into the following files:
 4 | 
 5 | 1. [Installation](https://nf-co.re/usage/installation)
 6 | 2. Pipeline configuration
 7 |     * [Local installation](https://nf-co.re/usage/local_installation)
 8 |     * [Adding your own system config](https://nf-co.re/usage/adding_own_config)
 9 |     * [Reference genomes](https://nf-co.re/usage/reference_genomes)
10 | 3. [Running the pipeline](usage.md)
11 | 4. [Output and how to interpret the results](output.md)
12 | 5. [Troubleshooting](https://nf-co.re/usage/troubleshooting)
13 | 


--------------------------------------------------------------------------------
/.github/workflows/branch.yml:
--------------------------------------------------------------------------------
 1 | name: nf-core/rnaseq branch protection
 2 | # This workflow is triggered on PRs to master branch on the repository
 3 | on:
 4 |   pull_request:
 5 |     branches:
 6 |     - master
 7 | 
 8 | jobs:
 9 |   test:
10 |     runs-on: ubuntu-latest
11 |     steps:
12 |       # PRs are only ok if coming from an nf-core dev branch
13 |       - uses: actions/checkout@v1
14 |       - name: Check PRs
15 |         run: |
16 |           [[ $(git remote get-url origin) == *nf-core/rnaseq ]] && [[ ${GITHUB_BASE_REF} = "master" ]] && { [[ ${GITHUB_HEAD_REF} = "dev" ]] || [[ ${GITHUB_BASE_REF} = "patch" ]]; }
17 | 


--------------------------------------------------------------------------------
/conf/awsbatch.config:
--------------------------------------------------------------------------------
 1 | /*
 2 |  * -------------------------------------------------
 3 |  *  Nextflow config file for running on AWS batch
 4 |  * -------------------------------------------------
 5 |  * Base config needed for running with -profile awsbatch
 6 |  */
 7 | params {
 8 |   config_profile_name = 'AWSBATCH'
 9 |   config_profile_description = 'AWSBATCH Cloud Profile'
10 |   config_profile_contact = 'Alexander Peltzer (@apeltzer)'
11 |   config_profile_url = 'https://aws.amazon.com/de/batch/'
12 | }
13 | 
14 | aws.region = params.awsregion
15 | process.executor = 'awsbatch'
16 | process.queue = params.awsqueue
17 | executor.awscli = '/home/ec2-user/miniconda/bin/aws'
18 | params.tracedir = './'
19 | 


--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE/feature_request.md:
--------------------------------------------------------------------------------
 1 | Hi there!
 2 | 
 3 | Thanks for suggesting a new feature for the pipeline! Please delete this text and anything that's not relevant from the template below:
 4 | 
 5 | #### Is your feature request related to a problem? Please describe.
 6 | A clear and concise description of what the problem is.
 7 | Ex. I'm always frustrated when [...]
 8 | 
 9 | #### Describe the solution you'd like
10 | A clear and concise description of what you want to happen.
11 | 
12 | #### Describe alternatives you've considered
13 | A clear and concise description of any alternative solutions or features you've considered.
14 | 
15 | #### Additional context
16 | Add any other context about the feature request here.
17 | 


--------------------------------------------------------------------------------
/assets/rrna-db-defaults.txt:
--------------------------------------------------------------------------------
1 | https://raw.githubusercontent.com/biocore/sortmerna/master/rRNA_databases/rfam-5.8s-database-id98.fasta
2 | https://raw.githubusercontent.com/biocore/sortmerna/master/rRNA_databases/rfam-5s-database-id98.fasta
3 | https://raw.githubusercontent.com/biocore/sortmerna/master/rRNA_databases/silva-arc-16s-id95.fasta
4 | https://raw.githubusercontent.com/biocore/sortmerna/master/rRNA_databases/silva-arc-23s-id98.fasta
5 | https://raw.githubusercontent.com/biocore/sortmerna/master/rRNA_databases/silva-bac-16s-id90.fasta
6 | https://raw.githubusercontent.com/biocore/sortmerna/master/rRNA_databases/silva-bac-23s-id98.fasta
7 | https://raw.githubusercontent.com/biocore/sortmerna/master/rRNA_databases/silva-euk-18s-id95.fasta
8 | https://raw.githubusercontent.com/biocore/sortmerna/master/rRNA_databases/silva-euk-28s-id98.fasta


--------------------------------------------------------------------------------
/assets/multiqc_config.yaml:
--------------------------------------------------------------------------------
 1 | extra_fn_clean_exts:
 2 |     - '_R1'
 3 |     - '_R2'
 4 |     - '.hisat'
 5 |     - '_subsamp'
 6 |     - '.sorted'
 7 | 
 8 | report_comment: >
 9 |     This report has been generated by the <a href="https://github.com/nf-core/rnaseq" target="_blank">nf-core/rnaseq</a>
10 |     analysis pipeline. For information about how to interpret these results, please see the
11 |     <a href="https://github.com/nf-core/rnaseq/blob/master/docs/output.md" target="_blank">documentation</a>.
12 | 
13 | top_modules:
14 |     - 'edgeR-sample-distances'
15 |     - 'sample-similarity'
16 |     - 'DupRadar'
17 |     - 'biotype-counts'
18 | 
19 | report_section_order:
20 |     software_versions:
21 |         order: -1000
22 |     nf-core-rnaseq-summary:
23 |         order: -1100
24 | 
25 | table_columns_visible:
26 |     FastQC:
27 |         percent_duplicates: False
28 | 
29 | export_plots: true
30 | 


--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE/bug_report.md:
--------------------------------------------------------------------------------
 1 | Hi there!
 2 | 
 3 | Thanks for telling us about a problem with the pipeline. Please delete this text and anything that's not relevant from the template below:
 4 | 
 5 | #### Describe the bug
 6 | A clear and concise description of what the bug is.
 7 | 
 8 | #### Steps to reproduce
 9 | Steps to reproduce the behaviour:
10 | 1. Command line: `nextflow run ...`
11 | 2. See error: _Please provide your error message_
12 | 
13 | #### Expected behaviour
14 | A clear and concise description of what you expected to happen.
15 | 
16 | #### System:
17 |  - Hardware: [e.g. HPC, Desktop, Cloud...]
18 |  - Executor: [e.g. slurm, local, awsbatch...]
19 |  - OS: [e.g. CentOS Linux, macOS, Linux Mint...]
20 |  - Version [e.g. 7, 10.13.6, 18.3...]
21 | 
22 | #### Nextflow Installation:
23 |  - Version: [e.g. 0.31.0]
24 | 
25 | #### Container engine:
26 |  - Engine: [e.g. Conda, Docker or Singularity]
27 |  - version: [e.g. 1.0.0]
28 |  - Image tag: [e.g. nfcore/rnaseq:1.0.0]
29 | 
30 | #### Additional context
31 | Add any other context about the problem here.
32 | 


--------------------------------------------------------------------------------
/.github/PULL_REQUEST_TEMPLATE.md:
--------------------------------------------------------------------------------
 1 | Many thanks to contributing to nf-core/rnaseq!
 2 | 
 3 | To ensure that your build passes, please make sure your pull request is to the `dev` branch rather than to `master`. Thank you!
 4 | 
 5 | Please fill in the appropriate checklist below (delete whatever is not relevant). These are the most common things requested on pull requests (PRs).
 6 | 
 7 | ## PR checklist
 8 |  - [ ] PR is to `dev` rather than `master`
 9 |  - [ ] This comment contains a description of changes (with reason)
10 |  - [ ] If you've fixed a bug or added code that should be tested, add tests!
11 |  - [ ] If necessary, also make a PR on the [nf-core/rnaseq branch on the nf-core/test-datasets repo]( https://github.com/nf-core/test-datasets/pull/new/nf-core/rnaseq)
12 |  - [ ] Ensure the test suite passes (`nextflow run . -profile test,docker`).
13 |  - [ ] Make sure your code lints (`nf-core lint .`).
14 |  - [ ] Documentation in `docs` is updated
15 |  - [ ] `CHANGELOG.md` is updated
16 |  - [ ] `README.md` is updated
17 | 
18 | **Learn more about contributing:** https://github.com/nf-core/rnaseq/tree/master/.github/CONTRIBUTING.md
19 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) Phil Ewels, Rickard Hammarén
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/environment.yml:
--------------------------------------------------------------------------------
 1 | # You can use this file to create a conda environment for this pipeline:
 2 | #   conda env create -f environment.yml
 3 | name: nf-core-rnaseq-1.4.2
 4 | channels:
 5 |   - conda-forge
 6 |   - bioconda
 7 |   - defaults
 8 | dependencies:
 9 |   ## conda-forge packages, sorting now alphabetically, without the channel prefix!
10 |   - matplotlib=3.0.3        # Current 3.1.0 build incompatible with multiqc=1.7
11 |   - r-base=3.6.1
12 |   - conda-forge::r-data.table=1.12.4
13 |   - conda-forge::r-gplots=3.0.1.1
14 |   - conda-forge::r-markdown=1.1
15 | 
16 |   ## bioconda packages, see above
17 |   - bioconductor-dupradar=1.14.0
18 |   - bioconductor-edger=3.26.5
19 |   - bioconductor-tximeta=1.2.2
20 |   - bioconductor-summarizedexperiment=1.14.0
21 |   - deeptools=3.3.1
22 |   - fastqc=0.11.8
23 |   - gffread=0.11.4
24 |   - hisat2=2.1.0
25 |   - multiqc=1.7
26 |   - picard=2.21.1
27 |   - preseq=2.0.3
28 |   - qualimap=2.2.2c
29 |   - rseqc=3.0.1
30 |   - salmon=0.14.2
31 |   - samtools=1.9
32 |   - sortmerna=2.1b # for metatranscriptomics
33 |   - star=2.6.1d             # Don't upgrade me - 2.7X indices incompatible with iGenomes.
34 |   - stringtie=2.0
35 |   - subread=1.6.4
36 |   - trim-galore=0.6.4
37 | 


--------------------------------------------------------------------------------
/.github/workflows/ci.yml:
--------------------------------------------------------------------------------
 1 | name: nf-core/rnaseq CI
 2 | # This workflow is triggered on pushes and PRs to the repository.
 3 | on: [push, pull_request]
 4 | 
 5 | jobs:
 6 |   test:
 7 |     runs-on: ubuntu-latest
 8 |     strategy:
 9 |       matrix:
10 |         nxf_ver: ['19.04.0', '']
11 |         aligner: ["--aligner 'hisat2'", "--aligner 'star'", "--pseudo_aligner 'salmon'"]
12 |         options: ['--skipQC', '--remove_rRNA', '--saveUnaligned', '--skipTrimming', '--star_index false']
13 |     steps:
14 |       - uses: actions/checkout@v1
15 |       - name: Install Nextflow
16 |         run: |
17 |           export NXF_VER=${{ matrix.nxf_ver }}
18 |           wget -qO- get.nextflow.io | bash
19 |           sudo mv nextflow /usr/local/bin/
20 |       - name: Download image
21 |         run: |
22 |           docker pull nfcore/rnaseq:dev && docker tag nfcore/rnaseq:dev nfcore/rnaseq:1.4.2
23 |       - name: Basic workflow tests
24 |         run: |
25 |           nextflow run ${GITHUB_WORKSPACE} -profile test,docker ${{ matrix.aligner }} ${{ matrix.options }}
26 |       - name: Basic workflow, gzipped input
27 |         run: |
28 |           nextflow run ${GITHUB_WORKSPACE} -profile test_gz,docker ${{ matrix.aligner }} ${{ matrix.options }}
29 | 


--------------------------------------------------------------------------------
/assets/sendmail_template.txt:
--------------------------------------------------------------------------------
 1 | To: $email
 2 | Subject: $subject
 3 | Mime-Version: 1.0
 4 | Content-Type: multipart/related;boundary="nfcoremimeboundary"
 5 | 
 6 | --nfcoremimeboundary
 7 | Content-Type: text/html; charset=utf-8
 8 | 
 9 | $email_html
10 | 
11 | --nfcoremimeboundary
12 | Content-Type: image/png;name="nf-core-rnaseq_logo.png"
13 | Content-Transfer-Encoding: base64
14 | Content-ID: <nfcorepipelinelogo>
15 | Content-Disposition: inline; filename="nf-core-rnaseq_logo.png"
16 | 
17 | <% out << new File("$baseDir/assets/nf-core-rnaseq_logo.png").
18 |   bytes.
19 |   encodeBase64().
20 |   toString().
21 |   tokenize( '\n' )*.
22 |   toList()*.
23 |   collate( 76 )*.
24 |   collect { it.join() }.
25 |   flatten().
26 |   join( '\n' ) %>
27 | 
28 | <%
29 | if (mqcFile){
30 | def mqcFileObj = new File("$mqcFile")
31 | if (mqcFileObj.length() < mqcMaxSize){
32 | out << """
33 | --nfcoremimeboundary
34 | Content-Type: text/html; name=\"multiqc_report\"
35 | Content-Transfer-Encoding: base64
36 | Content-ID: <mqcreport>
37 | Content-Disposition: attachment; filename=\"${mqcFileObj.getName()}\"
38 | 
39 | ${mqcFileObj.
40 |   bytes.
41 |   encodeBase64().
42 |   toString().
43 |   tokenize( '\n' )*.
44 |   toList()*.
45 |   collate( 76 )*.
46 |   collect { it.join() }.
47 |   flatten().
48 |   join( '\n' )}
49 | """
50 | }}
51 | %>
52 | 
53 | --nfcoremimeboundary--
54 | 


--------------------------------------------------------------------------------
/bin/markdown_to_html.r:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env Rscript
 2 | 
 3 | # Command line argument processing
 4 | args = commandArgs(trailingOnly=TRUE)
 5 | if (length(args) < 2) {
 6 |   stop("Usage: markdown_to_html.r <input.md> <output.html>", call.=FALSE)
 7 | }
 8 | markdown_fn <- args[1]
 9 | output_fn <- args[2]
10 | 
11 | # Load / install packages
12 | if (!require("markdown")) {
13 |   install.packages("markdown", dependencies=TRUE, repos='http://cloud.r-project.org/')
14 |   library("markdown")
15 | }
16 | 
17 | base_css_fn <- getOption("markdown.HTML.stylesheet")
18 | base_css <- readChar(base_css_fn, file.info(base_css_fn)$size)
19 | custom_css <-  paste(base_css, "
20 | body {
21 |   padding: 3em;
22 |   margin-right: 350px;
23 |   max-width: 100%;
24 | }
25 | #toc {
26 |   position: fixed;
27 |   right: 20px;
28 |   width: 300px;
29 |   padding-top: 20px;
30 |   overflow: scroll;
31 |   height: calc(100% - 3em - 20px);
32 | }
33 | #toc_header {
34 |   font-size: 1.8em;
35 |   font-weight: bold;
36 | }
37 | #toc > ul {
38 |   padding-left: 0;
39 |   list-style-type: none;
40 | }
41 | #toc > ul ul { padding-left: 20px; }
42 | #toc > ul > li > a { display: none; }
43 | img { max-width: 800px; }
44 | ")
45 | 
46 | markdownToHTML(
47 |   file = markdown_fn,
48 |   output = output_fn,
49 |   stylesheet = custom_css,
50 |   options = c('toc', 'base64_images', 'highlight_code')
51 | )
52 | 


--------------------------------------------------------------------------------
/conf/test.config:
--------------------------------------------------------------------------------
 1 | /*
 2 |  * -------------------------------------------------
 3 |  *  Nextflow config file for running tests
 4 |  * -------------------------------------------------
 5 |  * Defines bundled input files and everything required
 6 |  * to run a fast and simple test. Use as follows:
 7 |  *   nextflow run nf-core/rnaseq -profile test
 8 |  */
 9 | 
10 | params {
11 |   config_profile_name = 'Test profile'
12 |   config_profile_description = 'Minimal test dataset to check pipeline function'
13 |   // Limit resources so that this can run CI
14 |   max_cpus = 2
15 |   max_memory = 6.GB
16 |   max_time = 48.h
17 |   
18 |   // Input data
19 |   singleEnd = true
20 |   readPaths = [
21 |     ['SRR4238351', ['https://github.com/nf-core/test-datasets/raw/rnaseq/testdata/SRR4238351_subsamp.fastq.gz']],
22 |     ['SRR4238355', ['https://github.com/nf-core/test-datasets/raw/rnaseq/testdata/SRR4238355_subsamp.fastq.gz']],
23 |     ['SRR4238359', ['https://github.com/nf-core/test-datasets/raw/rnaseq/testdata/SRR4238359_subsamp.fastq.gz']],
24 |     ['SRR4238379', ['https://github.com/nf-core/test-datasets/raw/rnaseq/testdata/SRR4238379_subsamp.fastq.gz']],
25 |   ]
26 |   // Genome references
27 |   fasta = 'https://github.com/nf-core/test-datasets/raw/rnaseq/reference/genome.fa'
28 |   gtf = 'https://github.com/nf-core/test-datasets/raw/rnaseq/reference/genes.gtf'
29 |   gff = 'https://github.com/nf-core/test-datasets/raw/rnaseq/reference/genes.gff'
30 |   transcript_fasta = 'https://github.com/nf-core/test-datasets/raw/rnaseq/reference/transcriptome.fasta'
31 | }
32 | 


--------------------------------------------------------------------------------
/.github/workflows/linting.yml:
--------------------------------------------------------------------------------
 1 | name: nf-core/rnaseq linting
 2 | # This workflow is triggered on pushes and PRs to the repository.
 3 | on: [push, pull_request]  
 4 | 
 5 | jobs:
 6 |   Markdown:
 7 |     runs-on: ubuntu-latest
 8 |     steps:
 9 |       - uses: actions/checkout@v1
10 |       - uses: actions/setup-node@v1
11 |         with:
12 |           node-version: '10'
13 |       - name: Install markdownlint
14 |         run: |
15 |           npm install -g markdownlint-cli
16 |       - name: Run Markdownlint
17 |         run: |
18 |           markdownlint ${GITHUB_WORKSPACE} -c ${GITHUB_WORKSPACE}/.github/markdownlint.yml
19 |   YAML:
20 |     runs-on: ubuntu-latest
21 |     steps:
22 |       - uses: actions/checkout@v1
23 |       - uses: actions/setup-node@v1
24 |         with:
25 |           node-version: '10'
26 |       - name: Install yamllint
27 |         run: |
28 |           npm install -g yaml-lint
29 |       - name: Run yamllint
30 |         run: |
31 |           yamllint $(find ${GITHUB_WORKSPACE} -type f -name "*.yml")
32 |   nf-core:
33 |     runs-on: ubuntu-latest
34 |     steps:
35 |       - uses: actions/checkout@v1
36 |       - name: Install Nextflow
37 |         run: |
38 |           wget -qO- get.nextflow.io | bash
39 |           sudo mv nextflow /usr/local/bin/
40 |       - uses: actions/setup-python@v1
41 |         with:
42 |           python-version: '3.6'
43 |           architecture: 'x64'
44 |       - name: Install pip
45 |         run: |
46 |           sudo apt install python3-pip
47 |           pip install --upgrade pip
48 |       - name: Install nf-core tools
49 |         run: |
50 |           pip install nf-core
51 |       - name: Run nf-core lint
52 |         run: |
53 |           nf-core lint ${GITHUB_WORKSPACE}


--------------------------------------------------------------------------------
/assets/where_are_my_files.txt:
--------------------------------------------------------------------------------
 1 | =====================
 2 |  Where are my files?
 3 | =====================
 4 | 
 5 | By default, the nf-core/rnaseq pipeline does not save large intermediate files to the
 6 | results directory. This is to try to conserve disk space.
 7 | 
 8 | These files can be found in the pipeline `work` directory if needed.
 9 | Alternatively, re-run the pipeline using `-resume` in addition to one of
10 | the below command-line options and they will be copied into the results directory:
11 | 
12 | `--saveAlignedIntermediates`
13 | The final BAM files created after the Picard MarkDuplicates step are always saved
14 | and can be found in the `markDuplicates/` folder.
15 | Specify this flag to also copy out BAM files from STAR / HISAT2 alignment and sorting steps.
16 | 
17 | `--saveTrimmed`
18 | Specify to save trimmed FastQ files to the results directory.
19 | 
20 | `--saveReference`
21 | Save any downloaded or generated reference genome files to your results folder.
22 | These can then be used for future pipeline runs, reducing processing times.
23 | 
24 | -----------------------------------
25 |  Setting defaults in a config file
26 | -----------------------------------
27 | If you would always like these files to be saved without having to specify this on
28 | the command line, you can save the following to your personal configuration file
29 | (eg. `~/.nextflow/config`):
30 | 
31 | params.saveReference = true
32 | params.saveTrimmed = true
33 | params.saveAlignedIntermediates = true
34 | 
35 | For more help, see the following documentation:
36 | 
37 | https://github.com/nf-core/rnaseq/blob/master/docs/usage.md
38 | https://www.nextflow.io/docs/latest/getstarted.html
39 | https://www.nextflow.io/docs/latest/config.html
40 | 


--------------------------------------------------------------------------------
/bin/se.r:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env Rscript
 2 | 
 3 | args = commandArgs(trailingOnly=TRUE)
 4 | if (length(args) < 2) {
 5 |   stop("Usage: tximeta.r <coldata> <salmon_out>", call.=FALSE)
 6 | }
 7 | 
 8 | coldata = args[1]
 9 | counts_fn = args[2]
10 | tpm_fn = args[3]
11 | 
12 | tx2gene = "tx2gene.csv"
13 | info = file.info(tx2gene)
14 | if (info$size == 0){
15 |   tx2gene = NULL
16 | }else{
17 |   rowdata = read.csv(tx2gene, header = FALSE)
18 |   colnames(rowdata) = c("tx", "gene_id", "gene_name")
19 |   tx2gene = rowdata[,1:2]
20 | }
21 | 
22 | counts = read.csv(counts_fn, row.names=1)
23 | tpm = read.csv(tpm_fn, row.names=1)
24 | 
25 | if (length(intersect(rownames(counts), rowdata[["tx"]])) > length(intersect(rownames(counts), rowdata[["gene_id"]]))){
26 |     by_what = "tx"
27 | } else {
28 |     by_what = "gene_id"
29 |     rowdata = unique(rowdata[,2:3])
30 | }
31 | 
32 | if (file.exists(coldata)){
33 |     coldata = read.csv(coldata)
34 |     coldata = coldata[match(colnames(counts), coldata[,1]),]
35 |     coldata = cbind(files = fns, coldata)
36 | }else{
37 |     message("ColData not avaliable ", coldata)
38 |     coldata = data.frame(files = colnames(counts), names = colnames(counts))
39 | }
40 | library(SummarizedExperiment)
41 | 
42 | rownames(coldata) = coldata[["names"]]
43 | extra = setdiff(rownames(counts),  as.character(rowdata[[by_what]]))
44 | if (length(extra) > 0){
45 |     rowdata = rbind(rowdata,
46 |                     data.frame(tx=extra,
47 |                                gene_id=extra,
48 |                                gene_name=extra))
49 | }
50 | 
51 | rowdata = rowdata[match(rownames(counts), as.character(rowdata[[by_what]])),]
52 | rownames(rowdata) = rowdata[[by_what]]
53 | se = SummarizedExperiment(assays = list(counts = counts,
54 |                                         abundance = tpm),
55 |                           colData = DataFrame(coldata),
56 |                           rowData = rowdata)
57 | 
58 | saveRDS(se, file = paste0(tools::file_path_sans_ext(counts_fn), ".rds"))
59 | 


--------------------------------------------------------------------------------
/.travis.yml:
--------------------------------------------------------------------------------
 1 | sudo: required
 2 | language: python
 3 | jdk: openjdk8
 4 | services: docker
 5 | python: '3.6'
 6 | cache: pip
 7 | matrix:
 8 |   fast_finish: true
 9 | 
10 | before_install:
11 |   # PRs to master are only ok if coming from dev branch
12 |   - '[ $TRAVIS_PULL_REQUEST = "false" ] || [ $TRAVIS_BRANCH != "master" ] || ([ $TRAVIS_PULL_REQUEST_SLUG = $TRAVIS_REPO_SLUG ] && ([ $TRAVIS_PULL_REQUEST_BRANCH = "dev" ] || [ $TRAVIS_PULL_REQUEST_BRANCH = "patch" ]))'
13 |   # Pull the docker image first so the test doesn't wait for this
14 |   - docker pull nfcore/rnaseq:dev
15 |   # Fake the tag locally so that the pipeline runs properly
16 |   # Looks weird when this is :dev to :dev, but makes sense when testing code for a release (:dev to :1.0.1)
17 |   - docker tag nfcore/rnaseq:dev nfcore/rnaseq:1.4.2
18 | 
19 | install:
20 |   # Install Nextflow
21 |   - mkdir /tmp/nextflow && cd /tmp/nextflow
22 |   - wget -qO- get.nextflow.io | bash
23 |   - sudo ln -s /tmp/nextflow/nextflow /usr/local/bin/nextflow
24 |   # Install nf-core/tools
25 |   - pip install --upgrade pip
26 |   - pip install nf-core
27 |   # Reset
28 |   - mkdir ${TRAVIS_BUILD_DIR}/tests && cd ${TRAVIS_BUILD_DIR}/tests
29 |   # Install markdownlint-cli
30 |   - sudo apt-get install npm && npm install -g markdownlint-cli
31 | 
32 | env:
33 |   - NXF_VER='19.04.0' # Specify a minimum NF version that should be tested and work
34 |   - NXF_VER='' # Plus: get the latest NF version and check that it works
35 | 
36 | script:
37 |   # Lint the pipeline code
38 |   - nf-core lint ${TRAVIS_BUILD_DIR}
39 |   # Lint the documentation
40 |   - markdownlint ${TRAVIS_BUILD_DIR} -c ${TRAVIS_BUILD_DIR}/.github/markdownlint.yml
41 |   # Run, build reference genome with STAR
42 |   - nextflow run ${TRAVIS_BUILD_DIR} -profile test,docker
43 |   # Run, build reference genome with HISAT2
44 |   - nextflow run ${TRAVIS_BUILD_DIR} -profile test,docker --aligner hisat2
45 |   # Mini Test for gzipped stuff with STAR
46 |   - nextflow run ${TRAVIS_BUILD_DIR} -profile test_gz,docker --fasta false --pseudo_aligner 'salmon' --skipAlignment
47 | 


--------------------------------------------------------------------------------
/conf/test_gz.config:
--------------------------------------------------------------------------------
 1 | /*
 2 |  * -------------------------------------------------
 3 |  *  Nextflow config file for running tests
 4 |  * -------------------------------------------------
 5 |  * Defines bundled input files and everything required
 6 |  * to run a fast and simple test. Use as follows:
 7 |  *   nextflow run nf-core/rnaseq -profile test_gz
 8 |  */
 9 | 
10 | params {
11 |   config_profile_name = 'Test profile - gzipped inputs'
12 |   config_profile_description = 'Minimal test dataset to check pipeline function with gzipped input files'
13 |   // Limit resources so that this can run on Travis
14 |   max_cpus = 2
15 |   max_memory = 6.GB
16 |   max_time = 48.h
17 |   // Input data
18 |   singleEnd = true
19 |   readPaths = [
20 |     ['SRR4238351', ['https://github.com/nf-core/test-datasets/raw/rnaseq/testdata/SRR4238351_subsamp.fastq.gz']],
21 |     ['SRR4238355', ['https://github.com/nf-core/test-datasets/raw/rnaseq/testdata/SRR4238355_subsamp.fastq.gz']],
22 |     ['SRR4238359', ['https://github.com/nf-core/test-datasets/raw/rnaseq/testdata/SRR4238359_subsamp.fastq.gz']],
23 |     ['SRR4238379', ['https://github.com/nf-core/test-datasets/raw/rnaseq/testdata/SRR4238379_subsamp.fastq.gz']],
24 |   ]
25 |   // Genome references
26 |   fasta = 'https://github.com/czbiohub/test-datasets/raw/olgabot/subset-chrom-I-gzip/reference/genome.fa.gz'
27 |   gtf = 'https://github.com/czbiohub/test-datasets/raw/olgabot/subset-chrom-I-gzip/reference/genes.gtf.gz'
28 |   gff = 'https://github.com/czbiohub/test-datasets/raw/olgabot/subset-chrom-I-gzip/reference/genes.gff.gz'
29 |   transcript_fasta = 'https://github.com/nf-core/test-datasets/raw/rnaseq/reference/transcriptome.fasta.gz'
30 |   hisat2_index = 'https://github.com/czbiohub/test-datasets/raw/olgabot/subset-chrom-I-gzip/reference/hisat2.tar.gz'
31 |   star_index = 'https://github.com/czbiohub/test-datasets/raw/olgabot/subset-chrom-I-gzip/reference/star.tar.gz'
32 |   salmon_index = 'https://github.com/czbiohub/test-datasets/raw/olgabot/subset-chrom-I-gzip/reference/salmon_index.tar.gz'
33 |   compressedReference = true
34 | }
35 | 


--------------------------------------------------------------------------------
/assets/email_template.txt:
--------------------------------------------------------------------------------
 1 | ----------------------------------------------------
 2 |                                         ,--./,-.
 3 |         ___     __   __   __   ___     /,-._.--~\\
 4 |   |\\ | |__  __ /  ` /  \\ |__) |__         }  {
 5 |   | \\| |       \\__, \\__/ |  \\ |___     \\`-._,-`-,
 6 |                                         `._,._,'
 7 |   nf-core/rnaseq v${version}
 8 | ----------------------------------------------------
 9 | 
10 | Run Name: $runName
11 | 
12 | <% if (success){
13 |     out << "## nf-core/rnaseq execution completed successfully! ##"
14 | } else {
15 |     out << """####################################################
16 | ## nf-core/rnaseq execution completed unsuccessfully! ##
17 | ####################################################
18 | The exit status of the task that caused the workflow execution to fail was: $exitStatus.
19 | The full error message was:
20 | 
21 | ${errorReport}
22 | """
23 | } %>
24 | 
25 | 
26 | <% if (!success){
27 |     out << """####################################################
28 | ## nf-core/rnaseq execution completed unsuccessfully! ##
29 | ####################################################
30 | The exit status of the task that caused the workflow execution to fail was: $exitStatus.
31 | The full error message was:
32 | 
33 | ${errorReport}
34 | """
35 | } else if(skipped_poor_alignment.size() > 0) {
36 |     out << """##################################################
37 | ## nf-core/rnaseq execution completed with warnings ##
38 | ##################################################
39 | The pipeline finished successfully, but the following samples were skipped,
40 | due to very low alignment (less than 5%):
41 | 
42 |   - ${skipped_poor_alignment.join("\n  - ")}
43 | """
44 | } else {
45 |     out << "## nf-core/rnaseq execution completed successfully! ##"
46 | }
47 | %>
48 | 
49 | 
50 | 
51 | 
52 | The workflow was completed at $dateComplete (duration: $duration)
53 | 
54 | The command used to launch the workflow was as follows:
55 | 
56 |   $commandLine
57 | 
58 | 
59 | 
60 | Pipeline Configuration:
61 | -----------------------
62 | <% out << summary.collect{ k,v -> " - $k: $v" }.join("\n") %>
63 | 
64 | --
65 | nf-core/rnaseq
66 | https://github.com/nf-core/rnaseq
67 | 


--------------------------------------------------------------------------------
/conf/base.config:
--------------------------------------------------------------------------------
 1 | /*
 2 |  * -------------------------------------------------
 3 |  *  nf-core/rnaseq Nextflow base config file
 4 |  * -------------------------------------------------
 5 |  * A 'blank slate' config file, appropriate for general
 6 |  * use on most high performace compute environments.
 7 |  * Assumes that all software is installed and available
 8 |  * on the PATH. Runs in `local` mode - all jobs will be
 9 |  * run on the logged in environment.
10 |  */
11 | 
12 | process {
13 | 
14 |   cpus = { check_max( 2, 'cpus' ) }
15 |   memory = { check_max( 8.GB * task.attempt, 'memory' ) }
16 |   time = { check_max( 4.h * task.attempt, 'time' ) }
17 | 
18 |   errorStrategy = { task.exitStatus in [143,137,104,134,139] ? 'retry' : 'terminate' }
19 |   maxRetries = 1
20 |   maxErrors = '-1'
21 | 
22 |   // Process-specific resource requirements
23 |   withLabel: low_memory {
24 |     memory = { check_max( 16.GB * task.attempt, 'memory' ) }
25 |   }
26 |   withLabel: mid_memory {
27 |     cpus = { check_max (4, 'cpus')}
28 |     memory = { check_max( 28.GB * task.attempt, 'memory' ) }
29 |     time = { check_max( 8.h * task.attempt, 'time' ) }
30 |   }
31 |   withLabel: high_memory {
32 |     cpus = { check_max (10, 'cpus')}
33 |     memory = { check_max( 70.GB * task.attempt, 'memory' ) }
34 |     time = { check_max( 8.h * task.attempt, 'time' ) }
35 |   }
36 | 
37 |   withName: makeHISATindex {
38 |     cpus = { check_max( 10, 'cpus' ) }
39 |     memory = { check_max( 200.GB * task.attempt, 'memory' ) }
40 |     time = { check_max( 5.h * task.attempt, 'time' ) }
41 |   }
42 |   withName: trim_galore {
43 |     time = { check_max( 8.h * task.attempt, 'time' ) }
44 |   }
45 |   withName: sortmerna {
46 |     cpus = { check_max( 16 * task.attempt, 'cpus' ) }
47 |     time = { check_max( 24.h * task.attempt, 'time' ) }
48 |     maxRetries = 2
49 |   }
50 |   withName: markDuplicates {
51 |     // Actually the -Xmx value should be kept lower,
52 |     // and is set through the markdup_java_options
53 |     cpus = { check_max( 8, 'cpus' ) }
54 |     memory = { check_max( 8.GB * task.attempt, 'memory' ) }
55 |   }
56 |   withLabel: salmon {
57 |     cpus = { check_max( 8, 'cpus' ) }
58 |     memory = { check_max( 16.GB * task.attempt, 'memory' ) }
59 |   }
60 |   withName: 'get_software_versions' {
61 |     memory = { check_max( 2.GB * task.attempt, 'memory' ) }
62 |     cache = false
63 |   }
64 |   withName: 'multiqc' {
65 |     memory = { check_max( 2.GB * task.attempt, 'memory' ) }
66 |     cache = false
67 |   }
68 | }
69 | 
70 | params {
71 |   // Defaults only, expecting to be overwritten
72 |   max_memory = 128.GB
73 |   max_cpus = 16
74 |   max_time = 240.h
75 |   igenomes_base = 's3://ngi-igenomes/igenomes/'
76 | }
77 | 


--------------------------------------------------------------------------------
/bin/filter_gtf_for_genes_in_genome.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | from __future__ import print_function
 3 | import logging
 4 | from itertools import groupby
 5 | import argparse
 6 | 
 7 | # Create a logger
 8 | logging.basicConfig(format='%(name)s - %(asctime)s %(levelname)s: %(message)s')
 9 | logger = logging.getLogger(__file__)
10 | logger.setLevel(logging.INFO)
11 | 
12 | def is_header(line):
13 |     return line[0] == '>'
14 | 
15 | 
16 | def extract_fasta_seq_names(fasta_name):
17 |     """
18 |     modified from Brent Pedersen
19 |     Correct Way To Parse A Fasta File In Python
20 |     given a fasta file. yield tuples of header, sequence
21 |     from https://www.biostars.org/p/710/
22 |     """
23 |     # first open the file outside
24 |     fh = open(fasta_name)
25 | 
26 |     # ditch the boolean (x[0]) and just keep the header or sequence since
27 |     # we know they alternate.
28 |     faiter = (x[1] for x in groupby(fh, is_header))
29 | 
30 |     for i, header in enumerate(faiter):
31 |         line = next(header)
32 |         if is_header(line):
33 |             # drop the ">"
34 |             headerStr = line[1:].strip().split()[0]
35 |         yield headerStr
36 | 
37 | 
38 | def extract_genes_in_genome(fasta, gtf_in, gtf_out):
39 |     seq_names_in_genome = set(extract_fasta_seq_names(fasta))
40 |     logger.info("Extracted chromosome sequence names from : %s" % fasta)
41 |     logger.info("All chromosome names: " + ", ".join(sorted(x for x in seq_names_in_genome)))
42 |     seq_names_in_gtf = set([])
43 | 
44 |     n_total_lines = 0
45 |     n_lines_in_genome = 0
46 |     with open(gtf_out, 'w') as f:
47 |         with open(gtf_in) as g:
48 | 
49 |             for line in g.readlines():
50 |                 n_total_lines += 1
51 |                 seq_name_gtf = line.split("\t")[0]
52 |                 seq_names_in_gtf.add(seq_name_gtf)
53 |                 if seq_name_gtf in seq_names_in_genome:
54 |                     n_lines_in_genome += 1
55 |                     f.write(line)
56 |     logger.info("Extracted %d / %d lines from %s matching sequences in %s" %
57 |                 (n_lines_in_genome, n_total_lines, gtf_in, fasta))
58 |     logger.info("All sequence IDs from GTF: " + ", ".join(sorted(x for x in seq_name_gtf)))
59 | 
60 |     logger.info("Wrote matching lines to %s" % gtf_out)
61 | 
62 | 
63 | if __name__ == "__main__":
64 |     parser = argparse.ArgumentParser(description="""Filter GTF only for features in the genome""")
65 |     parser.add_argument("--gtf", type=str, help="GTF file")
66 |     parser.add_argument("--fasta", type=str, help="Genome fasta file")
67 |     parser.add_argument("-o", "--output", dest='output',
68 |                         default='genes_in_genome.gtf',
69 |                         type=str, help="GTF features on fasta genome sequences")
70 | 
71 |     args = parser.parse_args()
72 |     extract_genes_in_genome(args.fasta, args.gtf, args.output)
73 | 


--------------------------------------------------------------------------------
/bin/mqc_features_stat.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | 
 3 | import argparse
 4 | import logging
 5 | import os
 6 | 
 7 | # Create a logger
 8 | logging.basicConfig(format='%(name)s - %(asctime)s %(levelname)s: %(message)s')
 9 | logger = logging.getLogger(__file__)
10 | logger.setLevel(logging.INFO)
11 | 
12 | mqc_main = """#id: 'biotype-gs'
13 | #plot_type: 'generalstats'
14 | #pconfig:"""
15 | 
16 | mqc_pconf="""#    percent_{ft}:
17 | #        title: '% {ft}'
18 | #        namespace: 'Biotype Counts'
19 | #        description: '% reads overlapping {ft} features'
20 | #        max: 100
21 | #        min: 0
22 | #        scale: 'RdYlGn-rev'
23 | #        format: '{{:.2f}}%'"""
24 | 
25 | def mqc_feature_stat(bfile, features, outfile, sname=None):
26 | 
27 |     # If sample name not given use file name
28 |     if not sname:
29 |         sname = os.path.splitext(os.path.basename(bfile))[0]
30 | 
31 |     # Try to parse and read biocount file
32 |     fcounts = {}
33 |     try:
34 |         with open(bfile, 'r') as bfl:
35 |             for ln in bfl:
36 |                 if ln.startswith('#'):
37 |                     continue
38 |                 ft, cn = ln.strip().split('\t')
39 |                 fcounts[ft] = float(cn)
40 |     except:
41 |         logger.error("Trouble reading the biocount file {}".format(bfile))
42 |         return
43 | 
44 |     total_count = sum(fcounts.values())
45 |     if total_count == 0:
46 |         logger.error("No biocounts found, exiting")
47 |         return
48 | 
49 |     # Calculate percentage for each requested feature
50 |     fpercent = {f: (fcounts[f]/total_count)*100 if f in fcounts else 0 for f in features}
51 |     if len(fpercent) == 0:
52 |         logger.error("Any of given features '{}' not found in the biocount file".format(", ".join(features), bfile))
53 |         return
54 | 
55 |     # Prepare the output strings
56 |     out_head, out_value, out_mqc = ("Sample", "'{}'".format(sname), mqc_main)
57 |     for ft, pt in fpercent.items():
58 |         out_head = "{}\tpercent_{}".format(out_head, ft)
59 |         out_value = "{}\t{}".format(out_value, pt)
60 |         out_mqc = "{}\n{}".format(out_mqc, mqc_pconf.format(ft=ft))
61 | 
62 |     # Write the output to a file
63 |     with open(outfile, 'w') as ofl:
64 |         out_final = "\n".join([out_mqc, out_head, out_value]).strip()
65 |         ofl.write(out_final + "\n")
66 | 
67 | if __name__ == "__main__":
68 |     parser = argparse.ArgumentParser(description="""Calculate features percentage for biotype counts""")
69 |     parser.add_argument("biocount", type=str, help="File with all biocounts")
70 |     parser.add_argument("-f", "--features", dest='features', required=True, nargs='+', help="Features to count")
71 |     parser.add_argument("-s", "--sample", dest='sample', type=str, help="Sample Name")
72 |     parser.add_argument("-o", "--output", dest='output', default='biocount_percent.tsv', type=str, help="Sample Name")
73 |     args = parser.parse_args()
74 |     mqc_feature_stat(args.biocount, args.features, args.output, args.sample)
75 | 


--------------------------------------------------------------------------------
/bin/tximport.r:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env Rscript
 2 | 
 3 | args = commandArgs(trailingOnly=TRUE)
 4 | if (length(args) < 2) {
 5 |   stop("Usage: tximeta.r <coldata> <salmon_out>", call.=FALSE)
 6 | }
 7 | 
 8 | path = args[2]
 9 | coldata = args[1]
10 | 
11 | sample_name = args[3]
12 | 
13 | prefix = paste(c(sample_name, "salmon"), sep="_")
14 | 
15 | tx2gene = "tx2gene.csv"
16 | info = file.info(tx2gene)
17 | if (info$size == 0){
18 |   tx2gene = NULL
19 | }else{
20 |   rowdata = read.csv(tx2gene, header = FALSE)
21 |   colnames(rowdata) = c("tx", "gene_id", "gene_name")
22 |   tx2gene = rowdata[,1:2]
23 | }
24 | 
25 | fns = list.files(path, pattern = "quant.sf", recursive = T, full.names = T)
26 | names = basename(dirname(fns))
27 | names(fns) = names
28 | 
29 | if (file.exists(coldata)){
30 |     coldata = read.csv(coldata)
31 |     coldata = coldata[match(names, coldata[,1]),]
32 |     coldata = cbind(files = fns, coldata)
33 | }else{
34 |     message("ColData not avaliable ", coldata)
35 |     coldata = data.frame(files = fns, names = names)
36 | }
37 | 
38 | library(SummarizedExperiment)
39 | library(tximport)
40 | 
41 | txi = tximport(fns, type = "salmon", txOut = TRUE)
42 | rownames(coldata) = coldata[["names"]]
43 | extra = setdiff(rownames(txi[[1]]),  as.character(rowdata[["tx"]]))
44 | if (length(extra) > 0){
45 |     rowdata = rbind(rowdata,
46 |                     data.frame(tx=extra,
47 |                                gene_id=extra,
48 |                                gene_name=extra))
49 | }
50 | rowdata = rowdata[match(rownames(txi[[1]]), as.character(rowdata[["tx"]])),]
51 | rownames(rowdata) = rowdata[["tx"]]
52 | se = SummarizedExperiment(assays = list(counts = txi[["counts"]],
53 |                                         abundance = txi[["abundance"]],
54 |                                         length = txi[["length"]]),
55 |                           colData = DataFrame(coldata),
56 |                           rowData = rowdata)
57 | if (!is.null(tx2gene)){
58 |     gi = summarizeToGene(txi, tx2gene = tx2gene)
59 |     growdata = unique(rowdata[,2:3])
60 |     growdata = growdata[match(rownames(gi[[1]]), growdata[["gene_id"]]),]
61 |     rownames(growdata) = growdata[["tx"]]
62 |     gse = SummarizedExperiment(assays = list(counts = gi[["counts"]],
63 |                                              abundance = gi[["abundance"]],
64 |                                              length = gi[["length"]]),
65 |                                colData = DataFrame(coldata),
66 |                                rowData = growdata)
67 | }
68 | 
69 | if(exists("gse")){
70 |   write.csv(assays(gse)[["abundance"]], paste(c(prefix, "gene_tpm.csv"), collapse="_"), quote=FALSE)
71 |   write.csv(assays(gse)[["counts"]], paste(c(prefix, "gene_counts.csv"), collapse="_"), quote=FALSE)
72 | }
73 | 
74 | write.csv(assays(se)[["abundance"]], paste(c(prefix, "transcript_tpm.csv"), collapse="_"), quote=FALSE)
75 | write.csv(assays(se)[["counts"]], paste(c(prefix, "transcript_counts.csv"), collapse="_"), quote=FALSE)
76 | 
77 | # Print sessioninfo to standard out
78 | citation("tximeta")
79 | sessionInfo()
80 | 


--------------------------------------------------------------------------------
/.github/CONTRIBUTING.md:
--------------------------------------------------------------------------------
 1 | # nf-core/rnaseq: Contributing Guidelines
 2 | 
 3 | Hi there! Many thanks for taking an interest in improving nf-core/rnaseq.
 4 | 
 5 | We try to manage the required tasks for nf-core/rnaseq using GitHub issues, you probably came to this page when creating one. Please use the pre-filled template to save time.
 6 | 
 7 | However, don't be put off by this template - other more general issues and suggestions are welcome! Contributions to the code are even more welcome ;)
 8 | 
 9 | > If you need help using or modifying nf-core/rnaseq then the best place to ask is on the pipeline channel on [Slack](https://nf-co.re/join/slack/).
10 | 
11 | ## Contribution workflow
12 | 
13 | If you'd like to write some code for nf-core/rnaseq, the standard workflow
14 | is as follows:
15 | 
16 | 1. Check that there isn't already an issue about your idea in the
17 |    [nf-core/rnaseq issues](https://github.com/nf-core/rnaseq/issues) to avoid
18 |    duplicating work.
19 |     * If there isn't one already, please create one so that others know you're working on this
20 | 2. Fork the [nf-core/rnaseq repository](https://github.com/nf-core/rnaseq) to your GitHub account
21 | 3. Make the necessary changes / additions within your forked repository
22 | 4. Submit a Pull Request against the `dev` branch and wait for the code to be reviewed and merged.
23 | 
24 | If you're not used to this workflow with git, you can start with some [basic docs from GitHub](https://help.github.com/articles/fork-a-repo/) or even their [excellent interactive tutorial](https://try.github.io/).
25 | 
26 | ## Tests
27 | 
28 | When you create a pull request with changes, [Travis CI](https://travis-ci.org/) will run automatic tests.
29 | Typically, pull-requests are only fully reviewed when these tests are passing, though of course we can help out before then.
30 | 
31 | There are typically two types of tests that run:
32 | 
33 | ### Lint Tests
34 | 
35 | The nf-core has a [set of guidelines](https://nf-co.re/developers/guidelines) which all pipelines must adhere to.
36 | To enforce these and ensure that all pipelines stay in sync, we have developed a helper tool which runs checks on the pipeline code. This is in the [nf-core/tools repository](https://github.com/nf-core/tools) and once installed can be run locally with the `nf-core lint <pipeline-directory>` command.
37 | 
38 | If any failures or warnings are encountered, please follow the listed URL for more documentation.
39 | 
40 | ### Pipeline Tests
41 | 
42 | Each nf-core pipeline should be set up with a minimal set of test-data.
43 | Travis CI then runs the pipeline on this data to ensure that it exists successfully.
44 | If there are any failures then the automated tests fail.
45 | These tests are run both with the latest available version of Nextflow and also the minimum required version that is stated in the pipeline code.
46 | 
47 | ## Getting help
48 | 
49 | For further information/help, please consult the [nf-core/rnaseq documentation](https://github.com/nf-core/rnaseq#documentation) and don't hesitate to get in touch on the [nf-core/rnaseq pipeline channel](https://nfcore.slack.com/channels/rnaseq) on [Slack](https://nf-co.re/join/slack/).
50 | 


--------------------------------------------------------------------------------
/bin/parse_gtf.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | from __future__ import print_function
 3 | from collections import OrderedDict, defaultdict, Counter
 4 | import logging
 5 | import argparse
 6 | import glob
 7 | import os
 8 | 
 9 | # Create a logger
10 | logging.basicConfig(format='%(name)s - %(asctime)s %(levelname)s: %(message)s')
11 | logger = logging.getLogger(__file__)
12 | logger.setLevel(logging.INFO)
13 | 
14 | 
15 | def read_top_transcript(salmon):
16 |     txs = set()
17 |     fn = glob.glob(os.path.join(salmon, "*", "quant.sf"))[0]
18 |     with open(fn) as inh:
19 |         for line in inh:
20 |             if line.startswith("Name"):
21 |                 continue
22 |             txs.add(line.split()[0])
23 |             if len(txs) > 100:
24 |                 break
25 |     logger.info("Transcripts found in FASTA: %s" % txs)
26 |     return txs
27 | 
28 | 
29 | def tx2gene(gtf, salmon, gene_id, extra, out):
30 |     txs = read_top_transcript(salmon)
31 |     votes = Counter()
32 |     gene_dict = defaultdict(list)
33 |     with open(gtf) as inh:
34 |         for line in inh:
35 |             if line.startswith("#"):
36 |                 continue
37 |             cols = line.split("\t")
38 |             attr_dict = OrderedDict()
39 |             for gff_item in cols[8].split(";"):
40 |                 item_pair = gff_item.strip().split(" ")
41 |                 if len(item_pair) > 1:
42 |                     value = item_pair[1].strip().replace("\"", "")
43 |                     if value in txs:
44 |                         votes[item_pair[0].strip()] += 1
45 | 
46 |                     attr_dict[item_pair[0].strip()] = value
47 |             gene_dict[attr_dict[gene_id]].append(attr_dict)
48 | 
49 |     if not votes:
50 |         logger.warning("No attribute in GTF matching transcripts")
51 |         return None
52 | 
53 |     txid = votes.most_common(1)[0][0]
54 |     logger.info("Attributed found to be transcript: %s" % txid)
55 |     seen = set()
56 |     with open(out, 'w') as outh:
57 |         for gene in gene_dict:
58 |             for row in gene_dict[gene]:
59 |                 if txid not in row:
60 |                     continue
61 |                 if (gene, row[txid]) not in seen:
62 |                     seen.add((gene, row[txid]))
63 |                     if not extra in row:
64 |                         extra_id = gene
65 |                     else:
66 |                         extra_id = row[extra]
67 |                     print("%s,%s,%s" % (row[txid], gene, extra_id), file=outh)
68 | 
69 | 
70 | if __name__ == "__main__":
71 |     parser = argparse.ArgumentParser(description="""Get tx to gene names for tximport""")
72 |     parser.add_argument("--gtf", type=str, help="GTF file")
73 |     parser.add_argument("--salmon", type=str, help="output of salmon")
74 |     parser.add_argument("--id", type=str, help="gene id in the gtf file")
75 |     parser.add_argument("--extra", type=str, help="extra id in the gtf file")
76 |     parser.add_argument("-o", "--output", dest='output', default='tx2gene.csv', type=str, help="file with output")
77 | 
78 |     args = parser.parse_args()
79 |     tx2gene(args.gtf, args.salmon, args.id, args.extra, args.output)
80 | 


--------------------------------------------------------------------------------
/bin/edgeR_heatmap_MDS.r:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env Rscript
 2 | 
 3 | # Command line argument processing
 4 | args <- commandArgs(trailingOnly=TRUE)
 5 | 
 6 | if (length(args) < 3) {
 7 |   stop("Usage: edgeR_heatmap_MDS.r <sample_1.bam> <sample_2.bam> <sample_3.bam> (more bam files optional)", call.=FALSE)
 8 | }
 9 | 
10 | # Load / install required packages
11 | if (!require("limma")){
12 |     source("http://bioconductor.org/biocLite.R")
13 |     biocLite("limma", suppressUpdates=TRUE)
14 |     library("limma")
15 | }
16 | if (!require("edgeR")){
17 |     source("http://bioconductor.org/biocLite.R")
18 |     biocLite("edgeR", suppressUpdates=TRUE)
19 |     library("edgeR")
20 | }
21 | if (!require("data.table")){
22 |     install.packages("data.table", dependencies=TRUE, repos='http://cloud.r-project.org/')
23 |     library("data.table")
24 | }
25 | if (!require("gplots")) {
26 |     install.packages("gplots", dependencies=TRUE, repos='http://cloud.r-project.org/')
27 |     library("gplots")
28 | }
29 | 
30 | # Load count column from all files into a list of data frames
31 | # Use data.tables fread as much much faster than read.table
32 | # Row names are GeneIDs
33 | temp <- lapply(lapply(args, fread, skip="Geneid", header=TRUE), function(x){return(as.data.frame(x)[,c(1, ncol(x))])})
34 | 
35 | # Merge into a single data frame
36 | merge.all <- function(x, y) {
37 |     merge(x, y, all=TRUE, by="Geneid")
38 | }
39 | data <- data.frame(Reduce(merge.all, temp))
40 | 
41 | # Clean sample name headers
42 | colnames(data) <- gsub("Aligned.sortedByCoord.out.bam", "", colnames(data))
43 | 
44 | # Set GeneID as row name
45 | rownames(data) <- data[,1]
46 | data[,1] <- NULL
47 | 
48 | # Convert data frame to edgeR DGE object
49 | dataDGE <- DGEList( counts=data.matrix(data) )
50 | 
51 | # Normalise counts
52 | dataNorm <- calcNormFactors(dataDGE)
53 | 
54 | # Make MDS plot
55 | pdf('edgeR_MDS_plot.pdf')
56 | MDSdata <- plotMDS(dataNorm)
57 | dev.off()
58 | 
59 | # Print distance matrix to file
60 | write.csv(MDSdata$distance.matrix, 'edgeR_MDS_distance_matrix.csv', quote=FALSE,append=TRUE)
61 | 
62 | # Print plot x,y co-ordinates to file
63 | MDSxy = MDSdata$cmdscale.out
64 | colnames(MDSxy) = c(paste(MDSdata$axislabel, '1'), paste(MDSdata$axislabel, '2'))
65 | write.csv(MDSxy, 'edgeR_MDS_Aplot_coordinates_mqc.csv', quote=FALSE, append=TRUE)
66 | 
67 | # Get the log counts per million values
68 | logcpm <- cpm(dataNorm, prior.count=2, log=TRUE)
69 | 
70 | # Calculate the Pearsons correlation between samples
71 | # Plot a heatmap of correlations
72 | pdf('log2CPM_sample_correlation_heatmap.pdf')
73 | hmap <- heatmap.2(as.matrix(cor(logcpm, method="pearson")),
74 |   key.title="Pearson's Correlation", trace="none",
75 |   dendrogram="row", margin=c(9, 9)
76 | )
77 | dev.off()
78 | 
79 | # Write correlation values to file
80 | write.csv(hmap$carpet, 'log2CPM_sample_correlation_mqc.csv', quote=FALSE, append=TRUE)
81 | 
82 | # Plot the heatmap dendrogram
83 | pdf('log2CPM_sample_distances_dendrogram.pdf')
84 | hmap <- heatmap.2(as.matrix(dist(t(logcpm))))
85 | plot(hmap$rowDendrogram, main="Sample Pearson's Correlation Clustering")
86 | dev.off()
87 | 
88 | file.create("corr.done")
89 | 
90 | # Printing sessioninfo to standard out
91 | print("Sample correlation info:")
92 | sessionInfo()
93 | 


--------------------------------------------------------------------------------
/assets/email_template.html:
--------------------------------------------------------------------------------
 1 | <html>
 2 | <head>
 3 |   <head>
 4 |   <meta charset="utf-8">
 5 |   <meta http-equiv="X-UA-Compatible" content="IE=edge">
 6 |   <meta name="viewport" content="width=device-width, initial-scale=1">
 7 | 
 8 |   <meta name="description" content="nf-core/rnaseq: Nextflow RNA-Seq analysis pipeline, part of the nf-core community.">
 9 |   <title>nf-core/rnaseq Pipeline Report</title>
10 | </head>
11 | <body>
12 | <div style="font-family: Helvetica, Arial, sans-serif; padding: 30px; max-width: 800px; margin: 0 auto;">
13 | 
14 | <img src="cid:nfcorepipelinelogo">
15 | 
16 | <h1>nf-core/rnaseq v${version}</h1>
17 | <h2>Run Name: $runName</h2>
18 | 
19 | <% if (!success){
20 |     out << """
21 |     <div style="color: #a94442; background-color: #f2dede; border-color: #ebccd1; padding: 15px; margin-bottom: 20px; border: 1px solid transparent; border-radius: 4px;">
22 |         <h4 style="margin-top:0; color: inherit;">nf-core/rnaseq execution completed unsuccessfully!</h4>
23 |         <p>The exit status of the task that caused the workflow execution to fail was: <code>$exitStatus</code>.</p>
24 |         <p>The full error message was:</p>
25 |         <pre style="white-space: pre-wrap; overflow: visible; margin-bottom: 0;">${errorReport}</pre>
26 |     </div>
27 |     """
28 | } else if(skipped_poor_alignment.size() > 0) {
29 |     out << """
30 |     <div style="color: #856404; background-color: #fff3cd; border-color: #ffeeba; padding: 15px; margin-bottom: 20px; border: 1px solid transparent; border-radius: 4px;">
31 |         <h4 style="margin-top:0; color: inherit;">nf-core/rnaseq execution completed with warnings!</h4>
32 |         <p>The pipeline finished successfully, but the following samples were skipped due to very low alignment (&lt; 5%):</p>
33 |         <ul>
34 |             <li><code>${skipped_poor_alignment.join('</code></li><li><code>')}</code></li>
35 |         </ul>
36 |         <p>
37 |     </div>
38 |     """
39 | } else {
40 |     out << """
41 |     <div style="color: #3c763d; background-color: #dff0d8; border-color: #d6e9c6; padding: 15px; margin-bottom: 20px; border: 1px solid transparent; border-radius: 4px;">
42 |         nf-core/rnaseq execution completed successfully!
43 |     </div>
44 |     """
45 | }
46 | %>
47 | 
48 | <p>The workflow was completed at <strong>$dateComplete</strong> (duration: <strong>$duration</strong>)</p>
49 | <p>The command used to launch the workflow was as follows:</p>
50 | <pre style="white-space: pre-wrap; overflow: visible; background-color: #ededed; padding: 15px; border-radius: 4px; margin-bottom:30px;">$commandLine</pre>
51 | 
52 | <h3>Pipeline Configuration:</h3>
53 | <table style="width:100%; max-width:100%; border-spacing: 0; border-collapse: collapse; border:0; margin-bottom: 30px;">
54 |     <tbody style="border-bottom: 1px solid #ddd;">
55 |         <% out << summary.collect{ k,v -> "<tr><th style='text-align:left; padding: 8px 0; line-height: 1.42857143; vertical-align: top; border-top: 1px solid #ddd;'>$k</th><td style='text-align:left; padding: 8px; line-height: 1.42857143; vertical-align: top; border-top: 1px solid #ddd;'><pre style='white-space: pre-wrap; overflow: visible;'>$v</pre></td></tr>" }.join("\n") %>
56 |     </tbody>
57 | </table>
58 | 
59 | <p>nf-core/rnaseq</p>
60 | <p><a href="https://github.com/nf-core/rnaseq">https://github.com/nf-core/rnaseq</a></p>
61 | 
62 | </div>
63 | 
64 | </body>
65 | </html>
66 | 


--------------------------------------------------------------------------------
/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
 1 | # Contributor Covenant Code of Conduct
 2 | 
 3 | ## Our Pledge
 4 | 
 5 | In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation.
 6 | 
 7 | ## Our Standards
 8 | 
 9 | Examples of behavior that contributes to creating a positive environment include:
10 | 
11 | * Using welcoming and inclusive language
12 | * Being respectful of differing viewpoints and experiences
13 | * Gracefully accepting constructive criticism
14 | * Focusing on what is best for the community
15 | * Showing empathy towards other community members
16 | 
17 | Examples of unacceptable behavior by participants include:
18 | 
19 | * The use of sexualized language or imagery and unwelcome sexual attention or advances
20 | * Trolling, insulting/derogatory comments, and personal or political attacks
21 | * Public or private harassment
22 | * Publishing others' private information, such as a physical or electronic address, without explicit permission
23 | * Other conduct which could reasonably be considered inappropriate in a professional setting
24 | 
25 | ## Our Responsibilities
26 | 
27 | Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior.
28 | 
29 | Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful.
30 | 
31 | ## Scope
32 | 
33 | This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers.
34 | 
35 | ## Enforcement
36 | 
37 | Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team on [Slack](https://nf-co.re/join/slack/). The project team will review and investigate all complaints, and will respond in a way that it deems appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately.
38 | 
39 | Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership.
40 | 
41 | ## Attribution
42 | 
43 | This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, available at [http://contributor-covenant.org/version/1/4][version]
44 | 
45 | [homepage]: http://contributor-covenant.org
46 | [version]: http://contributor-covenant.org/version/1/4/
47 | 


--------------------------------------------------------------------------------
/bin/scrape_software_versions.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | from __future__ import print_function
 3 | from collections import OrderedDict
 4 | import re
 5 | 
 6 | regexes = {
 7 |     'nf-core/rnaseq': ['v_ngi_rnaseq.txt', r"(\S+)"],
 8 |     'Nextflow': ['v_nextflow.txt', r"(\S+)"],
 9 |     'FastQC': ['v_fastqc.txt', r"FastQC v(\S+)"],
10 |     'Cutadapt': ['v_cutadapt.txt', r"(\S+)"],
11 |     'Trim Galore!': ['v_trim_galore.txt', r"version (\S+)"],
12 |     'SortMeRNA': ['v_sortmerna.txt', r"SortMeRNA version (\S+),"],
13 |     'STAR': ['v_star.txt', r"(\S+)"],
14 |     'HISAT2': ['v_hisat2.txt', r"version (\S+)"],
15 |     'Picard MarkDuplicates': ['v_markduplicates.txt', r"([\d\.]+)-SNAPSHOT"],
16 |     'Samtools': ['v_samtools.txt', r"samtools (\S+)"],
17 |     'featureCounts': ['v_featurecounts.txt', r"featureCounts v(\S+)"],
18 |     'Salmon': ['v_salmon.txt', r"salmon (\S+)"],
19 |     'deepTools': ['v_deeptools.txt', r"bamCoverage (\S+)"],
20 |     'StringTie': ['v_stringtie.txt', r"(\S+)"],
21 |     'Preseq': ['v_preseq.txt', r"Version: (\S+)"],
22 |     'RSeQC': ['v_rseqc.txt', r"read_duplication.py ([\d\.]+)"],
23 |     'Qualimap': ['v_qualimap.txt', r"QualiMap v(\S+)"],
24 |     'dupRadar': ['v_dupRadar.txt', r"(\S+)"],
25 |     'edgeR': ['v_edgeR.txt', r"(\S+)"],
26 |     'MultiQC': ['v_multiqc.txt', r"multiqc, version (\S+)"],
27 | }
28 | results = OrderedDict()
29 | results['nf-core/rnaseq'] = '<span style="color:#999999;\">N/A</span>'
30 | results['Nextflow'] = '<span style="color:#999999;\">N/A</span>'
31 | results['FastQC'] = '<span style="color:#999999;\">N/A</span>'
32 | results['Cutadapt'] = '<span style="color:#999999;\">N/A</span>'
33 | results['Trim Galore!'] = '<span style="color:#999999;\">N/A</span>'
34 | results['SortMeRNA'] = '<span style="color:#999999;\">N/A</span>'
35 | results['STAR'] = False
36 | results['HISAT2'] = False
37 | results['Picard MarkDuplicates'] = '<span style="color:#999999;\">N/A</span>'
38 | results['Samtools'] = '<span style="color:#999999;\">N/A</span>'
39 | results['featureCounts'] = '<span style="color:#999999;\">N/A</span>'
40 | results['Salmon'] = '<span style="color:#999999;\">N/A</span>'
41 | results['StringTie'] = '<span style="color:#999999;\">N/A</span>'
42 | results['Preseq'] = '<span style="color:#999999;\">N/A</span>'
43 | results['deepTools'] = '<span style="color:#999999;\">N/A</span>'
44 | results['RSeQC'] = '<span style="color:#999999;\">N/A</span>'
45 | results['dupRadar'] = '<span style="color:#999999;\">N/A</span>'
46 | results['edgeR'] = '<span style="color:#999999;\">N/A</span>'
47 | results['Qualimap'] = '<span style="color:#999999;\">N/A</span>'
48 | results['MultiQC'] = '<span style="color:#999999;\">N/A</span>'
49 | 
50 | # Search each file using its regex
51 | for k, v in regexes.items():
52 |     try:
53 |         with open(v[0]) as x:
54 |             versions = x.read()
55 |             match = re.search(v[1], versions)
56 |             if match:
57 |                 results[k] = "v{}".format(match.group(1))
58 |     except IOError:
59 |         results[k] = False
60 | 
61 | # Strip STAR or HiSAT2
62 | for k in results:
63 |     if not results[k]:
64 |         del(results[k])
65 | 
66 | # Dump to YAML
67 | print ('''
68 | id: 'software_versions'
69 | section_name: 'nf-core/rnaseq Software Versions'
70 | section_href: 'https://github.com/nf-core/rnaseq'
71 | plot_type: 'html'
72 | description: 'are collected at run time from the software output.'
73 | data: |
74 |     <dl class="dl-horizontal">
75 | ''')
76 | for k,v in results.items():
77 |     print("        <dt>{}</dt><dd><samp>{}</samp></dd>".format(k,v))
78 | print ("    </dl>")
79 | 
80 | # Write out regexes as csv file:
81 | with open('software_versions.csv', 'w') as f:
82 |     for k,v in results.items():
83 |         f.write("{}\t{}\n".format(k,v))
84 | 


--------------------------------------------------------------------------------
/bin/gtf2bed:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/perl
  2 | 
  3 | # Copyright (c) 2011 Erik Aronesty (erik@q32.com)
  4 | #
  5 | # Permission is hereby granted, free of charge, to any person obtaining a copy
  6 | # of this software and associated documentation files (the "Software"), to deal
  7 | # in the Software without restriction, including without limitation the rights
  8 | # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
  9 | # copies of the Software, and to permit persons to whom the Software is
 10 | # furnished to do so, subject to the following conditions:
 11 | #
 12 | # The above copyright notice and this permission notice shall be included in
 13 | # all copies or substantial portions of the Software.
 14 | #
 15 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 16 | # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 17 | # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 18 | # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 19 | # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 20 | # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 21 | # THE SOFTWARE.
 22 | #
 23 | # ALSO, IT WOULD BE NICE IF YOU LET ME KNOW YOU USED IT.
 24 | 
 25 | use Getopt::Long;
 26 | 
 27 | my $extended;
 28 | GetOptions("x"=>\$extended);
 29 | 
 30 | $in = shift @ARGV;
 31 | 
 32 | my $in_cmd =($in =~ /\.gz$/ ? "gunzip -c $in|" : $in =~ /\.zip$/ ? "unzip -p $in|" : "$in") || die "Can't open $in: $!\n";
 33 | open IN, $in_cmd;
 34 | 
 35 | while (<IN>) {
 36 | 	$gff = 2 if /^##gff-version 2/;
 37 | 	$gff = 3 if /^##gff-version 3/;
 38 | 	next if /^#/ && $gff;
 39 | 
 40 | 	s/\s+$//;
 41 | 	# 0-chr 1-src 2-feat 3-beg 4-end 5-scor 6-dir 7-fram 8-attr
 42 | 	my @f = split /\t/;
 43 | 	if ($gff) {
 44 |         # most ver 2's stick gene names in the id field
 45 | 		($id) = $f[8]=~ /\bID="([^"]+)"/;
 46 |         # most ver 3's stick unquoted names in the name field
 47 | 		($id) = $f[8]=~ /\bName=([^";]+)/ if !$id && $gff == 3;
 48 | 	} else {
 49 | 		($id) = $f[8]=~ /transcript_id "([^"]+)"/;
 50 | 	}
 51 | 
 52 | 	next unless $id && $f[0];
 53 | 
 54 | 	if ($f[2] eq 'exon') {
 55 | 		die "no position at exon on line $." if ! $f[3];
 56 |         # gff3 puts :\d in exons sometimes
 57 |         $id =~ s/:\d+$// if $gff == 3;
 58 | 		push @{$exons{$id}}, \@f;
 59 | 		# save lowest start
 60 | 		$trans{$id} = \@f if !$trans{$id};
 61 | 	} elsif ($f[2] eq 'start_codon') {
 62 | 		#optional, output codon start/stop as "thick" region in bed
 63 | 		$sc{$id}->[0] = $f[3];
 64 | 	} elsif ($f[2] eq 'stop_codon') {
 65 | 		$sc{$id}->[1] = $f[4];
 66 | 	} elsif ($f[2] eq 'miRNA' ) {
 67 | 		$trans{$id} = \@f if !$trans{$id};
 68 | 		push @{$exons{$id}}, \@f;
 69 | 	}
 70 | }
 71 | 
 72 | for $id (
 73 | 	# sort by chr then pos
 74 | 	sort {
 75 | 		$trans{$a}->[0] eq $trans{$b}->[0] ?
 76 | 		$trans{$a}->[3] <=> $trans{$b}->[3] :
 77 | 		$trans{$a}->[0] cmp $trans{$b}->[0]
 78 | 	} (keys(%trans)) ) {
 79 | 		my ($chr, undef, undef, undef, undef, undef, $dir, undef, $attr, undef, $cds, $cde) = @{$trans{$id}};
 80 |         my ($cds, $cde);
 81 |         ($cds, $cde) = @{$sc{$id}} if $sc{$id};
 82 | 
 83 | 		# sort by pos
 84 | 		my @ex = sort {
 85 | 			$a->[3] <=> $b->[3]
 86 | 		} @{$exons{$id}};
 87 | 
 88 | 		my $beg = $ex[0][3];
 89 | 		my $end = $ex[-1][4];
 90 | 		
 91 | 		if ($dir eq '-') {
 92 | 			# swap
 93 | 			$tmp=$cds;
 94 | 			$cds=$cde;
 95 | 			$cde=$tmp;
 96 | 			$cds -= 2 if $cds;
 97 | 			$cde += 2 if $cde;
 98 | 		}
 99 | 
100 | 		# not specified, just use exons
101 | 		$cds = $beg if !$cds;
102 | 		$cde = $end if !$cde;
103 | 
104 | 		# adjust start for bed
105 | 		--$beg; --$cds;
106 | 	
107 | 		my $exn = @ex;												# exon count
108 | 		my $exst = join ",", map {$_->[3]-$beg-1} @ex;				# exon start
109 | 		my $exsz = join ",", map {$_->[4]-$_->[3]+1} @ex;			# exon size
110 | 
111 |         my $gene_id;
112 |         my $extend = "";
113 |         if ($extended) {
114 |     	    ($gene_id) = $attr =~ /gene_name "([^"]+)"/;
115 |     	    ($gene_id) = $attr =~ /gene_id "([^"]+)"/ unless $gene_id;
116 |             $extend="\t$gene_id";
117 |         }
118 | 		# added an extra comma to make it look exactly like ucsc's beds
119 | 		print "$chr\t$beg\t$end\t$id\t0\t$dir\t$cds\t$cde\t0\t$exn\t$exsz,\t$exst,$extend\n";
120 | }
121 | 
122 | 
123 | close IN;
124 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # ![nf-core/rnaseq](docs/images/nf-core-rnaseq_logo.png)
 2 | 
 3 | [![Build Status](https://travis-ci.org/nf-core/rnaseq.svg?branch=master)](https://travis-ci.org/nf-core/rnaseq)
 4 | [![Nextflow](https://img.shields.io/badge/nextflow-%E2%89%A519.04.0-brightgreen.svg)](https://www.nextflow.io/)
 5 | [![DOI](https://zenodo.org/badge/127293091.svg)](https://zenodo.org/badge/latestdoi/127293091)
 6 | 
 7 | [![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg)](http://bioconda.github.io/)
 8 | [![Docker](https://img.shields.io/docker/automated/nfcore/rnaseq.svg)](https://hub.docker.com/r/nfcore/rnaseq/)
 9 | 
10 | ### Introduction
11 | 
12 | **nf-core/rnaseq** is a bioinformatics analysis pipeline used for RNA sequencing data.
13 | 
14 | The workflow processes raw data from
15 |  FastQ inputs ([FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/),
16 |  [Trim Galore!](https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/)),
17 |   aligns the reads
18 |    ([STAR](https://github.com/alexdobin/STAR) or
19 |     [HiSAT2](https://ccb.jhu.edu/software/hisat2/index.shtml)),
20 |      generates counts relative to genes
21 |       ([featureCounts](http://bioinf.wehi.edu.au/featureCounts/),
22 |        [StringTie](https://ccb.jhu.edu/software/stringtie/)) or transcripts
23 |         ([Salmon](https://combine-lab.github.io/salmon/),
24 |          [tximport](https://bioconductor.org/packages/release/bioc/html/tximport.html)) and performs extensive quality-control on the results
25 |           ([RSeQC](http://rseqc.sourceforge.net/),
26 |            [Qualimap](http://qualimap.bioinfo.cipf.es/),
27 |             [dupRadar](https://bioconductor.org/packages/release/bioc/html/dupRadar.html),
28 |              [Preseq](http://smithlabresearch.org/software/preseq/),
29 |               [edgeR](https://bioconductor.org/packages/release/bioc/html/edgeR.html),
30 |                [MultiQC](http://multiqc.info/)). See the [output documentation](docs/output.md) for more details of the results.
31 | 
32 | The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible.
33 | 
34 | ## Quick Start
35 | 
36 | i. Install [`nextflow`](https://nf-co.re/usage/installation)
37 | 
38 | ii. Install one of [`docker`](https://docs.docker.com/engine/installation/), [`singularity`](https://www.sylabs.io/guides/3.0/user-guide/) or [`conda`](https://conda.io/miniconda.html)
39 | 
40 | iii. Download the pipeline and test it on a minimal dataset with a single command
41 | 
42 | ```bash
43 | nextflow run nf-core/rnaseq -profile test,<docker/singularity/conda>
44 | ```
45 | 
46 | iv. Start running your own analysis!
47 | 
48 | ```bash
49 | nextflow run nf-core/rnaseq -profile <docker/singularity/conda> --reads '*_R{1,2}.fastq.gz' --genome GRCh37
50 | ```
51 | 
52 | See [usage docs](docs/usage.md) for all of the available options when running the pipeline.
53 | 
54 | ### Documentation
55 | 
56 | The nf-core/rnaseq pipeline comes with documentation about the pipeline, found in the `docs/` directory:
57 | 
58 | 1. [Installation](https://nf-co.re/usage/installation)
59 | 2. Pipeline configuration
60 |     * [Local installation](https://nf-co.re/usage/local_installation)
61 |     * [Adding your own system config](https://nf-co.re/usage/adding_own_config)
62 |     * [Reference genomes](https://nf-co.re/usage/reference_genomes)
63 | 3. [Running the pipeline](docs/usage.md)
64 | 4. [Output and how to interpret the results](docs/output.md)
65 | 5. [Troubleshooting](https://nf-co.re/usage/troubleshooting)
66 | 
67 | ### Credits
68 | 
69 | These scripts were originally written for use at the [National Genomics Infrastructure](https://portal.scilifelab.se/genomics/), part of [SciLifeLab](http://www.scilifelab.se/) in Stockholm, Sweden, by Phil Ewels ([@ewels](https://github.com/ewels)) and Rickard Hammarén ([@Hammarn](https://github.com/Hammarn)).
70 | 
71 | Many thanks to other who have helped out along the way too, including (but not limited to):
72 | [@Galithil](https://github.com/Galithil),
73 | [@pditommaso](https://github.com/pditommaso),
74 | [@orzechoj](https://github.com/orzechoj),
75 | [@apeltzer](https://github.com/apeltzer),
76 | [@colindaven](https://github.com/colindaven),
77 | [@lpantano](https://github.com/lpantano),
78 | [@olgabot](https://github.com/olgabot),
79 | [@jburos](https://github.com/jburos),
80 | [@drpatelh](https://github.com/drpatelh).
81 | 
82 | ## Contributions and Support
83 | 
84 | If you would like to contribute to this pipeline, please see the [contributing guidelines](.github/CONTRIBUTING.md).
85 | 
86 | For further information or help, don't hesitate to get in touch on [Slack](https://nfcore.slack.com/channels/rnaseq) (you can join with [this invite](https://nf-co.re/join/slack)).
87 | 
88 | ## Citation
89 | 
90 | If you use nf-core/rnaseq for your analysis, please cite it using the following doi: [10.5281/zenodo.1400710](https://doi.org/10.5281/zenodo.1400710)
91 | 
92 | You can cite the `nf-core` pre-print as follows:  
93 | 
94 | > Ewels PA, Peltzer A, Fillinger S, Alneberg JA, Patel H, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. **nf-core: Community curated bioinformatics pipelines**. *bioRxiv*. 2019. p. 610741. [doi: 10.1101/610741](https://www.biorxiv.org/content/10.1101/610741v1).
95 | 


--------------------------------------------------------------------------------
/nextflow.config:
--------------------------------------------------------------------------------
  1 | /*
  2 |  * -------------------------------------------------
  3 |  *  nf-core/rnaseq Nextflow config file
  4 |  * -------------------------------------------------
  5 |  * Default config options for all environments.
  6 |  */
  7 | 
  8 | // Global default params, used in configs
  9 | params {
 10 | 
 11 |   // Pipeline Options
 12 |   // Workflow flags
 13 |   genome = false
 14 |   reads = "data/*{1,2}.fastq.gz"
 15 |   singleEnd = false
 16 | 
 17 |   // References
 18 |   genome = false
 19 |   salmon_index = false
 20 |   transcript_fasta = false
 21 |   splicesites = false
 22 |   saveReference = false
 23 |   gencode = false
 24 |   compressedReference = false
 25 | 
 26 |   // Strandedness
 27 |   forwardStranded = false
 28 |   reverseStranded = false
 29 |   unStranded = false
 30 | 
 31 |   // Trimming
 32 |   skipTrimming = false
 33 |   clip_r1 = 0
 34 |   clip_r2 = 0
 35 |   three_prime_clip_r1 = 0
 36 |   three_prime_clip_r2 = 0
 37 |   trim_nextseq = 0
 38 |   pico = false
 39 |   saveTrimmed = false
 40 | 
 41 |   // Ribosomal RNA removal
 42 |   removeRiboRNA = false
 43 |   save_nonrRNA_reads = false
 44 |   rRNA_database_manifest = false
 45 | 
 46 |   // Alignment
 47 |   aligner = 'star'
 48 |   pseudo_aligner = false
 49 |   stringTieIgnoreGTF = false
 50 |   seq_center = false
 51 |   saveAlignedIntermediates = false
 52 |   skipAlignment = false
 53 |   saveUnaligned = false
 54 | 
 55 |   // Read Counting
 56 |   fc_extra_attributes = 'gene_name'
 57 |   fc_group_features = 'gene_id'
 58 |   fc_count_type = 'exon'
 59 |   fc_group_features_type = 'gene_biotype'
 60 |   sampleLevel = false
 61 |   skipBiotypeQC = false
 62 | 
 63 |   // QC
 64 |   skipQC = false
 65 |   skipFastQC = false
 66 |   skipPreseq = false
 67 |   skipDupRadar = false
 68 |   skipQualimap = false
 69 |   skipRseQC = false
 70 |   skipEdgeR = false
 71 |   skipMultiQC = false
 72 | 
 73 |   // Defaults
 74 |   project = false
 75 |   markdup_java_options = '"-Xms4000m -Xmx7g"' //Established values for markDuplicate memory consumption, see issue PR #689 (in Sarek) for details
 76 |   hisat_build_memory = 200 // Required amount of memory in GB to build HISAT2 index with splice sites
 77 |   readPaths = null
 78 |   star_memory = false // Cluster specific param required for hebbe
 79 |   rRNA_database_manifest = "$baseDir/assets/rrna-db-defaults.txt"
 80 | 
 81 |   // Boilerplate options
 82 |   clusterOptions = false
 83 |   outdir = './results'
 84 |   name = false
 85 |   multiqc_config = "$baseDir/assets/multiqc_config.yaml"
 86 |   email = false
 87 |   email_on_fail = false
 88 |   max_multiqc_email_size = 25.MB
 89 |   plaintext_email = false
 90 |   monochrome_logs = false
 91 |   help = false
 92 |   igenomes_base = "./iGenomes"
 93 |   tracedir = "${params.outdir}/pipeline_info"
 94 |   awsqueue = false
 95 |   awsregion = 'eu-west-1'
 96 |   igenomesIgnore = false
 97 |   custom_config_version = 'master'
 98 |   custom_config_base = "https://raw.githubusercontent.com/nf-core/configs/${params.custom_config_version}"
 99 |   hostnames = false
100 |   config_profile_description = false
101 |   config_profile_contact = false
102 |   config_profile_url = false
103 | }
104 | 
105 | // Container slug. Stable releases should specify release tag!
106 | // Developmental code should specify :dev
107 | process.container = 'nfcore/rnaseq:1.4.2'
108 | 
109 | // Load base.config by default for all pipelines
110 | includeConfig 'conf/base.config'
111 | 
112 | // Load nf-core custom profiles from different Institutions
113 | try {
114 |   includeConfig "${params.custom_config_base}/nfcore_custom.config"
115 | } catch (Exception e) {
116 |   System.err.println("WARNING: Could not load nf-core/config profiles: ${params.custom_config_base}/nfcore_custom.config")
117 | }
118 | 
119 | profiles {
120 |   awsbatch { includeConfig 'conf/awsbatch.config' }
121 |   conda { process.conda = "$baseDir/environment.yml" }
122 |   debug { process.beforeScript = 'echo $HOSTNAME' }
123 |   docker { docker.enabled = true }
124 |   singularity { singularity.enabled = true
125 |                 singularity.autoMounts = true }
126 |   test { includeConfig 'conf/test.config' }
127 |   test_gz { includeConfig 'conf/test_gz.config' }
128 | }
129 | 
130 | // Avoid this error:
131 | // WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
132 | // Testing this in nf-core after discussion here https://github.com/nf-core/tools/pull/351, once this is established and works well, nextflow might implement this behavior as new default.
133 | docker.runOptions = '-u \$(id -u):\$(id -g)'
134 | 
135 | // Load igenomes.config if required
136 | if (!params.igenomesIgnore) {
137 |   includeConfig 'conf/igenomes.config'
138 | }
139 | 
140 | // Capture exit codes from upstream processes when piping
141 | process.shell = ['/bin/bash', '-euo', 'pipefail']
142 | 
143 | timeline {
144 |   enabled = true
145 |   file = "${params.tracedir}/execution_timeline.html"
146 | }
147 | report {
148 |   enabled = true
149 |   file = "${params.tracedir}/execution_report.html"
150 | }
151 | trace {
152 |   enabled = true
153 |   file = "${params.tracedir}/execution_trace.txt"
154 | }
155 | dag {
156 |   enabled = true
157 |   file = "${params.tracedir}/pipeline_dag.svg"
158 | }
159 | 
160 | manifest {
161 |   name = 'nf-core/rnaseq'
162 |   author = 'Phil Ewels, Rickard Hammarén'
163 |   homePage = 'https://github.com/nf-core/rnaseq'
164 |   description = 'Nextflow RNA-Seq analysis pipeline, part of the nf-core community.'
165 |   mainScript = 'main.nf'
166 |   nextflowVersion = '>=19.04.0'
167 |   version = '1.4.2'
168 | }
169 | 
170 | // Function to ensure that resource requirements don't go beyond
171 | // a maximum limit
172 | def check_max(obj, type) {
173 |   if (type == 'memory') {
174 |     try {
175 |       if (obj.compareTo(params.max_memory as nextflow.util.MemoryUnit) == 1)
176 |         return params.max_memory as nextflow.util.MemoryUnit
177 |       else
178 |         return obj
179 |     } catch (all) {
180 |       println "   ### ERROR ###   Max memory '${params.max_memory}' is not valid! Using default value: $obj"
181 |       return obj
182 |     }
183 |   } else if (type == 'time') {
184 |     try {
185 |       if (obj.compareTo(params.max_time as nextflow.util.Duration) == 1)
186 |         return params.max_time as nextflow.util.Duration
187 |       else
188 |         return obj
189 |     } catch (all) {
190 |       println "   ### ERROR ###   Max time '${params.max_time}' is not valid! Using default value: $obj"
191 |       return obj
192 |     }
193 |   } else if (type == 'cpus') {
194 |     try {
195 |       return Math.min( obj, params.max_cpus as int )
196 |     } catch (all) {
197 |       println "   ### ERROR ###   Max cpus '${params.max_cpus}' is not valid! Using default value: $obj"
198 |       return obj
199 |     }
200 |   }
201 | }
202 | 


--------------------------------------------------------------------------------
/bin/dupRadar.r:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env Rscript
  2 | 
  3 | # Command line argument processing
  4 | args = commandArgs(trailingOnly=TRUE)
  5 | if (length(args) < 5) {
  6 |   stop("Usage: dupRadar.r <input.bam> <annotation.gtf> <strandDirection:0=unstranded/1=forward/2=reverse> <paired/single> <nbThreads> <R-package-location (optional)>", call.=FALSE)
  7 | }
  8 | input_bam <- args[1]
  9 | annotation_gtf <- args[2]
 10 | stranded <- as.numeric(args[3])
 11 | paired_end <- if(args[4]=='paired') TRUE else FALSE
 12 | threads <- as.numeric(args[5])
 13 | 
 14 | bamRegex <- "(.+)\\.bam$"
 15 | 
 16 | if(!(grepl(bamRegex, input_bam) && file.exists(input_bam) &&  (!file.info(input_bam)$isdir))) stop("First argument '<input.bam>' must be an existing file (not a directory) with '.bam' extension...")
 17 | if(!(file.exists(annotation_gtf) &&  (!file.info(annotation_gtf)$isdir))) stop("Second argument '<annotation.gtf>' must be an existing file (and not a directory)...")
 18 | if(is.na(stranded) || (!(stranded %in% (0:2)))) stop("Third argument <strandDirection> must be a numeric value in 0(unstranded)/1(forward)/2(reverse)...")
 19 | if(is.na(threads) || (threads<=0)) stop("Fifth argument <nbThreads> must be a strictly positive numeric value...")
 20 | 
 21 | # Remove bam file extension to generate basename
 22 | input_bam_basename <- gsub(bamRegex, "\\1", input_bam)
 23 | input_bam_basename <- gsub("_subsamp.*", "", input_bam_basename)
 24 | input_bam_basename <- gsub("\\.sorted.*", "", input_bam_basename)
 25 | input_bam_basename <- gsub("Aligned.*", "", input_bam_basename)
 26 | 
 27 | # Debug messages (stderr)
 28 | message("Input bam      (Arg 1): ", input_bam)
 29 | message("Input gtf      (Arg 2): ", annotation_gtf)
 30 | message("Strandness     (Arg 3): ", c("unstranded", "forward", "reverse")[stranded+1])
 31 | message("paired/single  (Arg 4): ", ifelse(paired_end, 'paired', 'single'))
 32 | message("Nb threads     (Arg 5): ", threads)
 33 | message("R package loc. (Arg 6): ", ifelse(length(args) > 4, args[5], "Not specified"))
 34 | message("Output basename       : ", input_bam_basename)
 35 | 
 36 | 
 37 | # Load / install packages
 38 | if (length(args) > 5) { .libPaths( c( args[6], .libPaths() ) ) }
 39 | if (!require("dupRadar")){
 40 |   source("http://bioconductor.org/biocLite.R")
 41 |   biocLite("dupRadar", suppressUpdates=TRUE)
 42 |   library("dupRadar")
 43 | }
 44 | if (!require("parallel")) {
 45 |   install.packages("parallel", dependencies=TRUE, repos='http://cloud.r-project.org/')
 46 |   library("parallel")
 47 | }
 48 | 
 49 | # Duplicate stats
 50 | dm <- analyzeDuprates(input_bam, annotation_gtf, stranded, paired_end, threads)
 51 | write.table(dm, file=paste(input_bam_basename, "_dupMatrix.txt", sep=""), quote=F, row.name=F, sep="\t")
 52 | 
 53 | # 2D density scatter plot
 54 | pdf(paste0(input_bam_basename, "_duprateExpDens.pdf"))
 55 | duprateExpDensPlot(DupMat=dm)
 56 | title("Density scatter plot")
 57 | mtext(input_bam_basename, side=3)
 58 | dev.off()
 59 | fit <- duprateExpFit(DupMat=dm)
 60 | cat(
 61 |   paste("- dupRadar Int (duprate at low read counts):", fit$intercept),
 62 |   paste("- dupRadar Sl (progression of the duplication rate):", fit$slope),
 63 |   fill=TRUE, labels=input_bam_basename,
 64 |   file=paste0(input_bam_basename, "_intercept_slope.txt"), append=FALSE
 65 | )
 66 | 
 67 | # Create a multiqc file dupInt
 68 | sample_name <- gsub("Aligned.sortedByCoord.out.markDups", "", input_bam_basename)
 69 | line="#id: DupInt
 70 | #plot_type: 'generalstats'
 71 | #pconfig:
 72 | #    dupRadar_intercept:
 73 | #        title: 'dupInt'
 74 | #        namespace: 'DupRadar'
 75 | #        description: 'Intercept value from DupRadar'
 76 | #        max: 100
 77 | #        min: 0
 78 | #        scale: 'RdYlGn-rev'
 79 | #        format: '{:.2f}%'
 80 | Sample dupRadar_intercept"
 81 | 
 82 | write(line,file=paste0(input_bam_basename, "_dup_intercept_mqc.txt"),append=TRUE)
 83 | write(paste(sample_name, fit$intercept),file=paste0(input_bam_basename, "_dup_intercept_mqc.txt"),append=TRUE)
 84 | 
 85 | # Get numbers from dupRadar GLM
 86 | curve_x <- sort(log10(dm$RPK))
 87 | curve_y = 100*predict(fit$glm, data.frame(x=curve_x), type="response")
 88 | # Remove all of the infinite values
 89 | infs = which(curve_x %in% c(-Inf,Inf))
 90 | curve_x = curve_x[-infs]
 91 | curve_y = curve_y[-infs]
 92 | # Reduce number of data points
 93 | curve_x <- curve_x[seq(1, length(curve_x), 10)]
 94 | curve_y <- curve_y[seq(1, length(curve_y), 10)]
 95 | # Convert x values back to real counts
 96 | curve_x = 10^curve_x
 97 | # Write to file
 98 | line="#id: DupRadar
 99 | #section_name: 'DupRadar'
100 | #section_href: 'bioconductor.org/packages/release/bioc/html/dupRadar.html'
101 | #description: \"provides duplication rate quality control for RNA-Seq datasets. Highly expressed genes can be expected to have a lot of duplicate reads, but high numbers of duplicates at low read counts can indicate low library complexity with technical duplication.
102 | #    This plot shows the general linear models - a summary of the gene duplication distributions. \"
103 | #pconfig:
104 | #    title: 'DupRadar General Linear Model'
105 | #    xLog: True
106 | #    xlab: 'expression (reads/kbp)'
107 | #    ylab: '% duplicate reads'
108 | #    ymax: 100
109 | #    ymin: 0
110 | #    tt_label: '<b>{point.x:.1f} reads/kbp</b>: {point.y:,.2f}% duplicates'
111 | #    xPlotLines:
112 | #        - color: 'green'
113 | #          dashStyle: 'LongDash'
114 | #          label:
115 | #                style: {color: 'green'}
116 | #                text: '0.5 RPKM'
117 | #                verticalAlign: 'bottom'
118 | #                y: -65
119 | #          value: 0.5
120 | #          width: 1
121 | #        - color: 'red'
122 | #          dashStyle: 'LongDash'
123 | #          label:
124 | #                style: {color: 'red'}
125 | #                text: '1 read/bp'
126 | #                verticalAlign: 'bottom'
127 | #                y: -65
128 | #          value: 1000
129 | #          width: 1"
130 | 
131 | write(line,file=paste0(input_bam_basename, "_duprateExpDensCurve_mqc.txt"),append=TRUE)
132 | write.table(
133 |   cbind(curve_x, curve_y),
134 |   file=paste0(input_bam_basename, "_duprateExpDensCurve_mqc.txt"),
135 |   quote=FALSE, row.names=FALSE, col.names=FALSE, append=TRUE, 
136 | )
137 | 
138 | # Distribution of expression box plot
139 | pdf(paste0(input_bam_basename, "_duprateExpBoxplot.pdf"))
140 | duprateExpBoxplot(DupMat=dm)
141 | title("Percent Duplication by Expression")
142 | mtext(input_bam_basename, side=3)
143 | dev.off()
144 | 
145 | # Distribution of RPK values per gene
146 | pdf(paste0(input_bam_basename, "_expressionHist.pdf"))
147 | expressionHist(DupMat=dm)
148 | title("Distribution of RPK values per gene")
149 | mtext(input_bam_basename, side=3)
150 | dev.off()
151 | 
152 | # Print sessioninfo to standard out
153 | print(input_bam_basename)
154 | citation("dupRadar")
155 | sessionInfo()
156 | 


--------------------------------------------------------------------------------
/CHANGELOG.md:
--------------------------------------------------------------------------------
  1 | # nf-core/rnaseq: Changelog
  2 | 
  3 | ## Version 1.4.2
  4 | 
  5 | * Minor version release for keeping Git History in sync
  6 | * No changes with respect to 1.4.1 on pipeline level
  7 | 
  8 | ## Version 1.4.1
  9 | 
 10 | Major novel changes include:
 11 | 
 12 | * Update `igenomes.config` with NCBI `GRCh38` and most recent UCSC genomes
 13 | * Set `autoMounts = true` by default for `singularity` profile
 14 | 
 15 | ### Pipeline enhancements & fixes
 16 | 
 17 | * Fixed parameter warnings [#316](https://github.com/nf-core/rnaseq/issues/316) and [318](https://github.com/nf-core/rnaseq/issues/318)
 18 | * Fixed [#307](https://github.com/nf-core/rnaseq/issues/307) - Confusing Info Printout about GFF and GTF
 19 | 
 20 | ## Version 1.4
 21 | 
 22 | Major novel changes include:
 23 | 
 24 | * Support for Salmon as an alternative method to STAR and HISAT2
 25 | * Several improvements in `featureCounts` handling of types other than `exon`. It is possible now to handle nuclearRNAseq data. Nuclear RNA has un-spliced RNA, and the whole transcript, including the introns, needs to be counted, e.g. by specifying `--fc_count_type transcript`.
 26 | * Support for [outputting unaligned data](https://github.com/nf-core/rnaseq/issues/277) to results folders.
 27 | * Added options to skip several steps
 28 | 
 29 |   * Skip trimming using `--skipTrimming`
 30 |   * Skip BiotypeQC using `--skipBiotypeQC`
 31 |   * Skip Alignment using `--skipAlignment` to only use pseudo-alignment using Salmon
 32 | 
 33 | ### Documentation updates
 34 | 
 35 | * Adjust wording of skipped samples [in pipeline output](https://github.com/nf-core/rnaseq/issues/290)
 36 | * Fixed link to guidelines [#203](https://github.com/nf-core/rnaseq/issues/203)
 37 | * Add `Citation` and `Quick Start` section to `README.md`
 38 | * Add in documentation of the `--gff` parameter
 39 | 
 40 | ### Reporting Updates
 41 | 
 42 | * Generate MultiQC plots in the results directory [#200](https://github.com/nf-core/rnaseq/issues/200)
 43 | * Get MultiQC to save plots as [standalone files](https://github.com/nf-core/rnaseq/issues/183)
 44 | * Get MultiQC to write out the software versions in a `.csv` file [#185](https://github.com/nf-core/rnaseq/issues/185)
 45 | * Use `file` instead of `new File` to create `pipeline_report.{html,txt}` files, and properly create subfolders
 46 | 
 47 | ### Pipeline enhancements & fixes
 48 | 
 49 | * Restore `SummarizedExperimment` object creation in the salmon_merge process avoiding increasing memory with sample size.
 50 | * Fix sample names in feature counts and dupRadar to remove suffixes added in other processes
 51 | * Removed `genebody_coverage` process [#195](https://github.com/nf-core/rnaseq/issues/195)
 52 | * Implemented Pearsons correlation instead of Euclidean distance [#146](https://github.com/nf-core/rnaseq/issues/146)
 53 | * Add `--stringTieIgnoreGTF` parameter [#206](https://github.com/nf-core/rnaseq/issues/206)
 54 | * Removed unused `stringtie` channels for `MultiQC`
 55 | * Integrate changes in `nf-core/tools v1.6` template which resolved [#90](https://github.com/nf-core/rnaseq/issues/90)
 56 | * Moved process `convertGFFtoGTF` before `makeSTARindex` [#215](https://github.com/nf-core/rnaseq/issues/215)
 57 | * Change all boolean parameters from `snake_case` to `camelCase` and vice versa for value parameters
 58 | * Add SM ReadGroup info for QualiMap compatibility[#238](https://github.com/nf-core/rnaseq/issues/238)
 59 | * Obtain edgeR + dupRadar version information [#198](https://github.com/nf-core/rnaseq/issues/198) and [#112](https://github.com/nf-core/rnaseq/issues/112)
 60 | * Add `--gencode` option for compatibility of Salmon and featureCounts biotypes with GENCODE gene annotations
 61 | * Added functionality to accept compressed reference data in the pipeline
 62 | * Check that gtf features are on chromosomes that exist in the genome fasta file [#274](https://github.com/nf-core/rnaseq/pull/274)
 63 | * Maintain all gff features upon gtf conversion (keeps `gene_biotype` or `gene_type` to make `featureCounts` happy)
 64 | * Add SortMeRNA as an optional step to allow rRNA removal [#280](https://github.com/nf-core/rnaseq/issues/280)
 65 | * Minimal adjustment of memory and CPU constraints for clusters with locked memory / CPU relation
 66 | * Cleaned up usage, `parameters.settings.json` and the `nextflow.config`
 67 | 
 68 | ### Dependency Updates
 69 | 
 70 | * Dependency list is now sorted appropriately
 71 | * Force matplotlib=3.0.3
 72 | 
 73 | #### Updated Packages
 74 | 
 75 | * Picard 2.20.0 -> 2.21.1
 76 | * bioconductor-dupradar 1.12.1 -> 1.14.0
 77 | * bioconductor-edger 3.24.3 -> 3.26.5
 78 | * gffread 0.9.12 -> 0.11.4
 79 | * trim-galore 0.6.1 -> 0.6.4
 80 | * gffread 0.9.12 -> 0.11.4
 81 | * rseqc 3.0.0 -> 3.0.1
 82 | * R-Base 3.5 -> 3.6.1
 83 | 
 84 | #### Added / Removed Packages
 85 | 
 86 | * Dropped CSVtk in favor of Unix's simple `cut` and `paste` utilities
 87 | * Added Salmon 0.14.2
 88 | * Added TXIMeta 1.2.2
 89 | * Added SummarizedExperiment 1.14.0
 90 | * Added SortMeRNA 2.1b
 91 | * Add tximport and summarizedexperiment dependency [#171](https://github.com/nf-core/rnaseq/issues/171)
 92 | * Add Qualimap dependency [#202](https://github.com/nf-core/rnaseq/issues/202)
 93 | 
 94 | ## [Version 1.3](https://github.com/nf-core/rnaseq/releases/tag/1.3) - 2019-03-26
 95 | 
 96 | ### Pipeline Updates
 97 | 
 98 | * Added configurable options to specify group attributes for featureCounts [#144](https://github.com/nf-core/rnaseq/issues/144)
 99 | * Added support for RSeqC 3.0 [#148](https://github.com/nf-core/rnaseq/issues/148)
100 | * Added a `parameters.settings.json` file for use with the new `nf-core launch` helper tool.
101 | * Centralized all configuration profiles using [nf-core/configs](https://github.com/nf-core/configs)
102 | * Fixed all centralized configs [for offline usage](https://github.com/nf-core/rnaseq/issues/163)
103 | * Hide %dup in [multiqc report](https://github.com/nf-core/rnaseq/issues/150)
104 | * Add option for Trimming NextSeq data properly ([@jburos work](https://github.com/jburos))
105 | 
106 | ### Bug fixes
107 | 
108 | * Fixing HISAT2 Index Building for large reference genomes [#153](https://github.com/nf-core/rnaseq/issues/153)
109 | * Fixing HISAT2 BAM sorting using more memory than available on the system
110 | * Fixing MarkDuplicates memory consumption issues following [#179](https://github.com/nf-core/rnaseq/pull/179)
111 | * Use `file` instead of `new File` to create the `pipeline_report.{html,txt}` files to avoid creating local directories when outputting to AWS S3 folders
112 | 
113 | ### Dependency Updates
114 | 
115 | * RSeQC 2.6.4 -> 3.0.0
116 | * Picard 2.18.15 -> 2.20.0
117 | * r-data.table 1.11.4 -> 1.12.2
118 | * bioconductor-edger 3.24.1 -> 3.24.3
119 | * r-markdown 0.8 -> 0.9
120 | * csvtk 0.15.0 -> 0.17.0
121 | * stringtie 1.3.4 -> 1.3.6
122 | * subread 1.6.2 -> 1.6.4
123 | * gffread 0.9.9 -> 0.9.12
124 | * multiqc 1.6 -> 1.7
125 | * deeptools 3.2.0 -> 3.2.1
126 | * trim-galore 0.5.0 -> 0.6.1
127 | * qualimap 2.2.2b
128 | * matplotlib 3.0.3
129 | * r-base 3.5.1
130 | 
131 | ## [Version 1.2](https://github.com/nf-core/rnaseq/releases/tag/1.2) - 2018-12-12
132 | 
133 | ### Pipeline updates
134 | 
135 | * Removed some outdated documentation about non-existent features
136 | * Config refactoring and code cleaning
137 | * Added a `--fcExtraAttributes` option to specify more than ENSEMBL gene names in `featureCounts`
138 | * Remove legacy rseqc `strandRule` config code. [#119](https://github.com/nf-core/rnaseq/issues/119)
139 | * Added STRINGTIE ballgown output to results folder [#125](https://github.com/nf-core/rnaseq/issues/125)
140 | * HiSAT index build now requests `200GB` memory, enough to use the exons / splice junction option for building.
141 |   * Added documentation about the `--hisatBuildMemory` option.
142 | * BAM indices are stored and re-used between processes [#71](https://github.com/nf-core/rnaseq/issues/71)
143 | 
144 | ### Bug Fixes
145 | 
146 | * Fixed conda bug which caused problems with environment resolution due to changes in bioconda [#113](https://github.com/nf-core/rnaseq/issues/113)
147 | * Fixed wrong gffread command line [#117](https://github.com/nf-core/rnaseq/issues/117)
148 | * Added `cpus = 1` to `workflow summary process` [#130](https://github.com/nf-core/rnaseq/issues/130)
149 | 
150 | ## [Version 1.1](https://github.com/nf-core/rnaseq/releases/tag/1.1) - 2018-10-05
151 | 
152 | ### Pipeline updates
153 | 
154 | * Wrote docs and made minor tweaks to the `--skip_qc` and associated options
155 | * Removed the depreciated `uppmax-modules` config profile
156 | * Updated the `hebbe` config profile to use the new `withName` syntax too
157 | * Use new `workflow.manifest` variables in the pipeline script
158 | * Updated minimum nextflow version to `0.32.0`
159 | 
160 | ### Bug Fixes
161 | 
162 | * [#77](https://github.com/nf-core/rnaseq/issues/77): Added back `executor = 'local'` for the `workflow_summary_mqc`
163 | * [#95](https://github.com/nf-core/rnaseq/issues/95): Check if task.memory is false instead of null
164 | * [#97](https://github.com/nf-core/rnaseq/issues/97): Resolved edge-case where numeric sample IDs are parsed as numbers causing some samples to be incorrectly overwritten.
165 | 
166 | ## [Version 1.0](https://github.com/nf-core/rnaseq/releases/tag/1.0) - 2018-08-20
167 | 
168 | This release marks the point where the pipeline was moved from [SciLifeLab/NGI-RNAseq](https://github.com/SciLifeLab/NGI-RNAseq)
169 | over to the new [nf-core](http://nf-co.re/) community, at [nf-core/rnaseq](https://github.com/nf-core/rnaseq).
170 | 
171 | View the previous changelog at [SciLifeLab/NGI-RNAseq/CHANGELOG.md](https://github.com/SciLifeLab/NGI-RNAseq/blob/master/CHANGELOG.md)
172 | 
173 | In addition to porting to the new nf-core community, the pipeline has had a number of major changes in this version.
174 | There have been 157 commits by 16 different contributors covering 70 different files in the pipeline: 7,357 additions and 8,236 deletions!
175 | 
176 | In summary, the main changes are:
177 | 
178 | * Rebranding and renaming throughout the pipeline to nf-core
179 | * Updating many parts of the pipeline config and style to meet nf-core standards
180 | * Support for GFF files in addition to GTF files
181 |   * Just use `--gff` instead of `--gtf` when specifying a file path
182 | * New command line options to skip various quality control steps
183 | * More safety checks when launching a pipeline
184 |   * Several new sanity checks - for example, that the specified reference genome exists
185 | * Improved performance with memory usage (especially STAR and Picard)
186 | * New BigWig file outputs for plotting coverage across the genome
187 | * Refactored gene body coverage calculation, now much faster and using much less memory
188 | * Bugfixes in the MultiQC process to avoid edge cases where it wouldn't run
189 | * MultiQC report now automatically attached to the email sent when the pipeline completes
190 | * New testing method, with data on GitHub
191 |   * Now run pipeline with `-profile test` instead of using bash scripts
192 | * Rewritten continuous integration tests with Travis CI
193 | * New explicit support for Singularity containers
194 | * Improved MultiQC support for DupRadar and featureCounts
195 |   * Now works for all users instead of just NGI Stockholm
196 | * New configuration for use on AWS batch
197 | * Updated config syntax to support latest versions of Nextflow
198 | * Built-in support for a number of new local HPC systems
199 |   * CCGA, GIS, UCT HEX, updates to UPPMAX, CFC, BINAC, Hebbe, c3se
200 | * Slightly improved documentation (more updates to come)
201 | * Updated software packages
202 | 
203 | ...and many more minor tweaks.
204 | 
205 | Thanks to everyone who has worked on this release!
206 | 


--------------------------------------------------------------------------------
/docs/output.md:
--------------------------------------------------------------------------------
  1 | # nf-core/rnaseq: Output
  2 | 
  3 | This document describes the output produced by the pipeline. Most of the plots are taken from the MultiQC report, which summarises results at the end of the pipeline.
  4 | 
  5 | ## Pipeline overview
  6 | The pipeline is built using [Nextflow](https://www.nextflow.io/)
  7 | and processes data using the following steps:
  8 | 
  9 | * [FastQC](#fastqc) - read quality control
 10 | * [TrimGalore](#trimgalore) - adapter trimming
 11 | * [SortMeRNA](#sortmerna) - ribosomal RNA removal
 12 | * [STAR](#star) - alignment
 13 | * [RSeQC](#rseqc) - RNA quality control metrics
 14 |   * [BAM stat](#bam-stat)
 15 |   * [Infer experiment](#infer-experiment)
 16 |   * [Junction saturation](#junction-saturation)
 17 |   * [RPKM saturation](#rpkm-saturation)
 18 |   * [Read duplication](#read-duplication)
 19 |   * [Inner distance](#inner-distance)
 20 |   * [Read distribution](#read-distribution)
 21 |   * [Junction annotation](#junction-annotation)
 22 | * [Qualimap](#qualimap) - RNA quality control metrics
 23 | * [dupRadar](#dupradar) - technical / biological read duplication
 24 | * [Preseq](#preseq) - library complexity
 25 | * [featureCounts](#featurecounts) - gene counts, biotype counts, rRNA estimation.
 26 | * [Salmon](#salmon) - gene counts, transcripts counts.
 27 | * [tximport](#tximport) - gene counts, transcripts counts, SummarizedExperimment object.
 28 | * [StringTie](#stringtie) - FPKMs for genes and transcripts
 29 | * [Sample_correlation](#Sample_correlation) - create MDS plot and sample pairwise distance heatmap / dendrogram
 30 | * [MultiQC](#multiqc) - aggregate report, describing results of the whole pipeline
 31 | 
 32 | ## FastQC
 33 | [FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) gives general quality metrics about your reads. It provides information about the quality score distribution across your reads, the per base sequence content (%T/A/G/C). You get information about adapter contamination and other overrepresented sequences.
 34 | 
 35 | For further reading and documentation see the [FastQC help](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/).
 36 | 
 37 | > **NB:** The FastQC plots displayed in the MultiQC report shows _untrimmed_ reads. They may contain adapter sequence and potentially regions with low quality. To see how your reads look after trimming, look at the FastQC reports in the `trim_galore` directory.
 38 | 
 39 | **Output directory: `results/fastqc`**
 40 | 
 41 | * `sample_fastqc.html`
 42 |   * FastQC report, containing quality metrics for your untrimmed raw fastq files
 43 | * `zips/sample_fastqc.zip`
 44 |   * zip file containing the FastQC report, tab-delimited data file and plot images
 45 | 
 46 | ## TrimGalore
 47 | The nfcore/rnaseq pipeline uses [TrimGalore](http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) for removal of adapter contamination and trimming of low quality regions. TrimGalore uses [Cutadapt](https://github.com/marcelm/cutadapt) for adapter trimming and runs FastQC after it finishes.
 48 | 
 49 | MultiQC reports the percentage of bases removed by TrimGalore in the _General Statistics_ table, along with a line plot showing where reads were trimmed.
 50 | 
 51 | **Output directory: `results/trim_galore`**
 52 | 
 53 | Contains FastQ files with quality and adapter trimmed reads for each sample, along with a log file describing the trimming.
 54 | 
 55 | * `sample_val_1.fq.gz`, `sample_val_2.fq.gz`
 56 |   * Trimmed FastQ data, reads 1 and 2.
 57 |   * NB: Only saved if `--saveTrimmed` has been specified.
 58 | * `logs/sample_val_1.fq.gz_trimming_report.txt`
 59 |   * Trimming report (describes which parameters that were used)
 60 | * `FastQC/sample_val_1_fastqc.zip`
 61 |   * FastQC report for trimmed reads
 62 | 
 63 | Single-end data will have slightly different file names and only one FastQ file per sample.
 64 | 
 65 | ## SortMeRNA
 66 | 
 67 | When `--removeRiboRNA` is specified, nfcore/rnaseq pipeline uses [SortMeRNA](https://github.com/biocore/sortmerna) for removal of rRNA. SortMeRNA requires reference sequences and these are by default from the [SILVA database](https://www.arb-silva.de/).
 68 | 
 69 | **Output directory: `results/SortMeRNA`**
 70 | 
 71 | Contains FastQ files with quality and adapter trimmed reads for each sample, along with a log file describing the trimming.
 72 | 
 73 | * `reads/sample-fw.fq.gz`, `reads/sample-rv.fq.gz`
 74 |   * Trimmed and rRNA depleted FastQ data, reads forward and reverse.
 75 |   * NB: Only saved if `--save_nonrRNA_reads` has been specified.
 76 | * `logs/sample_rRNA_report.txt`
 77 |   * Report how many reads where removed due to matches to reference database(s).
 78 | 
 79 | Single-end data will have slightly different file names (`reads/sample.fq.gz`) and only one FastQ file per sample.
 80 | 
 81 | ## STAR
 82 | STAR is a read aligner designed for RNA sequencing.  STAR stands for Spliced Transcripts Alignment to a Reference, it produces results comparable to TopHat (the aligned previously used by NGI for RNA alignments) but is much faster.
 83 | 
 84 | The STAR section of the MultiQC report shows a bar plot with alignment rates: good samples should have most reads as _Uniquely mapped_ and few _Unmapped_ reads.
 85 | 
 86 | ![STAR](images/star_alignment_plot.png)
 87 | 
 88 | **Output directory: `results/STAR`**
 89 | 
 90 | * `Sample_Aligned.sortedByCoord.out.bam`
 91 |   * The aligned BAM file
 92 | * `Sample_Log.final.out`
 93 |   * The STAR alignment report, contains mapping results summary
 94 | * `Sample_Log.out` and `Sample_Log.progress.out`
 95 |   * STAR log files, containing a lot of detailed information about the run. Typically only useful for debugging purposes.
 96 | * `Sample_SJ.out.tab`
 97 |   * Filtered splice junctions detected in the mapping
 98 | * `unaligned/...`
 99 |   * Contains the unmapped reads that couldn't be mapped against the reference genome chosen. This is only available when the user specifically asks for `--saveUnaligned` output.
100 | 
101 | ## RSeQC
102 | 
103 | RSeQC is a package of scripts designed to evaluate the quality of RNA seq data. You can find out more about the package at the [RSeQC website](http://rseqc.sourceforge.net/).
104 | 
105 | This pipeline runs several, but not all RSeQC scripts. All of these results are summarised within the MultiQC report and described below.
106 | 
107 | **Output directory: `results/rseqc`**
108 | 
109 | These are all quality metrics files and contains the raw data used for the plots in the MultiQC report. In general, the `.r` files are R scripts for generating the figures, the `.txt` are summary files, the `.xls` are data tables and the `.pdf` files are summary figures.
110 | 
111 | ### BAM stat
112 | **Output: `Sample_bam_stat.txt`**
113 | 
114 | This script gives numerous statistics about the aligned BAM files produced by STAR. A typical output looks as follows:
115 | 
116 | ```txt
117 | #Output (all numbers are read count)
118 | #==================================================
119 | Total records:                                 41465027
120 | QC failed:                                     0
121 | Optical/PCR duplicate:                         0
122 | Non Primary Hits                               8720455
123 | Unmapped reads:                                0
124 | 
125 | mapq < mapq_cut (non-unique):                  3127757
126 | mapq >= mapq_cut (unique):                     29616815
127 | Read-1:                                        14841738
128 | Read-2:                                        14775077
129 | Reads map to '+':                              14805391
130 | Reads map to '-':                              14811424
131 | Non-splice reads:                              25455360
132 | Splice reads:                                  4161455
133 | Reads mapped in proper pairs:                  21856264
134 | Proper-paired reads map to different chrom:    7648
135 | ```
136 | 
137 | MultiQC plots each of these statistics in a dot plot. Each sample in the project is a dot - hover to see the sample highlighted across all fields.
138 | 
139 | RSeQC documentation: [bam_stat.py](http://rseqc.sourceforge.net/#bam-stat-py)
140 | 
141 | ### Infer experiment
142 | **Output: `Sample_infer_experiment.txt`**
143 | 
144 | This script predicts the mode of library preparation (sense-stranded or antisense-stranded) according to how aligned reads overlay gene features in the reference genome.
145 | Example output from an unstranded (~50% sense/antisense) library of paired end data:
146 | 
147 | **From MultiQC report:**
148 | ![infer_experiment](images/rseqc_infer_experiment_plot.png)
149 | 
150 | **From the `infer_experiment.txt` file:**
151 | 
152 | ```txt
153 | This is PairEnd Data
154 | Fraction of reads failed to determine: 0.0409
155 | Fraction of reads explained by "1++,1--,2+-,2-+": 0.4839
156 | Fraction of reads explained by "1+-,1-+,2++,2--": 0.4752
157 | ```
158 | 
159 | RSeQC documentation: [infer_experiment.py](http://rseqc.sourceforge.net/#infer-experiment-py)
160 | 
161 | 
162 | ### Junction saturation
163 | **Output:**
164 | * `Sample_rseqc.junctionSaturation_plot.pdf`
165 | * `Sample_rseqc.junctionSaturation_plot.r`
166 | 
167 | This script shows the number of splice sites detected at the data at various levels of subsampling. A sample that reaches a plateau before getting to 100% data indicates that all junctions in the library have been detected, and that further sequencing will not yield more observations. A good sample should approach such a plateau of _Known junctions_, very deep sequencing is typically requires to saturate all _Novel Junctions_ in a sample.
168 | 
169 | None of the lines in this example have plateaued and thus these samples could reveal more alternative splicing information if they were sequenced deeper.
170 | 
171 | ![Junction saturation](images/rseqc_junction_saturation_plot.png)
172 | 
173 | RSeQC documentation: [junction_saturation.py](http://rseqc.sourceforge.net/#junction-saturation-py)
174 | 
175 | ### RPKM saturation
176 | **Output:**
177 | 
178 | * `Sample_RPKM_saturation.eRPKM.xls`
179 | * `Sample_RPKM_saturation.rawCount.xls`
180 | * `Sample_RPKM_saturation.saturation.pdf`
181 | * `Sample_RPKM_saturation.saturation.r`
182 | 
183 | This tool resamples a subset of the total RNA reads and calculates the RPKM value for each subset. We use the default subsets of every 5% of the total reads.
184 | A percent relative error is then calculated based on the subsamples; this is the y-axis in the graph. A typical PDF figure looks as follows:
185 | 
186 | ![RPKM saturation](images/saturation.png)
187 | 
188 | A complex library will have low resampling error in well expressed genes.
189 | 
190 | This data is not currently reported in the MultiQC report.
191 | 
192 | RSeQC documentation: [RPKM_saturation.py](http://rseqc.sourceforge.net/#rpkm-saturation-py)
193 | 
194 | 
195 | ### Read duplication
196 | **Output:**
197 | 
198 | * `Sample_read_duplication.DupRate_plot.pdf`
199 | * `Sample_read_duplication.DupRate_plot.r`
200 | * `Sample_read_duplication.pos.DupRate.xls`
201 | * `Sample_read_duplication.seq.DupRate.xls`
202 | 
203 | This plot shows the number of reads (y-axis) with a given number of exact duplicates (x-axis). Most reads in an RNA-seq library should have a low number of exact duplicates. Samples which have many reads with many duplicates (a large area under the curve) may be suffering excessive technical duplication.
204 | 
205 | ![Read duplication](images/rseqc_read_dups_plot.png)
206 | 
207 | RSeQC documentation: [read_duplication.py](http://rseqc.sourceforge.net/#read-duplication-py)
208 | 
209 | ### Inner distance
210 | **Output:**
211 | 
212 | * `Sample_rseqc.inner_distance.txt`
213 | * `Sample_rseqc.inner_distance_freq.txt`
214 | * `Sample_rseqc.inner_distance_plot.r`
215 | 
216 | The inner distance script tries to calculate the inner distance between two paired RNA reads. It is the distance between the end of read 1 to the start of read 2,
217 | and it is sometimes confused with the insert size (see [this blog post](http://thegenomefactory.blogspot.com.au/2013/08/paired-end-read-confusion-library.html) for disambiguation):
218 | ![inner distance concept](images/inner_distance_concept.png)
219 | > _Credit: modified from RSeQC documentation._
220 | 
221 | Note that values can be negative if the reads overlap. A typical set of samples may look like this:
222 | ![Inner distance](images/rseqc_inner_distance_plot.png)
223 | 
224 | This plot will not be generated for single-end data. Very short inner distances are often seen in old or degraded samples (_eg._ FFPE).
225 | 
226 | RSeQC documentation: [inner_distance.py](http://rseqc.sourceforge.net/#inner-distance-py)
227 | 
228 | ### Read distribution
229 | **Output: `Sample_read_distribution.txt`**
230 | 
231 | This tool calculates how mapped reads are distributed over genomic features. A good result for a standard RNA seq experiments is generally to have as many exonic reads as possible (`CDS_Exons`). A large amount of intronic reads could be indicative of DNA contamination in your sample or some other problem.
232 | 
233 | ![Read distribution](images/rseqc_read_distribution_plot.png)
234 | 
235 | RSeQC documentation: [read_distribution.py](http://rseqc.sourceforge.net/#read-distribution-py)
236 | 
237 | 
238 | ### Junction annotation
239 | **Output:**
240 | 
241 | * `Sample_junction_annotation_log.txt`
242 | * `Sample_rseqc.junction.xls`
243 | * `Sample_rseqc.junction_plot.r`
244 | * `Sample_rseqc.splice_events.pdf`
245 | * `Sample_rseqc.splice_junction.pdf`
246 | 
247 | Junction annotation compares detected splice junctions to a reference gene model. An RNA read can be spliced 2 or more times, each time is called a splicing event.
248 | 
249 | ![Junction annotation](images/rseqc_junction_annotation_junctions_plot.png)
250 | 
251 | RSeQC documentation: [junction_annotation.py](http://rseqc.sourceforge.net/#junction-annotation-py)
252 | 
253 | ## Qualimap
254 | [Qualimap](http://qualimap.bioinfo.cipf.es/) is a standalone package written in java. It calculates read alignment assignment, transcript coverage, read genomic origin, junction analysis and 3'-5' bias.
255 | 
256 | **Output directory: `results/qualimap`**
257 | 
258 | * `rnaseq_qc_results.txt`
259 | * `qualimapReport.html`
260 | * `css`
261 | * `raw_data_qualimapReport`
262 | * `images_qualimapReport`
263 | 
264 | Qualimap RNAseq documentation: [Qualimap docs](http://qualimap.bioinfo.cipf.es/doc_html/analysis.html#rna-seq-qc).
265 | 
266 | ## dupRadar
267 | [dupRadar](https://www.bioconductor.org/packages/release/bioc/html/dupRadar.html) is a Bioconductor library for R. It plots the duplication rate against expression (RPKM) for every gene. A good sample with little technical duplication will only show high numbers of duplicates for highly expressed genes. Samples with technical duplication will have high duplication for all genes, irrespective of transcription level.
268 | 
269 | ![dupRadar](images/dupRadar_plot.png)
270 | > _Credit: [dupRadar documentation](https://www.bioconductor.org/packages/devel/bioc/vignettes/dupRadar/inst/doc/dupRadar.html)_
271 | 
272 | **Output directory: `results/dupRadar`**
273 | 
274 | * `Sample_markDups.bam_duprateExpDens.pdf`
275 | * `Sample_markDups.bam_duprateExpBoxplot.pdf`
276 | * `Sample_markDups.bam_expressionHist.pdf`
277 | * `Sample_markDups.bam_dupMatrix.txt`
278 | * `Sample_markDups.bam_duprateExpDensCurve.txt`
279 | * `Sample_markDups.bam_intercept_slope.txt`
280 | 
281 | DupRadar documentation: [dupRadar docs](https://www.bioconductor.org/packages/devel/bioc/vignettes/dupRadar/inst/doc/dupRadar.html)
282 | 
283 | ## Preseq
284 | [Preseq](http://smithlabresearch.org/software/preseq/) estimates the complexity of a library, showing how many additional unique reads are sequenced for increasing the total read count. A shallow curve indicates that the library has reached complexity saturation and further sequencing would likely not add further unique reads. The dashed line shows a perfectly complex library where total reads = unique reads.
285 | 
286 | Note that these are predictive numbers only, not absolute. The MultiQC plot can sometimes give extreme sequencing depth on the X axis - click and drag from the left side of the plot to zoom in on more realistic numbers.
287 | 
288 | ![preseq](images/preseq_plot.png)
289 | 
290 | **Output directory: `results/preseq`**
291 | 
292 | * `sample_ccurve.txt`
293 |   * This file contains plot values for the complexity curve, plotted in the MultiQC report.
294 | 
295 | ## featureCounts
296 | [featureCounts](http://bioinf.wehi.edu.au/featureCounts/) from the subread package summarises the read distribution over genomic features such as genes, exons, promotors, gene bodies, genomic bins and chromosomal locations.
297 | RNA reads should mostly overlap genes, so be assigned.
298 | 
299 | ![featureCounts](images/featureCounts_assignment_plot.png)
300 | 
301 | We also use featureCounts to count overlaps with different classes of features. This gives a good idea of where aligned reads are ending up and can show potential problems such as rRNA contamination.
302 | ![biotypes](images/featureCounts_biotype_plot.png)
303 | 
304 | **Output directory: `results/featureCounts`**
305 | 
306 | * `Sample.bam_biotype_counts.txt`
307 |   * Read counts for the different gene biotypes that featureCounts distinguishes.
308 | * `Sample.featureCounts.txt`
309 |   * Read the counts for each gene provided in the reference `gtf` file
310 | * `Sample.featureCounts.txt.summary`
311 |   * Summary file, containing statistics about the counts
312 | 
313 | ## Salmon
314 | [Salmon](https://salmon.readthedocs.io/en/latest/salmon.html) from [Ocean Genomics](https://oceangenomics.com/) quasi-maps and quantifies expression relative to the transcriptome.
315 | 
316 | **Output directory: `results/salmon`**
317 | 
318 | * `Sample/quant.sf`
319 |   * Read counts for the different transcripts.
320 | * `Sample/quant.genes.sf`
321 |   * Read the counts for each gene provided in the reference `gtf` file
322 | * `Sample/logs`
323 |   * Summary file with information about the process
324 | * `unaligned/`
325 |   * Contains a list of unmapped reads that can be used to generate a FastQ of unmapped reads for downstream analysis.
326 | 
327 | ## tximport
328 | [tximport](https://bioconductor.org/packages/release/bioc/html/tximport.html) imports transcript-level abundance, estimated counts and transcript lengths, and summarizes into matrices for use with downstream gene-level analysis packages. Average transcript length, weighted by sample-specific transcript abundance estimates, is provided as a matrix which can be used as an offset for different expression of gene-level counts.
329 | 
330 | **Output directory: `results/salmon`**
331 | 
332 | * `salmon_merged_transcript_tpm.csv`
333 |   * TPM counts for the different transcripts.
334 | * `salmon_merged_gene_tpm.csv`
335 |   * TPM counts for the different genes.
336 | * `salmon_merged_transcript_counts.csv`
337 |   * estimated counts for the different transcripts.
338 | * `salmon_merged_gene_counts.csv`
339 |   * estimated counts for the different genes.
340 | * `tx2gene.csv`
341 |   * CSV file with transcript and genes (`params.fc_group_features`) and extra name (`params.fc_extra_attributes`) in each column.
342 | * `se.rds`
343 |   * RDS object to be loaded in R that contains a [SummarizedExperiment](https://bioconductor.org/packages/release/bioc/html/SummarizedExperiment.html) with the TPM (`abundance`), estimated counts (`counts`) and transcript length (`length`) in the assays slot for transcripts.
344 | * `gse.rds`
345 |   * RDS object to be loaded in R that contains a [SummarizedExperiment](https://bioconductor.org/packages/release/bioc/html/SummarizedExperiment.html) with the TPM (`abundance`), estimated counts (`counts`) and transcript length (`length`) in the assays slot for genes.
346 | 
347 | 
348 | ### Index files
349 | 
350 | **Output directory: `results/reference_genome/salmon_index`**
351 | 
352 | * `duplicate_clusters.tsv`
353 |   * Stores which transcripts are duplicates of one another
354 | * `hash.bin`
355 | * `header.json`
356 |   * Information about k-mer size, uniquely identifying hashes for the reference
357 | * `indexing.log`
358 |   * Time log for creating transcriptome index
359 | * `quasi_index.log`
360 |   * Step-by-step log for making transcriptome index
361 | * `refInfo.json`
362 |   * Information about file used for the reference
363 | * `rsd.bin`
364 | * `sa.bin`
365 | * `txpInfo.bin`
366 | * `versionInfo.json`
367 |   * Salmon and indexing version sed to make the index
368 | 
369 | ### Quantification output
370 | 
371 | **Output directory: `results/salmon`**
372 | 
373 | * `aux_info/`
374 |   * Auxiliary info e.g. versions and number of mapped reads
375 | * `cmd_info.json`
376 |   * Information about the Salmon quantification command, version, and options
377 | * `lib_format_counts.json`
378 |   * Number of fragments assigned, unassigned and incompatible
379 | * `libParams/`
380 |   * Contains the file `flenDist.txt` for the fragment length distribution
381 | * `logs/`
382 |   * Contains the file `salmon_quant.log` giving a record of Salmon's quantification
383 | * `quant.sf`
384 |   * *Transcript*-level quantification of the sample, including gene length, effective length, TPM, and number of reads
385 | * `quant.genes.sf`
386 |   * *Gene*-level quantification of the sample, including gene length, effective length, TPM, and number of reads
387 | * `Sample.transcript.tpm.txt`
388 |   * Subset of `quant.sf`, only containing the transcript id and TPM values
389 | * `Sample.gene.tpm.txt`
390 |   * Subset of `quant.genes.sf`, only containing the gene id and TPM values
391 | 
392 | ## StringTie
393 | [StringTie](https://ccb.jhu.edu/software/stringtie/) assembles RNA-Seq alignments into potential transcripts. It assembles and quantitates full-length transcripts representing multiple splice variants for each gene locus.
394 | 
395 | StringTie outputs FPKM metrics for genes and transcripts as well as the transcript features that it generates.
396 | 
397 | **Output directory: `results/stringtie`**
398 | 
399 | * `<sample>_Aligned.sortedByCoord.out.bam.gene_abund.txt`
400 |   * Gene aboundances, FPKM values
401 | * `<sample>_Aligned.sortedByCoord.out.bam_transcripts.gtf`
402 |   * This `.gtf` file contains all of the assembled transcipts from StringTie
403 | * `<sample>_Aligned.sortedByCoord.out.bam.cov_refs.gtf`
404 |   * This `.gtf` file contains the transcripts that are fully covered by reads.
405 | 
406 | ## Sample Correlation
407 | [edgeR](https://bioconductor.org/packages/release/bioc/html/edgeR.html) is a Bioconductor package for R used for RNA-seq data analysis. The script included in the pipeline uses edgeR to normalise read counts and create a heatmap showing Pearson's correlation and a dendrogram showing pairwise Euclidean distances between the samples in the experiment. It also creates a 2D MDS scatter plot showing sample grouping. These help to show sample similarity and can reveal batch effects and sample groupings.
408 | 
409 | **Heatmap:**
410 | 
411 | ![heatmap](images/mqc_hcplot_hocmzpdjsq.png)
412 | 
413 | **MDS plot:**
414 | 
415 | ![mds_plot](images/mqc_hcplot_ltqchiyxfz.png)
416 | 
417 | **Output directory: `results/sample_correlation`**
418 | 
419 | * `edgeR_MDS_plot.pdf`
420 |   * MDS scatter plot showing sample similarity
421 | * `edgeR_MDS_distance_matrix.csv`
422 |   * Distance matrix containing raw data from MDS analysis
423 | * `edgeR_MDS_Aplot_coordinates_mqc.csv`
424 |   * Scatter plot coordinates from MDS plot, used for MultiQC report
425 | * `log2CPM_sample_distances_dendrogram.pdf`
426 |   * Dendrogram showing the Euclidean distance between your samples
427 | * `log2CPM_sample_correlation_heatmap.pdf`
428 |   * Heatmap showing the Pearsons correlation between your samples
429 | * `log2CPM_sample_correlation_mqc.csv`
430 |   * Raw data from Pearsons correlation heatmap, used for MultiQC report
431 | 
432 | ## MultiQC
433 | [MultiQC](http://multiqc.info) is a visualisation tool that generates a single HTML report summarising all samples in your project. Most of the pipeline QC results are visualised in the report and further statistics are available in within the report data directory.
434 | 
435 | The pipeline has special steps which allow the software versions used to be reported in the MultiQC output for future traceability.
436 | 
437 | **Output directory: `results/multiqc`**
438 | 
439 | * `Project_multiqc_report.html`
440 |   * MultiQC report - a standalone HTML file that can be viewed in your web browser
441 | * `Project_multiqc_data/`
442 |   * Directory containing parsed statistics from the different tools used in the pipeline
443 | 
444 | For more information about how to use MultiQC reports, see [http://multiqc.info](http://multiqc.info)
445 | 


--------------------------------------------------------------------------------
/parameters.settings.json:
--------------------------------------------------------------------------------
  1 | {
  2 |     "parameters": [
  3 |         {
  4 |             "name": "reads",
  5 |             "label": "Input files",
  6 |             "usage": "Specify the location of your input FastQ files.",
  7 |             "group": "Main options",
  8 |             "default_value": "'data/*{1,2}.fastq.gz'",
  9 |             "render": "textfield",
 10 |             "pattern": ".*\\*.*",
 11 |             "type": "string"
 12 |         },
 13 |         {
 14 |             "name": "singleEnd",
 15 |             "label": "Single-end sequencing input",
 16 |             "usage": "Use single-end sequencing inputs instead of paired-end.",
 17 |             "group": "Main options",
 18 |             "render": "check-box",
 19 |             "default_value": false,
 20 |             "type": "boolean"
 21 |         },
 22 |         {
 23 |             "name": "genome",
 24 |             "label": "Alignment reference iGenomes key",
 25 |             "usage": "Ref. genome key for iGenomes",
 26 |             "group": "Alignment",
 27 |             "render": "drop-down",
 28 |             "type": "string",
 29 |             "choices": [
 30 |                     "",
 31 |                     "GRCh37",
 32 |                     "GRCm38",
 33 |                     "TAIR10",
 34 |                     "EB2",
 35 |                     "UMD3.1",
 36 |                     "WBcel235",
 37 |                     "CanFam3.1",
 38 |                     "GRCz10",
 39 |                     "BDGP6",
 40 |                     "EquCab2",
 41 |                     "EB1",
 42 |                     "Galgal4",
 43 |                     "Gm01",
 44 |                     "Mmul_1",
 45 |                     "IRGSP-1.0",
 46 |                     "CHIMP2.1.4",
 47 |                     "Rnor_6.0",
 48 |                     "R64-1-1",
 49 |                     "EF2",
 50 |                     "Sbi1",
 51 |                     "Sscrofa10.2",
 52 |                     "AGPv3"
 53 |             ],
 54 |             "default_value": ""
 55 |         },
 56 |         {
 57 |             "name": "star_index",
 58 |             "label": "STAR index",
 59 |             "usage": "Path to STAR index",
 60 |             "group": "Alignment",
 61 |             "render": "file",
 62 |             "type": "string",
 63 |             "pattern": ".*",
 64 |             "default_value": ""
 65 |         },
 66 |         {
 67 |             "name": "hisat2_index",
 68 |             "label": "HISAT2 index",
 69 |             "usage": "Path to HiSAT2 index",
 70 |             "group": "Alignment",
 71 |             "render": "file",
 72 |             "type": "string",
 73 |             "pattern": ".*",
 74 |             "default_value": ""
 75 |         },
 76 |         {
 77 |             "name": "salmon_index",
 78 |             "label": "Salmon index",
 79 |             "usage": "Path to Salmon index",
 80 |             "group": "Alignment",
 81 |             "render": "file",
 82 |             "type": "string",
 83 |             "pattern": ".*",
 84 |             "default_value": ""
 85 |         },
 86 |         {
 87 |             "name": "fasta",
 88 |             "label": "FASTA",
 89 |             "usage": "Path to Fasta reference",
 90 |             "group": "Alignment",
 91 |             "render": "file",
 92 |             "type": "string",
 93 |             "pattern": ".*",
 94 |             "default_value": ""
 95 |         },
 96 |         {
 97 |             "name": "transcript_fasta",
 98 |             "label": "FASTA",
 99 |             "usage": "Path to transcript fasta file",
100 |             "group": "Alignment",
101 |             "render": "file",
102 |             "type": "string",
103 |             "pattern": ".*",
104 |             "default_value": ""
105 |         },
106 |         {
107 |             "name": "splicesites",
108 |             "label": "HISAT2 splice sites file",
109 |             "usage": "Optional splice-sites file for building a HISAT2 alignment index",
110 |             "group": "Alignment",
111 |             "render": "file",
112 |             "type": "string",
113 |             "pattern": ".*",
114 |             "default_value": ""
115 |         },
116 |         {
117 |             "name": "gtf",
118 |             "label": "GTF",
119 |             "usage": "Path to GTF file",
120 |             "group": "Alignment",
121 |             "render": "file",
122 |             "type": "string",
123 |             "pattern": ".*",
124 |             "default_value": ""
125 |         },
126 |         {
127 |             "name": "gff",
128 |             "label": "GFF",
129 |             "usage": "Path to GFF3 file",
130 |             "group": "Alignment",
131 |             "render": "file",
132 |             "type": "string",
133 |             "pattern": ".*",
134 |             "default_value": ""
135 |         },
136 |         {
137 |             "name": "bed12",
138 |             "label": "BED12",
139 |             "usage": "Path to bed12 file",
140 |             "group": "Alignment",
141 |             "render": "file",
142 |             "type": "string",
143 |             "pattern": ".*",
144 |             "default_value": ""
145 |         },
146 |         {
147 |             "name": "saveReference",
148 |             "label": "Save reference genome index",
149 |             "usage": "Save the generated reference files to the results directory.",
150 |             "group": "Pipeline defaults",
151 |             "render": "check-box",
152 |             "default_value": false,
153 |             "type": "boolean"
154 |         },
155 |         {
156 |             "name": "forwardStranded",
157 |             "label": "Forward stranded",
158 |             "usage": "Samples are made using a forward-stranded library type.",
159 |             "group": "Main options",
160 |             "render": "check-box",
161 |             "default_value": false,
162 |             "type": "boolean"
163 |         },
164 |         {
165 |             "name": "reverseStranded",
166 |             "label": "Reverse stranded",
167 |             "usage": "Samples are made using a reverse-stranded library type.",
168 |             "group": "Main options",
169 |             "render": "check-box",
170 |             "default_value": false,
171 |             "type": "boolean"
172 |         },
173 |         {
174 |             "name": "unStranded",
175 |             "label": "Unstranded",
176 |             "usage": "Force the library strandedness to be unstranded",
177 |             "render": "none",
178 |             "default_value": false,
179 |             "type": "boolean",
180 |             "group": "Advanced"
181 |         },
182 |         {
183 |             "name": "clip_r1",
184 |             "label": "Read Clipping: 5' R1",
185 |             "usage": "Instructs Trim Galore to remove bp from the 5' end of read 1 (or single-end reads).",
186 |             "group": "Read trimming",
187 |             "render": "textfield",
188 |             "pattern": "\\d*",
189 |             "type": "integer",
190 |             "default_value": 0
191 |         },
192 |         {
193 |             "name": "clip_r2",
194 |             "label": "Read Clipping: 5' R1",
195 |             "usage": "Instructs Trim Galore to remove bp from the 5' end of read 2 (paired-end reads only).",
196 |             "group": "Read trimming",
197 |             "render": "textfield",
198 |             "pattern": "\\d*",
199 |             "type": "integer",
200 |             "default_value": 0
201 |         },
202 |         {
203 |             "name": "three_prime_clip_r1",
204 |             "label": "Read Clipping: 3' R1",
205 |             "usage": "Instructs Trim Galore to remove bp from the 3' end of read 1 AFTER adapter/quality trimming has been performed.",
206 |             "group": "Read trimming",
207 |             "render": "textfield",
208 |             "pattern": "\\d*",
209 |             "type": "integer",
210 |             "default_value": 0
211 |         },
212 |         {
213 |             "name": "three_prime_clip_r2",
214 |             "label": "Read Clipping: 3' R2",
215 |             "usage": "Instructs Trim Galore to remove bp from the 3' end of read 2 AFTER adapter/quality trimming has been performed.",
216 |             "group": "Read trimming",
217 |             "render": "textfield",
218 |             "pattern": "\\d*",
219 |             "type": "integer",
220 |             "default_value": 0
221 |         },
222 |         {
223 |             "name": "trim_nextseq",
224 |             "label": "NextSeq Trimming",
225 |             "usage": "This enables the option --nextseq-trim=3'CUTOFF within Cutadapt in Trim Galore, which will set a quality cutoff (that is normally given with -q instead), but qualities of G bases are ignored. This trimming is in common for the NextSeq- and NovaSeq-platforms, where basecalls without any signal are called as high-quality G bases.",
226 |             "group": "Read trimming",
227 |             "render": "textfield",
228 |             "pattern": "\\d*",
229 |             "type": "integer",
230 |             "default_value": 0
231 |         },
232 |         {
233 |             "name": "pico",
234 |             "label": "Library type: Pico",
235 |             "usage": "Set trimming and standedness settings for the SMARTer Stranded Total RNA-Seq Kit - Pico Input kit.",
236 |             "group": "Main options",
237 |             "render": "check-box",
238 |             "default_value": false,
239 |             "type": "boolean"
240 |         },
241 |         {
242 |             "name": "saveTrimmed",
243 |             "label": "Save Trimmed FastQ files",
244 |             "usage": "Save the trimmed FastQ files to the results directory.",
245 |             "group": "Pipeline defaults",
246 |             "render": "check-box",
247 |             "default_value": false,
248 |             "type": "boolean"
249 |         },
250 |         {
251 |             "name": "aligner",
252 |             "label": "Alignment tool",
253 |             "usage": "Choose whether to align reads with STAR or HISAT2",
254 |             "type": "string",
255 |             "render": "radio-button",
256 |             "choices": [
257 |                     "star",
258 |                     "hisat2"
259 |             ],
260 |             "default_value": "star",
261 |             "group": "Alignment"
262 |         },
263 |         {
264 |             "name": "removeRiboRNA",
265 |             "label": "Remove ribosomal RNA",
266 |             "usage": "Choose whether to remove rRNA or not",
267 |             "type": "boolean",
268 |             "render": "check-box",
269 |             "default_value": false,
270 |             "group": "rRNA Removal Settings"
271 |         },
272 |         {
273 |             "name": "saveNonRiboRNAReads",
274 |             "label": "Save non-ribosomal RNA reads as FastQ to results",
275 |             "usage": "By default, the pipeline doesn't save non-rRNA FastQ files.",
276 |             "default_value": false,
277 |             "type": "boolean",
278 |             "render": "check-box",
279 |             "group": "rRNA Removal Settings"
280 |         },
281 |         {
282 |             "name": "rRNA_database_manifest",
283 |             "label": "Specify path to rRNA database manifest file",
284 |             "usage": "By default, the pipeline uses a predefined SILVA list. Users can specify their own if necessary.",
285 |             "pattern": ".*",
286 |             "type": "string",
287 |             "default_value": "",
288 |             "group": "rRNA Removal Settings"
289 |         },
290 |         {
291 |             "name": "pseudo_aligner",
292 |             "label": "Pseudo alignment tool",
293 |             "usage": "Choose whether to pseudo align reads with Salmon",
294 |             "type": "string",
295 |             "render": "radio-button",
296 |             "choices": [ "salmon" ],
297 |             "default_value": "",
298 |             "group": "Alignment"
299 |         },
300 |         {
301 |             "name": "stringTieIgnoreGTF",
302 |             "label": "Alignment options",
303 |             "usage": "Perform reference-guided de novo assembly of transcripts using StringTie i.e. dont restrict to those in GTF file.",
304 |             "group": "Alignment",
305 |             "render": "check-box",
306 |             "default_value": false,
307 |             "type": "boolean"
308 |         },
309 |         {
310 |             "name": "seq_center",
311 |             "label": "Sequencing center",
312 |             "usage": "Add sequencing center in @RG line of output BAM header",
313 |             "group": "Advanced",
314 |             "render": "textfield",
315 |             "pattern": ".*",
316 |             "type": "string",
317 |             "default_value": ""
318 |         },
319 |         {
320 |             "name": "saveAlignedIntermediates",
321 |             "label": "Save Aligned Intermediate BAM files",
322 |             "usage": "Save intermediate BAM files to the results directory.",
323 |             "group": "Pipeline defaults",
324 |             "render": "check-box",
325 |             "default_value": false,
326 |             "type": "boolean"
327 |         },
328 |         {
329 |             "name": "fc_group_features",
330 |             "label": "FeatureCounts Group Features",
331 |             "usage": "By default, the pipeline uses `gene_name` as the default gene identifier group. Specifying `--fc_group_features` uses a different category present in your provided GTF file.",
332 |             "default_value": "gene_id",
333 |             "render": "textfield",
334 |             "pattern": ".*",
335 |             "type": "string",
336 |             "group": "FeatureCount settings"
337 |         },
338 |         {
339 |             "name": "fc_group_features_type",
340 |             "label": "FeatureCounts Group Features Biotype",
341 |             "usage": "GTF attribute name that gives the biotype of a feature.",
342 |             "group": "FeatureCount settings",
343 |             "default_value": "gene_biotype",
344 |             "render": "textfield",
345 |             "pattern": ".*",
346 |             "type": "string"
347 |         },
348 |         {
349 |             "name": "fc_extra_attributes",
350 |             "label": "FeatureCounts Extra Gene Names",
351 |             "usage": "By default the pipeline uses `gene_names` as additional gene identifiers apart from ENSEMBL identifiers. --fc_extra_attributes is passed to featureCounts as an --extraAttributes parameter",
352 |             "render": "textfield",
353 |             "pattern": ".*",
354 |             "type": "string",
355 |             "default_value": "",
356 |             "group": "FeatureCount settings"
357 |         },
358 |         {
359 |             "name": "skipQC",
360 |             "label": "Skip all QC steps, apart from MultiQC",
361 |             "render": "check-box",
362 |             "default_value": false,
363 |             "type": "boolean",
364 |             "group": "Skip pipeline steps"
365 |         },
366 |         {
367 |             "name": "skipFastQC",
368 |             "label": "Skip FastQC",
369 |             "render": "check-box",
370 |             "default_value": false,
371 |             "type": "boolean",
372 |             "group": "Skip pipeline steps"
373 |         },
374 |         {
375 |             "name": "skipPreseq",
376 |             "label": "Skip Preseq analysis",
377 |             "render": "check-box",
378 |             "default_value": false,
379 |             "type": "boolean",
380 |             "group": "Skip pipeline steps"
381 |         },
382 |         {
383 |             "name": "skipDupRadar",
384 |             "label": "Skip DupRadar QC",
385 |             "render": "check-box",
386 |             "default_value": false,
387 |             "type": "boolean",
388 |             "group": "Skip pipeline steps"
389 |         },
390 |         {
391 |             "name": "skipQualimap",
392 |             "label": "Skip Qualimap step",
393 |             "render": "check-box",
394 |             "default_value": false,
395 |             "type": "boolean",
396 |             "group": "Skip pipeline steps"
397 |         },
398 |         {
399 |             "name": "skipRseQC",
400 |             "label": "Skip RSeQC steps, apart from genebody coverage",
401 |             "render": "check-box",
402 |             "default_value": false,
403 |             "type": "boolean",
404 |             "group": "Skip pipeline steps"
405 |         },
406 |         {
407 |             "name": "skipEdgeR",
408 |             "label": "Skip edgeR QC analysis",
409 |             "render": "check-box",
410 |             "default_value": false,
411 |             "type": "boolean",
412 |             "group": "Skip pipeline steps"
413 |         },
414 |         {
415 |             "name": "skipMultiQC",
416 |             "label": "Skip MultiQC",
417 |             "render": "check-box",
418 |             "default_value": false,
419 |             "type": "boolean",
420 |             "group": "Skip pipeline steps"
421 |         },
422 |         {
423 |             "name": "sampleLevel",
424 |             "label": "sampleLevel",
425 |             "usage": "Turn off project-level analysis (edgeR MDS plot and heatmap).",
426 |             "group": "Pipeline defaults",
427 |             "render": "check-box",
428 |             "default_value": false,
429 |             "type": "boolean"
430 |         },
431 |         {
432 |             "name": "outdir",
433 |             "label": "Output directory",
434 |             "usage": "Set where to save the results from the pipeline",
435 |             "group": "Main options",
436 |             "default_value": "./results",
437 |             "render": "textfield",
438 |             "pattern": ".*",
439 |             "type": "string"
440 |         },
441 |         {
442 |             "name": "email",
443 |             "label": "Your email address",
444 |             "usage": "Your email address, required to receive completion notification.",
445 |             "group": "Pipeline defaults",
446 |             "render": "textfield",
447 |             "pattern": "^$|(^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\\.[a-zA-Z0-9-.]+$)",
448 |             "type": "string",
449 |             "default_value": ""
450 |         },
451 |         {
452 |             "name": "max_multiqc_email_size",
453 |             "label": "Maximum MultiQC email file size",
454 |             "usage": "Theshold size for MultiQC report to be attached in notification email. If file generated by pipeline exceeds the threshold, it will not be attached.",
455 |             "group": "Pipeline defaults",
456 |             "default_value": "25.MB",
457 |             "render": "textfield",
458 |             "pattern": "\\d+\\.[KMGT]?B",
459 |             "type": "string"
460 |         },
461 |         {
462 |             "name": "name",
463 |             "label": "Custom run name",
464 |             "usage": "Helper variable. Do not set, use -name instead.",
465 |             "group": "Advanced",
466 |             "render": "none",
467 |             "pattern": ".*",
468 |             "type": "string",
469 |             "default_value": ""
470 |         },
471 |         {
472 |             "name": "awsregion",
473 |             "label": "AWS Region",
474 |             "usage": "The AWS region to run your job in.",
475 |             "group": "AWS cloud usage",
476 |             "default_value": "eu-west-1",
477 |             "render": "textfield",
478 |             "pattern": ".*",
479 |             "type": "string"
480 |         },
481 |         {
482 |             "name": "awsqueue",
483 |             "label": "AWS job queue",
484 |             "usage": "The JobQueue that you intend to use on AWS Batch.",
485 |             "group": "AWS cloud usage",
486 |             "render": "textfield",
487 |             "pattern": ".*",
488 |             "type": "string",
489 |             "default_value": ""
490 |         },
491 |         {
492 |             "name": "hisat_build_memory",
493 |             "label": "HISAT2 indexing: required memory for splice sites in GB",
494 |             "usage": "HISAT2 needs a very large amount of memory to build an index with splice sites. If the available memory is below this threshold, the index build will proceed without splicing information.",
495 |             "group": "Advanced",
496 |             "default_value": 200,
497 |             "render": "textfield",
498 |             "type": "integer"
499 |         },
500 |         {
501 |             "name": "star_memory",
502 |             "label": "STAR memory",
503 |             "usage": "Instead of using the default amount available, force STAR to use a given amount of memory",
504 |             "group": "Advanced",
505 |             "render": "textfield",
506 |             "pattern": "^$|\\d+\\.[KMGT]?B",
507 |             "type": "string",
508 |             "default_value": ""
509 |         },
510 |         {
511 |             "name": "multiqc_config",
512 |             "label": "MultiQC Config",
513 |             "usage": "Path to a config file for MultiQC",
514 |             "group": "Advanced",
515 |             "default_value": "/Users/ewels/GitHub/nf-core/rnaseq/assets/multiqc_config.yaml",
516 |             "render": "file",
517 |             "pattern": ".*\\.yaml",
518 |             "type": "string"
519 |         },
520 |         {
521 |             "name": "project",
522 |             "label": "Cluster project",
523 |             "usage": "For use on HPC systems where a project ID is required for job submission",
524 |             "group": "Cluster job submission",
525 |             "render": "textfield",
526 |             "pattern": ".*",
527 |             "type": "string",
528 |             "default_value": ""
529 |         },
530 |         {
531 |             "name": "igenomes_base",
532 |             "label": "iGenomes base path",
533 |             "usage": "Base path for iGenomes reference files",
534 |             "group": "Alignment",
535 |             "default_value": "s3://ngi-igenomes/igenomes/",
536 |             "render": "textfield",
537 |             "pattern": ".*",
538 |             "type": "string"
539 |         },
540 |         {
541 |             "name": "container",
542 |             "label": "Software container",
543 |             "usage": "Dockerhub address for pipeline container",
544 |             "default_value": "nfcore/rnaseq:latest",
545 |             "render": "textfield",
546 |             "pattern": ".*",
547 |             "type": "string",
548 |             "group": "Pipeline defaults"
549 |         },
550 |         {
551 |             "name": "plaintext_email",
552 |             "label": "Plain text email",
553 |             "usage": "Set to receive plain-text e-mails instead of HTML formatted.",
554 |             "group": "Pipeline defaults",
555 |             "render": "check-box",
556 |             "default_value": false,
557 |             "type": "boolean"
558 |         },
559 |         {
560 |             "name": "help",
561 |             "label": "Help",
562 |             "usage": "Specify to show the pipeline help text.",
563 |             "group": "Pipeline defaults",
564 |             "render": "none",
565 |             "default_value": false,
566 |             "type": "boolean"
567 |         },
568 |         {
569 |             "name": "max_cpus",
570 |             "label": "Maximum available CPUs",
571 |             "usage": "Use to set a top-limit for the default CPUs requirement for each process.",
572 |             "group": "Pipeline defaults",
573 |             "default_value": 16,
574 |             "render": "textfield",
575 |             "type": "integer"
576 |         },
577 |         {
578 |             "name": "max_time",
579 |             "label": "Maximum available time",
580 |             "usage": "Use to set a top-limit for the default time requirement for each process.",
581 |             "group": "Pipeline defaults",
582 |             "default_value": "10d",
583 |             "render": "textfield",
584 |             "pattern": "\\d+[smhd]",
585 |             "type": "string"
586 |         },
587 |         {
588 |             "name": "max_memory",
589 |             "label": "Maximum available memory",
590 |             "usage": "Use to set a top-limit for the default memory requirement for each process.",
591 |             "group": "Pipeline defaults",
592 |             "default_value": "128.GB",
593 |             "render": "textfield",
594 |             "pattern": "\\d+\\.[KMGT]?B",
595 |             "type": "string"
596 |         },
597 |         {
598 |             "name": "tracedir",
599 |             "label": "Trace directory",
600 |             "usage": "Set to where the pipeline trace should be saved. Set to local path when using AWS on S3.",
601 |             "group": "AWS cloud usage",
602 |             "default_value": "./results/pipeline_info",
603 |             "render": "textfield",
604 |             "pattern": ".*",
605 |             "type": "string"
606 |         },
607 |         {
608 |             "name": "readPaths",
609 |             "label": "Read Paths",
610 |             "usage": "For use with nextflow config files only",
611 |             "group": "Advanced",
612 |             "render": "none",
613 |             "pattern": ".*",
614 |             "type": "string",
615 |             "default_value": ""
616 |         }
617 |     ]
618 | }
619 | 


--------------------------------------------------------------------------------
/conf/igenomes.config:
--------------------------------------------------------------------------------
  1 | /*
  2 |  * -------------------------------------------------
  3 |  *  Nextflow config file for iGenomes paths
  4 |  * -------------------------------------------------
  5 |  * Defines reference genomes, using iGenome paths
  6 |  * Can be used by any config that customises the base
  7 |  * path using $params.igenomes_base / --igenomes_base
  8 |  */
  9 | 
 10 | params {
 11 |   // illumina iGenomes reference file paths
 12 |   genomes {
 13 |     'GRCh37' {
 14 |       fasta       = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa"
 15 |       bwa         = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/genome.fa"
 16 |       bowtie2     = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/"
 17 |       star        = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/"
 18 |       bismark     = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/BismarkIndex/"
 19 |       gtf         = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.gtf"
 20 |       bed12       = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.bed"
 21 |       mito_name   = "MT"
 22 |       macs_gsize  = "2.7e9"
 23 |       blacklist   = "${baseDir}/assets/blacklists/GRCh37-blacklist.bed"
 24 |     }
 25 |     'GRCh38' {
 26 |       fasta       = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa"
 27 |       bwa         = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/BWAIndex/genome.fa"
 28 |       bowtie2     = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/Bowtie2Index/"
 29 |       star        = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/STARIndex/"
 30 |       bismark     = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/BismarkIndex/"
 31 |       gtf         = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.gtf"
 32 |       bed12       = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.bed"
 33 |       mito_name   = "chrM"
 34 |       macs_gsize  = "2.7e9"
 35 |       blacklist   = "${baseDir}/assets/blacklists/hg38-blacklist.bed"
 36 |     }
 37 |     'GRCm38' {
 38 |       fasta       = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/WholeGenomeFasta/genome.fa"
 39 |       bwa         = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/BWAIndex/genome.fa"
 40 |       bowtie2     = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/Bowtie2Index/"
 41 |       star        = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/STARIndex/"
 42 |       bismark     = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/BismarkIndex/"
 43 |       gtf         = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/genes.gtf"
 44 |       bed12       = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/genes.bed"
 45 |       mito_name   = "MT"
 46 |       macs_gsize  = "1.87e9"
 47 |       blacklist   = "${baseDir}/assets/blacklists/GRCm38-blacklist.bed"
 48 |     }
 49 |     'TAIR10' {
 50 |       fasta       = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/WholeGenomeFasta/genome.fa"
 51 |       bwa         = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/BWAIndex/genome.fa"
 52 |       bowtie2     = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/Bowtie2Index/"
 53 |       star        = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/STARIndex/"
 54 |       bismark     = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/BismarkIndex/"
 55 |       gtf         = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/Genes/genes.gtf"
 56 |       bed12       = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/Genes/genes.bed"
 57 |       mito_name   = "Mt"
 58 |     }
 59 |     'EB2' {
 60 |       fasta       = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/WholeGenomeFasta/genome.fa"
 61 |       bwa         = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/BWAIndex/genome.fa"
 62 |       bowtie2     = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/Bowtie2Index/"
 63 |       star        = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/STARIndex/"
 64 |       bismark     = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/BismarkIndex/"
 65 |       gtf         = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/Genes/genes.gtf"
 66 |       bed12       = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/Genes/genes.bed"
 67 |     }
 68 |     'UMD3.1' {
 69 |       fasta       = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/WholeGenomeFasta/genome.fa"
 70 |       bwa         = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/BWAIndex/genome.fa"
 71 |       bowtie2     = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/Bowtie2Index/"
 72 |       star        = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/STARIndex/"
 73 |       bismark     = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/BismarkIndex/"
 74 |       gtf         = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/Genes/genes.gtf"
 75 |       bed12       = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/Genes/genes.bed"
 76 |       mito_name   = "MT"
 77 |     }
 78 |     'WBcel235' {
 79 |       fasta       = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/WholeGenomeFasta/genome.fa"
 80 |       bwa         = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/BWAIndex/genome.fa"
 81 |       bowtie2     = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/Bowtie2Index/"
 82 |       star        = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/STARIndex/"
 83 |       bismark     = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/BismarkIndex/"
 84 |       gtf         = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Annotation/Genes/genes.gtf"
 85 |       bed12       = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Annotation/Genes/genes.bed"
 86 |       mito_name   = "MtDNA"
 87 |       macs_gsize  = "9e7"
 88 |     }
 89 |     'CanFam3.1' {
 90 |       fasta       = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/WholeGenomeFasta/genome.fa"
 91 |       bwa         = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/BWAIndex/genome.fa"
 92 |       bowtie2     = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/Bowtie2Index/"
 93 |       star        = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/STARIndex/"
 94 |       bismark     = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/BismarkIndex/"
 95 |       gtf         = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/Genes/genes.gtf"
 96 |       bed12       = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/Genes/genes.bed"
 97 |       mito_name   = "MT"
 98 |     }
 99 |     'GRCz10' {
100 |       fasta       = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/WholeGenomeFasta/genome.fa"
101 |       bwa         = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/BWAIndex/genome.fa"
102 |       bowtie2     = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/Bowtie2Index/"
103 |       star        = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/STARIndex/"
104 |       bismark     = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/BismarkIndex/"
105 |       gtf         = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Annotation/Genes/genes.gtf"
106 |       bed12       = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Annotation/Genes/genes.bed"
107 |       mito_name   = "MT"
108 |     }
109 |     'BDGP6' {
110 |       fasta       = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/WholeGenomeFasta/genome.fa"
111 |       bwa         = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/BWAIndex/genome.fa"
112 |       bowtie2     = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/Bowtie2Index/"
113 |       star        = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/STARIndex/"
114 |       bismark     = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/BismarkIndex/"
115 |       gtf         = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Annotation/Genes/genes.gtf"
116 |       bed12       = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Annotation/Genes/genes.bed"
117 |       mito_name   = "M"
118 |       macs_gsize  = "1.2e8"
119 |     }
120 |     'EquCab2' {
121 |       fasta       = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/WholeGenomeFasta/genome.fa"
122 |       bwa         = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/BWAIndex/genome.fa"
123 |       bowtie2     = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/Bowtie2Index/"
124 |       star        = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/STARIndex/"
125 |       bismark     = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/BismarkIndex/"
126 |       gtf         = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/Genes/genes.gtf"
127 |       bed12       = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/Genes/genes.bed"
128 |       mito_name   = "MT"
129 |     }
130 |     'EB1' {
131 |       fasta       = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/WholeGenomeFasta/genome.fa"
132 |       bwa         = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/BWAIndex/genome.fa"
133 |       bowtie2     = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/Bowtie2Index/"
134 |       star        = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/STARIndex/"
135 |       bismark     = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/BismarkIndex/"
136 |       gtf         = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/Genes/genes.gtf"
137 |       bed12       = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/Genes/genes.bed"
138 |     }
139 |     'Galgal4' {
140 |       fasta       = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/WholeGenomeFasta/genome.fa"
141 |       bwa         = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/BWAIndex/genome.fa"
142 |       bowtie2     = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/Bowtie2Index/"
143 |       star        = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/STARIndex/"
144 |       bismark     = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/BismarkIndex/"
145 |       gtf         = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Annotation/Genes/genes.gtf"
146 |       bed12       = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Annotation/Genes/genes.bed"
147 |       mito_name   = "MT"
148 |     }
149 |     'Gm01' {
150 |       fasta       = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/WholeGenomeFasta/genome.fa"
151 |       bwa         = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/BWAIndex/genome.fa"
152 |       bowtie2     = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/Bowtie2Index/"
153 |       star        = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/STARIndex/"
154 |       bismark     = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/BismarkIndex/"
155 |       gtf         = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/Genes/genes.gtf"
156 |       bed12       = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/Genes/genes.bed"
157 |     }
158 |     'Mmul_1' {
159 |       fasta       = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/WholeGenomeFasta/genome.fa"
160 |       bwa         = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/BWAIndex/genome.fa"
161 |       bowtie2     = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/Bowtie2Index/"
162 |       star        = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/STARIndex/"
163 |       bismark     = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/BismarkIndex/"
164 |       gtf         = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/Genes/genes.gtf"
165 |       bed12       = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/Genes/genes.bed"
166 |       mito_name   = "MT"
167 |     }
168 |     'IRGSP-1.0' {
169 |       fasta       = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/WholeGenomeFasta/genome.fa"
170 |       bwa         = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/BWAIndex/genome.fa"
171 |       bowtie2     = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/Bowtie2Index/"
172 |       star        = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/STARIndex/"
173 |       bismark     = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/BismarkIndex/"
174 |       gtf         = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Annotation/Genes/genes.gtf"
175 |       bed12       = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Annotation/Genes/genes.bed"
176 |       mito_name   = "Mt"
177 |     }
178 |     'CHIMP2.1.4' {
179 |       fasta       = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/WholeGenomeFasta/genome.fa"
180 |       bwa         = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/BWAIndex/genome.fa"
181 |       bowtie2     = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/Bowtie2Index/"
182 |       star        = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/STARIndex/"
183 |       bismark     = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/BismarkIndex/"
184 |       gtf         = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/Genes/genes.gtf"
185 |       bed12       = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/Genes/genes.bed"
186 |       mito_name   = "MT"
187 |     }
188 |     'Rnor_6.0' {
189 |       fasta       = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/WholeGenomeFasta/genome.fa"
190 |       bwa         = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/BWAIndex/genome.fa"
191 |       bowtie2     = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/Bowtie2Index/"
192 |       star        = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/STARIndex/"
193 |       bismark     = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/BismarkIndex/"
194 |       gtf         = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Annotation/Genes/genes.gtf"
195 |       bed12       = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Annotation/Genes/genes.bed"
196 |       mito_name   = "MT"
197 |     }
198 |     'R64-1-1' {
199 |       fasta       = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/WholeGenomeFasta/genome.fa"
200 |       bwa         = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/BWAIndex/genome.fa"
201 |       bowtie2     = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/Bowtie2Index/"
202 |       star        = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/STARIndex/"
203 |       bismark     = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/BismarkIndex/"
204 |       gtf         = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Annotation/Genes/genes.gtf"
205 |       bed12       = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Annotation/Genes/genes.bed"
206 |       mito_name   = "MT"
207 |       macs_gsize  = "1.2e7"
208 |     }
209 |     'EF2' {
210 |       fasta       = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/WholeGenomeFasta/genome.fa"
211 |       bwa         = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/BWAIndex/genome.fa"
212 |       bowtie2     = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/Bowtie2Index/"
213 |       star        = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/STARIndex/"
214 |       bismark     = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/BismarkIndex/"
215 |       gtf         = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Annotation/Genes/genes.gtf"
216 |       bed12       = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Annotation/Genes/genes.bed"
217 |       mito_name   = "MT"
218 |       macs_gsize  = "1.21e7"
219 |     }
220 |     'Sbi1' {
221 |       fasta       = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/WholeGenomeFasta/genome.fa"
222 |       bwa         = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/BWAIndex/genome.fa"
223 |       bowtie2     = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/Bowtie2Index/"
224 |       star        = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/STARIndex/"
225 |       bismark     = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/BismarkIndex/"
226 |       gtf         = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Annotation/Genes/genes.gtf"
227 |       bed12       = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Annotation/Genes/genes.bed"
228 |     }
229 |     'Sscrofa10.2' {
230 |       fasta       = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/WholeGenomeFasta/genome.fa"
231 |       bwa         = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/BWAIndex/genome.fa"
232 |       bowtie2     = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/Bowtie2Index/"
233 |       star        = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/STARIndex/"
234 |       bismark     = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/BismarkIndex/"
235 |       gtf         = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Annotation/Genes/genes.gtf"
236 |       bed12       = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Annotation/Genes/genes.bed"
237 |       mito_name   = "MT"
238 |     }
239 |     'AGPv3' {
240 |       fasta       = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/WholeGenomeFasta/genome.fa"
241 |       bwa         = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/BWAIndex/genome.fa"
242 |       bowtie2     = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/Bowtie2Index/"
243 |       star        = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/STARIndex/"
244 |       bismark     = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/BismarkIndex/"
245 |       gtf         = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Annotation/Genes/genes.gtf"
246 |       bed12       = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Annotation/Genes/genes.bed"
247 |       mito_name   = "Mt"
248 |     }
249 |     'hg38' {
250 |       fasta       = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/genome.fa"
251 |       bwa         = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/BWAIndex/genome.fa"
252 |       bowtie2     = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/Bowtie2Index/"
253 |       star        = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/STARIndex/"
254 |       bismark     = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/BismarkIndex/"
255 |       gtf         = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Annotation/Genes/genes.gtf"
256 |       bed12       = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Annotation/Genes/genes.bed"
257 |       mito_name   = "chrM"
258 |       macs_gsize  = "2.7e9"
259 |       blacklist   = "${baseDir}/assets/blacklists/hg38-blacklist.bed"
260 |     }
261 |     'hg19' {
262 |       fasta       = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa"
263 |       bwa         = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/BWAIndex/genome.fa"
264 |       bowtie2     = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/"
265 |       star        = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/STARIndex/"
266 |       bismark     = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/BismarkIndex/"
267 |       gtf         = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Annotation/Genes/genes.gtf"
268 |       bed12       = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Annotation/Genes/genes.bed"
269 |       mito_name   = "chrM"
270 |       macs_gsize  = "2.7e9"
271 |       blacklist   = "${baseDir}/assets/blacklists/hg19-blacklist.bed"
272 |     }
273 |     'mm10' {
274 |       fasta       = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/WholeGenomeFasta/genome.fa"
275 |       bwa         = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/BWAIndex/genome.fa"
276 |       bowtie2     = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/Bowtie2Index/"
277 |       star        = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/STARIndex/"
278 |       bismark     = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/BismarkIndex/"
279 |       gtf         = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Annotation/Genes/genes.gtf"
280 |       bed12       = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Annotation/Genes/genes.bed"
281 |       mito_name   = "chrM"
282 |       macs_gsize  = "1.87e9"
283 |       blacklist   = "${baseDir}/assets/blacklists/mm10-blacklist.bed"
284 |     }
285 |     'bosTau8' {
286 |       fasta       = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/WholeGenomeFasta/genome.fa"
287 |       bwa         = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/BWAIndex/genome.fa"
288 |       bowtie2     = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/Bowtie2Index/"
289 |       star        = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/STARIndex/"
290 |       bismark     = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/BismarkIndex/"
291 |       gtf         = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Annotation/Genes/genes.gtf"
292 |       bed12       = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Annotation/Genes/genes.bed"
293 |       mito_name   = "chrM"
294 |     }
295 |     'ce10' {
296 |       fasta       = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/WholeGenomeFasta/genome.fa"
297 |       bwa         = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/BWAIndex/genome.fa"
298 |       bowtie2     = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/Bowtie2Index/"
299 |       star        = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/STARIndex/"
300 |       bismark     = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/BismarkIndex/"
301 |       gtf         = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Annotation/Genes/genes.gtf"
302 |       bed12       = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Annotation/Genes/genes.bed"
303 |       mito_name   = "chrM"
304 |       macs_gsize  = "9e7"
305 |     }
306 |     'canFam3' {
307 |       fasta       = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/WholeGenomeFasta/genome.fa"
308 |       bwa         = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/BWAIndex/genome.fa"
309 |       bowtie2     = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/Bowtie2Index/"
310 |       star        = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/STARIndex/"
311 |       bismark     = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/BismarkIndex/"
312 |       gtf         = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Annotation/Genes/genes.gtf"
313 |       bed12       = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Annotation/Genes/genes.bed"
314 |       mito_name   = "chrM"
315 |     }
316 |     'danRer10' {
317 |       fasta       = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/WholeGenomeFasta/genome.fa"
318 |       bwa         = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/BWAIndex/genome.fa"
319 |       bowtie2     = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/Bowtie2Index/"
320 |       star        = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/STARIndex/"
321 |       bismark     = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/BismarkIndex/"
322 |       gtf         = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Annotation/Genes/genes.gtf"
323 |       bed12       = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Annotation/Genes/genes.bed"
324 |       mito_name   = "chrM"
325 |     }
326 |     'dm6' {
327 |       fasta       = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/WholeGenomeFasta/genome.fa"
328 |       bwa         = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/BWAIndex/genome.fa"
329 |       bowtie2     = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/Bowtie2Index/"
330 |       star        = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/STARIndex/"
331 |       bismark     = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/BismarkIndex/"
332 |       gtf         = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Annotation/Genes/genes.gtf"
333 |       bed12       = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Annotation/Genes/genes.bed"
334 |       mito_name   = "chrM"
335 |       macs_gsize  = "1.2e8"
336 |     }
337 |     'equCab2' {
338 |       fasta       = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/WholeGenomeFasta/genome.fa"
339 |       bwa         = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/BWAIndex/genome.fa"
340 |       bowtie2     = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/Bowtie2Index/"
341 |       star        = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/STARIndex/"
342 |       bismark     = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/BismarkIndex/"
343 |       gtf         = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Annotation/Genes/genes.gtf"
344 |       bed12       = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Annotation/Genes/genes.bed"
345 |       mito_name   = "chrM"
346 |     }
347 |     'galGal4' {
348 |       fasta       = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/WholeGenomeFasta/genome.fa"
349 |       bwa         = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/BWAIndex/genome.fa"
350 |       bowtie2     = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/Bowtie2Index/"
351 |       star        = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/STARIndex/"
352 |       bismark     = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/BismarkIndex/"
353 |       gtf         = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Annotation/Genes/genes.gtf"
354 |       bed12       = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Annotation/Genes/genes.bed"
355 |       mito_name   = "chrM"
356 |     }
357 |     'panTro4' {
358 |       fasta       = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/WholeGenomeFasta/genome.fa"
359 |       bwa         = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/BWAIndex/genome.fa"
360 |       bowtie2     = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/Bowtie2Index/"
361 |       star        = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/STARIndex/"
362 |       bismark     = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/BismarkIndex/"
363 |       gtf         = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Annotation/Genes/genes.gtf"
364 |       bed12       = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Annotation/Genes/genes.bed"
365 |       mito_name   = "chrM"
366 |     }
367 |     'rn6' {
368 |       fasta       = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/WholeGenomeFasta/genome.fa"
369 |       bwa         = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/BWAIndex/genome.fa"
370 |       bowtie2     = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/Bowtie2Index/"
371 |       star        = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/STARIndex/"
372 |       bismark     = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/BismarkIndex/"
373 |       gtf         = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Annotation/Genes/genes.gtf"
374 |       bed12       = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Annotation/Genes/genes.bed"
375 |       mito_name   = "chrM"
376 |     }
377 |     'sacCer3' {
378 |       fasta       = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/WholeGenomeFasta/genome.fa"
379 |       bwa         = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/BWAIndex/genome.fa"
380 |       bowtie2     = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/Bowtie2Index/"
381 |       star        = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/STARIndex/"
382 |       bismark     = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/BismarkIndex/"
383 |       mito_name   = "chrM"
384 |       macs_gsize  = "1.2e7"
385 |     }
386 |     'susScr3' {
387 |       fasta       = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/WholeGenomeFasta/genome.fa"
388 |       bwa         = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/BWAIndex/genome.fa"
389 |       bowtie2     = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/Bowtie2Index/"
390 |       star        = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/STARIndex/"
391 |       bismark     = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/BismarkIndex/"
392 |       gtf         = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Annotation/Genes/genes.gtf"
393 |       bed12       = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Annotation/Genes/genes.bed"
394 |       mito_name   = "chrM"
395 |     }
396 |   }
397 | }
398 | 


--------------------------------------------------------------------------------
/docs/usage.md:
--------------------------------------------------------------------------------
  1 | # nf-core/rnaseq: Usage
  2 | 
  3 | ## Table of contents
  4 | 
  5 | <!-- Install Atom plugin markdown-toc-auto for this ToC to auto-update on save -->
  6 | <!-- TOC START min:2 max:3 link:true asterisk:true update:true -->
  7 | * [Table of contents](#table-of-contents)
  8 | * [Introduction](#introduction)
  9 | * [Running the pipeline](#running-the-pipeline)
 10 |   * [Updating the pipeline](#updating-the-pipeline)
 11 |   * [Reproducibility](#reproducibility)
 12 | * [Main arguments](#main-arguments)
 13 |   * [`-profile`](#-profile)
 14 |   * [`--reads`](#--reads)
 15 |   * [`--singleEnd`](#--singleend)
 16 |   * [Library strandedness](#library-strandedness)
 17 | * [FeatureCounts Extra Gene Names](#featurecounts-extra-gene-names)
 18 |   * [Default "`gene_name`" Attribute Type](#default-attribute-type)
 19 |   * [Extra Gene Names or IDs](#extra-gene-names-or-ids)
 20 |   * [Default "`exon`" Attribute](#default-exon-type)
 21 | * [Transcriptome mapping with Salmon](#transcriptome-mapping-with-salmon)
 22 | * [Alignment tool](#alignment-tool)
 23 | * [Reference genomes](#reference-genomes)
 24 |   * [`--genome` (using iGenomes)](#--genome-using-igenomes)
 25 |   * [`--star_index`, `--hisat2_index`, `--fasta`, `--gtf`, `--bed12`](#--star_index---hisat2_index---fasta---gtf---bed12)
 26 |   * [`--saveReference`](#--savereference)
 27 |   * [`--saveTrimmed`](#--savetrimmed)
 28 |   * [`--saveAlignedIntermediates`](#--savealignedintermediates)
 29 |   * [`--gencode`](#--gencode)
 30 |     * ["Type" of gene](#type-of-gene)
 31 |     * [Transcript IDs in FASTA files](#transcript-ids-in-fasta-files)
 32 |   * [`--skipAlignment`](#--skipAlignment)
 33 |   * [`--compressedReference`](#--compressedReference)
 34 |     * [Create compressed (tar.gz) STAR indices](#create-compressed-tar-gz-star-indices)
 35 |     * [Create compressed (tar.gz) HiSat2 indices](#create-compressed-tar-gz-hisat2-indices)
 36 |     * [Create compressed (tar.gz) Salmon indices](#create-compressed-tar-gz-salmon-indices)
 37 | * [Adapter Trimming](#adapter-trimming)
 38 |   * [`--clip_r1 [int]`](#--clip_r1-int)
 39 |   * [`--clip_r2 [int]`](#--clip_r2-int)
 40 |   * [`--three_prime_clip_r1 [int]`](#--three_prime_clip_r1-int)
 41 |   * [`--three_prime_clip_r2 [int]`](#--three_prime_clip_r2-int)
 42 |   * [`--trim_nextseq [int]`](#--trim_nextseq)
 43 |   * [`--skipTrimming`](#--skipTrimming)
 44 | * [Ribosomal RNA removal](#ribosomal-rna-removal)
 45 |   * [`--removeRiboRNA`](#--removeRiboRNA)
 46 |   * [`--save_nonrRNA_reads`](#--save_nonrrna_reads)
 47 |   * [`--rRNA_database_manifest`](#--rrna_database_manifest)
 48 | * [Library Prep Presets](#library-prep-presets)
 49 |   * [`--pico`](#--pico)
 50 | * [Skipping QC steps](#skipping-qc-steps)
 51 | * [Job resources](#job-resources)
 52 |   * [Automatic resubmission](#automatic-resubmission)
 53 |   * [Custom resource requests](#custom-resource-requests)
 54 | * [AWS Batch specific parameters](#aws-batch-specific-parameters)
 55 |   * [`--awsqueue`](#--awsqueue)
 56 |   * [`--awsregion`](#--awsregion)
 57 | * [Other command line parameters](#other-command-line-parameters)
 58 |   * [`--outdir`](#--outdir)
 59 |   * [`--email`](#--email)
 60 |   * [`--email_on_fail`](#--email_on_fail)
 61 |   * [`-name`](#-name)
 62 |   * [`-resume`](#-resume)
 63 |   * [`-c`](#-c)
 64 |   * [`--custom_config_version`](#--custom_config_version)
 65 |   * [`--custom_config_base`](#--custom_config_base)
 66 |   * [`--max_memory`](#--max_memory)
 67 |   * [`--max_time`](#--max_time)
 68 |   * [`--max_cpus`](#--max_cpus)
 69 |   * [`--hisat_build_memory`](#--hisat_build_memory)
 70 |   * [`--sampleLevel`](#--samplelevel)
 71 |   * [`--plaintext_email`](#--plaintext_email)
 72 |   * [`--monochrome_logs`](#--monochrome_logs)
 73 |   * [`--multiqc_config`](#--multiqc_config)
 74 | * [Stand-alone scripts](#stand-alone-scripts)
 75 | <!-- TOC END -->
 76 | 
 77 | ## Introduction
 78 | 
 79 | Nextflow handles job submissions on SLURM or other environments, and supervises running the jobs. Thus the Nextflow process must run until the pipeline is finished. We recommend that you put the process running in the background through `screen` / `tmux` or similar tool. Alternatively you can run nextflow within a cluster job submitted your job scheduler.
 80 | 
 81 | It is recommended to limit the Nextflow Java virtual machines memory. We recommend adding the following line to your environment (typically in `~/.bashrc` or `~./bash_profile`):
 82 | 
 83 | ```bash
 84 | NXF_OPTS='-Xms1g -Xmx4g'
 85 | ```
 86 | 
 87 | ## Running the pipeline
 88 | 
 89 | The typical command for running the pipeline is as follows:
 90 | 
 91 | ```bash
 92 | nextflow run nf-core/rnaseq --reads '*_R{1,2}.fastq.gz' --genome GRCh37 -profile docker
 93 | ```
 94 | 
 95 | This will launch the pipeline with the `docker` configuration profile. See below for more information about profiles.
 96 | 
 97 | Note that the pipeline will create the following files in your working directory:
 98 | 
 99 | ```bash
100 | work            # Directory containing the nextflow working files
101 | results         # Finished results (configurable, see below)
102 | .nextflow_log   # Log file from Nextflow
103 | # Other nextflow hidden files, eg. history of pipeline runs and old logs.
104 | ```
105 | 
106 | ### Updating the pipeline
107 | 
108 | When you run the above command, Nextflow automatically pulls the pipeline code from GitHub and stores it as a cached version. When running the pipeline after this, it will always use the cached version if available - even if the pipeline has been updated since. To make sure that you're running the latest version of the pipeline, make sure that you regularly update the cached version of the pipeline:
109 | 
110 | ```bash
111 | nextflow pull nf-core/rnaseq
112 | ```
113 | 
114 | ### Reproducibility
115 | 
116 | It's a good idea to specify a pipeline version when running the pipeline on your data. This ensures that a specific version of the pipeline code and software are used when you run your pipeline. If you keep using the same tag, you'll be running the same version of the pipeline, even if there have been changes to the code since.
117 | 
118 | First, go to the [nf-core/rnaseq releases page](https://github.com/nf-core/rnaseq/releases) and find the latest version number - numeric only (eg. `1.3.1`). Then specify this when running the pipeline with `-r` (one hyphen) - eg. `-r 1.3.1`.
119 | 
120 | This version number will be logged in reports when you run the pipeline, so that you'll know what you used when you look back in the future.
121 | 
122 | ## Main arguments
123 | 
124 | ### `-profile`
125 | 
126 | Use this parameter to choose a configuration profile. Profiles can give configuration presets for different compute environments. Note that multiple profiles can be loaded, for example: `-profile docker` - the order of arguments is important!
127 | 
128 | If `-profile` is not specified at all the pipeline will be run locally and expects all software to be installed and available on the `PATH`.
129 | 
130 | * `awsbatch`
131 |   * A generic configuration profile to be used with AWS Batch.
132 | * `conda`
133 |   * A generic configuration profile to be used with [conda](https://conda.io/docs/)
134 |   * Pulls most software from [Bioconda](https://bioconda.github.io/)
135 | * `docker`
136 |   * A generic configuration profile to be used with [Docker](http://docker.com/)
137 |   * Pulls software from dockerhub: [`nfcore/rnaseq`](http://hub.docker.com/r/nfcore/rnaseq/)
138 | * `singularity`
139 |   * A generic configuration profile to be used with [Singularity](http://singularity.lbl.gov/)
140 |   * Pulls software from DockerHub: [`nfcore/rnaseq`](http://hub.docker.com/r/nfcore/rnaseq/)
141 | * `test`
142 |   * A profile with a complete configuration for automated testing
143 |   * Includes links to test data so needs no other parameters
144 | 
145 | ### `--reads`
146 | 
147 | Use this to specify the location of your input FastQ files. For example:
148 | 
149 | ```bash
150 | --reads 'path/to/data/sample_*_{1,2}.fastq'
151 | ```
152 | 
153 | Please note the following requirements:
154 | 
155 | 1. The path must be enclosed in quotes
156 | 2. The path must have at least one `*` wildcard character
157 | 3. When using the pipeline with paired end data, the path must use `{1,2}` notation to specify read pairs.
158 | 
159 | If left unspecified, a default pattern is used: `data/*{1,2}.fastq.gz`
160 | 
161 | ### `--singleEnd`
162 | 
163 | By default, the pipeline expects paired-end data. If you have single-end data, you need to specify `--singleEnd` on the command line when you launch the pipeline. A normal glob pattern, enclosed in quotation marks, can then be used for `--reads`. For example:
164 | 
165 | ```bash
166 | --singleEnd --reads '*.fastq'
167 | ```
168 | 
169 | It is not possible to run a mixture of single-end and paired-end files in one run.
170 | 
171 | ### Library strandedness
172 | 
173 | Three command line flags / config parameters set the library strandedness for a run:
174 | 
175 | * `--forwardStranded`
176 | * `--reverseStranded`
177 | * `--unStranded`
178 | 
179 | If not set, the pipeline will be run as unstranded. Specifying `--pico` makes the pipeline run in `forwardStranded` mode.
180 | 
181 | You can set a default in a cutom Nextflow configuration file such as one saved in `~/.nextflow/config` (see the [nextflow docs](https://www.nextflow.io/docs/latest/config.html) for more). For example:
182 | 
183 | ```nextflow
184 | params {
185 |     reverseStranded = true
186 | }
187 | ```
188 | 
189 | If you have a default strandedness set in your personal config file you can use `--unStranded` to overwrite it for a given run.
190 | 
191 | These flags affect the commands used for several steps in the pipeline - namely HISAT2, featureCounts, RSeQC (`RPKM_saturation.py`), Qualimap and StringTie:
192 | 
193 | * `--forwardStranded`
194 |   * HISAT2: `--rna-strandness F` / `--rna-strandness FR`
195 |   * featureCounts: `-s 1`
196 |   * RSeQC: `-d ++,--` / `-d 1++,1--,2+-,2-+`
197 |   * Qualimap: `-pe strand-specific-forward`
198 |   * StringTie: `--fr`
199 | * `--reverseStranded`
200 |   * HISAT2: `--rna-strandness R` / `--rna-strandness RF`
201 |   * featureCounts: `-s 2`
202 |   * RSeQC: `-d +-,-+` / `-d 1+-,1-+,2++,2--`
203 |   * Qualimap: `-pe strand-specific-reverse`
204 |   * StringTie: `--rf`
205 | 
206 | ## FeatureCounts Extra Gene Names
207 | 
208 | ### Default "`gene_name`" Attribute Type
209 | 
210 | By default, the pipeline uses `gene_name` as the default gene identifier group. In case you need to adjust this, specify using the option `--fc_group_features` to use a different category present in your provided GTF file. Please also take care to use a suitable attribute to categorize the `biotype` of the selected features in your GTF then, using the option `--fc_group_features_type` (default: `gene_biotype`).
211 | 
212 | ### Extra Gene Names or IDs
213 | 
214 | By default, the pipeline uses `gene_names` as additional gene identifiers apart from ENSEMBL identifiers in the pipeline.
215 | This behaviour can be modified by specifying `--fc_extra_attributes` when running the pipeline, which is passed on to featureCounts as an `--extraAttributes` parameter.
216 | See the user guide of the [Subread package here](http://bioinf.wehi.edu.au/subread-package/SubreadUsersGuide.pdf).
217 | Note that you can also specify more than one desired value, separated by a comma:
218 | `--fc_extra_attributes gene_id,...`
219 | 
220 | ### Default "`exon`" Type
221 | 
222 | By default, the pipeline uses `exon` as the default to assign reads. In case you need to adjust this, specify using the option `--fc_count_type` to use a different category present in your provided GTF file (3rd column). For example, for nuclear RNA-seq, one could count reads in introns in addition to exons using `--fc_count_type transcript`.
223 | 
224 | ## Transcriptome mapping with Salmon
225 | 
226 | Use the `--pseudo aligner salmon` option to perform additional quantification at the transcript- and gene-level using [Salmon](https://salmon.readthedocs.io/en/latest/salmon.html). This will be run in addition to either STAR or HiSat2 and cannot be run in isolation, mainly because it allows you to obtain QC metrics with respect to the genomic alignments. By default, the pipeline will use the genome fasta and gtf file to generate the transcript fasta file, and then to build the Salmon index. You can override these parameters using the `--transcript_fasta` and `--salmon_index`, respectively.
227 | 
228 | The default Salmon parameters and a k-mer size of 31 are used to create the index. As [discussed here](https://salmon.readthedocs.io/en/latest/salmon.html#preparing-transcriptome-indices-mapping-based-mode)), a k-mer size off 31 works well with reads that are 75bp or longer.
229 | 
230 | ## Alignment tool
231 | 
232 | By default, the pipeline uses [STAR](https://github.com/alexdobin/STAR) to align the raw FastQ reads to the reference genome. STAR is fast and common, but requires a lot of memory to run, typically around 38GB for the Human GRCh37 reference genome.
233 | 
234 | If you prefer, you can use [HISAT2](https://ccb.jhu.edu/software/hisat2/index.shtml) as the alignment tool instead. Developed by the same group behind the popular Tophat aligner, HISAT2 has a much smaller memory footprint.
235 | 
236 | To use HISAT2, use the parameter `--aligner hisat2` or set `params.aligner = 'hisat2'` in your config file. Alternatively, you can also use `--aligner salmon` if you want to just perform a fast mapping to the transcriptome with Salmon (you will also have to supply the `--transcriptome` parameter or both a `--fasta` and `--gtf`/`--gff`).
237 | 
238 | ## Reference genomes
239 | 
240 | The pipeline config files come bundled with paths to the illumina iGenomes reference index files. If running with docker or AWS, the configuration is set up to use the [AWS-iGenomes](https://ewels.github.io/AWS-iGenomes/) resource.
241 | 
242 | ### `--genome` (using iGenomes)
243 | 
244 | There are 31 different species supported in the iGenomes references. To run the pipeline, you must specify which to use with the `--genome` flag.
245 | 
246 | You can find the keys to specify the genomes in the [iGenomes config file](../conf/igenomes.config). Common genomes that are supported are:
247 | 
248 | * Human
249 |   * `--genome GRCh37`
250 | * Mouse
251 |   * `--genome GRCm38`
252 | * _Drosophila_
253 |   * `--genome BDGP6`
254 | * _S. cerevisiae_
255 |   * `--genome 'R64-1-1'`
256 | 
257 | > There are numerous others - check the config file for more.
258 | 
259 | Note that you can use the same configuration setup to save sets of reference files for your own use, even if they are not part of the iGenomes resource. See the [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html) for instructions on where to save such a file.
260 | 
261 | The syntax for this reference configuration is as follows:
262 | 
263 | ```nextflow
264 | params {
265 |   genomes {
266 |     'GRCh37' {
267 |       star    = '<path to the star index folder>'
268 |       fasta   = '<path to the genome fasta file>' // Used if no star index given
269 |       gtf     = '<path to the genome gtf file>'
270 |       bed12   = '<path to the genome bed file>' // Generated from GTF if not given
271 |     }
272 |     // Any number of additional genomes, key is used with --genome
273 |   }
274 | }
275 | ```
276 | 
277 | ### `--star_index`, `--hisat2_index`, `--fasta`, `--gtf`, `--bed12`
278 | 
279 | If you prefer, you can specify the full path to your reference genome when you run the pipeline:
280 | 
281 | ```bash
282 | --star_index '/path/to/STAR/index' \
283 | --hisat2_index '/path/to/HISAT2/index' \
284 | --fasta '/path/to/reference.fasta' \
285 | --gtf '/path/to/gene_annotation.gtf' \
286 | --gff '/path/to/gene_annotation.gff' \
287 | --bed12 '/path/to/gene_annotation.bed'
288 | ```
289 | 
290 | Note that only one of `--star_index` / `--hisat2_index` are needed depending on which aligner you are using (see below).
291 | 
292 | The minimum requirements are a Fasta and GTF file. Note that `--gff` and `--bed` are auto-derived from the `--gtf` where needed and are not required. If these are provided and no others, then all other reference files will be automatically generated by the pipeline. If you specify a `--gff` file, it will be converted to GTF format automatically by the pipeline. If you specify both, the GTF is preferred over the GFF by the pipeline.
293 | 
294 | ### `--saveReference`
295 | 
296 | Supply this parameter to save any generated reference genome files to your results folder.
297 | These can then be used for future pipeline runs, reducing processing times.
298 | 
299 | ### `--saveTrimmed`
300 | 
301 | By default, trimmed FastQ files will not be saved to the results directory. Specify this
302 | flag (or set to true in your config file) to copy these files when complete.
303 | 
304 | ### `--saveUnaligned``
305 | 
306 | By default, the pipeline doesn't export unaligned/unmapped reads to a separate file. Using this option, STAR / HISAT2 and Salmon will produce a separate BAM file or a list of reads that were not aligned in a separate output directory.
307 | 
308 | ### `--saveAlignedIntermediates`
309 | 
310 | As above, by default intermediate BAM files from the alignment will not be saved. The final BAM files created after the Picard MarkDuplicates step are always saved. Set to true to also copy out BAM files from STAR / HISAT2 and sorting steps.
311 | 
312 | ### `--gencode`
313 | 
314 | If your `--gtf` file is in GENCODE format and you would like to run Salmon (`--pseudo_aligner salmon`) you will need to provide this parameter in order to build the Salmon index appropriately. The `params.fc_group_features_type=gene_type` will also be set as explained below.
315 | 
316 | [GENCODE](https://www.gencodegenes.org) gene annotations are slightly different from ENSEMBL or iGenome annotations in two ways.
317 | 
318 | #### "Type" of gene
319 | 
320 | The `gene_biotype` field which is typically found in Ensembl GTF files contains a key word description regarding the type of gene e.g. `protein_coding`, `lincRNA`, `rRNA`. In GENCODE GTF files this field has been renamed to `gene_type`.
321 | 
322 | ENSEMBL version:
323 | 
324 | ```bash
325 | 8       havana  transcript      70635318        70669174        .       -       .       gene_id "ENSG00000147592"; gene_version "9"; transcript_id "ENST00000522447"; transcript_version "5"; gene_name "LACTB2"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "LACTB2-203"; transcript_source "havana"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS6208"; tag "basic"; transcript_support_level "2";
326 | ```
327 | 
328 | GENCODE version:
329 | 
330 | ```bash
331 | chr8    HAVANA  transcript      70635318        70669174        .       -       .       gene_id "ENSG00000147592.9"; transcript_id "ENST00000522447.5"; gene_type "protein_coding"; gene_name "LACTB2"; transcript_type "protein_coding"; transcript_name "LACTB2-203"; level 2; protein_id "ENSP00000428801.1"; transcript_support_level "2"; tag "alternative_3_UTR"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS6208.1"; havana_gene "OTTHUMG00000164430.2"; havana_transcript "OTTHUMT00000378747.1";
332 | ```
333 | 
334 | Therefore, for `featureCounts` to correctly count the different biotypes when using a GENCODE annotation the `fc_group_features_type` is automatically set to `gene_type` when the `--gencode` flag is specified.
335 | 
336 | #### Transcript IDs in FASTA files
337 | 
338 | The transcript IDs in GENCODE fasta files are separated by vertical pipes (`|`) rather than spaces.
339 | 
340 | ENSEMBL version:
341 | 
342 | ```bash
343 | >ENST00000522447.5 cds chromosome:GRCh38:8:70635318:70669174:-1 gene:ENSG00000147592.9 gene_biotype:protein_coding transcript_biotype:protein_coding gene_symbol:LACTB2 description:lactamase beta 2 [Source:HGNC Symbol;Acc:HGNC:18512]
344 | ```
345 | 
346 | GENCODE version:
347 | 
348 | ```bash
349 | >ENST00000522447.5|ENSG00000147592.9|OTTHUMG00000164430.2|OTTHUMT00000378747.1|LACTB2-203|LACTB2|1034|protein_coding|
350 | ```
351 | 
352 | This [issue](https://github.com/COMBINE-lab/salmon/issues/15) can be overcome by specifying the `--gencode` flag when building the Salmon index.
353 | 
354 | ### `--skipBiotypeQC`
355 | 
356 | This skips the BiotypeQC step in the `featureCounts` process, explicitly useful when there is no available GTF/GFF with any `biotype` or similar information that could be used before.
357 | 
358 | ### `--skipAlignment`
359 | 
360 | By default, the pipeline aligns the input reads to the genome using either HISAT2 or STAR and counts gene expression using featureCounts. If you prefer to skip alignment altogether and only get transcript/gene expression counts with pseudo alignment, use this flag. Note that you will also need to specify `--pseudo_aligner salmon`. If you have a custom transcriptome, supply that with `--transcript_fasta`.
361 | 
362 | ### Compressed Reference File Input
363 | 
364 | By default, the pipeline assumes that the reference genome files are all uncompressed, i.e. raw fasta or gtf files. If instead you intend to use compressed or gzipped references, like directly from ENSEMBL:
365 | 
366 | ```bash
367 | nextflow run --reads 'data/{R1,R2}*.fastq.gz' \
368 |     --genome ftp://ftp.ensembl.org/pub/release-97/fasta/microcebus_murinus/dna_index/Microcebus_murinus.Mmur_3.0.dna.toplevel.fa.gz \
369 |     --gtf ftp://ftp.ensembl.org/pub/release-97/gtf/microcebus_murinus/Microcebus_murinus.Mmur_3.0.97.gtf.gz
370 | ```
371 | 
372 | This assumes that ALL of the reference files are compressed, including the reference indices, e.g. for STAR, HiSat2 or Salmon. For instructions on how to create your own compressed reference files, see the instructions below. This also includes any files specified with `--additional_fasta`, which are assumed to be compressed as well when the `--fasta` file is compressed. The pipeline auto-detects `gz` input for reference files. Mixing of `gz` and non-compressed input is not possible!
373 | 
374 | #### Create compressed (tar.gz) STAR indices
375 | 
376 | STAR indices can be created by using `--saveReference`, and then using `tar` on them:
377 | 
378 | ```bash
379 | cd results/reference_genome
380 | tar -zcvf star.tar.gz star
381 | ```
382 | 
383 | #### HISAT2 indices
384 | 
385 | HiSAT2 indices can be created by using `--saveReference`, and then using `tar` on them:
386 | 
387 | ```bash
388 | cd results/reference_genome
389 | tar -zcvf hisat2.tar.gz *.hisat2_*
390 | ```
391 | 
392 | #### Salmon index
393 | 
394 | Salmon indices can be created by using `--saveReference`, and then using `tar` on them:
395 | 
396 | ```bash
397 | cd results/reference_genome
398 | tar -zcvf salmon_index.tar.gz salmon_index
399 | ```
400 | 
401 | ## Adapter Trimming
402 | 
403 | If specific additional trimming is required (for example, from additional tags),
404 | you can use any of the following command line parameters. These affect the command
405 | used to launch TrimGalore!
406 | 
407 | ### `--clip_r1 [int]`
408 | 
409 | Instructs Trim Galore to remove bp from the 5' end of read 1 (or single-end reads).
410 | 
411 | ### `--clip_r2 [int]`
412 | 
413 | Instructs Trim Galore to remove bp from the 5' end of read 2 (paired-end reads only).
414 | 
415 | ### `--three_prime_clip_r1 [int]`
416 | 
417 | Instructs Trim Galore to remove bp from the 3' end of read 1 _AFTER_ adapter/quality trimming has been performed.
418 | 
419 | ### `--three_prime_clip_r2 [int]`
420 | 
421 | Instructs Trim Galore to remove bp from the 3' end of read 2 _AFTER_ adapter/quality trimming has been performed.
422 | 
423 | ### `--trim_nextseq [int]`
424 | 
425 | This enables the option --nextseq-trim=3'CUTOFF within Cutadapt in Trim Galore, which will set a quality cutoff (that is normally given with -q instead), but qualities of G bases are ignored. This trimming is in common for the NextSeq- and NovaSeq-platforms, where basecalls without any signal are called as high-quality G bases.
426 | 
427 | ### `--skipTrimming`
428 | 
429 | This allows to skip the trimming process to save time when re-analyzing data that has been trimmed already.
430 | 
431 | ## Ribosomal RNA removal
432 | 
433 | If rRNA removal is desired (for example, metatranscriptomics),
434 | add the following command line parameters.
435 | 
436 | ### `--removeRiboRNA`
437 | 
438 | Instructs to use SortMeRNA to remove reads related to ribosomal RNA (or any patterns found in the sequences defined by `--rRNA_database_manifest`).
439 | 
440 | ### `--saveNonRiboRNAReads`
441 | 
442 | By default, non-rRNA FastQ files will not be saved to the results directory. Specify this
443 | flag (or set to true in your config file) to copy these files when complete.
444 | 
445 | ### `--rRNA_database_manifest`
446 | 
447 | By default, rRNA databases in github [`biocore/sortmerna/rRNA_databases`](https://github.com/biocore/sortmerna/tree/master/rRNA_databases) are used. Here the path to a text file can be provided that contains paths to fasta files (one per line, no ' or " for file names) that will be used for database creation for SortMeRNA instead of the default ones. You can see an example in the directory `assets/rrna-default-dbs.txt`. Consequently, similar reads to these sequences will be removed.
448 | 
449 | ## Library Prep Presets
450 | 
451 | Some command line options are available to automatically set parameters for common RNA-seq library preparation kits.
452 | 
453 | > Note that these presets override other command line arguments. So if you specify `--pico --clip_r1 0`, the `--clip_r1` bit will be ignored.
454 | 
455 | If you have a kit that you'd like a preset added for, please let us know!
456 | 
457 | ### `--pico`
458 | 
459 | Sets trimming and standedness settings for the _SMARTer Stranded Total RNA-Seq Kit - Pico Input_ kit.
460 | 
461 | Equivalent to: `--forwardStranded` `--clip_r1 3` `--three_prime_clip_r2 3`
462 | 
463 | ## Skipping QC steps
464 | 
465 | The pipeline contains a large number of quality control steps. Sometimes, it may not be desirable to run all of them if time and compute resources are limited.
466 | The following options make this easy:
467 | 
468 | * `--skipQC` -                Skip **all QC steps**, apart from MultiQC
469 | * `--skipFastQC` -            Skip FastQC
470 | * `--skipRseQC` -             Skip RSeQC
471 | * `--skipQualimap` -          Skip Qualimap
472 | * `--skipPreseq` -            Skip Preseq
473 | * `--skipDupRadar` -          Skip dupRadar (and Picard MarkDuplicates)
474 | * `--skipEdgeR` -             Skip edgeR MDS plot and heatmap
475 | * `--skipMultiQC` -           Skip MultiQC
476 | 
477 | ## Job resources
478 | 
479 | ### Automatic resubmission
480 | 
481 | Each step in the pipeline has a default set of requirements for number of CPUs, memory and time. For most of the steps in the pipeline, if the job exits with an error code of `143` (exceeded requested resources) it will automatically resubmit with higher requests (2 x original, then 3 x original). If it still fails after three times then the pipeline is stopped.
482 | 
483 | ### Custom resource requests
484 | 
485 | Wherever process-specific requirements are set in the pipeline, the default value can be changed by creating a custom config file. See the files hosted at [`nf-core/configs`](https://github.com/nf-core/configs/tree/master/conf) for examples.
486 | 
487 | If you are likely to be running `nf-core` pipelines regularly it may be a good idea to request that your custom config file is uploaded to the `nf-core/configs` git repository. Before you do this please can you test that the config file works with your pipeline of choice using the `-c` parameter (see definition below). You can then create a pull request to the `nf-core/configs` repository with the addition of your config file, associated documentation file (see examples in [`nf-core/configs/docs`](https://github.com/nf-core/configs/tree/master/docs)), and amending [`nfcore_custom.config`](https://github.com/nf-core/configs/blob/master/nfcore_custom.config) to include your custom profile.
488 | 
489 | If you have any questions or issues please send us a message on [Slack](https://nf-co.re/join/slack/).
490 | 
491 | ## AWS Batch specific parameters
492 | 
493 | Running the pipeline on AWS Batch requires a couple of specific parameters to be set according to your AWS Batch configuration. Please use the `-awsbatch` profile and then specify all of the following parameters.
494 | 
495 | ### `--awsqueue`
496 | 
497 | The JobQueue that you intend to use on AWS Batch.
498 | 
499 | ### `--awsregion`
500 | 
501 | The AWS region to run your job in. Default is set to `eu-west-1` but can be adjusted to your needs.
502 | 
503 | Please make sure to also set the `-w/--work-dir` and `--outdir` parameters to a S3 storage bucket of your choice - you'll get an error message notifying you if you didn't.
504 | 
505 | ## Other command line parameters
506 | 
507 | ### `--outdir`
508 | 
509 | The output directory where the results will be saved.
510 | 
511 | ### `--email`
512 | 
513 | Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (`~/.nextflow/config`) then you don't need to specify this on the command line for every run.
514 | 
515 | ### `--email_on_fail`
516 | This works exactly as with `--email`, except emails are only sent if the workflow is not successful.
517 | 
518 | ### `-name`
519 | 
520 | Name for the pipeline run. If not specified, Nextflow will automatically generate a random mnemonic.
521 | 
522 | This is used in the MultiQC report (if not default) and in the summary HTML / e-mail (always).
523 | 
524 | **NB:** Single hyphen (core Nextflow option)
525 | 
526 | ### `-resume`
527 | 
528 | Specify this when restarting a pipeline. Nextflow will used cached results from any pipeline steps where the inputs are the same, continuing from where it got to previously.
529 | 
530 | You can also supply a run name to resume a specific run: `-resume [run-name]`. Use the `nextflow log` command to show previous run names.
531 | 
532 | **NB:** Single hyphen (core Nextflow option)
533 | 
534 | ### `-c`
535 | 
536 | Specify the path to a specific config file (this is a core NextFlow command).
537 | 
538 | **NB:** Single hyphen (core Nextflow option)
539 | 
540 | Note - you can use this to override pipeline defaults.
541 | 
542 | ### `--custom_config_version`
543 | 
544 | Provide git commit id for custom Institutional configs hosted at `nf-core/configs`. This was implemented for reproducibility purposes. Default is set to `master`.
545 | 
546 | ```bash
547 | ## Download and use config file with following git commid id
548 | --custom_config_version d52db660777c4bf36546ddb188ec530c3ada1b96
549 | ```
550 | 
551 | ### `--custom_config_base`
552 | 
553 | If you're running offline, nextflow will not be able to fetch the institutional config files
554 | from the internet. If you don't need them, then this is not a problem. If you do need them,
555 | you should download the files from the repo and tell nextflow where to find them with the
556 | `custom_config_base` option. For example:
557 | 
558 | ```bash
559 | ## Download and unzip the config files
560 | cd /path/to/my/configs
561 | wget https://github.com/nf-core/configs/archive/master.zip
562 | unzip master.zip
563 | 
564 | ## Run the pipeline
565 | cd /path/to/my/data
566 | nextflow run /path/to/pipeline/ --custom_config_base /path/to/my/configs/configs-master/
567 | ```
568 | 
569 | > Note that the nf-core/tools helper package has a `download` command to download all required pipeline
570 | > files + singularity containers + institutional configs in one go for you, to make this process easier.
571 | 
572 | ### `--max_memory`
573 | 
574 | Use to set a top-limit for the default memory requirement for each process.
575 | Should be a string in the format integer-unit. eg. `--max_memory '8.GB'`
576 | 
577 | ### `--max_time`
578 | 
579 | Use to set a top-limit for the default time requirement for each process.
580 | Should be a string in the format integer-unit. eg. `--max_time '2.h'`
581 | 
582 | ### `--max_cpus`
583 | 
584 | Use to set a top-limit for the default CPU requirement for each process.
585 | Should be a string in the format integer-unit. eg. `--max_cpus 1`
586 | 
587 | ### `--hisat_build_memory`
588 | 
589 | Required amount of memory in GB to build HISAT2 index with splice sites.
590 | The HiSAT2 index build can proceed with or without exon / splice junction information.
591 | To work with this, a very large amount of memory is required.
592 | If this memory is not available, the index build will proceed without splicing information.
593 | The `--hisat_build_memory` option changes this threshold. By default it is `200GB` - if your system
594 | `--max_memory` is set to `128GB` but your genome is small enough to build using this, then you can
595 | allow the exon build to proceed by supplying `--hisat_build_memory 100GB`
596 | 
597 | ### `--sampleLevel`
598 | 
599 | Used to turn of the edgeR MDS and heatmap. Set automatically when running on fewer than 3 samples.
600 | 
601 | ### `--plaintext_email`
602 | 
603 | Set to receive plain-text e-mails instead of HTML formatted.
604 | 
605 | ### `--monochrome_logs`
606 | 
607 | Set to disable colourful command line output and live life in monochrome.
608 | 
609 | ### `--multiqc_config`
610 | 
611 | Specify a path to a custom MultiQC configuration file.
612 | 
613 | ## Stand-alone scripts
614 | 
615 | The `bin` directory contains some scripts used by the pipeline which may also be run manually:
616 | 
617 | * `gtf2bed`
618 |   * Script used to generate the BED12 reference files used by RSeQC. Takes a `.gtf` file as input
619 | * `dupRadar.r`
620 |   * dupRadar script used in the _dupRadar_ pipeline process.
621 | * `edgeR_heatmap_MDS.r`
622 |   * edgeR script used in the _Sample Correlation_ process
623 | 


--------------------------------------------------------------------------------