├── CODE_OF_CONDUCT.md
├── LICENSE
├── README.md
├── bin
    ├── QC_summary_stats.py
    ├── RGI_aro_hits.py
    ├── RGI_long_combine.py
    ├── amr_long_to_wide.py
    ├── kraken2_long_to_wide.py
    ├── kraken_long_to_wide.py
    ├── rarefaction
    ├── resistome
    ├── samtools_idxstats.py
    └── trimmomatic_stats.py
├── config
    ├── MEG_AMI.config
    ├── local.config
    ├── local_MSI.config
    ├── local_angus.config
    ├── singularity.config
    └── singularity_slurm.config
├── containers
    ├── Singularity
    └── Singularity.RGI
├── data
    ├── HMM.tar.xz
    ├── adapters
    │   └── nextera.fa
    ├── amr
    │   ├── megares_annotations_v1.01.csv
    │   ├── megares_annotations_v1.02.csv
    │   ├── megares_database_v1.01.fasta
    │   ├── megares_database_v1.02.fasta
    │   ├── megares_drugs_annotations_v2.00.csv
    │   ├── megares_drugs_database_v2.00.fasta
    │   ├── megares_modified_annotations_v2.00.csv
    │   ├── megares_modified_database_v2.00.fasta
    │   ├── megares_to_external_header_mappings_v1.01.tsv
    │   └── snp_location_metadata.csv
    ├── host
    │   └── chr21.fasta.gz
    └── raw
    │   ├── S1_test_R1.fastq.gz
    │   ├── S1_test_R2.fastq.gz
    │   ├── S2_test_R1.fastq.gz
    │   ├── S2_test_R2.fastq.gz
    │   ├── S3_test_R1.fastq.gz
    │   └── S3_test_R2.fastq.gz
├── docs
    ├── AmrPlusPlus_Pipeline_workflow.pdf
    ├── CHANGELOG.md
    ├── FAQs.md
    ├── accessing_AMR++.md
    ├── configuration.md
    ├── contact.md
    ├── dependencies.md
    ├── installation.md
    ├── output.md
    ├── requirements.md
    └── usage.md
├── download_minikraken.sh
├── launch_mpi_slurm.sh
├── main_AmrPlusPlus_v2.nf
├── main_AmrPlusPlus_v2_withKraken.nf
├── main_AmrPlusPlus_v2_withRGI.nf
├── main_AmrPlusPlus_v2_withRGI_Kraken.nf
├── nextflow.config
└── previous_versions
    └── main_amr_plus_plus_v1.nf


/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
 1 | # Contributor Covenant Code of Conduct
 2 | 
 3 | ## Our Pledge
 4 | 
 5 | In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation.
 6 | 
 7 | ## Our Standards
 8 | 
 9 | Examples of behavior that contributes to creating a positive environment include:
10 | 
11 | * Using welcoming and inclusive language
12 | * Being respectful of differing viewpoints and experiences
13 | * Gracefully accepting constructive criticism
14 | * Focusing on what is best for the community
15 | * Showing empathy towards other community members
16 | 
17 | Examples of unacceptable behavior by participants include:
18 | 
19 | * The use of sexualized language or imagery and unwelcome sexual attention or advances
20 | * Trolling, insulting/derogatory comments, and personal or political attacks
21 | * Public or private harassment
22 | * Publishing others' private information, such as a physical or electronic address, without explicit permission
23 | * Other conduct which could reasonably be considered inappropriate in a professional setting
24 | 
25 | ## Our Responsibilities
26 | 
27 | Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior.
28 | 
29 | Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful.
30 | 
31 | ## Scope
32 | 
33 | This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers.
34 | 
35 | ## Enforcement
36 | 
37 | Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team at meg.metagenomics@gmail.com. The project team will review and investigate all complaints, and will respond in a way that it deems appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately.
38 | 
39 | Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership.
40 | 
41 | ## Attribution
42 | 
43 | This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, available at [http://contributor-covenant.org/version/1/4][version]
44 | 
45 | [homepage]: http://contributor-covenant.org
46 | [version]: http://contributor-covenant.org/version/1/4/
47 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2017 Chris Dean
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | Overview
 2 | --------
 3 | [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
 4 | [![Nextflow](https://img.shields.io/badge/Nextflow-%E2%89%A50.25.1-brightgreen.svg)](https://www.nextflow.io/)
 5 | 
 6 | 
 7 | ## [AMR++ v3 now available!](https://github.com/Microbial-Ecology-Group/AMRplusplus)
 8 | We have migrated github repositories to a [new location](https://github.com/Microbial-Ecology-Group/AMRplusplus) (to make it a group repository), and this repository will be deprecated. We apologize for any inconvenience and hope you find v3 useful for your research needs. Of note, version 3 includes:
 9 | * SNP confirmation using a custom database and [SNP verification software](https://github.com/Isabella136/AmrPlusPlus_SNP)
10 | * improved modularity to optimize a personalized workflow
11 | 
12 | 
13 | ### 2022-08-22 : AMR++ update coming soon
14 | Hello AMR++ users, we would like to sincerely apologize for the delay in addresssing your concerns and updating AMR++. As a lot of you likely experienced, COVID was challenging and we were not able dedicate the resources to AMR++ that it deserves. We are happy to announce that we have assembled a team for another major update to AMR++ and the MEGARes database in the next few months!
15 | 
16 | A few notes:
17 |  * We are aware of the issues with integrating RGI results with the AMR++ pipeline. Unfortunately, we are discontinuing our support of integrating AMR++ results with the RGI software.
18 |  * We are attempting to remedy the issues that AMR++ users have reported, but we would also like to hear any other suggestions you might have. Please send any suggestions to enriquedoster@gmail.com with the subject line, "AMR++ update".
19 |  * A few upcoming updates: easy control over the amount of intermediate files that are stored, option to re-arrange pipeline processes, better sample summary statistics provided, and improved functionality through nextflow profiles.
20 | 
21 | 
22 | ### 2020-03-21 : AMR++ v2.0.2 update.
23 | We identified issues in running RGI with the full AMR++ pipeline thanks to github users, AroArz and DiegoBrambilla. We are releasing v2.0.1 to continue AMR++ functionality, but we are planning further updates for the next stable release. As of this update, RGI developers are focused on contributing to the COVID-19 response, so we plan to reconvene with them when their schedule opens up.
24 |   * Please view the [CHANGELOG](https://github.com/meglab-metagenomics/amrplusplus_v2/blob/master/docs/CHANGELOG.md) for more details on changes included in AMR++ v2.0.1
25 |   * To run the AMR++ pipeline with RGI, you'll have to download the CARD database locally and specify it's location using the "--card_db" flag like this:
26 | 
27 | ```
28 | # If you want to include RGI in your analysis, first download CARD with this command:
29 | # We tested AMR++ v2.0.2 with the CARD database v3.0.8, but we recommend using the command below to get the latest CARD db
30 | wget -q -O card-data.tar.bz2 https://card.mcmaster.ca/latest/data && tar xfvj card-data.tar.bz2
31 | 
32 | # In case the latest CARD database is causing issues, you can download the version we used for testing, v3.0.8:
33 | wget -q -O card-data.tar.bz2 https://card.mcmaster.ca/download/0/broadstreet-v3.0.8.tar.bz2 && tar xfvj card-data.tar.bz2
34 | 
35 | 
36 | # If you run into an error regarding "Issued certificate has expired.", try this command:
37 | wget --no-check-certificate -q -O card-data.tar.bz2 https://card.mcmaster.ca/latest/data && tar xfvj card-data.tar.bz2
38 | 
39 | 
40 | # Run the AMR++ pipeline with the "--card_db" flag
41 | nextflow run main_AmrPlusPlus_v2_withRGI.nf -profile singularity --card_db /path/to/card.json --reads '/path/to/reads/*R{1,2}_001.R1.fastq.gz' --output AMR++_results -w work_dir
42 | ```
43 | 
44 | 
45 | # Microbial Ecology Group (MEG)
46 | (https://megares.meglab.org/)
47 | 
48 | Our international multidisciplinary group of scientists and educators is addressing the issues of antimicrobial resistance (AMR) and microbial ecology in agriculture through research, outreach, and education. By characterizing risks related to AMR and microbial ecology, our center will identify agricultural production practices that are harmful and can be avoided, while also identifying and promoting production practices and interventions that are beneficial or do no harm to the ecosystem or public health. This will allow society to realize “sustainable intensification” of agriculture.
49 | 
50 | # MEGARes and the AMR++ bioinformatic pipeline
51 | (http://megares.meglab.org/amrplusplus/latest/html/v2/)
52 | 
53 | The MEGARes database contains sequence data for approximately 8,000 hand-curated antimicrobial resistance genes accompanied by an annotation structure that is optimized for use with high throughput sequencing and metagenomic analysis. The acyclical annotation graph of MEGARes allows for accurate, count-based, hierarchical statistical analysis of resistance at the population level, much like microbiome analysis, and is also designed to be used as a training database for the creation of statistical classifiers.
54 | 
55 | The goal of many metagenomics studies is to characterize the content and relative abundance of sequences of interest from the DNA of a given sample or set of samples. You may want to know what is contained within your sample or how abundant a given sequence is relative to another.
56 | 
57 | Often, metagenomics is performed when the answer to these questions must be obtained for a large number of targets where techniques like multiplex PCR and other targeted methods would be too cumbersome to perform. AmrPlusPlus can process the raw data from the sequencer, identify the fragments of DNA, and count them. It also provides a count of the polymorphisms that occur in each DNA fragment with respect to the reference database.
58 | 
59 | Additionally, you may want to know if the depth of your sequencing (how many reads you obtain that are on target) is high enough to identify rare organisms (organisms with low abundance relative to others) in your population. This is referred to as rarefaction and is calculated by randomly subsampling your sequence data at intervals between 0% and 100% in order to determine how many targets are found at each depth.
60 | 
61 | With AMR++, you will obtain alignment count files for each sample that are combined into a count matrix that can be analyzed using any statistical and mathematical techniques that can operate on a matrix of observations.
62 | 
63 | More Information
64 | ----------------
65 | 
66 | - [Installation](https://github.com/meglab-metagenomics/amrplusplus_v2/blob/master/docs/installation.md)
67 | - [Usage](https://github.com/meglab-metagenomics/amrplusplus_v2/blob/master/docs/usage.md)
68 | - [Configuration](https://github.com/meglab-metagenomics/amrplusplus_v2/blob/master/docs/configuration.md)
69 | - [Accessing AMR++](https://github.com/meglab-metagenomics/amrplusplus_v2/blob/master/docs/accessing_AMR++.md)
70 | - [Output](https://github.com/meglab-metagenomics/amrplusplus_v2/blob/master/docs/output.md)
71 | - [Dependencies](https://github.com/meglab-metagenomics/amrplusplus_v2/blob/master/docs/dependencies.md)
72 | - [Software Requirements](https://github.com/meglab-metagenomics/amrplusplus_v2/blob/master/docs/requirements.md)
73 | - [FAQs](https://github.com/meglab-metagenomics/amrplusplus_v2/blob/master/docs/FAQs.md)
74 | - [Details on AMR++ updates](https://github.com/meglab-metagenomics/amrplusplus_v2/blob/master/docs/update_details.md)
75 | - [Contact](https://github.com/meglab-metagenomics/amrplusplus_v2/blob/master/docs/contact.md)
76 | 


--------------------------------------------------------------------------------
/bin/QC_summary_stats.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import sys
  3 | import gzip
  4 | import argparse
  5 | import glob
  6 | import sys
  7 | import gzip
  8 | import csv
  9 | import pandas as pd
 10 | import numpy
 11 | 
 12 | 
 13 | 
 14 | 
 15 | def parse_cmdline_params(cmdline_params):
 16 | 	info = "Removes duplicate FASTQ entries from a FASTQ file"
 17 | 	parser = argparse.ArgumentParser(description=info)
 18 | 	parser.add_argument('-i', '--input_files', nargs='+', required=True,
 19 |         	                help='Use globstar to pass a list of sequence files, (Ex: *.fastq.gz)')
 20 | 	return parser.parse_args(cmdline_params)
 21 | 
 22 | 
 23 | def pull_Phred(fastq_files):
 24 | 
 25 |     for f in fastq_files: # iterate through each fastq file
 26 |         Plist=[]
 27 |         Qlist=[]
 28 |         Seqlen_list=[]
 29 |         num_reads = 0
 30 | 
 31 |         fp = open(f, 'r') # open each fastq file;   gzip.open if .gz files
 32 |         for line in fp:   # iterate through lines of fastq file
 33 |             Ordqual=[]
 34 |             Q=[]
 35 |             P=[]
 36 |             read_id = line
 37 |             seq = fp.next()
 38 | 
 39 |             #seq = seq[10:len(seq)] # Let's not chop off the umi here since we would be checking the quality after UMI removal and not all samples have UMIs
 40 |             
 41 |             Seqlen_list.append(len(seq))
 42 |             #newseq = seq + spacesep + UMI
 43 |             plus = fp.next()
 44 |             qual = fp.next()
 45 | 
 46 | 
 47 | 
 48 |             for i in range(len(qual)-1): #Exclude the return character
 49 |                 Ordqual.append(ord(qual[i]))
 50 |                 Q.append(Ordqual[i]-33)
 51 |                 P.append(10**(-Q[i]/10))
 52 | 
 53 |             Qlist.append(numpy.mean(Q))
 54 |             Plist.append(numpy.mean(P))
 55 |                 
 56 |             num_reads += 1
 57 |                 
 58 |         print(f,"mean_probability_nucleotide_error",numpy.mean(Plist))
 59 |         print(f,"mean_phred_score",numpy.mean(Qlist))
 60 |         print(f,"total_reads",num_reads)
 61 |         print(f,"mean_read_length",numpy.mean(Seqlen_list))
 62 | 
 63 |         fp.close()
 64 | 
 65 | 
 66 | 
 67 | 
 68 | 
 69 | 
 70 | 
 71 | 
 72 | def print_dict(dict):
 73 |   # iterate through UMIs and repeat counts and print those
 74 |   dups= "Repeat UMI Error"
 75 |   for k, v in dict.items():
 76 | 	  if v != 1:
 77 | 		  print k, dups, v
 78 | 	  else:
 79 | 	  	print k, v
 80 | 
 81 | def print_Rep(dict):
 82 |   # iterate through UMIs and repeat counts and print those
 83 |   dups= "Repeat UMI Error"
 84 |   for k, v in dict.items():
 85 | 	  if len(v) != 1:
 86 | 		  print k, dups, v
 87 | 	  else:
 88 | 	  	print k, v
 89 | 
 90 | 
 91 | #Apply the previous functions; Print UMIs and counts
 92 | if __name__ == "__main__":
 93 |   opts = parse_cmdline_params(sys.argv[1:])
 94 |   fastq_files = opts.input_files
 95 |   #read_dic = pull_UMI(fastq_files)
 96 |   #print_dict(read_dic[0])
 97 |   #print_Rep(read_dic[1])
 98 |   #print ReWriter(read_dic[1])
 99 |   pull_Phred(fastq_files)
100 | 


--------------------------------------------------------------------------------
/bin/RGI_aro_hits.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | 
  3 | import sys
  4 | import csv
  5 | 
  6 | 
  7 | def rgi_output(rgi_file):
  8 |     # Get the desired information from the RGI file
  9 |     with open(rgi_file, 'r') as rgifile:
 10 | 
 11 |         # Get the first line of the file and put it into a list
 12 |         header = rgifile.readline().split('\t')
 13 | 
 14 |         # Get the index of all the elements we want
 15 |         category = header.index('Cut_Off')
 16 |         best_hit_aro = header.index('Best_Hit_ARO')
 17 |         aro = header.index('ARO')
 18 |         model_type = header.index('Model_type')
 19 | 
 20 |         # Define a dictionary where each best_hit_aro is a key and the items are the rest of the desired elements
 21 |         rgi_dict = {}
 22 | 
 23 |         # Go through the file and fill out the dictionary without repeats
 24 |         reader = csv.reader(rgifile, delimiter="\t")
 25 |         for row in reader:
 26 |             aro_name = row[best_hit_aro]
 27 |             if aro_name not in rgi_dict.keys():
 28 |                 rgi_dict[aro_name] = [row[aro], row[category], 1, row[model_type]]
 29 |             else:
 30 |                 rgi_dict[aro_name][2] += 1
 31 | 
 32 |         # Close the file
 33 |         rgifile.close()
 34 | 
 35 |     # Get the name of the file to use for the three outputs
 36 |     sample_name = rgi_file.split(".")
 37 | 
 38 |     perf_sample_name = sample_name
 39 |     perf_sample_name.pop()
 40 |     perf_sample_name.pop()
 41 |     perf_sample_name.insert(1, '_rgi_perfect_hits.csv')
 42 |     perf_file_name = ''.join(perf_sample_name)
 43 | 
 44 |     strict_sample_name = sample_name
 45 |     strict_sample_name.pop()
 46 |     strict_sample_name.insert(1, '_rgi_strict_hits.csv')
 47 |     strict_file_name = ''.join(strict_sample_name)
 48 | 
 49 |     loose_sample_name = sample_name
 50 |     loose_sample_name.pop()
 51 |     loose_sample_name.insert(1, '_rgi_loose_hits.csv')
 52 |     loose_file_name = ''.join(loose_sample_name)
 53 | 
 54 |     # Search the dictionary to see which of the three Cut_Offs exist
 55 |     perf_in_dict = False
 56 |     strict_in_dict = False
 57 |     loose_in_dict = False
 58 | 
 59 |     for x in rgi_dict.values():
 60 |         if x[1] == "Perfect":
 61 |             perf_in_dict = True
 62 |         if x[1] == "Strict":
 63 |             strict_in_dict = True
 64 |         if x[1] == "Loose":
 65 |             loose_in_dict = True
 66 |         # Stop checking if we already know we need to make all three files
 67 |         if perf_in_dict and strict_in_dict and loose_in_dict:
 68 |             break
 69 | 
 70 |     # Write the dictionary to each of the files if cut_off values exist
 71 |     # I.e if the file has no perfects, we don't write a file for it
 72 |     if perf_in_dict:
 73 |         with open(perf_file_name, 'w', newline='\n') as perf_file:
 74 |             perf_write = csv.writer(perf_file, delimiter=',')
 75 |             perf_write.writerow(["Best_Hit_ARO", "ARO", "Cut_Off",  "Sum_Hits", "Model_Type"])
 76 |             for perf_key, perf_item in rgi_dict.items():
 77 |                 if perf_item[1] == "Perfect":
 78 |                     temp_perf_write = perf_item.copy()
 79 |                     temp_perf_write.insert(0, perf_key)
 80 |                     perf_write.writerow(temp_perf_write)
 81 | 
 82 |     if strict_in_dict:
 83 |         with open(strict_file_name, 'w', newline='\n') as strict_file:
 84 |             strict_write = csv.writer(strict_file, delimiter=',')
 85 |             strict_write.writerow(["Best_Hit_ARO", "ARO", "Cut_Off",  "Sum_Hits", "Model_Type"])
 86 |             for strict_key, strict_item in rgi_dict.items():
 87 |                 if strict_item[1] == "Strict":
 88 |                     temp_strict_write = strict_item.copy()
 89 |                     temp_strict_write.insert(0, strict_key)
 90 |                     strict_write.writerow(temp_strict_write)
 91 | 
 92 |     if loose_in_dict:
 93 |         with open(loose_file_name, 'w', newline='\n') as loose_file:
 94 |             loose_write = csv.writer(loose_file, delimiter=',')
 95 |             loose_write.writerow(["Best_Hit_ARO", "ARO", "Cut_Off",  "Sum_Hits", "Model_Type"])
 96 |             for loose_key, loose_item in rgi_dict.items():
 97 |                 if loose_item[1] == "Loose":
 98 |                     temp_loose_write = loose_item.copy()
 99 |                     temp_loose_write.insert(0, loose_key)
100 |                     loose_write.writerow(temp_loose_write)
101 | 
102 | 
103 | if __name__ == '__main__':
104 |     rgi_output(sys.argv[1])
105 | 


--------------------------------------------------------------------------------
/bin/RGI_long_combine.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python3
 2 | 
 3 | import sys
 4 | import csv
 5 | 
 6 | 
 7 | def rgi_long_combine(rgi_perf_file, long_file, combined_output):
 8 |     # Get the desired information from the RGI file
 9 |     with open(rgi_perf_file, 'r') as rgifile:
10 | 
11 |         # Define a dictionary where desired formatted rgi entries (gene column format) are keys and the items are the number of hits and default gene fraction percentage
12 |         rgi_perf_dict = {}
13 | 
14 |         # Go through the file and fill out the dictionary using long format
15 |         reader = csv.reader(rgifile, delimiter=',')
16 |         next(reader)
17 |         for row in reader:
18 |             aro_name = "RGI|" + row[2] + "|" + row[0]
19 |             rgi_perf_dict[aro_name] = [row[3], 80]
20 | 
21 |         # Close the file
22 |         rgifile.close()
23 | 
24 |     # Get the name of the sample
25 |     sample_name = rgi_perf_file.split("_")[0]
26 | 
27 |     # Get counts from the provided long_file
28 |     with open(long_file, 'r') as long_file:
29 | 
30 |         # Define a dictionary where genes are keys and the items are the sample name, number of hits, and gene fraction percentage
31 |         long_dict = {}
32 | 
33 |         # Go through the file and fill out the dictionary using long format lines
34 |         long_reader = csv.reader(long_file, delimiter=',')
35 |         header = next(long_reader)
36 |         for long_row in long_reader:
37 |             split_gene_name = long_row[1].split("|")
38 |             if split_gene_name[len(split_gene_name)-1] != "RequiresSNPConfirmation":
39 |                 long_dict[long_row[1]] = [long_row[0], long_row[2], long_row[3]]
40 | 
41 |         # Close the file
42 |         long_file.close()
43 | 
44 | 
45 |     # Write combined output to the provided output file
46 |     with open(combined_output, 'w', newline='\n') as combined_file:
47 |         combined_write = csv.writer(combined_file, delimiter=',')
48 |         combined_write.writerow(header)
49 |         # Write to the combined file using the dictionaries we created previously
50 |         for rgi_key, rgi_item in rgi_perf_dict.items():
51 |             temp_rgi_write = rgi_item.copy()
52 |             temp_rgi_write.insert(0, rgi_key)
53 |             temp_rgi_write.insert(0, sample_name)
54 |             combined_write.writerow(temp_rgi_write)
55 | 
56 |         for long_key, long_item in long_dict.items():
57 |             temp_long_write = long_item.copy()
58 |             temp_long_write.insert(1, long_key)
59 |             combined_write.writerow(temp_long_write)
60 | 
61 | 
62 | if __name__ == '__main__':
63 |     rgi_long_combine(sys.argv[1], sys.argv[2], sys.argv[3])
64 | 


--------------------------------------------------------------------------------
/bin/amr_long_to_wide.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python3
 2 | 
 3 | __author__ = "Steven Lakin"
 4 | __copyright__ = ""
 5 | __credits__ = ["Steven Lakin"]
 6 | __version__ = ""
 7 | __maintainer__ = "lakinsm"
 8 | __email__ = "lakinsm@colostate.edu"
 9 | __status__ = "Cows go moo."
10 | 
11 | import argparse
12 | import sys
13 | 
14 | amr_level_names = {0: 'Class', 1: 'Mechanism', 2: 'Group'}
15 | 
16 | def parse_cmdline_params(cmdline_params):
17 |     info = ""
18 |     parser = argparse.ArgumentParser(description=info)
19 |     parser.add_argument('-i', '--input_files', nargs='+', required=True,
20 |                         help='Use globstar to pass a list of files, (Ex: *.tsv)')
21 |     parser.add_argument('-o', '--output_file', required=True,
22 |                         help='Output file name for writing the AMR_analytic_matrix.csv file')
23 |     return parser.parse_args(cmdline_params)
24 | 
25 | def amr_load_data(file_name_list):
26 |     samples = {}
27 |     labels = set()
28 |     for file in file_name_list:
29 |         with open(file, 'r') as f:
30 |             data = f.read().split('\n')[1:]
31 |             for entry in data:
32 |                 if not entry:
33 |                     continue
34 |                 entry = entry.split('\t')
35 |                 sample = entry[0].split('.')[0]
36 |                 count = float(entry[2])
37 |                 gene_name = entry[1]
38 |                 try:
39 |                     samples[sample][gene_name] = count
40 |                 except KeyError:
41 |                     try:
42 |                         samples[sample].setdefault(gene_name, count)
43 |                     except KeyError:
44 |                         samples.setdefault(sample, {gene_name: count})
45 |                 labels.add(gene_name)
46 |     return samples, labels
47 | 
48 | def output_amr_analytic_data(outfile, S, L):
49 |     with open(outfile, 'w') as amr:
50 |         local_sample_names = []
51 |         for sample, dat in S.items():
52 |             local_sample_names.append(sample)
53 |         amr.write(','.join(local_sample_names) + '\n')
54 |         for label in L:
55 |             local_counts = []
56 |             amr.write(label + ',')
57 |             for local_sample in local_sample_names:
58 |                 if label in S[local_sample]:
59 |                     local_counts.append(str(S[local_sample][label]))
60 |                 else:
61 |                     local_counts.append(str(0))
62 |             amr.write(','.join(local_counts) + '\n')
63 | 
64 | if __name__ == '__main__':
65 |     opts = parse_cmdline_params(sys.argv[1:])
66 |     S, L = amr_load_data(opts.input_files)
67 |     output_amr_analytic_data(opts.output_file, S, L)
68 | 


--------------------------------------------------------------------------------
/bin/kraken2_long_to_wide.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | 
  3 | import sys
  4 | import argparse
  5 | import numpy as np
  6 | 
  7 | __author__ = 'Steven Lakin'
  8 | __maintainer__ = 'lakinsm'
  9 | __email__ = 'Steven.Lakin@colostate.edu'
 10 | 
 11 | 
 12 | taxa_levels = {
 13 |     'D': 0,
 14 |     'K': 1,
 15 |     'P': 2,
 16 |     'C': 3,
 17 |     'O': 4,
 18 |     'F': 5,
 19 |     'G': 6,
 20 |     'S': 7,
 21 |     'U': 8
 22 | }
 23 | 
 24 | taxa_level_names = {
 25 |     0: 'Domain',
 26 |     1: 'Kingdom',
 27 |     2: 'Phylum',
 28 |     3: 'Class',
 29 |     4: 'Order',
 30 |     5: 'Family',
 31 |     6: 'Genus',
 32 |     7: 'Species',
 33 |     8: 'Unclassified'
 34 | }
 35 | 
 36 | 
 37 | def parse_cmdline_params(cmdline_params):
 38 |     info = ""
 39 |     parser = argparse.ArgumentParser(description=info)
 40 |     parser.add_argument('-i', '--input_files', nargs='+', required=True,
 41 |                         help='Use globstar to pass a list of files, (Ex: *.tsv)')
 42 |     parser.add_argument('-o', '--output_file', required=True,
 43 |                         help='Output file name for writing the kraken_analytic_matrix.csv file')
 44 |     return parser.parse_args(cmdline_params)
 45 | 
 46 | 
 47 | def dict_to_matrix(D):
 48 |     ncol = len(D.keys())
 49 |     unique_nodes = []
 50 |     samples = []
 51 |     for sample, tdict in D.items():
 52 |         for taxon in tdict.keys():
 53 |             if taxon not in unique_nodes:
 54 |                 unique_nodes.append(taxon)
 55 |     nrow = len(unique_nodes)
 56 |     return_values = np.zeros((nrow, ncol), dtype=np.float)
 57 |     for j, (sample, tdict) in enumerate(D.items()):
 58 |         samples.append(sample)
 59 |         for i, taxon in enumerate(unique_nodes):
 60 |             if taxon in tdict:
 61 |                 return_values[i, j] = np.float(tdict[taxon])
 62 |     return return_values, unique_nodes, samples
 63 | 
 64 | 
 65 | def kraken2_load_analytic_data(file_name_list):
 66 |     return_values = {}
 67 |     unclassifieds = {}  # { sample: [unclassified, total, percent] }
 68 |     for file in file_name_list:
 69 |         sample_id = file.split('/')[-1].replace('.kraken.report', '')
 70 |         unclassifieds.setdefault(sample_id, [0, 0, 0])
 71 |         with open(file, 'r') as f:
 72 |             data = f.read().split('\n')
 73 |             taxon_list = ['NA'] * 8
 74 |             previous_taxon_level = 0
 75 |             for line in data:
 76 |                 if not line:
 77 |                     continue
 78 |                 entries = line.split('\t')
 79 |                 node_count = int(entries[2])
 80 |                 node_level = entries[3]
 81 |                 node_name = entries[5].strip()
 82 |                 if node_level == 'U':
 83 |                     unclassifieds[sample_id][0] = node_count
 84 |                     unclassifieds[sample_id][1] += node_count
 85 |                     unclassifieds[sample_id][2] = float(entries[0])
 86 |                     continue
 87 |                 elif node_level == 'R':
 88 |                     unclassifieds[sample_id][1] += int(entries[1])
 89 |                     continue
 90 |                 if len(node_level) > 1:
 91 |                     if node_level[0] in ('U', 'R'):
 92 |                         continue
 93 |                     parent_node_level = node_level[0]
 94 |                 else:
 95 |                     parent_node_level = node_level
 96 |                 this_taxon_level = taxa_levels[parent_node_level]
 97 |                 if len(node_level) == 1:
 98 |                     taxon_list[this_taxon_level] = node_name
 99 |                 if this_taxon_level < previous_taxon_level:
100 |                     taxon_list[this_taxon_level + 1:] = ['NA'] * (7 - this_taxon_level)
101 |                 previous_taxon_level = this_taxon_level
102 |                 if node_count == 0:
103 |                     continue
104 |                 this_taxonomy_string = '|'.join(taxon_list[:this_taxon_level + 1])
105 |                 try:
106 |                     return_values[sample_id][this_taxonomy_string] += node_count
107 |                 except KeyError:
108 |                     try:
109 |                         return_values[sample_id].setdefault(this_taxonomy_string, node_count)
110 |                     except KeyError:
111 |                         return_values.setdefault(sample_id, {this_taxonomy_string: node_count})
112 |     return dict_to_matrix(return_values), unclassifieds
113 | 
114 | 
115 | def output_kraken2_analytic_data(outfile, M, m_names, n_names, unclassifieds):
116 |     with open(outfile, 'w') as out, \
117 |             open('kraken_unclassifieds.csv', 'w') as u_out:
118 |         out.write(','.join(n_names) + '\n')
119 |         for i, row in enumerate(M):
120 |             out.write('\"{}\",'.format(
121 |                 m_names[i].replace(',', '')
122 |             ))
123 |             out.write(','.join([str(x) for x in row]) + '\n')
124 |         u_out.write('SampleID,NumberUnclassified,Total,PercentUnclassified\n')
125 |         for sample, numbers in unclassifieds.items():
126 |             u_out.write('{},{}\n'.format(
127 |                 sample,
128 |                 ','.join([str(x) for x in numbers])
129 |             ))
130 | 
131 | 
132 | if __name__ == '__main__':
133 |     opts = parse_cmdline_params(sys.argv[1:])
134 |     kraken2_load_analytic_data(opts.input_files)
135 |     (K, m, n), u = kraken2_load_analytic_data(opts.input_files)
136 |     output_kraken2_analytic_data(opts.output_file, K, m, n, u)
137 | 


--------------------------------------------------------------------------------
/bin/kraken_long_to_wide.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | 
  3 | import argparse
  4 | import numpy as np
  5 | import sys
  6 | 
  7 | __author__ = "Steven Lakin"
  8 | __copyright__ = ""
  9 | __credits__ = ["Steven Lakin"]
 10 | __version__ = ""
 11 | __maintainer__ = "lakinsm"
 12 | __email__ = "lakinsm@colostate.edu"
 13 | __status__ = "Cows go moo."
 14 | 
 15 | taxa_level = {'D': 0, 'P': 1, 'C': 2, 'O': 3, 'F': 4, 'G': 5, 'S': 6}
 16 | taxa_level_names = {1: 'Domain', 2: 'Phylum', 3: 'Class', 4: 'Order',
 17 |                     5: 'Family', 6: 'Genus', 7: 'Species', 8: 'Unclassified'}
 18 | 
 19 | def parse_cmdline_params(cmdline_params):
 20 |     info = ""
 21 |     parser = argparse.ArgumentParser(description=info)
 22 |     parser.add_argument('-i', '--input_files', nargs='+', required=True,
 23 |                         help='Use globstar to pass a list of files, (Ex: *.tsv)')
 24 |     parser.add_argument('-o', '--output_directory', required=True,
 25 |                         help='Output directory for writing the kraken_analytic_matrix.csv file')
 26 |     return parser.parse_args(cmdline_params)
 27 | 
 28 | def dict_to_matrix(D):
 29 |     ncol = len(D.keys())
 30 |     unique_nodes = []
 31 |     samples = []
 32 |     for sample, tdict in D.items():
 33 |         for taxon in tdict.keys():
 34 |             if taxon not in unique_nodes:
 35 |                 unique_nodes.append(taxon)
 36 |     nrow = len(unique_nodes)
 37 |     ret = np.zeros((nrow, ncol), dtype=np.float)
 38 |     for j, (sample, tdict) in enumerate(D.items()):
 39 |         samples.append(sample)
 40 |         for i, taxon in enumerate(unique_nodes):
 41 |             if taxon in tdict:
 42 |                 ret[i, j] = np.float(tdict[taxon])
 43 |     return ret, unique_nodes, samples
 44 | 
 45 | def kraken_load_analytic_data(file_name_list):
 46 |     ret = {}
 47 |     for file in file_name_list:
 48 |         sample_id = file.split('/')[-1].replace('.kraken.report', '')
 49 |         with open(file, 'r') as f:
 50 |             data = f.read().split('\n')
 51 |             assignment_list = [''] * 15
 52 |             taxon_list = ['NA'] * 7
 53 |             for entry in data:
 54 |                 if not entry:
 55 |                     continue
 56 |                 temp_name = entry.split('\t')[5]
 57 |                 space_level = int((len(temp_name) - len(temp_name.lstrip(' '))) / 2) - 1
 58 |                 if (space_level <= 0) and (''.join(entry.split()[5:]) not in ('Viruses', 'Bacteria', 'Archaea')):
 59 |                     continue
 60 |                 if space_level < 0:
 61 |                     space_level = 0
 62 |                 entry = entry.split()
 63 |                 if entry[3] == 'U':
 64 |                     continue
 65 |                 node_name = ' '.join(entry[5:])
 66 |                 assignment_list[space_level] = node_name
 67 |                 assignment_len = len(assignment_list) - assignment_list.count('')
 68 |                 if (space_level + 1) < assignment_len:
 69 |                     assignment_list = assignment_list[:space_level + 1] + [''] * (14 - space_level)
 70 |                 if entry[3] != '-':
 71 |                     node_level = taxa_level[entry[3]]
 72 |                     taxon_list[node_level] = node_name
 73 |                     taxon_name = '|'.join(taxon_list[:node_level+1])
 74 |                     taxon_len = len(taxon_list) - taxon_list.count('NA')
 75 |                     if (node_level + 1) < taxon_len:
 76 |                         taxon_list = taxon_list[:node_level + 1] + ['NA'] * (6 - node_level)
 77 |                 temp_list = [x for x in taxon_list]
 78 |                 if entry[3] == '-':
 79 |                     temp_list = [x for x in taxon_list]
 80 |                     while temp_list and temp_list[-1] == 'NA':
 81 |                         temp_list.pop()
 82 |                 if (space_level + 1) < assignment_len:
 83 |                     iter_loc = space_level + 1
 84 |                     while True:
 85 |                         if iter_loc == 0:
 86 |                             break
 87 |                         try:
 88 |                             iter_loc = temp_list.index(assignment_list[iter_loc])
 89 |                             break
 90 |                         except ValueError:
 91 |                             iter_loc -= 1
 92 |                     temp_list = [x for x in taxon_list[:iter_loc + 1]]
 93 |                     taxon_list = taxon_list[:iter_loc + 1] + ['NA'] * (6 - iter_loc)
 94 |                     taxon_name = '|'.join(temp_list)
 95 |                 if float(entry[2]) == 0.0:
 96 |                     continue
 97 |                 try:
 98 |                     ret[sample_id][taxon_name] += float(entry[2])
 99 |                 except KeyError:
100 |                     try:
101 |                         ret[sample_id].setdefault(taxon_name, float(entry[2]))
102 |                     except KeyError:
103 |                         ret.setdefault(sample_id, {taxon_name: float(entry[2])})
104 |     return dict_to_matrix(ret)
105 | 
106 | def output_kraken_analytic_data(outdir, M, m_names, n_names):
107 |     with open(outdir + '/kraken_analytic_matrix.csv', 'w') as out:
108 |         out.write(','.join(n_names) + '\n')
109 |         for i, row in enumerate(M):
110 |             out.write('\"{}\",'.format(m_names[i].replace(',', '')))
111 |             out.write(','.join([str(x) for x in row]) + '\n')
112 | 
113 | if __name__ == '__main__':
114 |     opts = parse_cmdline_params(sys.argv[1:])
115 |     K, m, n = kraken_load_analytic_data(opts.input_files)
116 |     output_kraken_analytic_data(opts.output_directory, K, m, n)
117 | 


--------------------------------------------------------------------------------
/bin/rarefaction:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/meglab-metagenomics/amrplusplus_v2/f8e6f9427e7ed63c0f1ac901ebbe4eddd5053ed2/bin/rarefaction


--------------------------------------------------------------------------------
/bin/resistome:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/meglab-metagenomics/amrplusplus_v2/f8e6f9427e7ed63c0f1ac901ebbe4eddd5053ed2/bin/resistome


--------------------------------------------------------------------------------
/bin/samtools_idxstats.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python3
 2 | 
 3 | __author__ = "Chris Dean"
 4 | __copyright__ = ""
 5 | __credits__ = ["Chris Dean"]
 6 | __version__ = ""
 7 | __maintainer__ = "cdeanj"
 8 | __email__ = "cdean11@colostate.edu"
 9 | __status__ = "Cows go moo."
10 | 
11 | import argparse
12 | import glob
13 | import os
14 | import sys
15 | 
16 | def parse_cmdline_params(cmdline_params):
17 |     info = "Parses a Samtools idxstats file to obtain the total number of mapped, unmapped, and total reads"
18 |     parser = argparse.ArgumentParser(description=info)
19 |     parser.add_argument('-i', '--input_files', nargs='+', required=True,
20 |                         help='Use globstar to pass a list of files, (Ex: *.tsv)')
21 |     parser.add_argument('-o', '--output_file', required=True,
22 |                         help='Output file to write mapping results to')
23 |     return parser.parse_args(cmdline_params)
24 | 
25 | def header(output_file):
26 |     with open(output_file, 'a') as o:
27 |         o.write('Sample\tNumberOfInputReads\tMapped\tUnmapped\n')
28 |     o.close()
29 | 
30 | def mapping_stats(input_list, output_file):
31 |     for f in input_list:
32 |         mapped = 0
33 |         unmapped = 0
34 |         number_of_reads = 0
35 |         with open(f, 'r') as fp:
36 |             sample_name = os.path.basename(str(fp.name)).split('.', 1)[0]
37 |             for line in fp:
38 |                 columns = line.strip().split('\t')
39 |                 mapped += int(columns[2])
40 |                 unmapped += int(columns[3])
41 |             number_of_reads += mapped + unmapped
42 |         fp.close()
43 |         with open(output_file, 'a') as o:
44 |             o.write(sample_name + '\t' + str(number_of_reads) + '\t' + str(mapped) + '\t' + str(unmapped) + '\n')
45 |         o.close()
46 | 
47 | if __name__ == "__main__":
48 |     opts = parse_cmdline_params(sys.argv[1:])
49 |     header(opts.output_file)
50 |     mapping_stats(opts.input_files, opts.output_file)
51 | 


--------------------------------------------------------------------------------
/bin/trimmomatic_stats.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python3
 2 | 
 3 | __author__ = "Chris Dean"
 4 | __copyright__ = ""
 5 | __credits__ = ["Chris Dean"]
 6 | __version__ = ""
 7 | __maintainer__ = "cdeanj"
 8 | __email__ = "cdean11@colostate.edu"
 9 | __status__ = "Cows go moo."
10 | 
11 | import argparse
12 | import glob
13 | import os
14 | import re
15 | import sys
16 | 
17 | def parse_cmdline_params(cmdline_params):
18 |     info = "Parses a Trimmomatic log file to obtain the total number of input reads and dropped reads"
19 |     parser = argparse.ArgumentParser(description=info)
20 |     parser.add_argument('-i', '--input_files', nargs='+', required=True,
21 |                         help='Use globstar to pass a list of files, (Ex: *.tsv)')
22 |     parser.add_argument('-o', '--output_file', required=True,
23 |                         help='Output file to write mapping results to')
24 |     return parser.parse_args(cmdline_params)
25 | 
26 | def header(output_file):
27 |     with open(output_file, 'a') as o:
28 |         o.write('Sample\tNumberOfInputReads\tForwardOnlySurviving\tReverseOnlySurviving\tDropped\n')
29 |     o.close()
30 | 
31 | def qc_stats(input_list, output_file):
32 |     for f in input_list:
33 |         total = 0
34 |         forward_surviving = 0
35 |         reverse_surviving = 0
36 |         dropped = 0
37 |         with open(f, 'r') as fp:
38 |             sample_name = os.path.basename(str(fp.name)).split('.', 1)[0]
39 |             for line in fp:
40 |                 total = re.search('Input Read Pairs: (\d+)', line)
41 |                 forward_surviving = re.search('Forward Only Surviving: (\d+)', line)
42 |                 reverse_surviving = re.search('Reverse Only Surviving: (\d+)', line)
43 |                 dropped = re.search('Dropped: (\d+)', line)
44 |                 if total:
45 |                     break
46 |         fp.close()
47 |         with open(output_file, 'a') as ofp:
48 |             ofp.write(sample_name + '\t' + total.group(1) + '\t' + forward_surviving.group(1) + '\t' + reverse_surviving.group(1) + '\t' + dropped.group(1) + '\n')
49 |         ofp.close()
50 | 
51 | if __name__ == "__main__":
52 |     opts = parse_cmdline_params(sys.argv[1:])
53 |     header(opts.output_file)
54 |     qc_stats(opts.input_files, opts.output_file)
55 | 


--------------------------------------------------------------------------------
/config/MEG_AMI.config:
--------------------------------------------------------------------------------
 1 | // The location of each dependency binary needs to be specified here.
 2 | // The examples listed below are assuming the tools are already in your $PATH, however,
 3 | // the absolute path to each tool can be entered individually.
 4 | env {
 5 |     /* These following tools are required to run AmrPlusPlus*/
 6 |     JAVA = "java"
 7 |     TRIMMOMATIC = "~/.conda/envs/AmrPlusPlus/share/trimmomatic/trimmomatic.jar"
 8 |     PYTHON3 = "python3"
 9 |     BWA = "bwa"
10 |     SAMTOOLS = "samtools"
11 |     BEDTOOLS = 	"bedtools"
12 |     RESISTOME = 	"resistome"
13 |     RAREFACTION = 	"rarefaction"
14 |     SNPFINDER = 	"snpfinder"
15 |     FREEBAYES = "freebayes"
16 |     /* These next tools are optional depending on which analyses you want to run */
17 |     KRAKEN2 = "kraken2"
18 |     RGI = "rgi"
19 |     DIAMOND = "diamond"
20 | }
21 | 
22 | process {
23 |     cpus = 1                     // The maximum amount of CPUs to use
24 |     disk = '125 GB'              // The maximum amount of disk space a single process is allowed to use
25 |     //errorStrategy = 'ignore'     // Ignore process errors
26 |     executor = 'local'           // The type of system the processes are being run on (do not modify this)
27 |     maxForks = 1                 // The maximum number of forks a single process is allowed to spawn
28 |     memory = '8 GB'              // The maximum amount of memory a single process is allowed to use
29 | }
30 | 


--------------------------------------------------------------------------------
/config/local.config:
--------------------------------------------------------------------------------
 1 | // The location of each dependency binary needs to be specified here.
 2 | // The examples listed below are assuming the tools are already in your $PATH, however,
 3 | // the absolute path to each tool can be entered individually.
 4 | env {
 5 |     /* These following tools are required to run AmrPlusPlus*/
 6 |     JAVA = "java"
 7 |     TRIMMOMATIC = "trimmomatic-0.36.jar"
 8 |     PYTHON3 = "python3"
 9 |     BWA = "bwa"
10 |     SAMTOOLS = "samtools"
11 |     BEDTOOLS = 	"bedtools"
12 |     RESISTOME = 	"resistome"
13 |     RAREFACTION = 	"rarefaction"
14 |     SNPFINDER = 	"snpfinder"
15 |     FREEBAYES = "freebayes"
16 |     /* These next tools are optional depending on which analyses you want to run */
17 |     KRAKEN2 = "kraken2"
18 |     RGI = "rgi"
19 |     DIAMOND = "diamond"
20 | }
21 | 
22 | process {
23 |     cpus = 4                     // The maximum amount of CPUs to use
24 |     disk = '125 GB'              // The maximum amount of disk space a single process is allowed to use
25 |     //errorStrategy = 'ignore'     // Ignore process errors
26 |     executor = 'local'           // The type of system the processes are being run on (do not modify this)
27 |     maxForks = 1                 // The maximum number of forks a single process is allowed to spawn
28 |     memory = '8 GB'              // The maximum amount of memory a single process is allowed to use
29 | }
30 | 


--------------------------------------------------------------------------------
/config/local_MSI.config:
--------------------------------------------------------------------------------
 1 | // The location of each dependency binary needs to be specified here.
 2 | // The paths listed below are just examples, however, I recommend
 3 | // following a similar format.
 4 | 
 5 | env {
 6 |     /* These following tools are required to run AmrPlusPlus*/
 7 |     JAVA = "/panfs/roc/msisoft/java/openjdk-8_202/bin/java"
 8 |     TRIMMOMATIC = "/panfs/roc/msisoft/trimmomatic/0.33/trimmomatic.jar"
 9 |     PYTHON3 = "/panfs/roc/msisoft/anaconda/anaconda3-2018.12/bin/python"
10 |     BWA = "/panfs/roc/msisoft/bwa/0.7.17_gcc-7.2.0_haswell/bwa"
11 |     SAMTOOLS = "/panfs/roc/msisoft/samtools/1.9_gcc-7.2.0_haswell/bin/samtools"
12 |     BEDTOOLS = 	"/panfs/roc/msisoft/bedtools/2.27.1/bin/bedtools"
13 |     RESISTOME = 	"/home/noyes046/shared/tools/resistomeanalyzer_v2/resistome"
14 |     RAREFACTION = 	"/home/noyes046/shared/tools/rarefaction"
15 |     SNPFINDER = 	"/home/noyes046/shared/tools/snpfinder"
16 |     FREEBAYES = "/soft/freebayes/1.2.0/bin/freebayes"
17 |     /* These next tools are optional depending on which analyses you want to run */
18 |     KRAKEN2 = "/panfs/roc/msisoft/kraken/2.0.7beta/kraken2"
19 |     RGI = "/panfs/roc/groups/11/noyes046/edoster/.conda/envs/AmrPlusPlus_env/bin/rgi"
20 | }
21 | 
22 | process {
23 |     maxForks = 3
24 |     disk = '125 GB'              // The maximum amount of disk space a single process is allowed to use
25 |     /* errorStrategy = 'ignore'     // Ignore process errors */
26 |     executor = 'local'           // The type of system the processes are being run on (do not modify this)
27 | }
28 | 


--------------------------------------------------------------------------------
/config/local_angus.config:
--------------------------------------------------------------------------------
 1 | // The location of each dependency binary needs to be specified here.
 2 | // The examples listed below are assuming the tools are already in your $PATH, however,
 3 | // the absolute path to each tool can be entered individually.
 4 | 
 5 | env {
 6 |     /* These following tools are required to run AmrPlusPlus*/
 7 |     JAVA = "/usr/bin/java"
 8 |     TRIMMOMATIC = "/s/angus/index/common/tools/Trimmomatic-0.36/trimmomatic-0.36.jar"
 9 |     PYTHON3 = "/usr/bin/python3"
10 |     BWA = "/usr/bin/bwa"
11 |     SAMTOOLS = "/usr/local/bin/samtools"
12 |     BEDTOOLS = 	"/usr/bin/bedtools"
13 |     RESISTOME = 	"/s/angus/index/common/tools/resistome"
14 |     RAREFACTION = 	"/s/angus/index/common/tools/rarefaction"
15 |     SNPFINDER = 	"/s/angus/index/common/tools/snpfinder"
16 |     FREEBAYES = "/s/angus/index/common/tools/miniconda3/envs/AmrPlusPlus_env/bin/freebayes"
17 |     /* These next tools are optional depending on which analyses you want to run */
18 |     KRAKEN2 = "/s/angus/index/common/tools/miniconda3/envs/AmrPlusPlus_env/bin/kraken2"
19 |     RGI = "/s/angus/index/common/tools/miniconda3/envs/AmrPlusPlus_env/bin/rgi"
20 |     DIAMOND = "/s/angus/index/common/tools/miniconda3/envs/AmrPlusPlus_env/bin/diamond"
21 | }
22 | 
23 | 
24 | 
25 | 
26 | process {
27 |     maxForks = 3
28 |     disk = '125 GB'              // The maximum amount of disk space a single process is allowed to use
29 |     executor = 'local'           // The type of system the processes are being run on (do not modify this)
30 | }
31 | 


--------------------------------------------------------------------------------
/config/singularity.config:
--------------------------------------------------------------------------------
 1 | singularity {
 2 |     /* Enables Singularity container execution by default */
 3 |     enabled = true
 4 |     cacheDir = "$PWD"
 5 |     /* Enable auto-mounting of host paths (requires user bind control feature enabled */
 6 |     autoMounts = true
 7 | }
 8 | 
 9 | env {
10 |     /* These following tools are required to run AmrPlusPlus*/
11 |     JAVA = '/usr/local/envs/AmrPlusPlus_env/bin/java'
12 |     TRIMMOMATIC = '/usr/local/envs/AmrPlusPlus_env/share/trimmomatic/trimmomatic.jar'
13 |     PYTHON3 = "python3"
14 |     BWA = "bwa"
15 |     SAMTOOLS = "samtools"
16 |     BEDTOOLS = 	"bedtools"
17 |     RESISTOME = 	"resistome"
18 |     RAREFACTION = 	"rarefaction"
19 |     SNPFINDER = 	"snpfinder"
20 |     FREEBAYES = "freebayes"
21 |     /* These next tools are optional depending on which analyses you want to run */
22 |     KRAKEN2 = "kraken2"
23 |     RGI = "rgi"
24 |     DIAMOND = "diamond"
25 | }
26 | 
27 | 
28 | process {
29 |   process.container = 'shub://meglab-metagenomics/amrplusplus_v2'
30 |   maxForks = 10                 // The maximum number of forks a single process is allowed to spawn
31 |   withName:RunRGI {
32 |     container = 'shub://meglab-metagenomics/amrplusplus_v2:rgi'
33 |   }
34 |   withName:RunDedupRGI {
35 |     container = 'shub://meglab-metagenomics/amrplusplus_v2:rgi'
36 |   }
37 | }
38 | 


--------------------------------------------------------------------------------
/config/singularity_slurm.config:
--------------------------------------------------------------------------------
  1 | singularity {
  2 |     /* Enables Singularity container execution by default */
  3 |     enabled = true
  4 |     cacheDir = "$PWD"
  5 |     /* Enable auto-mounting of host paths (requires user bind control feature enabled */
  6 |     autoMounts = true
  7 | }
  8 | 
  9 | env {
 10 |     /* These following tools are required to run AmrPlusPlus*/
 11 |     JAVA = '/usr/local/envs/AmrPlusPlus_env/bin//java'
 12 |     TRIMMOMATIC = '/usr/local/envs/AmrPlusPlus_env/share/trimmomatic/trimmomatic.jar'
 13 |     PYTHON3 = "python3"
 14 |     BWA = "bwa"
 15 |     SAMTOOLS = "samtools"
 16 |     BEDTOOLS = 	"bedtools"
 17 |     RESISTOME = 	"resistome"
 18 |     RAREFACTION = 	"rarefaction"
 19 |     SNPFINDER = 	"snpfinder"
 20 |     FREEBAYES = "freebayes"
 21 |     /* These next tools are optional depending on which analyses you want to run */
 22 |     KRAKEN2 = "kraken2"
 23 |     RGI = "/opt/conda/envs/PI_env/bin/rgi"
 24 |     DIAMOND = "diamond"
 25 | }
 26 | 
 27 | 
 28 | process {
 29 |   process.executor='slurm'
 30 |   process.container = 'shub://meglab-metagenomics/amrplusplus_v2'
 31 |   maxForks = 10                 // The maximum number of forks a single process is allowed to spawn
 32 |   withName:RunQC {
 33 |       process.qos='normal'
 34 |       clusterOptions='--job-name=QC%j --qos=normal --time=23:59:00'
 35 |   }
 36 |   withName:QCStats {
 37 |       process.qos='normal'
 38 |       clusterOptions='--job-name=QCstats%j --qos=normal --time=05:00:00'
 39 |   }
 40 |   withName:BuildHostIndex {
 41 |       process.qos='normal'
 42 |       clusterOptions='--job-name=hostindex%j --qos=normal --ntasks-per-node=12 --time=23:59:00'
 43 |   }
 44 |   withName:BuildAMRIndex {
 45 |       process.qos='normal'
 46 |       clusterOptions='--job-name=AMRindex%j --qos=normal --ntasks-per-node=12 --time=23:59:00'
 47 |   }
 48 |   withName:DedupReads {
 49 |       process.qos='normal'
 50 |       clusterOptions='--job-name=dedup%j --qos=normal  --partition=shas --ntasks-per-node=12 --time=23:59:00'
 51 |   }
 52 |   withName:AlignReadsToHost {
 53 |       process.time = '20:00:00'
 54 |       process.qos='normal'
 55 |       clusterOptions='--job-name=AlignHost%j --qos=normal --partition=shas --ntasks-per-node=12 --time=23:59:00'
 56 |   }
 57 |   withName:RemoveHostDNA  {
 58 |       process.qos='normal'
 59 |       clusterOptions='--job-name=RMHost%j --qos=normal --partition=shas --ntasks-per-node=12 --time=23:59:00'
 60 |   }
 61 |   withName:HostRemovalStats {
 62 |       process.qos='normal'
 63 |       clusterOptions='--job-name=hoststats%j --qos=normal --time=05:00:00'
 64 |   }
 65 |   withName:NonHostReads {
 66 |       process.qos='normal'
 67 |       clusterOptions='--job-name=BAMFastq%j --qos=normal --time=23:59:00'
 68 |   }
 69 |   withName:AlignDedupSNPToAMR {
 70 |       process.qos='normal'
 71 |       clusterOptions='--job-name=alignAMR%j --qos=normal --time=23:59:00'
 72 |   }
 73 |   withName:AlignToAMR {
 74 |       process.qos='normal'
 75 |       clusterOptions='--job-name=alignAMR%j --qos=normal --time=23:59:00'
 76 |   }
 77 |   withName:DedupRunResistome {
 78 |       process.qos='normal'
 79 |       clusterOptions='--job-name=resistome%j --qos=normal --partition=shas --ntasks-per-node=12 --time=23:59:00'
 80 |   }
 81 |   withName:RunResistome {
 82 |       process.qos='normal'
 83 |       clusterOptions='--job-name=resistome%j --qos=normal --partition=shas --ntasks-per-node=12 --time=23:59:00'
 84 |   }
 85 |   withName:RunFreebayes {
 86 |       process.qos='normal'
 87 |       clusterOptions='--job-name=freebayes%j --qos=normal --time=23:59:00'
 88 |   }
 89 |   withName:RunRarefaction {
 90 |       process.qos='normal'
 91 |       clusterOptions='--job-name=rarefaction%j --qos=normal --time=23:59:00 --ntasks-per-node=12'
 92 |   }
 93 |   withName:RunSNPFinder {
 94 |       process.qos='normal'
 95 |       clusterOptions='--job-name=SNPfinder%j --qos=normal --partition=shas --ntasks-per-node=12 --time=23:59:00'
 96 |   }
 97 |   withName:ResistomeResults {
 98 |       process.qos='normal'
 99 |       clusterOptions='--job-name=LtoWide%j --qos=normal --time=05:00:00'
100 |   }
101 |   withName:SNPAlignToAMR {
102 |       process.qos='normal'
103 |       clusterOptions='--job-name=SNPAlignToAMR%j --qos=normal --time=23:59:00'
104 |   }
105 |   withName:SNPRunResistome {
106 |       process.qos='normal'
107 |       clusterOptions='--job-name=SNPresistome%j --qos=normal --time=23:59:00'
108 |   }
109 |   withName:SNPRunRarefaction {
110 |       process.qos='normal'
111 |       clusterOptions='--job-name=SNPrarefaction%j --qos=normal --time=23:59:00'
112 |   }
113 |   withName:SNPconfirmation {
114 |       process.qos='normal'
115 |       clusterOptions='--job-name=SNPconfirmation%j --qos=normal --time=23:59:00'
116 |       module='jdk/1.8.0:singularity/2.5.2'
117 |   }
118 |   withName:SNPgene_alignment {
119 |       process.qos='normal'
120 |       clusterOptions='--job-name=SNPalignment%j --qos=normal --time=23:59:00'
121 |   }
122 |   withName:SNPRunFreebayes {
123 |       process.qos='normal'
124 |       clusterOptions='--job-name=SNPfreebayes%j --qos=normal --time=23:59:00'
125 |   }
126 |   withName:SNPRunSNPFinder {
127 |       process.qos='normal'
128 |       clusterOptions='--job-name=SNPsnpfinder%j --qos=normal --time=23:59:00'
129 |   }
130 |   withName:SNPResistomeResults {
131 |       process.qos='normal'
132 |       clusterOptions='--job-name=SNPLongToWide --qos=normal --time=5:00:00'
133 |   }
134 |   withName:DedupNonSNPResistomeResults {
135 |       process.qos='normal'
136 |       clusterOptions='--job-name=DedupNonSNPLongToWide --qos=normal --time=5:00:00'
137 |   }
138 |   withName:HMMResistomeResults {
139 |       process.qos='normal'
140 |       clusterOptions='--job-name=HMM_LongToWide --qos=normal --time=5:00:00'
141 |   }
142 |   withName:SamDedupRunResistome {
143 |       process.qos='normal'
144 |       clusterOptions='--job-name=SamDedupSNPresistome%j --qos=normal --time=23:59:00'
145 |   }
146 |   withName:SamDedupResistomeResults {
147 |       process.qos='normal'
148 |       clusterOptions='--job-name=SamDedup_LongToWide --qos=normal --time=5:00:00'
149 |   }
150 |   withName:Samtools_dedup_HMMcontig_count {
151 |       process.qos='normal'
152 |       clusterOptions='--job-name=SamDedupSNPresistome%j --qos=normal --time=23:59:00'
153 |   }
154 |   withName:Samtools_dedup_HMMResistomeResults {
155 |       process.qos='normal'
156 |       clusterOptions='--job-name=SamDedup_LongToWide --qos=normal --time=5:00:00'
157 |   }
158 |   withName:ExtractSNP {
159 |       process.qos='normal'
160 |       clusterOptions='--job-name=ExtractSNP%j --qos=normal --time=23:59:00'
161 |   }
162 |   withName:RunRGI {
163 |       process.qos='normal'
164 |       container = 'shub://EnriqueDoster/bioinformatic-nextflow-pipelines:rgi'
165 |       clusterOptions='--job-name=RunRGI%j --qos=normal --time=23:59:00'
166 |   }
167 |   withName:SNPconfirmation {
168 |       process.qos='normal'
169 |       clusterOptions='--job-name=SNPconfirmation --qos=normal --time=23:59:00'
170 |   }
171 |   withName:Confirmed_AMR_hits {
172 |       process.qos='normal'
173 |       clusterOptions='--job-name=Confirmed_AMR_hits --qos=normal --time=23:59:00'
174 |   }
175 |   withName:Confirmed_ResistomeResults {
176 |       process.qos='normal'
177 |       clusterOptions='--job-name=Confirmed_ResistomeResults --qos=normal --time=23:59:00'
178 |   }
179 |   withName:ExtractDedupSNP {
180 |       process.qos='normal'
181 |       clusterOptions='--job-name=ExtractDedupSNP --qos=normal --time=23:59:00'
182 |   }
183 |   withName:RunDedupRGI {
184 |       process.qos='normal'
185 |       container = 'shub://EnriqueDoster/bioinformatic-nextflow-pipelines:rgi'
186 |       clusterOptions='--job-name=RunDedupRGI --qos=normal --time=23:59:00'
187 |   }
188 |   withName:DedupSNPconfirmation {
189 |       process.qos='normal'
190 |       clusterOptions='--job-name=DedupSNPconfirmation --qos=normal --time=23:59:00'
191 |   }
192 |   withName:ConfirmDedupAMRHits {
193 |       process.qos='normal'
194 |       clusterOptions='--job-name=ConfirmDedupAMRHits --qos=normal --time=23:59:00'
195 |   }
196 |   withName:DedupSNPConfirmed_ResistomeResults {
197 |       process.qos='normal'
198 |       clusterOptions='--job-name=DedupSNPConfirmed_ResistomeResults --qos=normal --time=23:59:00'
199 |   }
200 | }
201 | 


--------------------------------------------------------------------------------
/containers/Singularity:
--------------------------------------------------------------------------------
 1 | Bootstrap: docker
 2 | From: debian:jessie-slim
 3 | 
 4 | #Includes trimmomatic, samtools, bwa, bedtools, vcftools, htslib,  kraken2, SNPfinder, freebayes, bbmap
 5 | 
 6 | %environment
 7 |     export LC_ALL=C
 8 | 
 9 | %post
10 |     apt update \
11 |     && apt install -y --no-install-recommends \
12 |     build-essential ca-certificates sudo tcsh\
13 |     git make automake autoconf openjdk-7-jre wget gzip unzip sed\
14 |     zlib1g-dev curl libbz2-dev locales libncurses5-dev liblzma-dev libcurl4-openssl-dev software-properties-common apt-transport-https\
15 |     python3-pip python3-docopt python3-pytest python-dev python3-dev\
16 |     libcurl4-openssl-dev libssl-dev zlib1g-dev fonts-texgyre \
17 |     gcc g++ gfortran libblas-dev liblapack-dev dos2unix libstdc++6\
18 |     r-base-core r-recommended hmmer\
19 |     && rm -rf /var/lib/apt/lists/*
20 | 
21 | 
22 |     wget -c https://repo.continuum.io/archive/Anaconda3-2020.02-Linux-x86_64.sh
23 |     sh Anaconda3-2020.02-Linux-x86_64.sh -bfp /usr/local
24 | 
25 |     # add bioconda channels
26 |     conda config --add channels defaults
27 |     conda config --add channels conda-forge
28 |     conda config --add channels bioconda
29 | 
30 |     # install bulk of bioinformatic tools using conda
31 |     conda create -n AmrPlusPlus_env python=3 trimmomatic bwa samtools bedtools freebayes bbmap vcftools htslib kraken2
32 | 
33 |     . /usr/local/bin/activate AmrPlusPlus_env
34 |     
35 |     #ln -s /usr/local/envs/AmrPlusPlus_env/bin/* /usr/local/bin/
36 |     
37 |     #Still experimenting with how to change $PATH location. 
38 |     echo 'export PATH=$PATH:/usr/local/envs/AmrPlusPlus_env/bin/' >> $SINGULARITY_ENVIRONMENT
39 | 
40 |     # SNPfinder
41 |     cd /usr/local
42 |     git clone https://github.com/cdeanj/snpfinder.git
43 |     cd snpfinder
44 |     make
45 |     cp snpfinder /usr/local/bin
46 |     cd /
47 | 
48 |     # Make sure all the tools have the right permissions to use the tools
49 |     chmod -R 777 /usr/local/
50 |     
51 | %test
52 | 
53 | 


--------------------------------------------------------------------------------
/containers/Singularity.RGI:
--------------------------------------------------------------------------------
 1 | Bootstrap: docker
 2 | From: debian:jessie-slim
 3 | 
 4 | #Includes Resistance Gene Identifier (RGI)
 5 | 
 6 | %environment
 7 |     export LC_ALL=C
 8 | 
 9 | %post
10 |     apt update \
11 |     && apt install -y --no-install-recommends \
12 |     build-essential ca-certificates sudo tcsh\
13 |     git make automake autoconf openjdk-7-jre wget gzip unzip sed\
14 |     zlib1g-dev curl libbz2-dev locales libncurses5-dev liblzma-dev libcurl4-openssl-dev software-properties-common apt-transport-https\
15 |     python3-pip python3-docopt python3-pytest python-dev python3-dev\
16 |     libcurl4-openssl-dev libssl-dev zlib1g-dev fonts-texgyre \
17 |     gcc g++ gfortran libblas-dev liblapack-dev dos2unix libstdc++6\
18 |     && rm -rf /var/lib/apt/lists/*
19 | 
20 | 
21 |     wget -c https://repo.continuum.io/archive/Anaconda3-2020.02-Linux-x86_64.sh
22 |     sh Anaconda3-2020.02-Linux-x86_64.sh -bfp /usr/local
23 | 
24 |     # add bioconda channels
25 |     conda config --add channels defaults
26 |     conda config --add channels conda-forge
27 |     conda config --add channels bioconda
28 | 
29 |     # install bulk of bioinformatic tools using conda
30 |     conda create -n AmrPlusPlus_env rgi
31 | 
32 |     . /usr/local/bin/activate AmrPlusPlus_env
33 |       
34 |     #change $PATH  
35 |     echo 'export PATH=/usr/local/envs/AmrPlusPlus_env/bin/:$PATH' >> $SINGULARITY_ENVIRONMENT
36 | 
37 | 
38 |     # Make sure all the tools have the right permissions to use the tools
39 |      chmod -R 777 /usr/local/
40 |      
41 |      # This downloads the latest CARD database and attempts to load it for RGI
42 |      # Doesn't seem to work due to the github [RGI issue #60](https://github.com/arpcard/rgi/issues/60)
43 |      #wget -q -O card-data.tar.bz2 https://card.mcmaster.ca/latest/data && tar xfvj card-data.tar.bz2 
44 |      #/usr/local/envs/AmrPlusPlus_env/bin/rgi load -i card.json
45 | 
46 | %test
47 | 
48 | 
49 | 


--------------------------------------------------------------------------------
/data/HMM.tar.xz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/meglab-metagenomics/amrplusplus_v2/f8e6f9427e7ed63c0f1ac901ebbe4eddd5053ed2/data/HMM.tar.xz


--------------------------------------------------------------------------------
/data/adapters/nextera.fa:
--------------------------------------------------------------------------------
  1 | >TruSeqR2_nextera
  2 | CTGTCTCTTATACACATCTCCGAGCCCACGAGACTAAGGCGAATCTCGTATGCCGTCTTCTGCTTGAAAA
  3 | >nextera_R2_right_side_adapter
  4 | CTGTCTCTTATACACATCTGACGCTGCCGACGAGCGATCTAGTGTAGATCTCGGTGGTCGCCGTATCATTAAAA
  5 | >I5_Nextera_Transposase_1
  6 | CTGTCTCTTATACACATCTGACGCTGCCGACGA
  7 | >I7_Nextera_Transposase_1
  8 | CTGTCTCTTATACACATCTCCGAGCCCACGAGAC
  9 | >I5_Nextera_Transposase_2
 10 | CTGTCTCTTATACACATCTCTGATGGCGCGAGGGAGGC
 11 | >I7_Nextera_Transposase_2
 12 | CTGTCTCTTATACACATCTCTGAGCGGGCTGGCAAGGC
 13 | >I5_Primer_Nextera_XT_and_Nextera_Enrichment_[N/S/E]501
 14 | GACGCTGCCGACGAGCGATCTAGTGTAGATCTCGGTGGTCGCCGTATCATT
 15 | >I5_Primer_Nextera_XT_and_Nextera_Enrichment_[N/S/E]502
 16 | GACGCTGCCGACGAATAGAGAGGTGTAGATCTCGGTGGTCGCCGTATCATT
 17 | >I5_Primer_Nextera_XT_and_Nextera_Enrichment_[N/S/E]503
 18 | GACGCTGCCGACGAAGAGGATAGTGTAGATCTCGGTGGTCGCCGTATCATT
 19 | >I5_Primer_Nextera_XT_and_Nextera_Enrichment_[N/S/E]504
 20 | GACGCTGCCGACGATCTACTCTGTGTAGATCTCGGTGGTCGCCGTATCATT
 21 | >I5_Primer_Nextera_XT_and_Nextera_Enrichment_[N/S/E]505
 22 | GACGCTGCCGACGACTCCTTACGTGTAGATCTCGGTGGTCGCCGTATCATT
 23 | >I5_Primer_Nextera_XT_and_Nextera_Enrichment_[N/S/E]506
 24 | GACGCTGCCGACGATATGCAGTGTGTAGATCTCGGTGGTCGCCGTATCATT
 25 | >I5_Primer_Nextera_XT_and_Nextera_Enrichment_[N/S/E]507
 26 | GACGCTGCCGACGATACTCCTTGTGTAGATCTCGGTGGTCGCCGTATCATT
 27 | >I5_Primer_Nextera_XT_and_Nextera_Enrichment_[N/S/E]508
 28 | GACGCTGCCGACGAAGGCTTAGGTGTAGATCTCGGTGGTCGCCGTATCATT
 29 | >I5_Primer_Nextera_XT_and_Nextera_Enrichment_[N/S/E]517
 30 | GACGCTGCCGACGATCTTACGCGTGTAGATCTCGGTGGTCGCCGTATCATT
 31 | >I7_Primer_Nextera_XT_and_Nextera_Enrichment_N701
 32 | CCGAGCCCACGAGACTAAGGCGAATCTCGTATGCCGTCTTCTGCTTG
 33 | >I7_Primer_Nextera_XT_and_Nextera_Enrichment_N702
 34 | CCGAGCCCACGAGACCGTACTAGATCTCGTATGCCGTCTTCTGCTTG
 35 | >I7_Primer_Nextera_XT_and_Nextera_Enrichment_N703
 36 | CCGAGCCCACGAGACAGGCAGAAATCTCGTATGCCGTCTTCTGCTTG
 37 | >I7_Primer_Nextera_XT_and_Nextera_Enrichment_N704
 38 | CCGAGCCCACGAGACTCCTGAGCATCTCGTATGCCGTCTTCTGCTTG
 39 | >I7_Primer_Nextera_XT_and_Nextera_Enrichment_N705
 40 | CCGAGCCCACGAGACGGACTCCTATCTCGTATGCCGTCTTCTGCTTG
 41 | >I7_Primer_Nextera_XT_and_Nextera_Enrichment_N706
 42 | CCGAGCCCACGAGACTAGGCATGATCTCGTATGCCGTCTTCTGCTTG
 43 | >I7_Primer_Nextera_XT_and_Nextera_Enrichment_N707
 44 | CCGAGCCCACGAGACCTCTCTACATCTCGTATGCCGTCTTCTGCTTG
 45 | >I7_Primer_Nextera_XT_and_Nextera_Enrichment_N708
 46 | CCGAGCCCACGAGACCAGAGAGGATCTCGTATGCCGTCTTCTGCTTG
 47 | >I7_Primer_Nextera_XT_and_Nextera_Enrichment_N709
 48 | CCGAGCCCACGAGACGCTACGCTATCTCGTATGCCGTCTTCTGCTTG
 49 | >I7_Primer_Nextera_XT_and_Nextera_Enrichment_N710
 50 | CCGAGCCCACGAGACCGAGGCTGATCTCGTATGCCGTCTTCTGCTTG
 51 | >I7_Primer_Nextera_XT_and_Nextera_Enrichment_N711
 52 | CCGAGCCCACGAGACAAGAGGCAATCTCGTATGCCGTCTTCTGCTTG
 53 | >I7_Primer_Nextera_XT_and_Nextera_Enrichment_N712
 54 | CCGAGCCCACGAGACGTAGAGGAATCTCGTATGCCGTCTTCTGCTTG
 55 | >I5_Primer_Nextera_XT_Index_Kit_v2_S502
 56 | GACGCTGCCGACGAATAGAGAGGTGTAGATCTCGGTGGTCGCCGTATCATT
 57 | >I5_Primer_Nextera_XT_Index_Kit_v2_S503
 58 | GACGCTGCCGACGAAGAGGATAGTGTAGATCTCGGTGGTCGCCGTATCATT
 59 | >I5_Primer_Nextera_XT_Index_Kit_v2_S505
 60 | GACGCTGCCGACGACTCCTTACGTGTAGATCTCGGTGGTCGCCGTATCATT
 61 | >I5_Primer_Nextera_XT_Index_Kit_v2_S506
 62 | GACGCTGCCGACGATATGCAGTGTGTAGATCTCGGTGGTCGCCGTATCATT
 63 | >I5_Primer_Nextera_XT_Index_Kit_v2_S507
 64 | GACGCTGCCGACGATACTCCTTGTGTAGATCTCGGTGGTCGCCGTATCATT
 65 | >I5_Primer_Nextera_XT_Index_Kit_v2_S508
 66 | GACGCTGCCGACGAAGGCTTAGGTGTAGATCTCGGTGGTCGCCGTATCATT
 67 | >I5_Primer_Nextera_XT_Index_Kit_v2_S510
 68 | GACGCTGCCGACGAATTAGACGGTGTAGATCTCGGTGGTCGCCGTATCATT
 69 | >I5_Primer_Nextera_XT_Index_Kit_v2_S511
 70 | GACGCTGCCGACGACGGAGAGAGTGTAGATCTCGGTGGTCGCCGTATCATT
 71 | >I5_Primer_Nextera_XT_Index_Kit_v2_S513
 72 | GACGCTGCCGACGACTAGTCGAGTGTAGATCTCGGTGGTCGCCGTATCATT
 73 | >I5_Primer_Nextera_XT_Index_Kit_v2_S515
 74 | GACGCTGCCGACGAAGCTAGAAGTGTAGATCTCGGTGGTCGCCGTATCATT
 75 | >I5_Primer_Nextera_XT_Index_Kit_v2_S516
 76 | GACGCTGCCGACGAACTCTAGGGTGTAGATCTCGGTGGTCGCCGTATCATT
 77 | >I5_Primer_Nextera_XT_Index_Kit_v2_S517
 78 | GACGCTGCCGACGATCTTACGCGTGTAGATCTCGGTGGTCGCCGTATCATT
 79 | >I5_Primer_Nextera_XT_Index_Kit_v2_S518
 80 | GACGCTGCCGACGACTTAATAGGTGTAGATCTCGGTGGTCGCCGTATCATT
 81 | >I5_Primer_Nextera_XT_Index_Kit_v2_S520
 82 | GACGCTGCCGACGAATAGCCTTGTGTAGATCTCGGTGGTCGCCGTATCATT
 83 | >I5_Primer_Nextera_XT_Index_Kit_v2_S521
 84 | GACGCTGCCGACGATAAGGCTCGTGTAGATCTCGGTGGTCGCCGTATCATT
 85 | >I5_Primer_Nextera_XT_Index_Kit_v2_S522
 86 | GACGCTGCCGACGATCGCATAAGTGTAGATCTCGGTGGTCGCCGTATCATT
 87 | >I7_Primer_Nextera_XT_Index_Kit_v2_N701
 88 | CCGAGCCCACGAGACTAAGGCGAATCTCGTATGCCGTCTTCTGCTTG
 89 | >I7_Primer_Nextera_XT_Index_Kit_v2_N702
 90 | CCGAGCCCACGAGACCGTACTAGATCTCGTATGCCGTCTTCTGCTTG
 91 | >I7_Primer_Nextera_XT_Index_Kit_v2_N703
 92 | CCGAGCCCACGAGACAGGCAGAAATCTCGTATGCCGTCTTCTGCTTG
 93 | >I7_Primer_Nextera_XT_Index_Kit_v2_N704
 94 | CCGAGCCCACGAGACTCCTGAGCATCTCGTATGCCGTCTTCTGCTTG
 95 | >I7_Primer_Nextera_XT_Index_Kit_v2_N705
 96 | CCGAGCCCACGAGACGGACTCCTATCTCGTATGCCGTCTTCTGCTTG
 97 | >I7_Primer_Nextera_XT_Index_Kit_v2_N706
 98 | CCGAGCCCACGAGACTAGGCATGATCTCGTATGCCGTCTTCTGCTTG
 99 | >I7_Primer_Nextera_XT_Index_Kit_v2_N707
100 | CCGAGCCCACGAGACCTCTCTACATCTCGTATGCCGTCTTCTGCTTG
101 | >I7_Primer_Nextera_XT_Index_Kit_v2_N710
102 | CCGAGCCCACGAGACCGAGGCTGATCTCGTATGCCGTCTTCTGCTTG
103 | >I7_Primer_Nextera_XT_Index_Kit_v2_N711
104 | CCGAGCCCACGAGACAAGAGGCAATCTCGTATGCCGTCTTCTGCTTG
105 | >I7_Primer_Nextera_XT_Index_Kit_v2_N712
106 | CCGAGCCCACGAGACGTAGAGGAATCTCGTATGCCGTCTTCTGCTTG
107 | >I7_Primer_Nextera_XT_Index_Kit_v2_N714
108 | CCGAGCCCACGAGACGCTCATGAATCTCGTATGCCGTCTTCTGCTTG
109 | >I7_Primer_Nextera_XT_Index_Kit_v2_N715
110 | CCGAGCCCACGAGACATCTCAGGATCTCGTATGCCGTCTTCTGCTTG
111 | >I7_Primer_Nextera_XT_Index_Kit_v2_N716
112 | CCGAGCCCACGAGACACTCGCTAATCTCGTATGCCGTCTTCTGCTTG
113 | >I7_Primer_Nextera_XT_Index_Kit_v2_N718
114 | CCGAGCCCACGAGACGGAGCTACATCTCGTATGCCGTCTTCTGCTTG
115 | >I7_Primer_Nextera_XT_Index_Kit_v2_N719
116 | CCGAGCCCACGAGACGCGTAGTAATCTCGTATGCCGTCTTCTGCTTG
117 | >I7_Primer_Nextera_XT_Index_Kit_v2_N720
118 | CCGAGCCCACGAGACCGGAGCCTATCTCGTATGCCGTCTTCTGCTTG
119 | >I7_Primer_Nextera_XT_Index_Kit_v2_N721
120 | CCGAGCCCACGAGACTACGCTGCATCTCGTATGCCGTCTTCTGCTTG
121 | >I7_Primer_Nextera_XT_Index_Kit_v2_N722
122 | CCGAGCCCACGAGACATGCGCAGATCTCGTATGCCGTCTTCTGCTTG
123 | >I7_Primer_Nextera_XT_Index_Kit_v2_N723
124 | CCGAGCCCACGAGACTAGCGCTCATCTCGTATGCCGTCTTCTGCTTG
125 | >I7_Primer_Nextera_XT_Index_Kit_v2_N724
126 | CCGAGCCCACGAGACACTGAGCGATCTCGTATGCCGTCTTCTGCTTG
127 | >I7_Primer_Nextera_XT_Index_Kit_v2_N726
128 | CCGAGCCCACGAGACCCTAAGACATCTCGTATGCCGTCTTCTGCTTG
129 | >I7_Primer_Nextera_XT_Index_Kit_v2_N727
130 | CCGAGCCCACGAGACCGATCAGTATCTCGTATGCCGTCTTCTGCTTG
131 | >I7_Primer_Nextera_XT_Index_Kit_v2_N728
132 | CCGAGCCCACGAGACTGCAGCTAATCTCGTATGCCGTCTTCTGCTTG
133 | >I7_Primer_Nextera_XT_Index_Kit_v2_N729
134 | CCGAGCCCACGAGACTCGACGTCATCTCGTATGCCGTCTTCTGCTTG
135 | >I5_Adapter_Nextera
136 | CTGATGGCGCGAGGGAGGCGTGTAGATCTCGGTGGTCGCCGTATCATT
137 | >I7_Adapter_Nextera_No_Barcode
138 | CTGAGCGGGCTGGCAAGGCAGACCGATCTCGTATGCCGTCTTCTGCTTG
139 | >Nextera_LMP_Read1_External_Adapter
140 | GATCGGAAGAGCACACGTCTGAACTCCAGTCAC
141 | >Nextera_LMP_Read2_External_Adapter
142 | GATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT
143 | 


--------------------------------------------------------------------------------
/data/host/chr21.fasta.gz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/meglab-metagenomics/amrplusplus_v2/f8e6f9427e7ed63c0f1ac901ebbe4eddd5053ed2/data/host/chr21.fasta.gz


--------------------------------------------------------------------------------
/data/raw/S1_test_R1.fastq.gz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/meglab-metagenomics/amrplusplus_v2/f8e6f9427e7ed63c0f1ac901ebbe4eddd5053ed2/data/raw/S1_test_R1.fastq.gz


--------------------------------------------------------------------------------
/data/raw/S1_test_R2.fastq.gz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/meglab-metagenomics/amrplusplus_v2/f8e6f9427e7ed63c0f1ac901ebbe4eddd5053ed2/data/raw/S1_test_R2.fastq.gz


--------------------------------------------------------------------------------
/data/raw/S2_test_R1.fastq.gz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/meglab-metagenomics/amrplusplus_v2/f8e6f9427e7ed63c0f1ac901ebbe4eddd5053ed2/data/raw/S2_test_R1.fastq.gz


--------------------------------------------------------------------------------
/data/raw/S2_test_R2.fastq.gz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/meglab-metagenomics/amrplusplus_v2/f8e6f9427e7ed63c0f1ac901ebbe4eddd5053ed2/data/raw/S2_test_R2.fastq.gz


--------------------------------------------------------------------------------
/data/raw/S3_test_R1.fastq.gz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/meglab-metagenomics/amrplusplus_v2/f8e6f9427e7ed63c0f1ac901ebbe4eddd5053ed2/data/raw/S3_test_R1.fastq.gz


--------------------------------------------------------------------------------
/data/raw/S3_test_R2.fastq.gz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/meglab-metagenomics/amrplusplus_v2/f8e6f9427e7ed63c0f1ac901ebbe4eddd5053ed2/data/raw/S3_test_R2.fastq.gz


--------------------------------------------------------------------------------
/docs/AmrPlusPlus_Pipeline_workflow.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/meglab-metagenomics/amrplusplus_v2/f8e6f9427e7ed63c0f1ac901ebbe4eddd5053ed2/docs/AmrPlusPlus_Pipeline_workflow.pdf


--------------------------------------------------------------------------------
/docs/CHANGELOG.md:
--------------------------------------------------------------------------------
 1 | Details on AMR++ updates
 2 | ------------
 3 | 
 4 | ## 2020-05-21 : AMR++ v2.0.2 update
 5 | Fixed a mistake with the config/singularity.config file to correctly call the singularity container anytime that RGI is run.
 6 | 
 7 | ## 2020-05-21 : AMR++ v2.0.1 update
 8 | We identified issues with running RGI thanks to github users, AroArz and DiegoBrambilla. As of this update, RGI developers are focused on contributing to the COVID-19 response, so we plan to reconvene with them when their schedule opens up. In the meantime, we are releasing updates to continue AMR++ functionality.
 9 | We found that the errors were associated with RGI bugs that were previously reported:
10 | * In [RGI issue #93](https://github.com/arpcard/rgi/issues/93), the github user, mahesh-panchal, reported that you need to run the "rgi main" command twice on the same dataset for it to successfully complete the analysis. The RGI component of AMR++ has been updated to work for now, but we plan for further changes to clean up the code.
11 | * In [RGI issue #60](https://github.com/arpcard/rgi/issues/60), caspargross reported issues with containerizing RGI due to requirements for a "writable file system". As a temporary fix, we updated AMR++ code so that the user has to download the CARD database locally and use an additional flag to specify the location of the local database.
12 | * Errors in running RGI will now be "ignored" so that the pipeline continues running but still provides any temporary files created with RGI. This should allow you to troubleshoot on your own or run any additional analysis using the reads aligning to gene accessions that "RequireSNPConfirmation"
13 | 
14 | Other updates:
15 | * minor fixes to singularity/slurm configuration files
16 | * updated the "resistome" and "rarefaction" code. It is now included in the "bin" directory.
17 | * updated the script for downloading the latest version of mini-kraken
18 | * created new singularity container just for the RGI software
19 | * output zipped nonhost files directly
20 | 


--------------------------------------------------------------------------------
/docs/FAQs.md:
--------------------------------------------------------------------------------
 1 | Troubleshooting and frequently asked questions (FAQs)
 2 | ------------
 3 | 
 4 | Many errors that may be encountered may ultimately be the result of user error. If you encounter an error message any time that this pipeline is used, carefully check the command you used for any spelling errors. Additionally, many of these error messages give some detail as too where the code is wrong. Here are a few common errors and our suggestions for basic troubleshooting.
 5 | 
 6 | * Are you using the correct "profile" to run AmrPlusPlus?
 7 |   * We provide many examples of profile configurationg and choosing the correct one depends on your computing environment.
 8 |     * If you have singularity installed on your server, we recommend using the "singularity" profile to avoid the installation of any additional tools. 
 9 |     * If you already have the tools installed on your server, the best option is to configure the local.config file to point to the absolute PATH to each too.
10 | 
11 | * Are the right user permissions are granted to the file/directory/server in which you are going to run the pipeline?
12 |   * In servers with multiple users, there are often cases in which certain directories give some users more editing privileges than others. Start by navigating to the directory in which you will be working. Next, type “ls -lha or ls -l”. This produces a list of all files in that directory and info on what permissions the user has using the “-rwxrwxrwx” scheme; r = read permissions, w = writing permissions, and x = execute permissions).
13 |   * Permission errors could be due to the directories chosen for the pipeline output or individual bioinformatic tools installed by other users, for example. 
14 |   * Review this tutorial for more information regarding file permissions: https://www.guru99.com/file-permissions.html
15 | 
16 | 
17 | 


--------------------------------------------------------------------------------
/docs/accessing_AMR++.md:
--------------------------------------------------------------------------------
 1 | Accessing AMR++
 2 | ------------
 3 | 
 4 | This section will help you get access to all the bioinformatic tools required for metagenomic analysis with AMR++.
 5 | 
 6 | Amazon Web Services
 7 | -----
 8 | 
 9 | In order to facilitate evaluation of the MEGARes 2.0 database and the functionality of AMR++ 2.0 pipeline, we have provided free access to an Amazon Machine Image (AMI) with example files for analysis. AMR++ 2.0 is pre-installed and fully integrated with all necessary bioinformatic tools and dependencies within an AMI named "Microbial_Ecology_Group_AMR_AMI", allowing users to easily employ the AMR++ v2.0 pipeline within the Amazon Web Services (AWS) ecosystem. Please follow the instructions on amazon web services for details on creating your own EC2 instance (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html). With this approach, users pay for the cost of a suitable AWS EC2 instance without the challenge of accessing large computing clusters and individually installing each piece of software necessary to run the pipeline (including all dependencies). Integration within AWS also allows users to scale the computing resources to fit the needs of any project size.
10 | 
11 | Singularity container
12 | -----------------
13 | 
14 | Singularity containers allow the packaging of multiple bioinformatic tools. While singularity is a popular tool and likely to be supported by many computing clusters, please contact your system administrator for help with installing singularity. Installation on a local computer is also an option and can be performed by following these instructions: https://sylabs.io/guides/3.0/user-guide/installation.html
15 | 
16 | We provide AMR++ with a singularity container that is automatically accessed when running the AMR++ pipeline by using the flag, "-profile singularity". Additionally, the singularity container is supported on singularity-hub.org and can be used locally for custom analysis (https://singularity-hub.org/collections/3418). P
17 | 
18 | ```bash
19 | # Choose your preference to pull the container from Singularity Hub (once)
20 | $ singularity pull shub://meglab-metagenomics/amrplusplus_v2
21 | 
22 | # Then interact with it (enter "exit" to leave the singularity container):
23 | $ singularity shell amrplusplus_v2.sif
24 | 
25 | ```
26 | 
27 | 
28 | 


--------------------------------------------------------------------------------
/docs/configuration.md:
--------------------------------------------------------------------------------
  1 | Configuration
  2 | -------------
  3 | 
  4 | The pipeline source code comes with a configuration file that can be used to set environment variables or default command-line options. Setting these variables before hand may be useful in situations when you do not want to specify a long list of options from the command line. This configuration file can be found in the root source code directory and is called **nextflow.config**. You can modify this file, save the changes, and run the pipeline directly.
  5 | 
  6 | 
  7 | Customize Environment Variables using profiles
  8 | ----------------------------------------------
  9 | 
 10 | The **nextflow.config** contains a section that allows the use of environment "profiles" when running AmrPlusPlus. Further information for each profile can be found within the /config directory. In brief, profiles allow control over how the pipeline is run on different computing clusters. We recommend the "singularity" profile which employs a Singularity container  with all the required bioinformatic tools.
 11 | 
 12 | 
 13 | ```bash
 14 | profiles {
 15 |   local {
 16 |     includeConfig "config/local.config"
 17 |   }
 18 |   local_angus {
 19 |     includeConfig "config/local_angus.config"
 20 |   }
 21 |   local_MSI {
 22 |     includeConfig "config/local_MSI.config"
 23 |   }
 24 |   slurm {
 25 |     process.executor = 'slurm'
 26 |     includeConfig "config/slurm.config"
 27 |     process.container = 'shub://meglab-metagenomics/amrplusplus_v2'
 28 |   }
 29 |   singularity {
 30 |     includeConfig "config/singularity.config"
 31 |     process.container = 'shub://meglab-metagenomics/amrplusplus_v2'
 32 |   }
 33 | }
 34 | ```
 35 | 
 36 | Customize Command-line Options
 37 | ------------------------------
 38 | 
 39 | The params section allows you to set the different commmand-line options that can be used within the pipeline. Here, you can specify input/output options, trimming options, and algorithm options.
 40 | 
 41 | If you intend to run multiple samples in parallel, you must specify a glob pattern for your sequence data as shown for the **reads** parameter. For more information on globs, please see this related [article](https://en.wikipedia.org/wiki/Glob_(programming)).
 42 | 
 43 | 
 44 | By default, the pipeline uses the default minikraken database (~4GB) to classify and assign taxonomic labels to your sequences. As Kraken loads this database into memory, this mini database is particularly useful for people who do not have access to large memory servers. We provide a script to easily download the minikraken database.
 45 | 
 46 | > sh download_minikraken.sh
 47 | 
 48 | If you would like to use a custom database or the standard Kraken database (~160GB), you will need to build it yourself and modify the **kraken_db** environment variable in the nextflow.config file to point to its location on your machine. 
 49 | 
 50 | 
 51 | ```bash
 52 | params {
 53 |     /* Location of forward and reverse read pairs */
 54 |     reads = "data/raw/*_{1,2}.fastq.gz"
 55 | 
 56 |     /* Location of adapter sequences */
 57 |     adapters = "data/adapters/nextera.fa"
 58 | 
 59 |     /* Location of tab delimited adapter sequences */
 60 |     fqc_adapters = "data/adapters/nextera.tab"
 61 | 
 62 |     /* Location of host genome index files */
 63 |     host_index = ""
 64 | 
 65 |     /* Location of host genome */
 66 |     host = "data/host/chr21.fasta.gz"
 67 |     
 68 |     /* Kraken database location, default is "none" */   
 69 |     kraken_db = "minikraken2_v2_8GB_201904_UPDATE"
 70 | 
 71 |     /* Location of amr index files */
 72 |     amr_index = ""
 73 | 
 74 |     /* Location of antimicrobial resistance (MEGARes) database */
 75 |     amr = "data/amr/megares_database_v1.02.fasta"
 76 | 
 77 |     /* Location of amr annotation file */
 78 |     annotation = "data/amr/megares_annotations_v1.02.csv"
 79 | 
 80 |     /* Location of SNP metadata */
 81 |     snp_annotation = "data/amr/snp_location_metadata.csv"
 82 | 
 83 |     /* Location of SNP confirmation script */
 84 |     snp_confirmation = "bin/snp_confirmation.py"
 85 | 
 86 |     /* Output directory */
 87 |     output = "test_results"
 88 | 
 89 |     /* Number of threads */
 90 |     threads = 10
 91 |     smem_threads = 12
 92 | 
 93 |     /* Trimmomatic trimming parameters */
 94 |     leading = 10
 95 |     trailing = 3
 96 |     slidingwindow = "4:15"
 97 |     minlen = 36
 98 | 
 99 |     /* Resistome threshold */
100 |     threshold = 80
101 | 
102 |     /* Starting rarefaction level */
103 |     min = 5
104 | 
105 |     /* Ending rarefaction level */
106 |     max = 100
107 | 
108 |     /* Number of levels to skip */
109 |     skip = 5
110 | 
111 |     /* Number of iterations to sample at */
112 |     samples = 1
113 | 
114 |     /* Display help message */
115 |     help = false
116 | }
117 | ```
118 | 


--------------------------------------------------------------------------------
/docs/contact.md:
--------------------------------------------------------------------------------
1 | Contact
2 | -------
3 | 
4 | Questions, comments, or feature requests can be directed to meglab.metagenomics@gmail.com.
5 | 
6 | View our website for further information:
7 | http://megares.meglab.org/
8 | 


--------------------------------------------------------------------------------
/docs/dependencies.md:
--------------------------------------------------------------------------------
 1 | Dependencies
 2 | ------------
 3 | 
 4 | AmrPlusPlus uses a variety of open-source tools. The tools used, descriptions, and version specifics are provided below.
 5 | 
 6 | ### Bedtools
 7 |   - Description: Bedtools is a suite of tools that can be used to compute and extract useful information from BAM, BED, and BCF files.
 8 |   - Version: 2.28.0
 9 |   - DOI: https://doi.org/10.1093/bioinformatics/btq033
10 | 
11 | ### BWA
12 |   - Description: BWA is a short and long read sequence aligner for aligning raw sequence data to a reference genome.
13 |   - Version: 0.7.17
14 |   - DOI: https://doi.org/10.1093/bioinformatics/btp324
15 | 
16 | ### Kraken2
17 |   - Description: Kraken is a fast taxonomic sequence classifier that assigns taxonomy labels to short-reads.
18 |   - Version: 2.0.8
19 |   - DOI: https://doi.org/10.1186/gb-2014-15-3-r46
20 | 
21 | ### RarefactionAnalyzer
22 |   - Description: RarefactionAnalyzer is a tool that can be used for performing rarefaction analysis.
23 |   - Version: 0.0.0
24 |   - DOI: https://doi.org/10.1093/nar/gkw1009
25 | 
26 | ### ResistomeAnalyzer
27 |   - Description: ResistomeAnalyzer is a tool for analyzing the resistome of large metagenomic datasets.
28 |   - Version: 0.0.0
29 |   - DOI: https://doi.org/10.1093/nar/gkw1009
30 |   
31 | ### Samtools
32 |   - Description: Samtools is a program for manipulating and extracting useful information from alignment files in SAM or BAM format.
33 |   - Version: 1.9
34 |   - DOI: https://doi.org/10.1093/bioinformatics/btp352
35 | 
36 | ### SNPFinder
37 |   - Description: SNPFinder is a haplotype variant caller that can be used for metagenomics datasets.
38 |   - Version: 0.0.0
39 |   - DOI: https://doi.org/10.1093/nar/gkw1009
40 | 
41 | ### Trimmomatic
42 |   - Description: Trimmomatic is a tool for removing low quality base pairs (bps) and adapter sequences from raw sequence data.
43 |   - Version: 0.39
44 |   - DOI: https://doi.org/10.1093/bioinformatics/btu170
45 |   
46 | ### Freebayes
47 |   - Description: Trimmomatic is a tool for removing low quality base pairs (bps) and adapter sequences from raw sequence data.
48 |   - Version: 1.3.1
49 |   - https://arxiv.org/abs/1207.3907v2
50 |   
51 | ### Resistance Gene Identifier
52 |    - Description: Trimmomatic is a tool for removing low quality base pairs (bps) and adapter sequences from raw sequence data.
53 |   - Version: 0.39
54 |   - https://card.mcmaster.ca/analyze/rgi
55 |   
56 | 
57 |   
58 | 


--------------------------------------------------------------------------------
/docs/installation.md:
--------------------------------------------------------------------------------
 1 | Installation
 2 | ------------
 3 | 
 4 | This section will help you get started with running the AmrPlusPlus pipeline with Nextflow and Docker. This tutorial assumes you will be running the pipeline from a POSIX compatible system such as Linux, Solaris, or OS X.
 5 | 
 6 | Setup
 7 | -----
 8 | 
 9 | We will go over a typical pipeline setup scenario in which you connect to a remote server, install Nextflow, and download the pipeline source code. For the easist use of AmrPlusPlus, make sure that Singularity is installed and in your $PATH variable. 
10 | Visit this website for further information:
11 | https://singularity.lbl.gov/docs-installation
12 | 
13 | If Singularity cannot be installed, configure the "config/local.config" file to specify the absolute PATH to each required bioinformatic tool. Then, change the flag after "-profile" to "local" when running the pipeline.
14 | 
15 | ```bash
16 | # username and host address
17 | $ ssh [USER]@[HOST]
18 | 
19 | # Check if you have nextflow installed,
20 | $ nextflow -h
21 | 
22 | # If not available, install Nextflow
23 | $ curl -s https://get.nextflow.io | bash
24 | # If you do not have curl installed, try wget
25 | # $ wget -qO- https://get.nextflow.io | bash
26 | 
27 | # give write permissions to user
28 | $ chmod u+x nextflow
29 | 
30 | # move nextflow executable to a folder in your PATH environment variable
31 | $ mv nextflow $HOME/bin
32 | 
33 | # create a test directory and change into it
34 | $ mkdir amr_test && cd amr_test
35 | 
36 | # clone pipeline source code
37 | $ git clone https://github.com/meglab-metagenomics/amrplusplus_v2.git .
38 | ```
39 | 
40 | Run a Simple Test
41 | -----------------
42 | 
43 | We will run a small sample dataset that comes with the pipeline source code. As such, we will not be specifying any input paths as they have already been included. During the program's execution, the required tool dependencies will be accessed using a Singularity container. As there are many tool dependencies, downloading the container could take some time depending on your connection speed.
44 | 
45 | ```bash
46 | # navigate into AmrPlusPlus repository
47 | $ cd amrplusplus_v2/
48 | 
49 | # command to run the amrplusplus pipeline
50 | $ nextflow run main_AmrPlusPlus_v2.nf -profile singularity --output test_results
51 | 
52 | # change directories to view pipeline outputs
53 | $ cd test/
54 | ```
55 | 
56 | 
57 | 


--------------------------------------------------------------------------------
/docs/output.md:
--------------------------------------------------------------------------------
 1 | Output
 2 | ------
 3 | 
 4 | All intermediate outputs produced from each module of this pipeline are provided as flat files that can be viewed in a text editor. These files are copied from the root **work/** directory created by Nextflow, so if disk space is a concern, this directory should be deleted as it can get quite large.
 5 | 
 6 | Directory Structure
 7 | -------------------
 8 | 
 9 | The output directories created by the pipeline are named after the module that produced them. Each file output is prefixed with the sample name and suffixed with a short product description. 
10 | 
11 | Files without sample prefixes are a result of aggregation. For example, the files **host.removal.stats** and **trimmomatic.stats** provide count matrices for the number of reads discarded as a result of host-dna removal and number of trimmed reads for each sample. 
12 | 
13 | ```bash
14 | ├── RunQC
15 | │   ├── Paired
16 | │   │   ├── SRR532663.1P.fastq
17 | │   │   └── SRR532663.2P.fastq
18 | │   ├── Stats
19 | │   │   └── trimmomatic.stats
20 | │   └── Unpaired
21 | │       ├── SRR532663.1U.fastq
22 | │       └── SRR532663.2U.fastq
23 | ├── BuildHostIndex
24 | │   ├── chr21.fasta.amb
25 | │   ├── chr21.fasta.ann
26 | │   ├── chr21.fasta.bwt
27 | │   ├── chr21.fasta.pac
28 | │   └── chr21.fasta.sa
29 | ├── AlignReadsToHost
30 | │   └── SRR532663.host.sam
31 | ├── NonHostReads
32 | │   ├── SRR532663.non.host.R1.fastq
33 | │   └── SRR532663.non.host.R2.fastq
34 | ├── RemoveHostDNA
35 | │   ├── HostRemovalStats
36 | │   │   └── host.removal.stats
37 | │   └── NonHostBAM
38 | │       └── SRR532663.host.sorted.removed.bam
39 | ├── AlignToAMR
40 | │   └── SRR532663.amr.alignment.sam
41 | ├── RunResistome
42 | │   └── SRR532663.gene.tsv
43 | ├── ResistomeResults
44 | │   └── AMR_analytic_matrix.csv
45 | ├── RunRarefaction
46 | │   ├── SRR532663.class.tsv
47 | │   ├── SRR532663.gene.tsv
48 | │   ├── SRR532663.group.tsv
49 | │   └── SRR532663.mech.tsv
50 | ├── RunKraken
51 | │   └── SRR532663.kraken.report
52 | │   └── SRR532663.kraken.filtered.report
53 | ├── KrakenResults
54 | │   └── kraken_analytic_matrix.csv
55 | ├── FilteredKrakenResults
56 | │   └── filtered_kraken_analytic_matrix.csv
57 | 
58 | 
59 | 
60 | ```
61 | 


--------------------------------------------------------------------------------
/docs/requirements.md:
--------------------------------------------------------------------------------
 1 | Software Requirements
 2 | ---------------------
 3 | To run AmrPlusPlus, you will need the following libraries and tools installed on your server or local machine.
 4 | 
 5 |   - Singularity 
 6 |     - Visit this website for further information: https://singularity.lbl.gov/docs-installation
 7 |   - Java 7+ (Required)
 8 |   - Nextflow (Required)
 9 |   
10 | NOTE: If you choose not to install Singularity, you will need to download each of the required dependencies and add the executable paths to your .bashrc file to run the pipeline. A list of these dependencies can be found in the [Dependencies](https://github.com/meglab-metagenomics/amrplusplus_v2/blob/master/docs/dependencies.md) section of this document.
11 | 


--------------------------------------------------------------------------------
/docs/usage.md:
--------------------------------------------------------------------------------
  1 | Usage
  2 | -----
  3 | 
  4 | ### Display Help Message
  5 | 
  6 | The `help` parameter displays the available options and commands.
  7 | ```
  8 | $ nextflow run main_AmrPlusPlus_v2.nf --help
  9 | ```
 10 | 
 11 | ### File Inputs
 12 | 
 13 | #### Set custom sequence data
 14 | 
 15 | The `reads` parameter accepts sequence files in standard fastq and gz format.
 16 | ```
 17 | $ nextflow run main_AmrPlusPlus_v2.nf --reads "data/raw/*_R{1,2}.fastq"
 18 | ```
 19 | 
 20 | #### Set host genome
 21 | 
 22 | The `host` parameter accepts a fasta formatted host genome.
 23 | ```
 24 | $ nextflow run main_AmrPlusPlus_v2.nf --host "data/host/chr21.fasta.gz"
 25 | ```
 26 | 
 27 | #### Set host index
 28 | 
 29 | The `host_index` parameter allows you to upload pre-built host indexes produced by BWA.
 30 | ```
 31 | $ nextflow run main_AmrPlusPlus_v2.nf --host "data/host/chr21.fasta.gz" --host_index "data/index/*"
 32 | ```
 33 | 
 34 | #### Set resistance database
 35 | 
 36 | The `amr` parameter accepts a fasta formatted resistance database. 
 37 | ```
 38 | $ nextflow run main_AmrPlusPlus_v2.nf --amr "data/amr/megares_database_v1.02.fasta"
 39 | ```
 40 | 
 41 | #### Set annotation database
 42 | 
 43 | The `annotation` parameter accepts a csv formatted annotation database.
 44 | ```
 45 | $ nextflow run main_AmrPlusPlus_v2.nf --annotation "data/amr/megares_annotations_v1.02.csv"
 46 | ```
 47 | 
 48 | #### Set adapter file
 49 | 
 50 | The `adapters` parameter accepts a fasta formatted adapter file.
 51 | ```
 52 | $ nextflow run main_AmrPlusPlus_v2.nf --adapters "data/adapters/adapters.fa"
 53 | ```
 54 | 
 55 | ### File Outputs
 56 | 
 57 | #### Set output and work directories
 58 | 
 59 | The `output` parameter writes the results to the specified directory. As a nextflow variable, the `work` parameter only requires one dash and determines where the temporary files will be directed. Upon completing the run, you can delete the temporary file directory.
 60 | ```
 61 | $ nextflow run main_AmrPlusPlus_v2.nf --output "test/" -work "work_dir/"
 62 | ```
 63 | 
 64 | ### Resume a pipeline run
 65 | 
 66 | If the pipeline run is cancelled or stopped for whatever reason, using the same command with the addition of the `-resume` flag will attempt to pick up where the pipeline stopped. 
 67 | ```
 68 | $ nextflow run main_AmrPlusPlus_v2.nf --output "test/" -work "work_dir/" -resume
 69 | ```
 70 | 
 71 | ### Trimming Options
 72 | 
 73 | #### Set custom trimming parameters
 74 | 
 75 | ```
 76 | $ nextflow run main_AmrPlusPlus_v2.nf \
 77 |     --reads "data/raw/*_R{1,2}.fastq" \
 78 |     --leading 3 \
 79 |     --trailing 3 \
 80 |     --minlen 36 \
 81 |     --slidingwindow 4 \
 82 |     --adapters "data/adapters/nextera.fa"
 83 |     --output "test/"
 84 | ```
 85 | 
 86 | ### Algorithm Options
 87 | 
 88 | #### Set custom algorithm options
 89 | 
 90 | ```
 91 | $ nextflow run main_AmrPlusPlus_v2.nf \
 92 |     --reads "data/raw/*_R{1,2}.fastq" \
 93 |     --threshold 80 \
 94 |     --min 1 \
 95 |     --max 100 \
 96 |     --samples 5 \
 97 |     --skip 5 \
 98 |     --output "test/"
 99 | ```
100 | 
101 | #### Set number of threads to use for each process
102 | 
103 | ```
104 | $ nextflow run main_AmrPlusPlus_v2.nf --threads 8
105 | ```
106 | 


--------------------------------------------------------------------------------
/download_minikraken.sh:
--------------------------------------------------------------------------------
1 | # Install nextflow
2 | # curl -s https://get.nextflow.io | bash
3 | 
4 | 
5 | # Download minikraken database and unzip
6 | wget ftp://ftp.ccb.jhu.edu/pub/data/kraken2_dbs/minikraken_8GB_202003.tgz
7 | tar -xvzf minikraken_8GB_202003.tgz
8 | 


--------------------------------------------------------------------------------
/launch_mpi_slurm.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | #SBATCH --job-name=AMRPlusPlus
 3 | #SBATCH --partition=shas
 4 | #SBATCH --ntasks=1
 5 | #SBATCH --qos=long
 6 | #SBATCH --cpus-per-task=1
 7 | #SBATCH --time=100:00:00
 8 | #SBATCH --export=ALL
 9 | #SBATCH --mail-user=enriquedoster@gmail.com
10 | #SBATCH --mail-type=ALL
11 | 
12 | module purge
13 | module load jdk/1.8.0
14 | module load singularity/2.5.2
15 | module spider openmpi/4.0.0
16 | 
17 | mpirun --pernode ./nextflow run main_AmrPlusPlus_v2.nf -resume -profile msi_pbs \
18 | -w /work_dir --threads 15 \
19 | --output output_results --host /PATH/TO/HOST/GENOME \
20 | --reads "RAWREADS/*_R{1,2}.fastq.gz" -with-mpi
21 | 


--------------------------------------------------------------------------------
/main_AmrPlusPlus_v2.nf:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env nextflow
  2 | 
  3 | /*
  4 | vim: syntax=groovy
  5 | -*- mode: groovy;-*-
  6 | */
  7 | 
  8 | if (params.help ) {
  9 |     return help()
 10 | }
 11 | if( params.host_index ) {
 12 |     host_index = Channel.fromPath(params.host_index).toSortedList()
 13 |     //if( host_index.isEmpty() ) return index_error(host_index)
 14 | }
 15 | if( params.host ) {
 16 |     host = file(params.host)
 17 |     if( !host.exists() ) return host_error(host)
 18 | }
 19 | if( params.amr ) {
 20 |     amr = file(params.amr)
 21 |     if( !amr.exists() ) return amr_error(amr)
 22 | }
 23 | if( params.adapters ) {
 24 |     adapters = file(params.adapters)
 25 |     if( !adapters.exists() ) return adapter_error(adapters)
 26 | }
 27 | if( params.annotation ) {
 28 |     annotation = file(params.annotation)
 29 |     if( !annotation.exists() ) return annotation_error(annotation)
 30 | }
 31 | 
 32 | if(params.kraken_db) {
 33 |     kraken_db = file(params.kraken_db)
 34 | }
 35 | 
 36 | threads = params.threads
 37 | 
 38 | threshold = params.threshold
 39 | 
 40 | min = params.min
 41 | max = params.max
 42 | skip = params.skip
 43 | samples = params.samples
 44 | 
 45 | leading = params.leading
 46 | trailing = params.trailing
 47 | slidingwindow = params.slidingwindow
 48 | minlen = params.minlen
 49 | 
 50 | Channel
 51 |     .fromFilePairs( params.reads, flat: true )
 52 |     .ifEmpty { exit 1, "Read pair files could not be found: ${params.reads}" }
 53 |     .set { reads }
 54 | 
 55 | process RunQC {
 56 |     tag { sample_id }
 57 | 
 58 |     publishDir "${params.output}/RunQC", mode: 'copy', pattern: '*.fastq.gz',
 59 |         saveAs: { filename ->
 60 |             if(filename.indexOf("P.fastq.gz") > 0) "Paired/$filename"
 61 |             else if(filename.indexOf("U.fastq.gz") > 0) "Unpaired/$filename"
 62 |             else {}
 63 |         }
 64 | 
 65 |     input:
 66 |         set sample_id, file(forward), file(reverse) from reads
 67 | 
 68 |     output:
 69 |         set sample_id, file("${sample_id}.1P.fastq.gz"), file("${sample_id}.2P.fastq.gz") into (paired_fastq)
 70 |         set sample_id, file("${sample_id}.1U.fastq.gz"), file("${sample_id}.2U.fastq.gz") into (unpaired_fastq)
 71 |         file("${sample_id}.trimmomatic.stats.log") into (trimmomatic_stats)
 72 | 
 73 |     """
 74 |      ${JAVA} -jar ${TRIMMOMATIC} \
 75 |       PE \
 76 |       -threads ${threads} \
 77 |       $forward $reverse ${sample_id}.1P.fastq.gz ${sample_id}.1U.fastq.gz ${sample_id}.2P.fastq.gz ${sample_id}.2U.fastq.gz \
 78 |       ILLUMINACLIP:${adapters}:2:30:10:3:TRUE \
 79 |       LEADING:${leading} \
 80 |       TRAILING:${trailing} \
 81 |       SLIDINGWINDOW:${slidingwindow} \
 82 |       MINLEN:${minlen} \
 83 |       2> ${sample_id}.trimmomatic.stats.log
 84 |     """
 85 | }
 86 | 
 87 | trimmomatic_stats.toSortedList().set { trim_stats }
 88 | 
 89 | process QCStats {
 90 |     tag { sample_id }
 91 | 
 92 |     publishDir "${params.output}/RunQC", mode: 'copy',
 93 |         saveAs: { filename ->
 94 |             if(filename.indexOf(".stats") > 0) "Stats/$filename"
 95 |             else {}
 96 |         }
 97 | 
 98 |     input:
 99 |         file(stats) from trim_stats
100 | 
101 |     output:
102 | 	file("trimmomatic.stats")
103 | 
104 |     """
105 |     ${PYTHON3} $baseDir/bin/trimmomatic_stats.py -i ${stats} -o trimmomatic.stats
106 |     """
107 | }
108 | 
109 | if( !params.host_index ) {
110 |     process BuildHostIndex {
111 |         publishDir "${params.output}/BuildHostIndex", mode: "copy"
112 | 
113 |         tag { host.baseName }
114 | 
115 |         input:
116 |             file(host)
117 | 
118 |         output:
119 |             file '*' into (host_index)
120 | 
121 |         """
122 |         ${BWA} index ${host}
123 |         """
124 |     }
125 | }
126 | 
127 | process AlignReadsToHost {
128 |     tag { sample_id }
129 | 
130 |     publishDir "${params.output}/AlignReadsToHost", mode: "copy"
131 | 
132 |     input:
133 |         set sample_id, file(forward), file(reverse) from paired_fastq
134 |         file index from host_index
135 |         file host
136 | 
137 |     output:
138 |         set sample_id, file("${sample_id}.host.sorted.bam") into (host_bam)
139 | 
140 |     """
141 |     ${BWA} mem ${host} ${forward} ${reverse} -t ${threads} > ${sample_id}.host.sam
142 |     ${SAMTOOLS} view -bS ${sample_id}.host.sam | ${SAMTOOLS} sort -@ ${threads} -o ${sample_id}.host.sorted.bam
143 |     rm ${sample_id}.host.sam
144 |     """
145 |   }
146 | 
147 | process RemoveHostDNA {
148 |     tag { sample_id }
149 | 
150 |     publishDir "${params.output}/RemoveHostDNA", mode: "copy", pattern: '*.bam',
151 |         saveAs: { filename ->
152 |             if(filename.indexOf(".bam") > 0) "NonHostBAM/$filename"
153 |         }
154 | 
155 |     input:
156 |         set sample_id, file(bam) from host_bam
157 | 
158 |     output:
159 |         set sample_id, file("${sample_id}.host.sorted.removed.bam") into (non_host_bam)
160 |         file("${sample_id}.samtools.idxstats") into (idxstats_logs)
161 | 
162 |     """
163 |     ${SAMTOOLS} index ${bam} && ${SAMTOOLS} idxstats ${bam} > ${sample_id}.samtools.idxstats
164 |     ${SAMTOOLS} view -h -f 4 -b ${bam} -o ${sample_id}.host.sorted.removed.bam
165 |     """
166 | }
167 | 
168 | idxstats_logs.toSortedList().set { host_removal_stats }
169 | 
170 | process HostRemovalStats {
171 |     tag { sample_id }
172 | 
173 |     publishDir "${params.output}/RemoveHostDNA", mode: "copy",
174 |         saveAs: { filename ->
175 |             if(filename.indexOf(".stats") > 0) "HostRemovalStats/$filename"
176 |         }
177 | 
178 |     input:
179 |         file(stats) from host_removal_stats
180 | 
181 |     output:
182 |         file("host.removal.stats")
183 | 
184 |     """
185 |     ${PYTHON3} $baseDir/bin/samtools_idxstats.py -i ${stats} -o host.removal.stats
186 |     """
187 | }
188 | 
189 | process NonHostReads {
190 |     tag { sample_id }
191 | 
192 |     publishDir "${params.output}/NonHostReads", mode: "copy"
193 | 
194 |     input:
195 |         set sample_id, file(bam) from non_host_bam
196 | 
197 |     output:
198 |         set sample_id, file("${sample_id}.non.host.R1.fastq.gz"), file("${sample_id}.non.host.R2.fastq.gz") into (non_host_fastq_megares, non_host_fastq_dedup,non_host_fastq_kraken)
199 | 
200 |     """
201 |     ${BEDTOOLS}  \
202 |        bamtofastq \
203 |       -i ${bam} \
204 |       -fq ${sample_id}.non.host.R1.fastq.gz \
205 |       -fq2 ${sample_id}.non.host.R2.fastq.gz
206 |     """
207 | }
208 | 
209 | /*
210 | -
211 | --
212 | ---
213 | ---- nonhost reads for megares and kraken2
214 | ---
215 | --
216 | -
217 | */
218 | 
219 | 
220 | 
221 | /*
222 | ---- Run alignment to MEGAres
223 | */
224 | 
225 | if( !params.amr_index ) {
226 |     process BuildAMRIndex {
227 |         tag { amr.baseName }
228 | 
229 |         input:
230 |             file(amr)
231 | 
232 |         output:
233 |             file '*' into (amr_index)
234 | 
235 |         """
236 |         ${BWA} index ${amr}
237 |         """
238 |     }
239 | }
240 | 
241 | process AlignToAMR {
242 |      tag { sample_id }
243 | 
244 |      publishDir "${params.output}/AlignToAMR", mode: "copy"
245 | 
246 |      input:
247 |          set sample_id, file(forward), file(reverse) from non_host_fastq_megares
248 |          file index from amr_index
249 |          file amr
250 | 
251 |      output:
252 |          set sample_id, file("${sample_id}.amr.alignment.sam") into (megares_resistome_sam, megares_rarefaction_sam, megares_snp_sam , megares_snpfinder_sam)
253 |          set sample_id, file("${sample_id}.amr.alignment.dedup.sam") into (megares_dedup_resistome_sam)
254 |          set sample_id, file("${sample_id}.amr.alignment.dedup.bam") into (megares_dedup_resistome_bam)
255 | 
256 | 
257 |      """
258 |      ${BWA} mem ${amr} ${forward} ${reverse} -t ${threads} -R '@RG\\tID:${sample_id}\\tSM:${sample_id}' > ${sample_id}.amr.alignment.sam
259 |      ${SAMTOOLS} view -S -b ${sample_id}.amr.alignment.sam > ${sample_id}.amr.alignment.bam
260 |      ${SAMTOOLS} sort -n ${sample_id}.amr.alignment.bam -o ${sample_id}.amr.alignment.sorted.bam
261 |      ${SAMTOOLS} fixmate ${sample_id}.amr.alignment.sorted.bam ${sample_id}.amr.alignment.sorted.fix.bam
262 |      ${SAMTOOLS} sort ${sample_id}.amr.alignment.sorted.fix.bam -o ${sample_id}.amr.alignment.sorted.fix.sorted.bam
263 |      ${SAMTOOLS} rmdup -S ${sample_id}.amr.alignment.sorted.fix.sorted.bam ${sample_id}.amr.alignment.dedup.bam
264 |      ${SAMTOOLS} view -h -o ${sample_id}.amr.alignment.dedup.sam ${sample_id}.amr.alignment.dedup.bam
265 |      rm ${sample_id}.amr.alignment.bam
266 |      rm ${sample_id}.amr.alignment.sorted*.bam
267 |      """
268 | }
269 | 
270 | process RunResistome {
271 |     tag { sample_id }
272 | 
273 |     publishDir "${params.output}/RunResistome", mode: "copy"
274 | 
275 |     input:
276 |         set sample_id, file(sam) from megares_resistome_sam
277 |         file annotation
278 |         file amr
279 | 
280 |     output:
281 |         file("${sample_id}.gene.tsv") into (megares_resistome_counts, SNP_confirm_long)
282 |         file("${sample_id}.group.tsv") into (megares_group_counts)
283 |         file("${sample_id}.mechanism.tsv") into (megares_mech_counts)
284 |         file("${sample_id}.class.tsv") into (megares_class_counts)
285 |         file("${sample_id}.type.tsv") into (megares_type_counts)
286 | 
287 |     """
288 |     $baseDir/bin/resistome -ref_fp ${amr} \
289 |       -annot_fp ${annotation} \
290 |       -sam_fp ${sam} \
291 |       -gene_fp ${sample_id}.gene.tsv \
292 |       -group_fp ${sample_id}.group.tsv \
293 |       -mech_fp ${sample_id}.mechanism.tsv \
294 |       -class_fp ${sample_id}.class.tsv \
295 |       -type_fp ${sample_id}.type.tsv \
296 |       -t ${threshold}
297 |     """
298 | }
299 | 
300 | megares_resistome_counts.toSortedList().set { megares_amr_l_to_w }
301 | 
302 | process ResistomeResults {
303 |     tag { }
304 | 
305 |     publishDir "${params.output}/ResistomeResults", mode: "copy"
306 | 
307 |     input:
308 |         file(resistomes) from megares_amr_l_to_w
309 | 
310 |     output:
311 |         file("AMR_analytic_matrix.csv") into amr_master_matrix
312 | 
313 |     """
314 |     ${PYTHON3} $baseDir/bin/amr_long_to_wide.py -i ${resistomes} -o AMR_analytic_matrix.csv
315 |     """
316 | }
317 | 
318 | 
319 | /* samtools deduplication of megares alignment */
320 | process SamDedupRunResistome {
321 |     tag { sample_id }
322 | 
323 |     publishDir "${params.output}/SamDedupRunResistome", mode: "copy"
324 | 
325 |     input:
326 |         set sample_id, file(sam) from megares_dedup_resistome_sam
327 |         file annotation
328 |         file amr
329 | 
330 |     output:
331 |         file("${sample_id}.gene.tsv") into (megares_dedup_resistome_counts)
332 |         file("${sample_id}.group.tsv") into (megares_dedup_group_counts)
333 |         file("${sample_id}.mechanism.tsv") into (megares_dedup_mech_counts)
334 |         file("${sample_id}.class.tsv") into (megares_dedup_class_counts)
335 |         file("${sample_id}.type.tsv") into (megares_dedup_type_counts)
336 | 
337 |     """
338 |     $baseDir/bin/resistome -ref_fp ${amr} \
339 |       -annot_fp ${annotation} \
340 |       -sam_fp ${sam} \
341 |       -gene_fp ${sample_id}.gene.tsv \
342 |       -group_fp ${sample_id}.group.tsv \
343 |       -mech_fp ${sample_id}.mechanism.tsv \
344 |       -class_fp ${sample_id}.class.tsv \
345 |       -type_fp ${sample_id}.type.tsv \
346 |       -t ${threshold}
347 |     """
348 | }
349 | 
350 | megares_dedup_resistome_counts.toSortedList().set { megares_dedup_amr_l_to_w }
351 | 
352 | process SamDedupResistomeResults {
353 |     tag { }
354 | 
355 |     publishDir "${params.output}/SamDedup_ResistomeResults", mode: "copy"
356 | 
357 |     input:
358 |         file(resistomes) from megares_dedup_amr_l_to_w
359 | 
360 |     output:
361 |         file("SamDedup_AMR_analytic_matrix.csv") into megares_dedup_amr_master_matrix
362 | 
363 |     """
364 |     ${PYTHON3} $baseDir/bin/amr_long_to_wide.py -i ${resistomes} -o SamDedup_AMR_analytic_matrix.csv
365 |     """
366 | }
367 | 
368 | process RunRarefaction {
369 |     tag { sample_id }
370 | 
371 |     publishDir "${params.output}/RunRarefaction", mode: "copy"
372 | 
373 |     input:
374 |         set sample_id, file(sam) from megares_rarefaction_sam
375 |         file annotation
376 |         file amr
377 | 
378 |     output:
379 |         set sample_id, file("*.tsv") into (rarefaction)
380 | 
381 |     """
382 |     $baseDir/bin/rarefaction \
383 |       -ref_fp ${amr} \
384 |       -sam_fp ${sam} \
385 |       -annot_fp ${annotation} \
386 |       -gene_fp ${sample_id}.gene.tsv \
387 |       -group_fp ${sample_id}.group.tsv \
388 |       -mech_fp ${sample_id}.mech.tsv \
389 |       -class_fp ${sample_id}.class.tsv \
390 |       -type_fp ${sample_id}.type.tsv \
391 |       -min ${min} \
392 |       -max ${max} \
393 |       -skip ${skip} \
394 |       -samples ${samples} \
395 |       -t ${threshold}
396 |     """
397 | }
398 | 
399 | 
400 | 
401 | 
402 | 
403 | 
404 | 
405 | def nextflow_version_error() {
406 |     println ""
407 |     println "This workflow requires Nextflow version 0.25 or greater -- You are running version $nextflow.version"
408 |     println "Run ./nextflow self-update to update Nextflow to the latest available version."
409 |     println ""
410 |     return 1
411 | }
412 | 
413 | def adapter_error(def input) {
414 |     println ""
415 |     println "[params.adapters] fail to open: '" + input + "' : No such file or directory"
416 |     println ""
417 |     return 1
418 | }
419 | 
420 | def amr_error(def input) {
421 |     println ""
422 |     println "[params.amr] fail to open: '" + input + "' : No such file or directory"
423 |     println ""
424 |     return 1
425 | }
426 | 
427 | def annotation_error(def input) {
428 |     println ""
429 |     println "[params.annotation] fail to open: '" + input + "' : No such file or directory"
430 |     println ""
431 |     return 1
432 | }
433 | 
434 | def fastq_error(def input) {
435 |     println ""
436 |     println "[params.reads] fail to open: '" + input + "' : No such file or directory"
437 |     println ""
438 |     return 1
439 | }
440 | 
441 | def host_error(def input) {
442 |     println ""
443 |     println "[params.host] fail to open: '" + input + "' : No such file or directory"
444 |     println ""
445 |     return 1
446 | }
447 | 
448 | def index_error(def input) {
449 |     println ""
450 |     println "[params.host_index] fail to open: '" + input + "' : No such file or directory"
451 |     println ""
452 |     return 1
453 | }
454 | 
455 | def help() {
456 |     println ""
457 |     println "Program: AmrPlusPlus"
458 |     println "Documentation: https://github.com/colostatemeg/amrplusplus/blob/master/README.md"
459 |     println "Contact: Christopher Dean <cdean11@colostate.edu>"
460 |     println ""
461 |     println "Usage:    nextflow run main.nf [options]"
462 |     println ""
463 |     println "Input/output options:"
464 |     println ""
465 |     println "    --reads         STR      path to FASTQ formatted input sequences"
466 |     println "    --adapters      STR      path to FASTA formatted adapter sequences"
467 |     println "    --host          STR      path to FASTA formatted host genome"
468 |     println "    --host_index    STR      path to BWA generated index files"
469 |     println "    --amr           STR      path to AMR resistance database"
470 |     println "    --annotation    STR      path to AMR annotation file"
471 |     println "    --output        STR      directory to write process outputs to"
472 |     println "    --KRAKENDB      STR      path to kraken database"
473 |     println ""
474 |     println "Trimming options:"
475 |     println ""
476 |     println "    --leading       INT      cut bases off the start of a read, if below a threshold quality"
477 |     println "    --minlen        INT      drop the read if it is below a specified length"
478 |     println "    --slidingwindow INT      perform sw trimming, cutting once the average quality within the window falls below a threshold"
479 |     println "    --trailing      INT      cut bases off the end of a read, if below a threshold quality"
480 |     println ""
481 |     println "Algorithm options:"
482 |     println ""
483 |     println "    --threads       INT      number of threads to use for each process"
484 |     println "    --threshold     INT      gene fraction threshold"
485 |     println "    --min           INT      starting sample level"
486 |     println "    --max           INT      ending sample level"
487 |     println "    --samples       INT      number of sampling iterations to perform"
488 |     println "    --skip          INT      number of levels to skip"
489 |     println ""
490 |     println "Help options:"
491 |     println ""
492 |     println "    --help                   display this message"
493 |     println ""
494 |     return 1
495 | }
496 | 


--------------------------------------------------------------------------------
/main_AmrPlusPlus_v2_withKraken.nf:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env nextflow
  2 | 
  3 | /*
  4 | vim: syntax=groovy
  5 | -*- mode: groovy;-*-
  6 | */
  7 | 
  8 | if (params.help ) {
  9 |     return help()
 10 | }
 11 | if( params.host_index ) {
 12 |     host_index = Channel.fromPath(params.host_index).toSortedList()
 13 |     //if( host_index.isEmpty() ) return index_error(host_index)
 14 | }
 15 | if( params.host ) {
 16 |     host = file(params.host)
 17 |     if( !host.exists() ) return host_error(host)
 18 | }
 19 | if( params.amr ) {
 20 |     amr = file(params.amr)
 21 |     if( !amr.exists() ) return amr_error(amr)
 22 | }
 23 | if( params.adapters ) {
 24 |     adapters = file(params.adapters)
 25 |     if( !adapters.exists() ) return adapter_error(adapters)
 26 | }
 27 | if( params.annotation ) {
 28 |     annotation = file(params.annotation)
 29 |     if( !annotation.exists() ) return annotation_error(annotation)
 30 | }
 31 | if(params.kraken_db) {
 32 |     kraken_db = file(params.kraken_db)
 33 | }
 34 | 
 35 | threads = params.threads
 36 | 
 37 | threshold = params.threshold
 38 | 
 39 | min = params.min
 40 | max = params.max
 41 | skip = params.skip
 42 | samples = params.samples
 43 | 
 44 | leading = params.leading
 45 | trailing = params.trailing
 46 | slidingwindow = params.slidingwindow
 47 | minlen = params.minlen
 48 | 
 49 | Channel
 50 |     .fromFilePairs( params.reads, flat: true )
 51 |     .ifEmpty { exit 1, "Read pair files could not be found: ${params.reads}" }
 52 |     .set { reads }
 53 | 
 54 | process RunQC {
 55 |     tag { sample_id }
 56 | 
 57 |     publishDir "${params.output}/RunQC", mode: 'copy', pattern: '*.fastq.gz',
 58 |         saveAs: { filename ->
 59 |             if(filename.indexOf("P.fastq.gz") > 0) "Paired/$filename"
 60 |             else if(filename.indexOf("U.fastq.gz") > 0) "Unpaired/$filename"
 61 |             else {}
 62 |         }
 63 | 
 64 |     input:
 65 |         set sample_id, file(forward), file(reverse) from reads
 66 | 
 67 |     output:
 68 |         set sample_id, file("${sample_id}.1P.fastq.gz"), file("${sample_id}.2P.fastq.gz") into (paired_fastq)
 69 |         set sample_id, file("${sample_id}.1U.fastq.gz"), file("${sample_id}.2U.fastq.gz") into (unpaired_fastq)
 70 |         file("${sample_id}.trimmomatic.stats.log") into (trimmomatic_stats)
 71 | 
 72 |     """
 73 |      ${JAVA} -jar ${TRIMMOMATIC} \
 74 |       PE \
 75 |       -threads ${threads} \
 76 |       $forward $reverse ${sample_id}.1P.fastq.gz ${sample_id}.1U.fastq.gz ${sample_id}.2P.fastq.gz ${sample_id}.2U.fastq.gz \
 77 |       ILLUMINACLIP:${adapters}:2:30:10:3:TRUE \
 78 |       LEADING:${leading} \
 79 |       TRAILING:${trailing} \
 80 |       SLIDINGWINDOW:${slidingwindow} \
 81 |       MINLEN:${minlen} \
 82 |       2> ${sample_id}.trimmomatic.stats.log
 83 |     """
 84 | }
 85 | 
 86 | trimmomatic_stats.toSortedList().set { trim_stats }
 87 | 
 88 | process QCStats {
 89 |     tag { sample_id }
 90 | 
 91 |     publishDir "${params.output}/RunQC", mode: 'copy',
 92 |         saveAs: { filename ->
 93 |             if(filename.indexOf(".stats") > 0) "Stats/$filename"
 94 |             else {}
 95 |         }
 96 | 
 97 |     input:
 98 |         file(stats) from trim_stats
 99 | 
100 |     output:
101 | 	file("trimmomatic.stats")
102 | 
103 |     """
104 |     ${PYTHON3} $baseDir/bin/trimmomatic_stats.py -i ${stats} -o trimmomatic.stats
105 |     """
106 | }
107 | 
108 | if( !params.host_index ) {
109 |     process BuildHostIndex {
110 |         publishDir "${params.output}/BuildHostIndex", mode: "copy"
111 | 
112 |         tag { host.baseName }
113 | 
114 |         input:
115 |             file(host)
116 | 
117 |         output:
118 |             file '*' into (host_index)
119 | 
120 |         """
121 |         ${BWA} index ${host}
122 |         """
123 |     }
124 | }
125 | 
126 | process AlignReadsToHost {
127 |     tag { sample_id }
128 | 
129 |     publishDir "${params.output}/AlignReadsToHost", mode: "copy"
130 | 
131 |     input:
132 |         set sample_id, file(forward), file(reverse) from paired_fastq
133 |         file index from host_index
134 |         file host
135 | 
136 |     output:
137 |         set sample_id, file("${sample_id}.host.sam") into (host_sam)
138 | 
139 |     """
140 |     ${BWA} mem ${host} ${forward} ${reverse} -t ${threads} > ${sample_id}.host.sam
141 |     """
142 | }
143 | 
144 | process RemoveHostDNA {
145 |     tag { sample_id }
146 | 
147 |     publishDir "${params.output}/RemoveHostDNA", mode: "copy", pattern: '*.bam',
148 | 	saveAs: { filename ->
149 |             if(filename.indexOf(".bam") > 0) "NonHostBAM/$filename"
150 |         }
151 | 
152 |     input:
153 |         set sample_id, file(sam) from host_sam
154 | 
155 |     output:
156 |         set sample_id, file("${sample_id}.host.sorted.removed.bam") into (non_host_bam)
157 |         file("${sample_id}.samtools.idxstats") into (idxstats_logs)
158 | 
159 |     """
160 |     ${SAMTOOLS} view -bS ${sam} | ${SAMTOOLS} sort -@ ${threads} -o ${sample_id}.host.sorted.bam
161 |     ${SAMTOOLS} index ${sample_id}.host.sorted.bam && ${SAMTOOLS} idxstats ${sample_id}.host.sorted.bam > ${sample_id}.samtools.idxstats
162 |     ${SAMTOOLS} view -h -f 4 -b ${sample_id}.host.sorted.bam -o ${sample_id}.host.sorted.removed.bam
163 |     """
164 | }
165 | 
166 | idxstats_logs.toSortedList().set { host_removal_stats }
167 | 
168 | process HostRemovalStats {
169 |     tag { sample_id }
170 | 
171 |     publishDir "${params.output}/RemoveHostDNA", mode: "copy",
172 |         saveAs: { filename ->
173 |             if(filename.indexOf(".stats") > 0) "HostRemovalStats/$filename"
174 |         }
175 | 
176 |     input:
177 |         file(stats) from host_removal_stats
178 | 
179 |     output:
180 |         file("host.removal.stats")
181 | 
182 |     """
183 |     ${PYTHON3} $baseDir/bin/samtools_idxstats.py -i ${stats} -o host.removal.stats
184 |     """
185 | }
186 | 
187 | process NonHostReads {
188 |     tag { sample_id }
189 | 
190 |     publishDir "${params.output}/NonHostReads", mode: "copy"
191 | 
192 |     input:
193 |         set sample_id, file(bam) from non_host_bam
194 | 
195 |     output:
196 |         set sample_id, file("${sample_id}.non.host.R1.fastq.gz"), file("${sample_id}.non.host.R2.fastq.gz") into (non_host_fastq_megares, non_host_fastq_dedup,non_host_fastq_kraken)
197 | 
198 |     """
199 |     ${BEDTOOLS}  \
200 |        bamtofastq \
201 |       -i ${bam} \
202 |       -fq ${sample_id}.non.host.R1.fastq.gz \
203 |       -fq2 ${sample_id}.non.host.R2.fastq.gz
204 |     """
205 | }
206 | 
207 | /*
208 | -
209 | --
210 | ---
211 | ---- nonhost reads for megares and kraken2
212 | ---
213 | --
214 | -
215 | */
216 | 
217 | 
218 | /*
219 | ---- Run Kraken2
220 | */
221 | 
222 | process RunKraken {
223 |     tag { sample_id }
224 | 
225 |     publishDir "${params.output}/RunKraken", mode: 'copy',
226 |         saveAs: { filename ->
227 |             if(filename.indexOf(".kraken.raw") > 0) "Standard/$filename"
228 |             else if(filename.indexOf(".kraken.report") > 0) "Standard_report/$filename"
229 |             else if(filename.indexOf(".kraken.filtered.report") > 0) "Filtered_report/$filename"
230 |             else if(filename.indexOf(".kraken.filtered.raw") > 0) "Filtered/$filename"
231 |             else {}
232 |         }
233 | 
234 |     input:
235 |        set sample_id, file(forward), file(reverse) from non_host_fastq_kraken
236 | 
237 | 
238 |    output:
239 |       file("${sample_id}.kraken.report") into (kraken_report,kraken_extract_taxa)
240 |       set sample_id, file("${sample_id}.kraken.raw") into kraken_raw
241 |       file("${sample_id}.kraken.filtered.report") into kraken_filter_report
242 |       file("${sample_id}.kraken.filtered.raw") into kraken_filter_raw
243 | 
244 | 
245 |      """
246 |      ${KRAKEN2} --db ${kraken_db} --paired ${forward} ${reverse} --threads ${threads} --report ${sample_id}.kraken.report > ${sample_id}.kraken.raw
247 |      ${KRAKEN2} --db ${kraken_db} --confidence 1 --paired ${forward} ${reverse} --threads ${threads} --report ${sample_id}.kraken.filtered.report > ${sample_id}.kraken.filtered.raw
248 |      """
249 | }
250 | 
251 | 
252 | kraken_report.toSortedList().set { kraken_l_to_w }
253 | kraken_filter_report.toSortedList().set { kraken_filter_l_to_w }
254 | 
255 | process KrakenResults {
256 |     tag { }
257 | 
258 |     publishDir "${params.output}/KrakenResults", mode: "copy"
259 | 
260 |     input:
261 |         file(kraken_reports) from kraken_l_to_w
262 | 
263 |     output:
264 |         file("kraken_analytic_matrix.csv") into kraken_master_matrix
265 | 
266 |     """
267 |     ${PYTHON3} $baseDir/bin/kraken2_long_to_wide.py -i ${kraken_reports} -o kraken_analytic_matrix.csv
268 |     """
269 | }
270 | 
271 | process FilteredKrakenResults {
272 |     tag { sample_id }
273 | 
274 |     publishDir "${params.output}/FilteredKrakenResults", mode: "copy"
275 | 
276 |     input:
277 |         file(kraken_reports) from kraken_filter_l_to_w
278 | 
279 |     output:
280 |         file("filtered_kraken_analytic_matrix.csv") into filter_kraken_master_matrix
281 | 
282 |     """
283 |     ${PYTHON3} $baseDir/bin/kraken2_long_to_wide.py -i ${kraken_reports} -o filtered_kraken_analytic_matrix.csv
284 |     """
285 | }
286 | 
287 | /*
288 | ---- Run alignment to MEGAres
289 | */
290 | 
291 | if( !params.amr_index ) {
292 |     process BuildAMRIndex {
293 |         tag { amr.baseName }
294 | 
295 |         input:
296 |             file(amr)
297 | 
298 |         output:
299 |             file '*' into (amr_index)
300 | 
301 |         """
302 |         ${BWA} index ${amr}
303 |         """
304 |     }
305 | }
306 | 
307 | process AlignToAMR {
308 |      tag { sample_id }
309 | 
310 |      publishDir "${params.output}/AlignToAMR", mode: "copy"
311 | 
312 |      input:
313 |          set sample_id, file(forward), file(reverse) from non_host_fastq_megares
314 |          file index from amr_index
315 |          file amr
316 | 
317 |      output:
318 |          set sample_id, file("${sample_id}.amr.alignment.sam") into (megares_resistome_sam, megares_rarefaction_sam, megares_snp_sam , megares_snpfinder_sam)
319 |          set sample_id, file("${sample_id}.amr.alignment.dedup.sam") into (megares_dedup_resistome_sam)
320 |          set sample_id, file("${sample_id}.amr.alignment.dedup.bam") into (megares_dedup_resistome_bam)
321 | 
322 | 
323 |      """
324 |      ${BWA} mem ${amr} ${forward} ${reverse} -t ${threads} -R '@RG\\tID:${sample_id}\\tSM:${sample_id}' > ${sample_id}.amr.alignment.sam
325 |      ${SAMTOOLS} view -S -b ${sample_id}.amr.alignment.sam > ${sample_id}.amr.alignment.bam
326 |      ${SAMTOOLS} sort -n ${sample_id}.amr.alignment.bam -o ${sample_id}.amr.alignment.sorted.bam
327 |      ${SAMTOOLS} fixmate ${sample_id}.amr.alignment.sorted.bam ${sample_id}.amr.alignment.sorted.fix.bam
328 |      ${SAMTOOLS} sort ${sample_id}.amr.alignment.sorted.fix.bam -o ${sample_id}.amr.alignment.sorted.fix.sorted.bam
329 |      ${SAMTOOLS} rmdup -S ${sample_id}.amr.alignment.sorted.fix.sorted.bam ${sample_id}.amr.alignment.dedup.bam
330 |      ${SAMTOOLS} view -h -o ${sample_id}.amr.alignment.dedup.sam ${sample_id}.amr.alignment.dedup.bam
331 |      rm ${sample_id}.amr.alignment.bam
332 |      rm ${sample_id}.amr.alignment.sorted*.bam
333 |      """
334 | }
335 | 
336 | process RunResistome {
337 |     tag { sample_id }
338 | 
339 |     publishDir "${params.output}/RunResistome", mode: "copy"
340 | 
341 |     input:
342 |         set sample_id, file(sam) from megares_resistome_sam
343 |         file annotation
344 |         file amr
345 | 
346 |     output:
347 |         file("${sample_id}.gene.tsv") into (megares_resistome_counts, SNP_confirm_long)
348 |         file("${sample_id}.group.tsv") into (megares_group_counts)
349 |         file("${sample_id}.mechanism.tsv") into (megares_mech_counts)
350 |         file("${sample_id}.class.tsv") into (megares_class_counts)
351 |         file("${sample_id}.type.tsv") into (megares_type_counts)
352 | 
353 |     """
354 |     $baseDir/bin/resistome -ref_fp ${amr} \
355 |       -annot_fp ${annotation} \
356 |       -sam_fp ${sam} \
357 |       -gene_fp ${sample_id}.gene.tsv \
358 |       -group_fp ${sample_id}.group.tsv \
359 |       -mech_fp ${sample_id}.mechanism.tsv \
360 |       -class_fp ${sample_id}.class.tsv \
361 |       -type_fp ${sample_id}.type.tsv \
362 |       -t ${threshold}
363 |     """
364 | }
365 | 
366 | megares_resistome_counts.toSortedList().set { megares_amr_l_to_w }
367 | 
368 | process ResistomeResults {
369 |     tag { }
370 | 
371 |     publishDir "${params.output}/ResistomeResults", mode: "copy"
372 | 
373 |     input:
374 |         file(resistomes) from megares_amr_l_to_w
375 | 
376 |     output:
377 |         file("AMR_analytic_matrix.csv") into amr_master_matrix
378 | 
379 |     """
380 |     ${PYTHON3} $baseDir/bin/amr_long_to_wide.py -i ${resistomes} -o AMR_analytic_matrix.csv
381 |     """
382 | }
383 | 
384 | 
385 | /* samtools deduplication of megares alignment */
386 | process SamDedupRunResistome {
387 |     tag { sample_id }
388 | 
389 |     publishDir "${params.output}/SamDedupRunResistome", mode: "copy"
390 | 
391 |     input:
392 |         set sample_id, file(sam) from megares_dedup_resistome_sam
393 |         file annotation
394 |         file amr
395 | 
396 |     output:
397 |         file("${sample_id}.gene.tsv") into (megares_dedup_resistome_counts)
398 |         file("${sample_id}.group.tsv") into (megares_dedup_group_counts)
399 |         file("${sample_id}.mechanism.tsv") into (megares_dedup_mech_counts)
400 |         file("${sample_id}.class.tsv") into (megares_dedup_class_counts)
401 |         file("${sample_id}.type.tsv") into (megares_dedup_type_counts)
402 | 
403 |     """
404 |     $baseDir/bin/resistome -ref_fp ${amr} \
405 |       -annot_fp ${annotation} \
406 |       -sam_fp ${sam} \
407 |       -gene_fp ${sample_id}.gene.tsv \
408 |       -group_fp ${sample_id}.group.tsv \
409 |       -mech_fp ${sample_id}.mechanism.tsv \
410 |       -class_fp ${sample_id}.class.tsv \
411 |       -type_fp ${sample_id}.type.tsv \
412 |       -t ${threshold}
413 |     """
414 | }
415 | 
416 | megares_dedup_resistome_counts.toSortedList().set { megares_dedup_amr_l_to_w }
417 | 
418 | process SamDedupResistomeResults {
419 |     tag { }
420 | 
421 |     publishDir "${params.output}/SamDedup_ResistomeResults", mode: "copy"
422 | 
423 |     input:
424 |         file(resistomes) from megares_dedup_amr_l_to_w
425 | 
426 |     output:
427 |         file("SamDedup_AMR_analytic_matrix.csv") into megares_dedup_amr_master_matrix
428 | 
429 |     """
430 |     ${PYTHON3} $baseDir/bin/amr_long_to_wide.py -i ${resistomes} -o SamDedup_AMR_analytic_matrix.csv
431 |     """
432 | }
433 | 
434 | process RunRarefaction {
435 |     tag { sample_id }
436 | 
437 |     publishDir "${params.output}/RunRarefaction", mode: "copy"
438 | 
439 |     input:
440 |         set sample_id, file(sam) from megares_rarefaction_sam
441 |         file annotation
442 |         file amr
443 | 
444 |     output:
445 |         set sample_id, file("*.tsv") into (rarefaction)
446 | 
447 |     """
448 |     $baseDir/bin/rarefaction \
449 |       -ref_fp ${amr} \
450 |       -sam_fp ${sam} \
451 |       -annot_fp ${annotation} \
452 |       -gene_fp ${sample_id}.gene.tsv \
453 |       -group_fp ${sample_id}.group.tsv \
454 |       -mech_fp ${sample_id}.mech.tsv \
455 |       -class_fp ${sample_id}.class.tsv \
456 |       -type_fp ${sample_id}.type.tsv \
457 |       -min ${min} \
458 |       -max ${max} \
459 |       -skip ${skip} \
460 |       -samples ${samples} \
461 |       -t ${threshold}
462 |     """
463 | }
464 | 
465 | 
466 | 
467 | 
468 | def nextflow_version_error() {
469 |     println ""
470 |     println "This workflow requires Nextflow version 0.25 or greater -- You are running version $nextflow.version"
471 |     println "Run ./nextflow self-update to update Nextflow to the latest available version."
472 |     println ""
473 |     return 1
474 | }
475 | 
476 | def adapter_error(def input) {
477 |     println ""
478 |     println "[params.adapters] fail to open: '" + input + "' : No such file or directory"
479 |     println ""
480 |     return 1
481 | }
482 | 
483 | def amr_error(def input) {
484 |     println ""
485 |     println "[params.amr] fail to open: '" + input + "' : No such file or directory"
486 |     println ""
487 |     return 1
488 | }
489 | 
490 | def annotation_error(def input) {
491 |     println ""
492 |     println "[params.annotation] fail to open: '" + input + "' : No such file or directory"
493 |     println ""
494 |     return 1
495 | }
496 | 
497 | def fastq_error(def input) {
498 |     println ""
499 |     println "[params.reads] fail to open: '" + input + "' : No such file or directory"
500 |     println ""
501 |     return 1
502 | }
503 | 
504 | def host_error(def input) {
505 |     println ""
506 |     println "[params.host] fail to open: '" + input + "' : No such file or directory"
507 |     println ""
508 |     return 1
509 | }
510 | 
511 | def index_error(def input) {
512 |     println ""
513 |     println "[params.host_index] fail to open: '" + input + "' : No such file or directory"
514 |     println ""
515 |     return 1
516 | }
517 | 
518 | def help() {
519 |     println ""
520 |     println "Program: AmrPlusPlus"
521 |     println "Documentation: https://github.com/colostatemeg/amrplusplus/blob/master/README.md"
522 |     println "Contact: Christopher Dean <cdean11@colostate.edu>"
523 |     println ""
524 |     println "Usage:    nextflow run main.nf [options]"
525 |     println ""
526 |     println "Input/output options:"
527 |     println ""
528 |     println "    --reads         STR      path to FASTQ formatted input sequences"
529 |     println "    --adapters      STR      path to FASTA formatted adapter sequences"
530 |     println "    --host          STR      path to FASTA formatted host genome"
531 |     println "    --host_index    STR      path to BWA generated index files"
532 |     println "    --amr           STR      path to AMR resistance database"
533 |     println "    --annotation    STR      path to AMR annotation file"
534 |     println "    --output        STR      directory to write process outputs to"
535 |     println "    --KRAKENDB      STR      path to kraken database"
536 |     println ""
537 |     println "Trimming options:"
538 |     println ""
539 |     println "    --leading       INT      cut bases off the start of a read, if below a threshold quality"
540 |     println "    --minlen        INT      drop the read if it is below a specified length"
541 |     println "    --slidingwindow INT      perform sw trimming, cutting once the average quality within the window falls below a threshold"
542 |     println "    --trailing      INT      cut bases off the end of a read, if below a threshold quality"
543 |     println ""
544 |     println "Algorithm options:"
545 |     println ""
546 |     println "    --threads       INT      number of threads to use for each process"
547 |     println "    --threshold     INT      gene fraction threshold"
548 |     println "    --min           INT      starting sample level"
549 |     println "    --max           INT      ending sample level"
550 |     println "    --samples       INT      number of sampling iterations to perform"
551 |     println "    --skip          INT      number of levels to skip"
552 |     println ""
553 |     println "Help options:"
554 |     println ""
555 |     println "    --help                   display this message"
556 |     println ""
557 |     return 1
558 | }
559 | 


--------------------------------------------------------------------------------
/main_AmrPlusPlus_v2_withRGI.nf:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env nextflow
  2 | 
  3 | /*
  4 | vim: syntax=groovy
  5 | -*- mode: groovy;-*-
  6 | */
  7 | 
  8 | if (params.help ) {
  9 |     return help()
 10 | }
 11 | if( params.host_index ) {
 12 |     host_index = Channel.fromPath(params.host_index).toSortedList()
 13 |     //if( host_index.isEmpty() ) return index_error(host_index)
 14 | }
 15 | if( params.host ) {
 16 |     host = file(params.host)
 17 |     if( !host.exists() ) return host_error(host)
 18 | }
 19 | if( params.amr ) {
 20 |     amr = file(params.amr)
 21 |     if( !amr.exists() ) return amr_error(amr)
 22 | }
 23 | if( params.adapters ) {
 24 |     adapters = file(params.adapters)
 25 |     if( !adapters.exists() ) return adapter_error(adapters)
 26 | }
 27 | if( params.annotation ) {
 28 |     annotation = file(params.annotation)
 29 |     if( !annotation.exists() ) return annotation_error(annotation)
 30 | }
 31 | if(params.kraken_db) {
 32 |     kraken_db = file(params.kraken_db)
 33 | }
 34 | 
 35 | card_db = file(params.card_db)
 36 | 
 37 | threads = params.threads
 38 | 
 39 | threshold = params.threshold
 40 | 
 41 | min = params.min
 42 | max = params.max
 43 | skip = params.skip
 44 | samples = params.samples
 45 | 
 46 | leading = params.leading
 47 | trailing = params.trailing
 48 | slidingwindow = params.slidingwindow
 49 | minlen = params.minlen
 50 | 
 51 | Channel
 52 |     .fromFilePairs( params.reads, flat: true )
 53 |     .ifEmpty { exit 1, "Read pair files could not be found: ${params.reads}" }
 54 |     .set { reads }
 55 | 
 56 | process RunQC {
 57 |     tag { sample_id }
 58 | 
 59 |     publishDir "${params.output}/RunQC", mode: 'copy', pattern: '*.fastq.gz',
 60 |         saveAs: { filename ->
 61 |             if(filename.indexOf("P.fastq.gz") > 0) "Paired/$filename"
 62 |             else if(filename.indexOf("U.fastq.gz") > 0) "Unpaired/$filename"
 63 |             else {}
 64 |         }
 65 | 
 66 |     input:
 67 |         set sample_id, file(forward), file(reverse) from reads
 68 | 
 69 |     output:
 70 |         set sample_id, file("${sample_id}.1P.fastq.gz"), file("${sample_id}.2P.fastq.gz") into (paired_fastq)
 71 |         set sample_id, file("${sample_id}.1U.fastq.gz"), file("${sample_id}.2U.fastq.gz") into (unpaired_fastq)
 72 |         file("${sample_id}.trimmomatic.stats.log") into (trimmomatic_stats)
 73 | 
 74 |     """
 75 |      ${JAVA} -jar ${TRIMMOMATIC} \
 76 |       PE \
 77 |       -threads ${threads} \
 78 |       $forward $reverse ${sample_id}.1P.fastq.gz ${sample_id}.1U.fastq.gz ${sample_id}.2P.fastq.gz ${sample_id}.2U.fastq.gz \
 79 |       ILLUMINACLIP:${adapters}:2:30:10:3:TRUE \
 80 |       LEADING:${leading} \
 81 |       TRAILING:${trailing} \
 82 |       SLIDINGWINDOW:${slidingwindow} \
 83 |       MINLEN:${minlen} \
 84 |       2> ${sample_id}.trimmomatic.stats.log
 85 |     """
 86 | }
 87 | 
 88 | trimmomatic_stats.toSortedList().set { trim_stats }
 89 | 
 90 | process QCStats {
 91 |     tag { sample_id }
 92 | 
 93 |     publishDir "${params.output}/RunQC", mode: 'copy',
 94 |         saveAs: { filename ->
 95 |             if(filename.indexOf(".stats") > 0) "Stats/$filename"
 96 |             else {}
 97 |         }
 98 | 
 99 |     input:
100 |         file(stats) from trim_stats
101 | 
102 |     output:
103 | 	file("trimmomatic.stats")
104 | 
105 |     """
106 |     ${PYTHON3} $baseDir/bin/trimmomatic_stats.py -i ${stats} -o trimmomatic.stats
107 |     """
108 | }
109 | 
110 | if( !params.host_index ) {
111 |     process BuildHostIndex {
112 |         publishDir "${params.output}/BuildHostIndex", mode: "copy"
113 | 
114 |         tag { host.baseName }
115 | 
116 |         input:
117 |             file(host)
118 | 
119 |         output:
120 |             file '*' into (host_index)
121 | 
122 |         """
123 |         ${BWA} index ${host}
124 |         """
125 |     }
126 | }
127 | 
128 | process AlignReadsToHost {
129 |     tag { sample_id }
130 | 
131 |     publishDir "${params.output}/AlignReadsToHost", mode: "copy"
132 | 
133 |     input:
134 |         set sample_id, file(forward), file(reverse) from paired_fastq
135 |         file index from host_index
136 |         file host
137 | 
138 |     output:
139 |         set sample_id, file("${sample_id}.host.sam") into (host_sam)
140 | 
141 |     """
142 |     ${BWA} mem ${host} ${forward} ${reverse} -t ${threads} > ${sample_id}.host.sam
143 |     """
144 | }
145 | 
146 | process RemoveHostDNA {
147 |     tag { sample_id }
148 | 
149 |     publishDir "${params.output}/RemoveHostDNA", mode: "copy", pattern: '*.bam',
150 | 	saveAs: { filename ->
151 |             if(filename.indexOf(".bam") > 0) "NonHostBAM/$filename"
152 |         }
153 | 
154 |     input:
155 |         set sample_id, file(sam) from host_sam
156 | 
157 |     output:
158 |         set sample_id, file("${sample_id}.host.sorted.removed.bam") into (non_host_bam)
159 |         file("${sample_id}.samtools.idxstats") into (idxstats_logs)
160 | 
161 |     """
162 |     ${SAMTOOLS} view -bS ${sam} | ${SAMTOOLS} sort -@ ${threads} -o ${sample_id}.host.sorted.bam
163 |     ${SAMTOOLS} index ${sample_id}.host.sorted.bam && ${SAMTOOLS} idxstats ${sample_id}.host.sorted.bam > ${sample_id}.samtools.idxstats
164 |     ${SAMTOOLS} view -h -f 4 -b ${sample_id}.host.sorted.bam -o ${sample_id}.host.sorted.removed.bam
165 |     """
166 | }
167 | 
168 | idxstats_logs.toSortedList().set { host_removal_stats }
169 | 
170 | process HostRemovalStats {
171 |     tag { sample_id }
172 | 
173 |     publishDir "${params.output}/RemoveHostDNA", mode: "copy",
174 |         saveAs: { filename ->
175 |             if(filename.indexOf(".stats") > 0) "HostRemovalStats/$filename"
176 |         }
177 | 
178 |     input:
179 |         file(stats) from host_removal_stats
180 | 
181 |     output:
182 |         file("host.removal.stats")
183 | 
184 |     """
185 |     ${PYTHON3} $baseDir/bin/samtools_idxstats.py -i ${stats} -o host.removal.stats
186 |     """
187 | }
188 | 
189 | process NonHostReads {
190 |     tag { sample_id }
191 | 
192 |     publishDir "${params.output}/NonHostReads", mode: "copy"
193 | 
194 |     input:
195 |         set sample_id, file(bam) from non_host_bam
196 | 
197 |     output:
198 |         set sample_id, file("${sample_id}.non.host.R1.fastq.gz"), file("${sample_id}.non.host.R2.fastq.gz") into (non_host_fastq_megares, non_host_fastq_dedup,non_host_fastq_kraken)
199 | 
200 |     """
201 |     ${BEDTOOLS}  \
202 |        bamtofastq \
203 |       -i ${bam} \
204 |       -fq ${sample_id}.non.host.R1.fastq.gz \
205 |       -fq2 ${sample_id}.non.host.R2.fastq.gz
206 |     """
207 | }
208 | 
209 | /*
210 | -
211 | --
212 | ---
213 | ---- nonhost reads for megares
214 | ---
215 | --
216 | -
217 | */
218 | 
219 | 
220 | /*
221 | ---- Run alignment to MEGAres
222 | */
223 | 
224 | if( !params.amr_index ) {
225 |     process BuildAMRIndex {
226 |         tag { amr.baseName }
227 | 
228 |         input:
229 |             file(amr)
230 | 
231 |         output:
232 |             file '*' into (amr_index)
233 | 
234 |         """
235 |         ${BWA} index ${amr}
236 |         """
237 |     }
238 | }
239 | 
240 | process AlignToAMR {
241 |      tag { sample_id }
242 | 
243 |      publishDir "${params.output}/AlignToAMR", mode: "copy"
244 | 
245 |      input:
246 |          set sample_id, file(forward), file(reverse) from non_host_fastq_megares
247 |          file index from amr_index
248 |          file amr
249 | 
250 |      output:
251 |          set sample_id, file("${sample_id}.amr.alignment.sam") into (megares_resistome_sam, megares_rarefaction_sam, megares_snp_sam , megares_snpfinder_sam, megares_RGI_sam)
252 |          set sample_id, file("${sample_id}.amr.alignment.dedup.sam") into (megares_dedup_resistome_sam,megares_dedup_RGI_sam)
253 |          set sample_id, file("${sample_id}.amr.alignment.dedup.bam") into (megares_dedup_resistome_bam)
254 | 
255 | 
256 |      """
257 |      ${BWA} mem ${amr} ${forward} ${reverse} -t ${threads} -R '@RG\\tID:${sample_id}\\tSM:${sample_id}' > ${sample_id}.amr.alignment.sam
258 |      ${SAMTOOLS} view -S -b ${sample_id}.amr.alignment.sam > ${sample_id}.amr.alignment.bam
259 |      ${SAMTOOLS} sort -n ${sample_id}.amr.alignment.bam -o ${sample_id}.amr.alignment.sorted.bam
260 |      ${SAMTOOLS} fixmate ${sample_id}.amr.alignment.sorted.bam ${sample_id}.amr.alignment.sorted.fix.bam
261 |      ${SAMTOOLS} sort ${sample_id}.amr.alignment.sorted.fix.bam -o ${sample_id}.amr.alignment.sorted.fix.sorted.bam
262 |      ${SAMTOOLS} rmdup -S ${sample_id}.amr.alignment.sorted.fix.sorted.bam ${sample_id}.amr.alignment.dedup.bam
263 |      ${SAMTOOLS} view -h -o ${sample_id}.amr.alignment.dedup.sam ${sample_id}.amr.alignment.dedup.bam
264 |      rm ${sample_id}.amr.alignment.bam
265 |      rm ${sample_id}.amr.alignment.sorted*.bam
266 |      """
267 | }
268 | 
269 | process RunResistome {
270 |     tag { sample_id }
271 | 
272 |     publishDir "${params.output}/RunResistome", mode: "copy"
273 | 
274 |     input:
275 |         set sample_id, file(sam) from megares_resistome_sam
276 |         file annotation
277 |         file amr
278 | 
279 |     output:
280 |         file("${sample_id}.gene.tsv") into (megares_resistome_counts, SNP_confirm_long)
281 |         file("${sample_id}.group.tsv") into (megares_group_counts)
282 |         file("${sample_id}.mechanism.tsv") into (megares_mech_counts)
283 |         file("${sample_id}.class.tsv") into (megares_class_counts)
284 |         file("${sample_id}.type.tsv") into (megares_type_counts)
285 | 
286 |     """
287 |     $baseDir/bin/resistome -ref_fp ${amr} \
288 |       -annot_fp ${annotation} \
289 |       -sam_fp ${sam} \
290 |       -gene_fp ${sample_id}.gene.tsv \
291 |       -group_fp ${sample_id}.group.tsv \
292 |       -mech_fp ${sample_id}.mechanism.tsv \
293 |       -class_fp ${sample_id}.class.tsv \
294 |       -type_fp ${sample_id}.type.tsv \
295 |       -t ${threshold}
296 |     """
297 | }
298 | 
299 | megares_resistome_counts.toSortedList().set { megares_amr_l_to_w }
300 | 
301 | process ResistomeResults {
302 |     tag { }
303 | 
304 |     publishDir "${params.output}/ResistomeResults", mode: "copy"
305 | 
306 |     input:
307 |         file(resistomes) from megares_amr_l_to_w
308 | 
309 |     output:
310 |         file("AMR_analytic_matrix.csv") into amr_master_matrix
311 | 
312 |     """
313 |     ${PYTHON3} $baseDir/bin/amr_long_to_wide.py -i ${resistomes} -o AMR_analytic_matrix.csv
314 |     """
315 | }
316 | 
317 | 
318 | /* samtools deduplication of megares alignment */
319 | process SamDedupRunResistome {
320 |     tag { sample_id }
321 | 
322 |     publishDir "${params.output}/SamDedupRunResistome", mode: "copy"
323 | 
324 |     input:
325 |         set sample_id, file(sam) from megares_dedup_resistome_sam
326 |         file annotation
327 |         file amr
328 | 
329 |     output:
330 |         file("${sample_id}.gene.tsv") into (megares_dedup_resistome_counts)
331 |         file("${sample_id}.group.tsv") into (megares_dedup_group_counts)
332 |         file("${sample_id}.mechanism.tsv") into (megares_dedup_mech_counts)
333 |         file("${sample_id}.class.tsv") into (megares_dedup_class_counts)
334 |         file("${sample_id}.type.tsv") into (megares_dedup_type_counts)
335 | 
336 |     """
337 |     $baseDir/bin/resistome -ref_fp ${amr} \
338 |       -annot_fp ${annotation} \
339 |       -sam_fp ${sam} \
340 |       -gene_fp ${sample_id}.gene.tsv \
341 |       -group_fp ${sample_id}.group.tsv \
342 |       -mech_fp ${sample_id}.mechanism.tsv \
343 |       -class_fp ${sample_id}.class.tsv \
344 |       -type_fp ${sample_id}.type.tsv \
345 |       -t ${threshold}
346 |     """
347 | }
348 | 
349 | megares_dedup_resistome_counts.toSortedList().set { megares_dedup_amr_l_to_w }
350 | 
351 | process SamDedupResistomeResults {
352 |     tag { }
353 | 
354 |     publishDir "${params.output}/SamDedup_ResistomeResults", mode: "copy"
355 | 
356 |     input:
357 |         file(resistomes) from megares_dedup_amr_l_to_w
358 | 
359 |     output:
360 |         file("SamDedup_AMR_analytic_matrix.csv") into megares_dedup_amr_master_matrix
361 | 
362 |     """
363 |     ${PYTHON3} $baseDir/bin/amr_long_to_wide.py -i ${resistomes} -o SamDedup_AMR_analytic_matrix.csv
364 |     """
365 | }
366 | 
367 | process RunRarefaction {
368 |     tag { sample_id }
369 | 
370 |     publishDir "${params.output}/RunRarefaction", mode: "copy"
371 | 
372 |     input:
373 |         set sample_id, file(sam) from megares_rarefaction_sam
374 |         file annotation
375 |         file amr
376 | 
377 |     output:
378 |         set sample_id, file("*.tsv") into (rarefaction)
379 | 
380 |     """
381 |     $baseDir/bin/rarefaction \
382 |       -ref_fp ${amr} \
383 |       -sam_fp ${sam} \
384 |       -annot_fp ${annotation} \
385 |       -gene_fp ${sample_id}.gene.tsv \
386 |       -group_fp ${sample_id}.group.tsv \
387 |       -mech_fp ${sample_id}.mech.tsv \
388 |       -class_fp ${sample_id}.class.tsv \
389 |       -type_fp ${sample_id}.type.tsv \
390 |       -min ${min} \
391 |       -max ${max} \
392 |       -skip ${skip} \
393 |       -samples ${samples} \
394 |       -t ${threshold}
395 |     """
396 | }
397 | 
398 | 
399 | 
400 | /*
401 | ---- Confirmation of alignments to genes that require SNP confirmation with RGI
402 | */
403 | 
404 | process ExtractSNP {
405 |      tag { sample_id }
406 | 
407 |      publishDir "${params.output}/ExtractMegaresSNPs", mode: "copy",
408 |          saveAs: { filename ->
409 |              if(filename.indexOf(".snp.fasta") > 0) "SNP_fasta/$filename"
410 |              else if(filename.indexOf("gene.tsv") > 0) "Gene_hits/$filename"
411 |              else {}
412 |          }
413 | 
414 |      input:
415 |          set sample_id, file(sam) from megares_RGI_sam
416 |          file annotation
417 |          file amr
418 | 
419 |      output:
420 |          set sample_id, file("*.snp.fasta") into megares_snp_fasta
421 |          set sample_id, file("${sample_id}*.gene.tsv") into (resistome_hits)
422 | 
423 |      """
424 |      awk -F "\\t" '{if (\$1!="@SQ" && \$1!="@RG" && \$1!="@PG" && \$1!="@HD" && \$3="RequiresSNPConfirmation" ) {print ">"\$1"\\n"\$10}}' ${sam} | tr -d '"'  > ${sample_id}.snp.fasta
425 |     $baseDir/bin/resistome -ref_fp ${amr} \
426 |       -annot_fp ${annotation} \
427 |       -sam_fp ${sam} \
428 |       -gene_fp ${sample_id}.gene.tsv \
429 |       -group_fp ${sample_id}.group.tsv \
430 |       -mech_fp ${sample_id}.mechanism.tsv \
431 |       -class_fp ${sample_id}.class.tsv \
432 |       -type_fp ${sample_id}.type.tsv \
433 |       -t ${threshold}
434 |      """
435 | }
436 | 
437 | 
438 | /* This doesn't work with the singularity container, so I'll just leave the CARD downlaod as a manual option.
439 | process DL_CARD_db {
440 |  
441 |      publishDir "${params.output}/CARD_db", mode: "symlink"
442 | 
443 |      output:
444 |          file("card.json") into card_db
445 | 
446 |      """
447 | 
448 |      chmod -R 777 /usr/local/
449 |      wget -q -O card-data.tar.bz2 https://card.mcmaster.ca/latest/data && tar xfvj card-data.tar.bz2 
450 |      """
451 | }   
452 | 
453 | */
454 | 
455 | 
456 | process RunRGI {
457 |      tag { sample_id }
458 |      errorStrategy 'ignore'
459 | 
460 | 
461 |      publishDir "${params.output}/RunRGI", mode: "symlink"
462 | 
463 |      input:
464 |          set sample_id, file(fasta) from megares_snp_fasta
465 |          file card_db
466 | 
467 |      output:
468 |          set sample_id, file("${sample_id}*rgi_output.txt") into rgi_results
469 | 
470 |      """
471 |      ${RGI} load --local -i ${card_db} --debug
472 | 
473 |      # We are using the code provided in the following RGI github issue https://github.com/arpcard/rgi/issues/93
474 |      set +e
475 |      echo "Run RGI the first time"
476 |      ${RGI} main --input_sequence ${fasta} --output_file ${sample_id}_rgi_output -a diamond --local
477 |      set -e
478 |      echo "Run RGI again"
479 |      ${RGI} main --input_sequence ${fasta} --output_file ${sample_id}_rgi_output -a diamond --local
480 | 
481 | 
482 |      """
483 | }
484 | 
485 | 
486 | 
487 | process SNPconfirmation {
488 |      tag { sample_id }
489 |      errorStrategy 'ignore'
490 | 
491 |      publishDir "${params.output}/SNPConfirmation", mode: "copy",
492 |          saveAs: { filename ->
493 |              if(filename.indexOf("_rgi_perfect_hits.csv") > 0) "Perfect_RGI/$filename"
494 |              else if(filename.indexOf("_rgi_strict_hits.csv") > 0) "Strict_RGI/$filename"
495 |              else if(filename.indexOf("_rgi_loose_hits.csv") > 0) "Loose_RGI/$filename"
496 |              else {}
497 |          }
498 | 
499 |      input:
500 |          set sample_id, file(rgi) from rgi_results
501 | 
502 |      output:
503 |          set sample_id, file("${sample_id}_rgi_perfect_hits.csv") into perfect_snp_long_hits
504 |      """
505 |      ${PYTHON3} $baseDir/bin/RGI_aro_hits.py ${rgi} ${sample_id}
506 |      """
507 | }
508 | 
509 | process Confirmed_AMR_hits {
510 |      tag { sample_id }
511 |      errorStrategy 'ignore'
512 | 
513 |      publishDir "${params.output}/SNP_confirmed_counts", mode: "copy"
514 | 
515 |      input:
516 |          set sample_id, file(megares_counts) from resistome_hits
517 |          set sample_id, file(perfect_rgi_counts) from perfect_snp_long_hits
518 | 
519 |      output:
520 |          file("${sample_id}*perfect_SNP_confirmed_counts") into perfect_confirmed_counts
521 | 
522 |      """
523 |      ${PYTHON3} $baseDir/bin/RGI_long_combine.py ${perfect_rgi_counts} ${megares_counts} ${sample_id}.perfect_SNP_confirmed_counts ${sample_id}
524 |      """
525 | }
526 | 
527 | 
528 | perfect_confirmed_counts.toSortedList().set { perfect_confirmed_amr_l_to_w }
529 | 
530 | process Confirmed_ResistomeResults {
531 |      tag {}
532 |      errorStrategy 'ignore'
533 | 
534 |      publishDir "${params.output}/Confirmed_ResistomeResults", mode: "copy"
535 | 
536 |      input:
537 |          file(perfect_confirmed_resistomes) from perfect_confirmed_amr_l_to_w
538 | 
539 |      output:
540 |          file("perfect_SNP_confirmed_AMR_analytic_matrix.csv") into perfect_confirmed_matrix
541 | 
542 |      """
543 |      ${PYTHON3} $baseDir/bin/amr_long_to_wide.py -i ${perfect_confirmed_resistomes} -o perfect_SNP_confirmed_AMR_analytic_matrix.csv
544 |      """
545 | }
546 | 
547 | /*
548 | ---- Confirmation of deduped alignments to genes that require SNP confirmation with RGI.
549 | */
550 | 
551 | 
552 | process ExtractDedupSNP {
553 |      tag { sample_id }
554 | 
555 |      errorStrategy 'ignore'
556 |      publishDir "${params.output}/ExtractDedupMegaresSNPs", mode: "copy",
557 |          saveAs: { filename ->
558 |              if(filename.indexOf(".snp.fasta") > 0) "SNP_fasta/$filename"
559 |              else if(filename.indexOf("gene.tsv") > 0) "Gene_hits/$filename"
560 |              else {}
561 |          }
562 | 
563 |      input:
564 |          set sample_id, file(sam) from megares_dedup_RGI_sam
565 |          file annotation
566 |          file amr
567 | 
568 |      output:
569 |          set sample_id, file("*.snp.fasta") into dedup_megares_snp_fasta
570 |          set sample_id, file("${sample_id}*.gene.tsv") into (dedup_resistome_hits)
571 | 
572 |      """
573 |      awk -F "\\t" '{if (\$1!="@SQ" && \$1!="@RG" && \$1!="@PG" && \$1!="@HD" && \$3="RequiresSNPConfirmation" ) {print ">"\$1"\\n"\$10}}' ${sam} | tr -d '"'  > ${sample_id}.snp.fasta
574 |      $baseDir/bin/resistome -ref_fp ${amr} \
575 |       -annot_fp ${annotation} \
576 |       -sam_fp ${sam} \
577 |       -gene_fp ${sample_id}.gene.tsv \
578 |       -group_fp ${sample_id}.group.tsv \
579 |       -mech_fp ${sample_id}.mechanism.tsv \
580 |       -class_fp ${sample_id}.class.tsv \
581 |       -type_fp ${sample_id}.type.tsv \
582 |       -t ${threshold}
583 |      """
584 | }
585 | 
586 | process RunDedupRGI {
587 |      tag { sample_id }
588 |      errorStrategy 'ignore'
589 | 	
590 |      publishDir "${params.output}/RunDedupRGI", mode: "copy"
591 | 
592 |      input:
593 |          set sample_id, file(fasta) from dedup_megares_snp_fasta
594 |          file card_db
595 | 
596 |      output:
597 |          set sample_id, file("${sample_id}_rgi_output.txt") into dedup_rgi_results
598 | 
599 |      """     
600 |      ${RGI} load --local -i ${card_db} --debug
601 | 
602 |      # We are using the code provided in the following RGI github issue https://github.com/arpcard/rgi/issues/93
603 |      set +e
604 |      echo "Run RGI the first time"
605 |      ${RGI} main --input_sequence ${fasta} --output_file ${sample_id}_rgi_output -a diamond --local
606 |      set -e
607 |      echo "Run RGI again"
608 |      ${RGI} main --input_sequence ${fasta} --output_file ${sample_id}_rgi_output -a diamond --local
609 | 
610 |      """
611 | }
612 | 
613 | 
614 | process DedupSNPconfirmation {
615 |      tag { sample_id }
616 |      errorStrategy 'ignore'
617 |      publishDir "${params.output}/DedupSNPConfirmation", mode: "copy",
618 |          saveAs: { filename ->
619 |              if(filename.indexOf("_rgi_perfect_hits.csv") > 0) "Perfect_RGI/$filename"
620 |              else if(filename.indexOf("_rgi_strict_hits.csv") > 0) "Strict_RGI/$filename"
621 |              else if(filename.indexOf("_rgi_loose_hits.csv") > 0) "Loose_RGI/$filename"
622 |              else {}
623 |          }
624 | 
625 |      input:
626 |          set sample_id, file(rgi) from dedup_rgi_results
627 | 
628 |      output:
629 |          set sample_id, file("${sample_id}_rgi_perfect_hits.csv") into dedup_perfect_snp_long_hits
630 |      """
631 |      ${PYTHON3} $baseDir/bin/RGI_aro_hits.py ${rgi} ${sample_id}
632 |      """
633 | }
634 | 
635 | process ConfirmDedupAMRHits {
636 |      tag { sample_id }
637 | 
638 |      errorStrategy 'ignore'
639 |      publishDir "${params.output}/SNP_confirmed_counts", mode: "copy"
640 | 
641 |      input:
642 |          set sample_id, file(megares_counts) from dedup_resistome_hits
643 |          set sample_id, file(perfect_rgi_counts) from dedup_perfect_snp_long_hits
644 | 
645 |      output:
646 |          file("${sample_id}*perfect_SNP_confirmed_counts") into dedup_perfect_confirmed_counts
647 | 
648 |      """
649 |      ${PYTHON3} $baseDir/bin/RGI_long_combine.py ${perfect_rgi_counts} ${megares_counts} ${sample_id}.perfect_SNP_confirmed_counts ${sample_id}
650 |      """
651 | }
652 | 
653 | 
654 | dedup_perfect_confirmed_counts.toSortedList().set { dedup_perfect_confirmed_amr_l_to_w }
655 | 
656 | process DedupSNPConfirmed_ResistomeResults {
657 |      tag {}
658 |      errorStrategy 'ignore'
659 |      publishDir "${params.output}/Confirmed_ResistomeResults", mode: "copy"
660 | 
661 |      input:
662 |          file(perfect_confirmed_resistomes) from dedup_perfect_confirmed_amr_l_to_w
663 | 
664 |      output:
665 |          file("perfect_SNP_confirmed_AMR_analytic_matrix.csv") into dedup_perfect_confirmed_matrix
666 | 
667 |      """
668 |      ${PYTHON3} $baseDir/bin/amr_long_to_wide.py -i ${perfect_confirmed_resistomes} -o perfect_SNP_confirmed_dedup_AMR_analytic_matrix.csv
669 |      """
670 | }
671 | 
672 | 
673 | 
674 | 
675 | 
676 | def nextflow_version_error() {
677 |     println ""
678 |     println "This workflow requires Nextflow version 0.25 or greater -- You are running version $nextflow.version"
679 |     println "Run ./nextflow self-update to update Nextflow to the latest available version."
680 |     println ""
681 |     return 1
682 | }
683 | 
684 | def adapter_error(def input) {
685 |     println ""
686 |     println "[params.adapters] fail to open: '" + input + "' : No such file or directory"
687 |     println ""
688 |     return 1
689 | }
690 | 
691 | def amr_error(def input) {
692 |     println ""
693 |     println "[params.amr] fail to open: '" + input + "' : No such file or directory"
694 |     println ""
695 |     return 1
696 | }
697 | 
698 | def annotation_error(def input) {
699 |     println ""
700 |     println "[params.annotation] fail to open: '" + input + "' : No such file or directory"
701 |     println ""
702 |     return 1
703 | }
704 | 
705 | def fastq_error(def input) {
706 |     println ""
707 |     println "[params.reads] fail to open: '" + input + "' : No such file or directory"
708 |     println ""
709 |     return 1
710 | }
711 | 
712 | def host_error(def input) {
713 |     println ""
714 |     println "[params.host] fail to open: '" + input + "' : No such file or directory"
715 |     println ""
716 |     return 1
717 | }
718 | 
719 | def index_error(def input) {
720 |     println ""
721 |     println "[params.host_index] fail to open: '" + input + "' : No such file or directory"
722 |     println ""
723 |     return 1
724 | }
725 | 
726 | def help() {
727 |     println ""
728 |     println "Program: AmrPlusPlus"
729 |     println "Documentation: https://github.com/colostatemeg/amrplusplus/blob/master/README.md"
730 |     println "Contact: Christopher Dean <cdean11@colostate.edu>"
731 |     println ""
732 |     println "Usage:    nextflow run main.nf [options]"
733 |     println ""
734 |     println "Input/output options:"
735 |     println ""
736 |     println "    --reads         STR      path to FASTQ formatted input sequences"
737 |     println "    --adapters      STR      path to FASTA formatted adapter sequences"
738 |     println "    --host          STR      path to FASTA formatted host genome"
739 |     println "    --host_index    STR      path to BWA generated index files"
740 |     println "    --amr           STR      path to AMR resistance database"
741 |     println "    --annotation    STR      path to AMR annotation file"
742 |     println "    --output        STR      directory to write process outputs to"
743 |     println "    --KRAKENDB      STR      path to kraken database"
744 |     println ""
745 |     println "Trimming options:"
746 |     println ""
747 |     println "    --leading       INT      cut bases off the start of a read, if below a threshold quality"
748 |     println "    --minlen        INT      drop the read if it is below a specified length"
749 |     println "    --slidingwindow INT      perform sw trimming, cutting once the average quality within the window falls below a threshold"
750 |     println "    --trailing      INT      cut bases off the end of a read, if below a threshold quality"
751 |     println ""
752 |     println "Algorithm options:"
753 |     println ""
754 |     println "    --threads       INT      number of threads to use for each process"
755 |     println "    --threshold     INT      gene fraction threshold"
756 |     println "    --min           INT      starting sample level"
757 |     println "    --max           INT      ending sample level"
758 |     println "    --samples       INT      number of sampling iterations to perform"
759 |     println "    --skip          INT      number of levels to skip"
760 |     println ""
761 |     println "Help options:"
762 |     println ""
763 |     println "    --help                   display this message"
764 |     println ""
765 |     return 1
766 | }
767 | 


--------------------------------------------------------------------------------
/main_AmrPlusPlus_v2_withRGI_Kraken.nf:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env nextflow
  2 | 
  3 | /*
  4 | vim: syntax=groovy
  5 | -*- mode: groovy;-*-
  6 | */
  7 | 
  8 | if (params.help ) {
  9 |     return help()
 10 | }
 11 | if( params.host_index ) {
 12 |     host_index = Channel.fromPath(params.host_index).toSortedList()
 13 |     //if( host_index.isEmpty() ) return index_error(host_index)
 14 | }
 15 | if( params.host ) {
 16 |     host = file(params.host)
 17 |     if( !host.exists() ) return host_error(host)
 18 | }
 19 | if( params.amr ) {
 20 |     amr = file(params.amr)
 21 |     if( !amr.exists() ) return amr_error(amr)
 22 | }
 23 | if( params.adapters ) {
 24 |     adapters = file(params.adapters)
 25 |     if( !adapters.exists() ) return adapter_error(adapters)
 26 | }
 27 | if( params.annotation ) {
 28 |     annotation = file(params.annotation)
 29 |     if( !annotation.exists() ) return annotation_error(annotation)
 30 | }
 31 | if(params.kraken_db) {
 32 |     kraken_db = file(params.kraken_db)
 33 | }
 34 | 
 35 | card_db = file(params.card_db)
 36 | 
 37 | threads = params.threads
 38 | 
 39 | threshold = params.threshold
 40 | 
 41 | min = params.min
 42 | max = params.max
 43 | skip = params.skip
 44 | samples = params.samples
 45 | 
 46 | leading = params.leading
 47 | trailing = params.trailing
 48 | slidingwindow = params.slidingwindow
 49 | minlen = params.minlen
 50 | 
 51 | Channel
 52 |     .fromFilePairs( params.reads, flat: true )
 53 |     .ifEmpty { exit 1, "Read pair files could not be found: ${params.reads}" }
 54 |     .set { reads }
 55 | 
 56 | process RunQC {
 57 |     tag { sample_id }
 58 | 
 59 |     publishDir "${params.output}/RunQC", mode: 'copy', pattern: '*.fastq.gz',
 60 |         saveAs: { filename ->
 61 |             if(filename.indexOf("P.fastq.gz") > 0) "Paired/$filename"
 62 |             else if(filename.indexOf("U.fastq.gz") > 0) "Unpaired/$filename"
 63 |             else {}
 64 |         }
 65 | 
 66 |     input:
 67 |         set sample_id, file(forward), file(reverse) from reads
 68 | 
 69 |     output:
 70 |         set sample_id, file("${sample_id}.1P.fastq.gz"), file("${sample_id}.2P.fastq.gz") into (paired_fastq)
 71 |         set sample_id, file("${sample_id}.1U.fastq.gz"), file("${sample_id}.2U.fastq.gz") into (unpaired_fastq)
 72 |         file("${sample_id}.trimmomatic.stats.log") into (trimmomatic_stats)
 73 | 
 74 |     """
 75 |      ${JAVA} -jar ${TRIMMOMATIC} \
 76 |       PE \
 77 |       -threads ${threads} \
 78 |       $forward $reverse ${sample_id}.1P.fastq.gz ${sample_id}.1U.fastq.gz ${sample_id}.2P.fastq.gz ${sample_id}.2U.fastq.gz \
 79 |       ILLUMINACLIP:${adapters}:2:30:10:3:TRUE \
 80 |       LEADING:${leading} \
 81 |       TRAILING:${trailing} \
 82 |       SLIDINGWINDOW:${slidingwindow} \
 83 |       MINLEN:${minlen} \
 84 |       2> ${sample_id}.trimmomatic.stats.log
 85 |     """
 86 | }
 87 | 
 88 | trimmomatic_stats.toSortedList().set { trim_stats }
 89 | 
 90 | process QCStats {
 91 |     tag { sample_id }
 92 | 
 93 |     publishDir "${params.output}/RunQC", mode: 'copy',
 94 |         saveAs: { filename ->
 95 |             if(filename.indexOf(".stats") > 0) "Stats/$filename"
 96 |             else {}
 97 |         }
 98 | 
 99 |     input:
100 |         file(stats) from trim_stats
101 | 
102 |     output:
103 | 	file("trimmomatic.stats")
104 | 
105 |     """
106 |     ${PYTHON3} $baseDir/bin/trimmomatic_stats.py -i ${stats} -o trimmomatic.stats
107 |     """
108 | }
109 | 
110 | if( !params.host_index ) {
111 |     process BuildHostIndex {
112 |         publishDir "${params.output}/BuildHostIndex", mode: "copy"
113 | 
114 |         tag { host.baseName }
115 | 
116 |         input:
117 |             file(host)
118 | 
119 |         output:
120 |             file '*' into (host_index)
121 | 
122 |         """
123 |         ${BWA} index ${host}
124 |         """
125 |     }
126 | }
127 | 
128 | process AlignReadsToHost {
129 |     tag { sample_id }
130 | 
131 |     publishDir "${params.output}/AlignReadsToHost", mode: "copy"
132 | 
133 |     input:
134 |         set sample_id, file(forward), file(reverse) from paired_fastq
135 |         file index from host_index
136 |         file host
137 | 
138 |     output:
139 |         set sample_id, file("${sample_id}.host.sam") into (host_sam)
140 | 
141 |     """
142 |     ${BWA} mem ${host} ${forward} ${reverse} -t ${threads} > ${sample_id}.host.sam
143 |     """
144 | }
145 | 
146 | process RemoveHostDNA {
147 |     tag { sample_id }
148 | 
149 |     publishDir "${params.output}/RemoveHostDNA", mode: "copy", pattern: '*.bam',
150 | 	saveAs: { filename ->
151 |             if(filename.indexOf(".bam") > 0) "NonHostBAM/$filename"
152 |         }
153 | 
154 |     input:
155 |         set sample_id, file(sam) from host_sam
156 | 
157 |     output:
158 |         set sample_id, file("${sample_id}.host.sorted.removed.bam") into (non_host_bam)
159 |         file("${sample_id}.samtools.idxstats") into (idxstats_logs)
160 | 
161 |     """
162 |     ${SAMTOOLS} view -bS ${sam} | ${SAMTOOLS} sort -@ ${threads} -o ${sample_id}.host.sorted.bam
163 |     ${SAMTOOLS} index ${sample_id}.host.sorted.bam && ${SAMTOOLS} idxstats ${sample_id}.host.sorted.bam > ${sample_id}.samtools.idxstats
164 |     ${SAMTOOLS} view -h -f 4 -b ${sample_id}.host.sorted.bam -o ${sample_id}.host.sorted.removed.bam
165 |     """
166 | }
167 | 
168 | idxstats_logs.toSortedList().set { host_removal_stats }
169 | 
170 | process HostRemovalStats {
171 |     tag { sample_id }
172 | 
173 |     publishDir "${params.output}/RemoveHostDNA", mode: "copy",
174 |         saveAs: { filename ->
175 |             if(filename.indexOf(".stats") > 0) "HostRemovalStats/$filename"
176 |         }
177 | 
178 |     input:
179 |         file(stats) from host_removal_stats
180 | 
181 |     output:
182 |         file("host.removal.stats")
183 | 
184 |     """
185 |     ${PYTHON3} $baseDir/bin/samtools_idxstats.py -i ${stats} -o host.removal.stats
186 |     """
187 | }
188 | 
189 | process NonHostReads {
190 |     tag { sample_id }
191 | 
192 |     publishDir "${params.output}/NonHostReads", mode: "copy"
193 | 
194 |     input:
195 |         set sample_id, file(bam) from non_host_bam
196 | 
197 |     output:
198 |         set sample_id, file("${sample_id}.non.host.R1.fastq.gz"), file("${sample_id}.non.host.R2.fastq.gz") into (non_host_fastq_megares, non_host_fastq_dedup,non_host_fastq_kraken)
199 | 
200 |     """
201 |     ${BEDTOOLS}  \
202 |        bamtofastq \
203 |       -i ${bam} \
204 |       -fq ${sample_id}.non.host.R1.fastq.gz \
205 |       -fq2 ${sample_id}.non.host.R2.fastq.gz
206 |     """
207 | }
208 | 
209 | 
210 | /*
211 | -
212 | --
213 | ---
214 | ---- nonhost reads for megares and kraken2
215 | ---
216 | --
217 | -
218 | */
219 | 
220 | 
221 | /*
222 | ---- Run Kraken2
223 | */
224 | 
225 | 
226 | 
227 | process RunKraken {
228 |     tag { sample_id }
229 | 
230 |     publishDir "${params.output}/RunKraken", mode: 'copy',
231 |         saveAs: { filename ->
232 |             if(filename.indexOf(".kraken.raw") > 0) "Standard/$filename"
233 |             else if(filename.indexOf(".kraken.report") > 0) "Standard_report/$filename"
234 |             else if(filename.indexOf(".kraken.filtered.report") > 0) "Filtered_report/$filename"
235 |             else if(filename.indexOf(".kraken.filtered.raw") > 0) "Filtered/$filename"
236 |             else {}
237 |         }
238 | 
239 |     input:
240 |        set sample_id, file(forward), file(reverse) from non_host_fastq_kraken
241 | 
242 |    output:
243 |       file("${sample_id}.kraken.report") into (kraken_report,kraken_extract_taxa)
244 |       set sample_id, file("${sample_id}.kraken.raw") into kraken_raw
245 |       file("${sample_id}.kraken.filtered.report") into kraken_filter_report
246 |       file("${sample_id}.kraken.filtered.raw") into kraken_filter_raw
247 | 
248 |      """
249 |      ${KRAKEN2} --db ${kraken_db} --paired ${forward} ${reverse} --threads ${threads} --report ${sample_id}.kraken.report > ${sample_id}.kraken.raw
250 |      ${KRAKEN2} --db ${kraken_db} --confidence 1 --paired ${forward} ${reverse} --threads ${threads} --report ${sample_id}.kraken.filtered.report > ${sample_id}.kraken.filtered.raw
251 |      """
252 | }
253 | 
254 | kraken_report.toSortedList().set { kraken_l_to_w }
255 | kraken_filter_report.toSortedList().set { kraken_filter_l_to_w }
256 | 
257 | process KrakenResults {
258 |     tag { }
259 | 
260 |     publishDir "${params.output}/KrakenResults", mode: "copy"
261 | 
262 |     input:
263 |         file(kraken_reports) from kraken_l_to_w
264 | 
265 |     output:
266 |         file("kraken_analytic_matrix.csv") into kraken_master_matrix
267 | 
268 |     """
269 |     ${PYTHON3} $baseDir/bin/kraken2_long_to_wide.py -i ${kraken_reports} -o kraken_analytic_matrix.csv
270 |     """
271 | }
272 | 
273 | process FilteredKrakenResults {
274 |     tag { sample_id }
275 | 
276 |     publishDir "${params.output}/FilteredKrakenResults", mode: "copy"
277 | 
278 |     input:
279 |         file(kraken_reports) from kraken_filter_l_to_w
280 | 
281 |     output:
282 |         file("filtered_kraken_analytic_matrix.csv") into filter_kraken_master_matrix
283 | 
284 |     """
285 |     ${PYTHON3} $baseDir/bin/kraken2_long_to_wide.py -i ${kraken_reports} -o filtered_kraken_analytic_matrix.csv
286 |     """
287 | }
288 | 
289 | 
290 | 
291 | /*
292 | ---- Run alignment to MEGAres
293 | */
294 | 
295 | if( !params.amr_index ) {
296 |     process BuildAMRIndex {
297 |         tag { amr.baseName }
298 | 
299 |         input:
300 |             file(amr)
301 | 
302 |         output:
303 |             file '*' into (amr_index)
304 | 
305 |         """
306 |         ${BWA} index ${amr}
307 |         """
308 |     }
309 | }
310 | 
311 | process AlignToAMR {
312 |      tag { sample_id }
313 | 
314 | 
315 | 
316 | 
317 | 
318 |      publishDir "${params.output}/AlignToAMR", mode: "copy"
319 | 
320 |      input:
321 |          set sample_id, file(forward), file(reverse) from non_host_fastq_megares
322 |          file index from amr_index
323 |          file amr
324 | 
325 |      output:
326 |          set sample_id, file("${sample_id}.amr.alignment.sam") into (megares_resistome_sam, megares_rarefaction_sam, megares_snp_sam , megares_snpfinder_sam, megares_RGI_sam)
327 |          set sample_id, file("${sample_id}.amr.alignment.dedup.sam") into (megares_dedup_resistome_sam,megares_dedup_RGI_sam)
328 |          set sample_id, file("${sample_id}.amr.alignment.dedup.bam") into (megares_dedup_resistome_bam)
329 | 
330 | 
331 |      """
332 |      ${BWA} mem ${amr} ${forward} ${reverse} -t ${threads} -R '@RG\\tID:${sample_id}\\tSM:${sample_id}' > ${sample_id}.amr.alignment.sam
333 |      ${SAMTOOLS} view -S -b ${sample_id}.amr.alignment.sam > ${sample_id}.amr.alignment.bam
334 |      ${SAMTOOLS} sort -n ${sample_id}.amr.alignment.bam -o ${sample_id}.amr.alignment.sorted.bam
335 |      ${SAMTOOLS} fixmate ${sample_id}.amr.alignment.sorted.bam ${sample_id}.amr.alignment.sorted.fix.bam
336 |      ${SAMTOOLS} sort ${sample_id}.amr.alignment.sorted.fix.bam -o ${sample_id}.amr.alignment.sorted.fix.sorted.bam
337 |      ${SAMTOOLS} rmdup -S ${sample_id}.amr.alignment.sorted.fix.sorted.bam ${sample_id}.amr.alignment.dedup.bam
338 |      ${SAMTOOLS} view -h -o ${sample_id}.amr.alignment.dedup.sam ${sample_id}.amr.alignment.dedup.bam
339 |      rm ${sample_id}.amr.alignment.bam
340 |      rm ${sample_id}.amr.alignment.sorted*.bam
341 |      """
342 | }
343 | 
344 | process RunResistome {
345 |     tag { sample_id }
346 | 
347 |     publishDir "${params.output}/RunResistome", mode: "copy"
348 | 
349 |     input:
350 |         set sample_id, file(sam) from megares_resistome_sam
351 |         file annotation
352 |         file amr
353 | 
354 |     output:
355 |         file("${sample_id}.gene.tsv") into (megares_resistome_counts, SNP_confirm_long)
356 |         file("${sample_id}.group.tsv") into (megares_group_counts)
357 |         file("${sample_id}.mechanism.tsv") into (megares_mech_counts)
358 |         file("${sample_id}.class.tsv") into (megares_class_counts)
359 |         file("${sample_id}.type.tsv") into (megares_type_counts)
360 | 
361 |     """
362 |     $baseDir/bin/resistome -ref_fp ${amr} \
363 |       -annot_fp ${annotation} \
364 |       -sam_fp ${sam} \
365 |       -gene_fp ${sample_id}.gene.tsv \
366 |       -group_fp ${sample_id}.group.tsv \
367 |       -mech_fp ${sample_id}.mechanism.tsv \
368 |       -class_fp ${sample_id}.class.tsv \
369 |       -type_fp ${sample_id}.type.tsv \
370 |       -t ${threshold}
371 |     """
372 | }
373 | 
374 | megares_resistome_counts.toSortedList().set { megares_amr_l_to_w }
375 | 
376 | process ResistomeResults {
377 |     tag { }
378 | 
379 |     publishDir "${params.output}/ResistomeResults", mode: "copy"
380 | 
381 |     input:
382 |         file(resistomes) from megares_amr_l_to_w
383 | 
384 |     output:
385 |         file("AMR_analytic_matrix.csv") into amr_master_matrix
386 | 
387 |     """
388 |     ${PYTHON3} $baseDir/bin/amr_long_to_wide.py -i ${resistomes} -o AMR_analytic_matrix.csv
389 |     """
390 | }
391 | 
392 | 
393 | /* samtools deduplication of megares alignment */
394 | process SamDedupRunResistome {
395 |     tag { sample_id }
396 | 
397 |     publishDir "${params.output}/SamDedupRunResistome", mode: "copy"
398 | 
399 |     input:
400 |         set sample_id, file(sam) from megares_dedup_resistome_sam
401 |         file annotation
402 |         file amr
403 | 
404 |     output:
405 |         file("${sample_id}.gene.tsv") into (megares_dedup_resistome_counts)
406 |         file("${sample_id}.group.tsv") into (megares_dedup_group_counts)
407 |         file("${sample_id}.mechanism.tsv") into (megares_dedup_mech_counts)
408 |         file("${sample_id}.class.tsv") into (megares_dedup_class_counts)
409 |         file("${sample_id}.type.tsv") into (megares_dedup_type_counts)
410 | 
411 |     """
412 |     $baseDir/bin/resistome -ref_fp ${amr} \
413 |       -annot_fp ${annotation} \
414 |       -sam_fp ${sam} \
415 |       -gene_fp ${sample_id}.gene.tsv \
416 |       -group_fp ${sample_id}.group.tsv \
417 |       -mech_fp ${sample_id}.mechanism.tsv \
418 |       -class_fp ${sample_id}.class.tsv \
419 |       -type_fp ${sample_id}.type.tsv \
420 |       -t ${threshold}
421 |     """
422 | }
423 | 
424 | megares_dedup_resistome_counts.toSortedList().set { megares_dedup_amr_l_to_w }
425 | 
426 | process SamDedupResistomeResults {
427 |     tag { }
428 | 
429 |     publishDir "${params.output}/SamDedup_ResistomeResults", mode: "copy"
430 | 
431 |     input:
432 |         file(resistomes) from megares_dedup_amr_l_to_w
433 | 
434 |     output:
435 |         file("SamDedup_AMR_analytic_matrix.csv") into megares_dedup_amr_master_matrix
436 | 
437 |     """
438 |     ${PYTHON3} $baseDir/bin/amr_long_to_wide.py -i ${resistomes} -o SamDedup_AMR_analytic_matrix.csv
439 |     """
440 | }
441 | 
442 | process RunRarefaction {
443 |     tag { sample_id }
444 | 
445 |     publishDir "${params.output}/RunRarefaction", mode: "copy"
446 | 
447 |     input:
448 |         set sample_id, file(sam) from megares_rarefaction_sam
449 |         file annotation
450 |         file amr
451 | 
452 |     output:
453 |         set sample_id, file("*.tsv") into (rarefaction)
454 | 
455 |     """
456 |     $baseDir/bin/rarefaction \
457 |       -ref_fp ${amr} \
458 |       -sam_fp ${sam} \
459 |       -annot_fp ${annotation} \
460 |       -gene_fp ${sample_id}.gene.tsv \
461 |       -group_fp ${sample_id}.group.tsv \
462 |       -mech_fp ${sample_id}.mech.tsv \
463 |       -class_fp ${sample_id}.class.tsv \
464 |       -type_fp ${sample_id}.type.tsv \
465 |       -min ${min} \
466 |       -max ${max} \
467 |       -skip ${skip} \
468 |       -samples ${samples} \
469 |       -t ${threshold}
470 |     """
471 | }
472 | 
473 | 
474 | 
475 | /*
476 | ---- Confirmation of alignments to genes that require SNP confirmation with RGI
477 | */
478 | 
479 | process ExtractSNP {
480 |      tag { sample_id }
481 |      
482 |      errorStrategy 'ignore'
483 |      
484 |      publishDir "${params.output}/ExtractMegaresSNPs", mode: "copy",
485 |          saveAs: { filename ->
486 |              if(filename.indexOf(".snp.fasta") > 0) "SNP_fasta/$filename"
487 |              else if(filename.indexOf("gene.tsv") > 0) "Gene_hits/$filename"
488 |              else {}
489 |          }
490 | 
491 |      input:
492 |          set sample_id, file(sam) from megares_RGI_sam
493 |          file annotation
494 |          file amr
495 | 
496 |      output:
497 |          set sample_id, file("*.snp.fasta") into megares_snp_fasta
498 |          set sample_id, file("${sample_id}*.gene.tsv") into (resistome_hits)
499 | 
500 |      """
501 |      awk -F "\\t" '{if (\$1!="@SQ" && \$1!="@RG" && \$1!="@PG" && \$1!="@HD" && \$3="RequiresSNPConfirmation" ) {print ">"\$1"\\n"\$10}}' ${sam} | tr -d '"'  > ${sample_id}.snp.fasta
502 |      $baseDir/bin/resistome -ref_fp ${amr} \
503 |       -annot_fp ${annotation} \
504 |       -sam_fp ${sam} \
505 |       -gene_fp ${sample_id}.gene.tsv \
506 |       -group_fp ${sample_id}.group.tsv \
507 |       -mech_fp ${sample_id}.mechanism.tsv \
508 |       -class_fp ${sample_id}.class.tsv \
509 |       -type_fp ${sample_id}.type.tsv \
510 |       -t ${threshold}
511 |      """
512 | }
513 | 
514 | 
515 | 
516 | 
517 | process RunRGI {
518 |      tag { sample_id }
519 |      errorStrategy 'ignore'
520 | 
521 | 
522 |      publishDir "${params.output}/RunRGI", mode: "symlink"
523 | 
524 |      input:
525 |          set sample_id, file(fasta) from megares_snp_fasta
526 |          file card_db
527 | 
528 |      output:
529 |          set sample_id, file("${sample_id}*rgi_output.txt") into rgi_results
530 | 
531 |      """
532 |      ${RGI} load --local -i ${card_db} --debug
533 | 
534 |      # We are using the code provided in the following RGI github issue https://github.com/arpcard/rgi/issues/93
535 |      set +e
536 |      echo "Run RGI the first time"
537 |      ${RGI} main --input_sequence ${fasta} --output_file ${sample_id}_rgi_output -a diamond --local
538 |      set -e
539 |      echo "Run RGI again"
540 |      ${RGI} main --input_sequence ${fasta} --output_file ${sample_id}_rgi_output -a diamond --local
541 | 
542 | 
543 |      """
544 | }
545 | 
546 | 
547 | process SNPconfirmation {
548 |      tag { sample_id }
549 |      errorStrategy 'ignore'
550 | 
551 |      publishDir "${params.output}/SNPConfirmation", mode: "copy",
552 |          saveAs: { filename ->
553 |              if(filename.indexOf("_rgi_perfect_hits.csv") > 0) "Perfect_RGI/$filename"
554 |              else if(filename.indexOf("_rgi_strict_hits.csv") > 0) "Strict_RGI/$filename"
555 |              else if(filename.indexOf("_rgi_loose_hits.csv") > 0) "Loose_RGI/$filename"
556 |              else {}
557 |          }
558 | 
559 |      input:
560 |          set sample_id, file(rgi) from rgi_results
561 | 
562 |      output:
563 |          set sample_id, file("${sample_id}_rgi_perfect_hits.csv") into perfect_snp_long_hits
564 |      """
565 |      ${PYTHON3} $baseDir/bin/RGI_aro_hits.py ${rgi} ${sample_id}
566 |      """
567 | }
568 | 
569 | process Confirmed_AMR_hits {
570 |      tag { sample_id }
571 | 
572 |      publishDir "${params.output}/SNP_confirmed_counts", mode: "copy"
573 | 
574 |      input:
575 |          set sample_id, file(megares_counts) from resistome_hits
576 |          set sample_id, file(perfect_rgi_counts) from perfect_snp_long_hits
577 | 
578 |      output:
579 |          file("${sample_id}*perfect_SNP_confirmed_counts") into perfect_confirmed_counts
580 | 
581 |      """
582 |      ${PYTHON3} $baseDir/bin/RGI_long_combine.py ${perfect_rgi_counts} ${megares_counts} ${sample_id}.perfect_SNP_confirmed_counts ${sample_id}
583 |      """
584 | }
585 | 
586 | 
587 | perfect_confirmed_counts.toSortedList().set { perfect_confirmed_amr_l_to_w }
588 | 
589 | process Confirmed_ResistomeResults {
590 |      tag {}
591 | 
592 |      publishDir "${params.output}/Confirmed_ResistomeResults", mode: "copy"
593 | 
594 |      input:
595 |          file(perfect_confirmed_resistomes) from perfect_confirmed_amr_l_to_w
596 | 
597 |      output:
598 |          file("perfect_SNP_confirmed_AMR_analytic_matrix.csv") into perfect_confirmed_matrix
599 | 
600 |      """
601 |      ${PYTHON3} $baseDir/bin/amr_long_to_wide.py -i ${perfect_confirmed_resistomes} -o perfect_SNP_confirmed_AMR_analytic_matrix.csv
602 |      """
603 | }
604 | 
605 | /*
606 | ---- Confirmation of deduped alignments to genes that require SNP confirmation with RGI.
607 | */
608 | 
609 | 
610 | process ExtractDedupSNP {
611 |      tag { sample_id }
612 | 
613 |      publishDir "${params.output}/ExtractDedupMegaresSNPs", mode: "copy",
614 |          saveAs: { filename ->
615 |              if(filename.indexOf(".snp.fasta") > 0) "SNP_fasta/$filename"
616 |              else if(filename.indexOf("gene.tsv") > 0) "Gene_hits/$filename"
617 |              else {}
618 |          }
619 | 
620 |      input:
621 |          set sample_id, file(sam) from megares_dedup_RGI_sam
622 |          file annotation
623 |          file amr
624 | 
625 |      output:
626 |          set sample_id, file("*.snp.fasta") into dedup_megares_snp_fasta
627 |          set sample_id, file("${sample_id}*.gene.tsv") into (dedup_resistome_hits)
628 | 
629 |      """
630 |      awk -F "\\t" '{if (\$1!="@SQ" && \$1!="@RG" && \$1!="@PG" && \$1!="@HD" && \$3="RequiresSNPConfirmation" ) {print ">"\$1"\\n"\$10}}' ${sam} | tr -d '"'  > ${sample_id}.snp.fasta
631 |      ${RESISTOME} -ref_fp ${amr} \
632 |       -annot_fp ${annotation} \
633 |       -sam_fp ${sam} \
634 |       -gene_fp ${sample_id}.gene.tsv \
635 |       -group_fp ${sample_id}.group.tsv \
636 |       -class_fp ${sample_id}.class.tsv \
637 |       -mech_fp ${sample_id}.mechanism.tsv \
638 |       -t ${threshold}
639 | 
640 |      """
641 | }
642 | 
643 | process RunDedupRGI {
644 |      tag { sample_id }
645 |      errorStrategy 'ignore'
646 |      publishDir "${params.output}/RunDedupRGI", mode: "copy"
647 | 
648 |      input:
649 |          set sample_id, file(fasta) from dedup_megares_snp_fasta
650 |          file card_db
651 | 
652 |      output:
653 |          set sample_id, file("${sample_id}_rgi_output.txt") into dedup_rgi_results
654 | 
655 |      """
656 |      ${RGI} load --local -i ${card_db} --debug
657 | 
658 |      # We are using the code provided in the following RGI github issue https://github.com/arpcard/rgi/issues/93
659 |      set +e
660 |      echo "Run RGI the first time"
661 |      ${RGI} main --input_sequence ${fasta} --output_file ${sample_id}_rgi_output -a diamond --local
662 |      set -e
663 |      echo "Run RGI again"
664 |      ${RGI} main --input_sequence ${fasta} --output_file ${sample_id}_rgi_output -a diamond --local
665 | 
666 |      """
667 | }
668 | 
669 | 
670 | process DedupSNPconfirmation {
671 |      tag { sample_id }
672 |      errorStrategy 'ignore'
673 | 
674 |      publishDir "${params.output}/DedupSNPConfirmation", mode: "copy",
675 |          saveAs: { filename ->
676 |              if(filename.indexOf("_rgi_perfect_hits.csv") > 0) "Perfect_RGI/$filename"
677 |              else if(filename.indexOf("_rgi_strict_hits.csv") > 0) "Strict_RGI/$filename"
678 |              else if(filename.indexOf("_rgi_loose_hits.csv") > 0) "Loose_RGI/$filename"
679 |              else {}
680 |          }
681 | 
682 |      input:
683 |          set sample_id, file(rgi) from dedup_rgi_results
684 | 
685 |      output:
686 |          set sample_id, file("${sample_id}_rgi_perfect_hits.csv") into dedup_perfect_snp_long_hits
687 |      """
688 |      ${PYTHON3} $baseDir/bin/RGI_aro_hits.py ${rgi} ${sample_id}
689 |      """
690 | }
691 | 
692 | process ConfirmDedupAMRHits {
693 |      tag { sample_id }
694 | 
695 |      publishDir "${params.output}/SNP_confirmed_counts", mode: "copy"
696 | 
697 |      input:
698 |          set sample_id, file(megares_counts) from dedup_resistome_hits
699 |          set sample_id, file(perfect_rgi_counts) from dedup_perfect_snp_long_hits
700 | 
701 |      output:
702 |          file("${sample_id}*perfect_SNP_confirmed_counts") into dedup_perfect_confirmed_counts
703 | 
704 |      """
705 |      ${PYTHON3} $baseDir/bin/RGI_long_combine.py ${perfect_rgi_counts} ${megares_counts} ${sample_id}.perfect_SNP_confirmed_counts ${sample_id}
706 |      """
707 | }
708 | 
709 | 
710 | dedup_perfect_confirmed_counts.toSortedList().set { dedup_perfect_confirmed_amr_l_to_w }
711 | 
712 | process DedupSNPConfirmed_ResistomeResults {
713 |      tag {}
714 | 
715 |      publishDir "${params.output}/Confirmed_ResistomeResults", mode: "copy"
716 | 
717 |      input:
718 |          file(perfect_confirmed_resistomes) from dedup_perfect_confirmed_amr_l_to_w
719 | 
720 |      output:
721 |          file("perfect_SNP_confirmed_AMR_analytic_matrix.csv") into dedup_perfect_confirmed_matrix
722 | 
723 |      """
724 |      ${PYTHON3} $baseDir/bin/amr_long_to_wide.py -i ${perfect_confirmed_resistomes} -o perfect_SNP_confirmed_dedup_AMR_analytic_matrix.csv
725 |      """
726 | }
727 | 
728 | 
729 | 
730 | 
731 | 
732 | def nextflow_version_error() {
733 |     println ""
734 |     println "This workflow requires Nextflow version 0.25 or greater -- You are running version $nextflow.version"
735 |     println "Run ./nextflow self-update to update Nextflow to the latest available version."
736 |     println ""
737 |     return 1
738 | }
739 | 
740 | def adapter_error(def input) {
741 |     println ""
742 |     println "[params.adapters] fail to open: '" + input + "' : No such file or directory"
743 |     println ""
744 |     return 1
745 | }
746 | 
747 | def amr_error(def input) {
748 |     println ""
749 |     println "[params.amr] fail to open: '" + input + "' : No such file or directory"
750 |     println ""
751 |     return 1
752 | }
753 | 
754 | def annotation_error(def input) {
755 |     println ""
756 |     println "[params.annotation] fail to open: '" + input + "' : No such file or directory"
757 |     println ""
758 |     return 1
759 | }
760 | 
761 | def fastq_error(def input) {
762 |     println ""
763 |     println "[params.reads] fail to open: '" + input + "' : No such file or directory"
764 |     println ""
765 |     return 1
766 | }
767 | 
768 | def host_error(def input) {
769 |     println ""
770 |     println "[params.host] fail to open: '" + input + "' : No such file or directory"
771 |     println ""
772 |     return 1
773 | }
774 | 
775 | def index_error(def input) {
776 |     println ""
777 |     println "[params.host_index] fail to open: '" + input + "' : No such file or directory"
778 |     println ""
779 |     return 1
780 | }
781 | 
782 | def help() {
783 |     println ""
784 |     println "Program: AmrPlusPlus"
785 |     println "Documentation: https://github.com/colostatemeg/amrplusplus/blob/master/README.md"
786 |     println "Contact: Christopher Dean <cdean11@colostate.edu>"
787 |     println ""
788 |     println "Usage:    nextflow run main.nf [options]"
789 |     println ""
790 |     println "Input/output options:"
791 |     println ""
792 |     println "    --reads         STR      path to FASTQ formatted input sequences"
793 |     println "    --adapters      STR      path to FASTA formatted adapter sequences"
794 |     println "    --host          STR      path to FASTA formatted host genome"
795 |     println "    --host_index    STR      path to BWA generated index files"
796 |     println "    --amr           STR      path to AMR resistance database"
797 |     println "    --annotation    STR      path to AMR annotation file"
798 |     println "    --output        STR      directory to write process outputs to"
799 |     println "    --KRAKENDB      STR      path to kraken database"
800 |     println ""
801 |     println "Trimming options:"
802 |     println ""
803 |     println "    --leading       INT      cut bases off the start of a read, if below a threshold quality"
804 |     println "    --minlen        INT      drop the read if it is below a specified length"
805 |     println "    --slidingwindow INT      perform sw trimming, cutting once the average quality within the window falls below a threshold"
806 |     println "    --trailing      INT      cut bases off the end of a read, if below a threshold quality"
807 |     println ""
808 |     println "Algorithm options:"
809 |     println ""
810 |     println "    --threads       INT      number of threads to use for each process"
811 |     println "    --threshold     INT      gene fraction threshold"
812 |     println "    --min           INT      starting sample level"
813 |     println "    --max           INT      ending sample level"
814 |     println "    --samples       INT      number of sampling iterations to perform"
815 |     println "    --skip          INT      number of levels to skip"
816 |     println ""
817 |     println "Help options:"
818 |     println ""
819 |     println "    --help                   display this message"
820 |     println ""
821 |     return 1
822 | }
823 | 


--------------------------------------------------------------------------------
/nextflow.config:
--------------------------------------------------------------------------------
 1 | manifest {
 2 |     /* Homepage of project */
 3 |     homePage = 'https://github.com/meglab-metagenomics/amrplusplus_v2'
 4 | 
 5 |     /* Description of project */
 6 |     description = 'AmrPlusPlus: A bioinformatic pipeline for characterizing the resistome with the MEGARes database and the microbiome using Kraken.'
 7 | 
 8 |     /* Main pipeline script */
 9 |     mainScript = 'main_AmrPlusPlus_v2.nf'
10 | 
11 |     /* Default repository branch */
12 |     defaultBranch = 'master'
13 | }
14 | 
15 | params {
16 |     /* Location of forward and reverse read pairs */
17 |     reads = "data/raw/*_R{1,2}.fastq.gz"
18 | 
19 |     /* Location of adapter sequences */
20 |     adapters = "data/adapters/nextera.fa"
21 | 
22 |     /* Location of host genome index files */
23 |     host_index = ""
24 | 
25 |     /* Location of host genome */
26 |     host = "data/host/chr21.fasta.gz"
27 |     
28 |     /* Kraken database location, default is "none" */   
29 |     kraken_db = "minikraken2_v2_8GB_201904_UPDATE"
30 | 
31 |     /* Location of amr index files */
32 |     amr_index = ""
33 | 
34 |     /* Location of antimicrobial resistance (MEGARes) database */
35 |     amr = "data/amr/megares_modified_database_v2.00.fasta"
36 | 
37 |     /* Location of amr annotation file */
38 |     annotation = "data/amr/megares_modified_annotations_v2.00.csv"
39 | 
40 |     /* Location of SNP confirmation script */
41 |     snp_confirmation = "bin/snp_confirmation.py"
42 | 
43 |     /* Output directory */
44 |     output = "test_results"
45 | 
46 |     /* Number of threads */
47 |     threads = 10
48 |     smem_threads = 12
49 | 
50 |     /* Trimmomatic trimming parameters */
51 |     leading = 3
52 |     trailing = 3
53 |     slidingwindow = "4:15"
54 |     minlen = 36
55 | 
56 |     /* Resistome threshold */
57 |     threshold = 80
58 | 
59 |     /* Starting rarefaction level */
60 |     min = 5
61 | 
62 |     /* Ending rarefaction level */
63 |     max = 100
64 | 
65 |     /* Number of levels to skip */
66 |     skip = 5
67 | 
68 |     /* Number of iterations to sample at */
69 |     samples = 1
70 | 
71 |     /* Display help message */
72 |     help = false
73 | }
74 | 
75 | /* These files correspond to configuration files that can be edited to best suit your computing environment. */
76 | profiles {
77 |   local {
78 |     includeConfig "config/local.config"
79 |   }
80 |   MEG_AMI {
81 |     includeConfig "config/MEG_AMI.config"
82 |   }
83 |   local_angus {
84 |     includeConfig "config/local_angus.config"
85 |   }
86 |   local_MSI {
87 |     includeConfig "config/local_MSI.config"
88 |   }
89 |   singularity_slurm {
90 |     process.executor = 'slurm'
91 |     includeConfig "config/singularity_slurm.config"
92 |     process.container = 'shub://meglab-metagenomics/amrplusplus_v2'
93 |   }
94 |   singularity {
95 |     includeConfig "config/singularity.config"
96 |     process.container = 'shub://meglab-metagenomics/amrplusplus_v2'
97 |   }
98 | }
99 | 


--------------------------------------------------------------------------------
/previous_versions/main_amr_plus_plus_v1.nf:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env nextflow
  2 | 
  3 | /*
  4 | vim: syntax=groovy
  5 | -*- mode: groovy;-*-
  6 | */
  7 | 
  8 | if (params.help ) {
  9 |     return help()
 10 | }
 11 | if( params.host_index ) {
 12 |     host_index = Channel.fromPath(params.host_index).toSortedList()
 13 |     //if( host_index.isEmpty() ) return index_error(host_index)
 14 | }
 15 | if( params.host ) {
 16 |     host = file(params.host)
 17 |     if( !host.exists() ) return host_error(host)
 18 | }
 19 | if( params.amr ) {
 20 |     amr = file(params.amr)
 21 |     if( !amr.exists() ) return amr_error(amr)
 22 | }
 23 | if( params.adapters ) {
 24 |     adapters = file(params.adapters)
 25 |     if( !adapters.exists() ) return adapter_error(adapters)
 26 | }
 27 | if( params.annotation ) {
 28 |     annotation = file(params.annotation)
 29 |     if( !annotation.exists() ) return annotation_error(annotation)
 30 | }
 31 | 
 32 | kraken_db = params.kraken_db
 33 | threads = params.threads
 34 | 
 35 | threshold = params.threshold
 36 | 
 37 | min = params.min
 38 | max = params.max
 39 | skip = params.skip
 40 | samples = params.samples
 41 | 
 42 | leading = params.leading
 43 | trailing = params.trailing
 44 | slidingwindow = params.slidingwindow
 45 | minlen = params.minlen
 46 | 
 47 | Channel
 48 |     .fromFilePairs( params.reads, flat: true )
 49 |     .ifEmpty { exit 1, "Read pair files could not be found: ${params.reads}" }
 50 |     .set { reads }
 51 | 
 52 | process RunQC {
 53 |     tag { sample_id }
 54 | 
 55 |     publishDir "${params.output}/RunQC", mode: 'copy', pattern: '*.fastq',
 56 |         saveAs: { filename ->
 57 |             if(filename.indexOf("P.fastq") > 0) "Paired/$filename"
 58 |             else if(filename.indexOf("U.fastq") > 0) "Unpaired/$filename"
 59 |             else {}
 60 |         }
 61 | 
 62 |     input:
 63 |         set sample_id, file(forward), file(reverse) from reads
 64 | 
 65 |     output:
 66 |         set sample_id, file("${sample_id}.1P.fastq.gz"), file("${sample_id}.2P.fastq.gz") into (paired_fastq)
 67 |         set sample_id, file("${sample_id}.1U.fastq.gz"), file("${sample_id}.2U.fastq.gz") into (unpaired_fastq)
 68 |         file("${sample_id}.trimmomatic.stats.log") into (trimmomatic_stats)
 69 | 
 70 |     """
 71 |      ${JAVA} -jar ${TRIMMOMATIC}/trimmomatic.jar \
 72 |       PE \
 73 |       -threads ${threads} \
 74 |       $forward $reverse ${sample_id}.1P.fastq ${sample_id}.1U.fastq ${sample_id}.2P.fastq ${sample_id}.2U.fastq \
 75 |       ILLUMINACLIP:${adapters}:2:30:10:3:TRUE \
 76 |       LEADING:${leading} \
 77 |       TRAILING:${trailing} \
 78 |       SLIDINGWINDOW:${slidingwindow} \
 79 |       MINLEN:${minlen} \
 80 |       2> ${sample_id}.trimmomatic.stats.log
 81 | 
 82 |       gzip *fastq
 83 |     """
 84 | }
 85 | 
 86 | trimmomatic_stats.toSortedList().set { trim_stats }
 87 | 
 88 | process QCStats {
 89 |     tag { sample_id }
 90 | 
 91 |     publishDir "${params.output}/RunQC", mode: 'copy',
 92 |         saveAs: { filename ->
 93 |             if(filename.indexOf(".stats") > 0) "Stats/$filename"
 94 |             else {}
 95 |         }
 96 | 
 97 |     input:
 98 |         file(stats) from trim_stats
 99 | 
100 |     output:
101 | 	file("trimmomatic.stats")
102 | 
103 |     """
104 |     ${PYTHON3} $baseDir/bin/trimmomatic_stats.py -i ${stats} -o trimmomatic.stats
105 |     """
106 | }
107 | 
108 | if( !params.host_index ) {
109 |     process BuildHostIndex {
110 |         publishDir "${params.output}/BuildHostIndex", mode: "copy"
111 | 
112 |         tag { host.baseName }
113 | 
114 |         input:
115 |             file(host)
116 | 
117 |         output:
118 |             file '*' into (host_index)
119 | 
120 |         """
121 |         ${BWA} index ${host}
122 |         """
123 |     }
124 | }
125 | 
126 | process AlignReadsToHost {
127 |     tag { sample_id }
128 | 
129 |     publishDir "${params.output}/AlignReadsToHost", mode: "copy"
130 | 
131 |     input:
132 |         set sample_id, file(forward), file(reverse) from paired_fastq
133 |         file index from host_index
134 |         file host
135 | 
136 |     output:
137 |         set sample_id, file("${sample_id}.host.sam") into (host_sam)
138 | 
139 |     """
140 |     ${BWA} mem ${host} ${forward} ${reverse} -t ${threads} > ${sample_id}.host.sam
141 |     """
142 | }
143 | 
144 | process RemoveHostDNA {
145 |     tag { sample_id }
146 | 
147 |     publishDir "${params.output}/RemoveHostDNA", mode: "copy", pattern: '*.bam',
148 | 	saveAs: { filename ->
149 |             if(filename.indexOf(".bam") > 0) "NonHostBAM/$filename"
150 |         }
151 | 
152 |     input:
153 |         set sample_id, file(sam) from host_sam
154 | 
155 |     output:
156 |         set sample_id, file("${sample_id}.host.sorted.removed.bam") into (non_host_bam)
157 |         file("${sample_id}.samtools.idxstats") into (idxstats_logs)
158 | 
159 |     """
160 |     ${SAMTOOLS} view -bS ${sam} | ${SAMTOOLS} sort -@ ${threads} -o ${sample_id}.host.sorted.bam
161 |     ${SAMTOOLS} index ${sample_id}.host.sorted.bam && ${SAMTOOLS} idxstats ${sample_id}.host.sorted.bam > ${sample_id}.samtools.idxstats
162 |     ${SAMTOOLS} view -h -f 4 -b ${sample_id}.host.sorted.bam -o ${sample_id}.host.sorted.removed.bam
163 |     """
164 | }
165 | 
166 | idxstats_logs.toSortedList().set { host_removal_stats }
167 | 
168 | process HostRemovalStats {
169 |     tag { sample_id }
170 | 
171 |     publishDir "${params.output}/RemoveHostDNA", mode: "copy",
172 |         saveAs: { filename ->
173 |             if(filename.indexOf(".stats") > 0) "HostRemovalStats/$filename"
174 |         }
175 | 
176 |     input:
177 |         file(stats) from host_removal_stats
178 | 
179 |     output:
180 |         file("host.removal.stats")
181 | 
182 |     """
183 |     ${PYTHON3} $baseDir/bin/samtools_idxstats.py -i ${stats} -o host.removal.stats
184 |     """
185 | }
186 | 
187 | process BAMToFASTQ {
188 |     tag { sample_id }
189 | 
190 |     publishDir "${params.output}/BAMToFASTQ", mode: "copy"
191 | 
192 |     input:
193 |         set sample_id, file(bam) from non_host_bam
194 | 
195 |     output:
196 |         set sample_id, file("${sample_id}.non.host.R1.fastq"), file("${sample_id}.non.host.R2.fastq") into (non_host_fastq, non_host_fastq_kraken)
197 | 
198 |     """
199 |     ${BEDTOOLS}  \
200 |        bamtofastq \
201 |       -i ${bam} \
202 |       -fq ${sample_id}.non.host.R1.fastq \
203 |       -fq2 ${sample_id}.non.host.R2.fastq
204 |     """
205 | }
206 | 
207 | if( !params.amr_index ) {
208 |     process BuildAMRIndex {
209 |         tag { amr.baseName }
210 | 
211 |         input:
212 |             file(amr)
213 | 
214 |         output:
215 |             file '*' into (amr_index)
216 | 
217 |         """
218 |         ${BWA} index ${amr}
219 |         """
220 |     }
221 | }
222 | 
223 | process AlignToAMR {
224 |      tag { sample_id }
225 | 
226 |      publishDir "${params.output}/AlignToAMR", mode: "copy"
227 | 
228 |      input:
229 |          set sample_id, file(forward), file(reverse) from non_host_fastq
230 |          file index from amr_index
231 |          file amr
232 | 
233 |      output:
234 |          set sample_id, file("${sample_id}.amr.alignment.sam") into (resistome_sam, rarefaction_sam, snp_sam)
235 | 
236 |      """
237 |      ${BWA} mem ${amr} ${forward} ${reverse} -t ${threads} > ${sample_id}.amr.alignment.sam
238 |      """
239 | }
240 | 
241 | process RunResistome {
242 |     tag { sample_id }
243 | 
244 |     publishDir "${params.output}/RunResistome", mode: "copy"
245 | 
246 |     input:
247 |         set sample_id, file(sam) from resistome_sam
248 |         file annotation
249 |         file amr
250 | 
251 |     output:
252 |         file("${sample_id}.gene.tsv") into (resistome)
253 | 
254 |     """
255 |     ${RESISTOME} -ref_fp ${amr} \
256 |       -annot_fp ${annotation} \
257 |       -sam_fp ${sam} \
258 |       -gene_fp ${sample_id}.gene.tsv \
259 |       -group_fp ${sample_id}.group.tsv \
260 |       -class_fp ${sample_id}.class.tsv \
261 |       -mech_fp ${sample_id}.mechanism.tsv \
262 |       -t ${threshold}
263 |     """
264 | }
265 | 
266 | process RunRarefaction {
267 |     tag { sample_id }
268 | 
269 |     publishDir "${params.output}/RunRarefaction", mode: "copy"
270 | 
271 |     input:
272 |         set sample_id, file(sam) from rarefaction_sam
273 |         file annotation
274 |         file amr
275 | 
276 |     output:
277 |         set sample_id, file("*.tsv") into (rarefaction)
278 | 
279 |     """
280 |     ${RAREFACTION} \
281 |       -ref_fp ${amr} \
282 |       -sam_fp ${sam} \
283 |       -annot_fp ${annotation} \
284 |       -gene_fp ${sample_id}.gene.tsv \
285 |       -group_fp ${sample_id}.group.tsv \
286 |       -class_fp ${sample_id}.class.tsv \
287 |       -mech_fp ${sample_id}.mech.tsv \
288 |       -min ${min} \
289 |       -max ${max} \
290 |       -skip ${skip} \
291 |       -samples ${samples} \
292 |       -t ${threshold}
293 |     """
294 | }
295 | 
296 | process RunSNPFinder {
297 |     tag { sample_id }
298 | 
299 |     publishDir "${params.output}/RunSNPFinder", mode: "copy"
300 | 
301 |     input:
302 |         set sample_id, file(sam) from snp_sam
303 |         file amr
304 | 
305 |     output:
306 |         set sample_id, file("*.tsv") into (snp)
307 | 
308 |     """
309 |     ${SNPFINDER} \
310 |       -amr_fp ${amr} \
311 |       -sampe ${sam} \
312 |       -out_fp ${sample_id}.tsv
313 |     """
314 | }
315 | 
316 | process RunKraken {
317 |     tag { sample_id }
318 | 
319 |     publishDir "${params.output}/RunKraken", mode: "copy"
320 | 
321 |     input:
322 |        set sample_id, file(forward), file(reverse) from non_host_fastq_kraken
323 | 
324 |     output:
325 |        file("${sample_id}.kraken.filtered.report") into
326 |        kraken_report
327 | 
328 |     """
329 |     ${KRAKEN2} --preload --db ${kraken_db} --paired ${forward} ${reverse} --threads ${threads} --report ${sample_id}.kraken.report > ${sample_id}.kraken.raw
330 |     ${KRAKEN2} --preload --db ${kraken_db} --confidence 1 --paired ${forward} ${reverse} --threads ${threads} --report ${sample_id}.kraken.filtered.report > ${sample_id}.kraken.raw
331 |     """
332 | }
333 | 
334 | resistome.toSortedList().set { amr_l_to_w }
335 | 
336 | process AMRLongToWide {
337 |     tag { }
338 | 
339 |     publishDir "${params.output}/AMRLongToWide", mode: "copy"
340 | 
341 |     input:
342 |         file(resistomes) from amr_l_to_w
343 | 
344 |     output:
345 |         file("AMR_analytic_matrix.csv") into amr_master_matrix
346 | 
347 |     """
348 |     mkdir ret
349 |     ${PYTHON3} $baseDir/bin/amr_long_to_wide.py -i ${resistomes} -o ret
350 |     mv ret/AMR_analytic_matrix.csv .
351 |     """
352 | }
353 | 
354 | kraken_report.toSortedList().set { kraken_l_to_w }
355 | 
356 | process KrakenLongToWide {
357 |     tag { }
358 | 
359 |     publishDir "${params.output}/KrakenLongToWide", mode: "copy"
360 | 
361 |     input:
362 |         file(kraken_reports) from kraken_l_to_w
363 | 
364 |     output:
365 |         file("kraken_analytic_matrix.csv") into kraken_master_matrix
366 | 
367 |     """
368 |     mkdir ret
369 |     ${PYTHON3} $baseDir/bin/kraken2_long_to_wide.py -i ${kraken_reports} -o ret
370 |     mv ret/kraken_analytic_matrix.csv .
371 |     """
372 | }
373 | 
374 | def nextflow_version_error() {
375 |     println ""
376 |     println "This workflow requires Nextflow version 0.25 or greater -- You are running version $nextflow.version"
377 |     println "Run ./nextflow self-update to update Nextflow to the latest available version."
378 |     println ""
379 |     return 1
380 | }
381 | 
382 | def adapter_error(def input) {
383 |     println ""
384 |     println "[params.adapters] fail to open: '" + input + "' : No such file or directory"
385 |     println ""
386 |     return 1
387 | }
388 | 
389 | def amr_error(def input) {
390 |     println ""
391 |     println "[params.amr] fail to open: '" + input + "' : No such file or directory"
392 |     println ""
393 |     return 1
394 | }
395 | 
396 | def annotation_error(def input) {
397 |     println ""
398 |     println "[params.annotation] fail to open: '" + input + "' : No such file or directory"
399 |     println ""
400 |     return 1
401 | }
402 | 
403 | def fastq_error(def input) {
404 |     println ""
405 |     println "[params.reads] fail to open: '" + input + "' : No such file or directory"
406 |     println ""
407 |     return 1
408 | }
409 | 
410 | def host_error(def input) {
411 |     println ""
412 |     println "[params.host] fail to open: '" + input + "' : No such file or directory"
413 |     println ""
414 |     return 1
415 | }
416 | 
417 | def index_error(def input) {
418 |     println ""
419 |     println "[params.host_index] fail to open: '" + input + "' : No such file or directory"
420 |     println ""
421 |     return 1
422 | }
423 | 
424 | def help() {
425 |     println ""
426 |     println "Program: AmrPlusPlus"
427 |     println "Documentation: https://github.com/colostatemeg/amrplusplus/blob/master/README.md"
428 |     println "Contact: Christopher Dean <cdean11@colostate.edu>"
429 |     println ""
430 |     println "Usage:    nextflow run main.nf [options]"
431 |     println ""
432 |     println "Input/output options:"
433 |     println ""
434 |     println "    --reads         STR      path to FASTQ formatted input sequences"
435 |     println "    --adapters      STR      path to FASTA formatted adapter sequences"
436 |     println "    --host          STR      path to FASTA formatted host genome"
437 |     println "    --host_index    STR      path to BWA generated index files"
438 |     println "    --amr           STR      path to AMR resistance database"
439 |     println "    --annotation    STR      path to AMR annotation file"
440 |     println "    --output        STR      directory to write process outputs to"
441 |     println "    --KRAKENDB      STR      path to kraken database"
442 |     println ""
443 |     println "Trimming options:"
444 |     println ""
445 |     println "    --leading       INT      cut bases off the start of a read, if below a threshold quality"
446 |     println "    --minlen        INT      drop the read if it is below a specified length"
447 |     println "    --slidingwindow INT      perform sw trimming, cutting once the average quality within the window falls below a threshold"
448 |     println "    --trailing      INT      cut bases off the end of a read, if below a threshold quality"
449 |     println ""
450 |     println "Algorithm options:"
451 |     println ""
452 |     println "    --threads       INT      number of threads to use for each process"
453 |     println "    --threshold     INT      gene fraction threshold"
454 |     println "    --min           INT      starting sample level"
455 |     println "    --max           INT      ending sample level"
456 |     println "    --samples       INT      number of sampling iterations to perform"
457 |     println "    --skip          INT      number of levels to skip"
458 |     println ""
459 |     println "Help options:"
460 |     println ""
461 |     println "    --help                   display this message"
462 |     println ""
463 |     return 1
464 | }
465 | 


--------------------------------------------------------------------------------