├── CODE_OF_CONDUCT.md ├── LICENSE ├── README.md ├── bin ├── QC_summary_stats.py ├── RGI_aro_hits.py ├── RGI_long_combine.py ├── amr_long_to_wide.py ├── kraken2_long_to_wide.py ├── kraken_long_to_wide.py ├── rarefaction ├── resistome ├── samtools_idxstats.py └── trimmomatic_stats.py ├── config ├── MEG_AMI.config ├── local.config ├── local_MSI.config ├── local_angus.config ├── singularity.config └── singularity_slurm.config ├── containers ├── Singularity └── Singularity.RGI ├── data ├── HMM.tar.xz ├── adapters │ └── nextera.fa ├── amr │ ├── megares_annotations_v1.01.csv │ ├── megares_annotations_v1.02.csv │ ├── megares_database_v1.01.fasta │ ├── megares_database_v1.02.fasta │ ├── megares_drugs_annotations_v2.00.csv │ ├── megares_drugs_database_v2.00.fasta │ ├── megares_modified_annotations_v2.00.csv │ ├── megares_modified_database_v2.00.fasta │ ├── megares_to_external_header_mappings_v1.01.tsv │ └── snp_location_metadata.csv ├── host │ └── chr21.fasta.gz └── raw │ ├── S1_test_R1.fastq.gz │ ├── S1_test_R2.fastq.gz │ ├── S2_test_R1.fastq.gz │ ├── S2_test_R2.fastq.gz │ ├── S3_test_R1.fastq.gz │ └── S3_test_R2.fastq.gz ├── docs ├── AmrPlusPlus_Pipeline_workflow.pdf ├── CHANGELOG.md ├── FAQs.md ├── accessing_AMR++.md ├── configuration.md ├── contact.md ├── dependencies.md ├── installation.md ├── output.md ├── requirements.md └── usage.md ├── download_minikraken.sh ├── launch_mpi_slurm.sh ├── main_AmrPlusPlus_v2.nf ├── main_AmrPlusPlus_v2_withKraken.nf ├── main_AmrPlusPlus_v2_withRGI.nf ├── main_AmrPlusPlus_v2_withRGI_Kraken.nf ├── nextflow.config └── previous_versions └── main_amr_plus_plus_v1.nf /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Contributor Covenant Code of Conduct 2 | 3 | ## Our Pledge 4 | 5 | In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation. 6 | 7 | ## Our Standards 8 | 9 | Examples of behavior that contributes to creating a positive environment include: 10 | 11 | * Using welcoming and inclusive language 12 | * Being respectful of differing viewpoints and experiences 13 | * Gracefully accepting constructive criticism 14 | * Focusing on what is best for the community 15 | * Showing empathy towards other community members 16 | 17 | Examples of unacceptable behavior by participants include: 18 | 19 | * The use of sexualized language or imagery and unwelcome sexual attention or advances 20 | * Trolling, insulting/derogatory comments, and personal or political attacks 21 | * Public or private harassment 22 | * Publishing others' private information, such as a physical or electronic address, without explicit permission 23 | * Other conduct which could reasonably be considered inappropriate in a professional setting 24 | 25 | ## Our Responsibilities 26 | 27 | Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior. 28 | 29 | Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful. 30 | 31 | ## Scope 32 | 33 | This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers. 34 | 35 | ## Enforcement 36 | 37 | Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team at meg.metagenomics@gmail.com. The project team will review and investigate all complaints, and will respond in a way that it deems appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately. 38 | 39 | Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership. 40 | 41 | ## Attribution 42 | 43 | This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, available at [http://contributor-covenant.org/version/1/4][version] 44 | 45 | [homepage]: http://contributor-covenant.org 46 | [version]: http://contributor-covenant.org/version/1/4/ 47 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2017 Chris Dean 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | Overview 2 | -------- 3 | [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) 4 | [![Nextflow](https://img.shields.io/badge/Nextflow-%E2%89%A50.25.1-brightgreen.svg)](https://www.nextflow.io/) 5 | 6 | 7 | ## [AMR++ v3 now available!](https://github.com/Microbial-Ecology-Group/AMRplusplus) 8 | We have migrated github repositories to a [new location](https://github.com/Microbial-Ecology-Group/AMRplusplus) (to make it a group repository), and this repository will be deprecated. We apologize for any inconvenience and hope you find v3 useful for your research needs. Of note, version 3 includes: 9 | * SNP confirmation using a custom database and [SNP verification software](https://github.com/Isabella136/AmrPlusPlus_SNP) 10 | * improved modularity to optimize a personalized workflow 11 | 12 | 13 | ### 2022-08-22 : AMR++ update coming soon 14 | Hello AMR++ users, we would like to sincerely apologize for the delay in addresssing your concerns and updating AMR++. As a lot of you likely experienced, COVID was challenging and we were not able dedicate the resources to AMR++ that it deserves. We are happy to announce that we have assembled a team for another major update to AMR++ and the MEGARes database in the next few months! 15 | 16 | A few notes: 17 | * We are aware of the issues with integrating RGI results with the AMR++ pipeline. Unfortunately, we are discontinuing our support of integrating AMR++ results with the RGI software. 18 | * We are attempting to remedy the issues that AMR++ users have reported, but we would also like to hear any other suggestions you might have. Please send any suggestions to enriquedoster@gmail.com with the subject line, "AMR++ update". 19 | * A few upcoming updates: easy control over the amount of intermediate files that are stored, option to re-arrange pipeline processes, better sample summary statistics provided, and improved functionality through nextflow profiles. 20 | 21 | 22 | ### 2020-03-21 : AMR++ v2.0.2 update. 23 | We identified issues in running RGI with the full AMR++ pipeline thanks to github users, AroArz and DiegoBrambilla. We are releasing v2.0.1 to continue AMR++ functionality, but we are planning further updates for the next stable release. As of this update, RGI developers are focused on contributing to the COVID-19 response, so we plan to reconvene with them when their schedule opens up. 24 | * Please view the [CHANGELOG](https://github.com/meglab-metagenomics/amrplusplus_v2/blob/master/docs/CHANGELOG.md) for more details on changes included in AMR++ v2.0.1 25 | * To run the AMR++ pipeline with RGI, you'll have to download the CARD database locally and specify it's location using the "--card_db" flag like this: 26 | 27 | ``` 28 | # If you want to include RGI in your analysis, first download CARD with this command: 29 | # We tested AMR++ v2.0.2 with the CARD database v3.0.8, but we recommend using the command below to get the latest CARD db 30 | wget -q -O card-data.tar.bz2 https://card.mcmaster.ca/latest/data && tar xfvj card-data.tar.bz2 31 | 32 | # In case the latest CARD database is causing issues, you can download the version we used for testing, v3.0.8: 33 | wget -q -O card-data.tar.bz2 https://card.mcmaster.ca/download/0/broadstreet-v3.0.8.tar.bz2 && tar xfvj card-data.tar.bz2 34 | 35 | 36 | # If you run into an error regarding "Issued certificate has expired.", try this command: 37 | wget --no-check-certificate -q -O card-data.tar.bz2 https://card.mcmaster.ca/latest/data && tar xfvj card-data.tar.bz2 38 | 39 | 40 | # Run the AMR++ pipeline with the "--card_db" flag 41 | nextflow run main_AmrPlusPlus_v2_withRGI.nf -profile singularity --card_db /path/to/card.json --reads '/path/to/reads/*R{1,2}_001.R1.fastq.gz' --output AMR++_results -w work_dir 42 | ``` 43 | 44 | 45 | # Microbial Ecology Group (MEG) 46 | (https://megares.meglab.org/) 47 | 48 | Our international multidisciplinary group of scientists and educators is addressing the issues of antimicrobial resistance (AMR) and microbial ecology in agriculture through research, outreach, and education. By characterizing risks related to AMR and microbial ecology, our center will identify agricultural production practices that are harmful and can be avoided, while also identifying and promoting production practices and interventions that are beneficial or do no harm to the ecosystem or public health. This will allow society to realize “sustainable intensification” of agriculture. 49 | 50 | # MEGARes and the AMR++ bioinformatic pipeline 51 | (http://megares.meglab.org/amrplusplus/latest/html/v2/) 52 | 53 | The MEGARes database contains sequence data for approximately 8,000 hand-curated antimicrobial resistance genes accompanied by an annotation structure that is optimized for use with high throughput sequencing and metagenomic analysis. The acyclical annotation graph of MEGARes allows for accurate, count-based, hierarchical statistical analysis of resistance at the population level, much like microbiome analysis, and is also designed to be used as a training database for the creation of statistical classifiers. 54 | 55 | The goal of many metagenomics studies is to characterize the content and relative abundance of sequences of interest from the DNA of a given sample or set of samples. You may want to know what is contained within your sample or how abundant a given sequence is relative to another. 56 | 57 | Often, metagenomics is performed when the answer to these questions must be obtained for a large number of targets where techniques like multiplex PCR and other targeted methods would be too cumbersome to perform. AmrPlusPlus can process the raw data from the sequencer, identify the fragments of DNA, and count them. It also provides a count of the polymorphisms that occur in each DNA fragment with respect to the reference database. 58 | 59 | Additionally, you may want to know if the depth of your sequencing (how many reads you obtain that are on target) is high enough to identify rare organisms (organisms with low abundance relative to others) in your population. This is referred to as rarefaction and is calculated by randomly subsampling your sequence data at intervals between 0% and 100% in order to determine how many targets are found at each depth. 60 | 61 | With AMR++, you will obtain alignment count files for each sample that are combined into a count matrix that can be analyzed using any statistical and mathematical techniques that can operate on a matrix of observations. 62 | 63 | More Information 64 | ---------------- 65 | 66 | - [Installation](https://github.com/meglab-metagenomics/amrplusplus_v2/blob/master/docs/installation.md) 67 | - [Usage](https://github.com/meglab-metagenomics/amrplusplus_v2/blob/master/docs/usage.md) 68 | - [Configuration](https://github.com/meglab-metagenomics/amrplusplus_v2/blob/master/docs/configuration.md) 69 | - [Accessing AMR++](https://github.com/meglab-metagenomics/amrplusplus_v2/blob/master/docs/accessing_AMR++.md) 70 | - [Output](https://github.com/meglab-metagenomics/amrplusplus_v2/blob/master/docs/output.md) 71 | - [Dependencies](https://github.com/meglab-metagenomics/amrplusplus_v2/blob/master/docs/dependencies.md) 72 | - [Software Requirements](https://github.com/meglab-metagenomics/amrplusplus_v2/blob/master/docs/requirements.md) 73 | - [FAQs](https://github.com/meglab-metagenomics/amrplusplus_v2/blob/master/docs/FAQs.md) 74 | - [Details on AMR++ updates](https://github.com/meglab-metagenomics/amrplusplus_v2/blob/master/docs/update_details.md) 75 | - [Contact](https://github.com/meglab-metagenomics/amrplusplus_v2/blob/master/docs/contact.md) 76 | -------------------------------------------------------------------------------- /bin/QC_summary_stats.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | import gzip 4 | import argparse 5 | import glob 6 | import sys 7 | import gzip 8 | import csv 9 | import pandas as pd 10 | import numpy 11 | 12 | 13 | 14 | 15 | def parse_cmdline_params(cmdline_params): 16 | info = "Removes duplicate FASTQ entries from a FASTQ file" 17 | parser = argparse.ArgumentParser(description=info) 18 | parser.add_argument('-i', '--input_files', nargs='+', required=True, 19 | help='Use globstar to pass a list of sequence files, (Ex: *.fastq.gz)') 20 | return parser.parse_args(cmdline_params) 21 | 22 | 23 | def pull_Phred(fastq_files): 24 | 25 | for f in fastq_files: # iterate through each fastq file 26 | Plist=[] 27 | Qlist=[] 28 | Seqlen_list=[] 29 | num_reads = 0 30 | 31 | fp = open(f, 'r') # open each fastq file; gzip.open if .gz files 32 | for line in fp: # iterate through lines of fastq file 33 | Ordqual=[] 34 | Q=[] 35 | P=[] 36 | read_id = line 37 | seq = fp.next() 38 | 39 | #seq = seq[10:len(seq)] # Let's not chop off the umi here since we would be checking the quality after UMI removal and not all samples have UMIs 40 | 41 | Seqlen_list.append(len(seq)) 42 | #newseq = seq + spacesep + UMI 43 | plus = fp.next() 44 | qual = fp.next() 45 | 46 | 47 | 48 | for i in range(len(qual)-1): #Exclude the return character 49 | Ordqual.append(ord(qual[i])) 50 | Q.append(Ordqual[i]-33) 51 | P.append(10**(-Q[i]/10)) 52 | 53 | Qlist.append(numpy.mean(Q)) 54 | Plist.append(numpy.mean(P)) 55 | 56 | num_reads += 1 57 | 58 | print(f,"mean_probability_nucleotide_error",numpy.mean(Plist)) 59 | print(f,"mean_phred_score",numpy.mean(Qlist)) 60 | print(f,"total_reads",num_reads) 61 | print(f,"mean_read_length",numpy.mean(Seqlen_list)) 62 | 63 | fp.close() 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | def print_dict(dict): 73 | # iterate through UMIs and repeat counts and print those 74 | dups= "Repeat UMI Error" 75 | for k, v in dict.items(): 76 | if v != 1: 77 | print k, dups, v 78 | else: 79 | print k, v 80 | 81 | def print_Rep(dict): 82 | # iterate through UMIs and repeat counts and print those 83 | dups= "Repeat UMI Error" 84 | for k, v in dict.items(): 85 | if len(v) != 1: 86 | print k, dups, v 87 | else: 88 | print k, v 89 | 90 | 91 | #Apply the previous functions; Print UMIs and counts 92 | if __name__ == "__main__": 93 | opts = parse_cmdline_params(sys.argv[1:]) 94 | fastq_files = opts.input_files 95 | #read_dic = pull_UMI(fastq_files) 96 | #print_dict(read_dic[0]) 97 | #print_Rep(read_dic[1]) 98 | #print ReWriter(read_dic[1]) 99 | pull_Phred(fastq_files) 100 | -------------------------------------------------------------------------------- /bin/RGI_aro_hits.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | import sys 4 | import csv 5 | 6 | 7 | def rgi_output(rgi_file): 8 | # Get the desired information from the RGI file 9 | with open(rgi_file, 'r') as rgifile: 10 | 11 | # Get the first line of the file and put it into a list 12 | header = rgifile.readline().split('\t') 13 | 14 | # Get the index of all the elements we want 15 | category = header.index('Cut_Off') 16 | best_hit_aro = header.index('Best_Hit_ARO') 17 | aro = header.index('ARO') 18 | model_type = header.index('Model_type') 19 | 20 | # Define a dictionary where each best_hit_aro is a key and the items are the rest of the desired elements 21 | rgi_dict = {} 22 | 23 | # Go through the file and fill out the dictionary without repeats 24 | reader = csv.reader(rgifile, delimiter="\t") 25 | for row in reader: 26 | aro_name = row[best_hit_aro] 27 | if aro_name not in rgi_dict.keys(): 28 | rgi_dict[aro_name] = [row[aro], row[category], 1, row[model_type]] 29 | else: 30 | rgi_dict[aro_name][2] += 1 31 | 32 | # Close the file 33 | rgifile.close() 34 | 35 | # Get the name of the file to use for the three outputs 36 | sample_name = rgi_file.split(".") 37 | 38 | perf_sample_name = sample_name 39 | perf_sample_name.pop() 40 | perf_sample_name.pop() 41 | perf_sample_name.insert(1, '_rgi_perfect_hits.csv') 42 | perf_file_name = ''.join(perf_sample_name) 43 | 44 | strict_sample_name = sample_name 45 | strict_sample_name.pop() 46 | strict_sample_name.insert(1, '_rgi_strict_hits.csv') 47 | strict_file_name = ''.join(strict_sample_name) 48 | 49 | loose_sample_name = sample_name 50 | loose_sample_name.pop() 51 | loose_sample_name.insert(1, '_rgi_loose_hits.csv') 52 | loose_file_name = ''.join(loose_sample_name) 53 | 54 | # Search the dictionary to see which of the three Cut_Offs exist 55 | perf_in_dict = False 56 | strict_in_dict = False 57 | loose_in_dict = False 58 | 59 | for x in rgi_dict.values(): 60 | if x[1] == "Perfect": 61 | perf_in_dict = True 62 | if x[1] == "Strict": 63 | strict_in_dict = True 64 | if x[1] == "Loose": 65 | loose_in_dict = True 66 | # Stop checking if we already know we need to make all three files 67 | if perf_in_dict and strict_in_dict and loose_in_dict: 68 | break 69 | 70 | # Write the dictionary to each of the files if cut_off values exist 71 | # I.e if the file has no perfects, we don't write a file for it 72 | if perf_in_dict: 73 | with open(perf_file_name, 'w', newline='\n') as perf_file: 74 | perf_write = csv.writer(perf_file, delimiter=',') 75 | perf_write.writerow(["Best_Hit_ARO", "ARO", "Cut_Off", "Sum_Hits", "Model_Type"]) 76 | for perf_key, perf_item in rgi_dict.items(): 77 | if perf_item[1] == "Perfect": 78 | temp_perf_write = perf_item.copy() 79 | temp_perf_write.insert(0, perf_key) 80 | perf_write.writerow(temp_perf_write) 81 | 82 | if strict_in_dict: 83 | with open(strict_file_name, 'w', newline='\n') as strict_file: 84 | strict_write = csv.writer(strict_file, delimiter=',') 85 | strict_write.writerow(["Best_Hit_ARO", "ARO", "Cut_Off", "Sum_Hits", "Model_Type"]) 86 | for strict_key, strict_item in rgi_dict.items(): 87 | if strict_item[1] == "Strict": 88 | temp_strict_write = strict_item.copy() 89 | temp_strict_write.insert(0, strict_key) 90 | strict_write.writerow(temp_strict_write) 91 | 92 | if loose_in_dict: 93 | with open(loose_file_name, 'w', newline='\n') as loose_file: 94 | loose_write = csv.writer(loose_file, delimiter=',') 95 | loose_write.writerow(["Best_Hit_ARO", "ARO", "Cut_Off", "Sum_Hits", "Model_Type"]) 96 | for loose_key, loose_item in rgi_dict.items(): 97 | if loose_item[1] == "Loose": 98 | temp_loose_write = loose_item.copy() 99 | temp_loose_write.insert(0, loose_key) 100 | loose_write.writerow(temp_loose_write) 101 | 102 | 103 | if __name__ == '__main__': 104 | rgi_output(sys.argv[1]) 105 | -------------------------------------------------------------------------------- /bin/RGI_long_combine.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | import sys 4 | import csv 5 | 6 | 7 | def rgi_long_combine(rgi_perf_file, long_file, combined_output): 8 | # Get the desired information from the RGI file 9 | with open(rgi_perf_file, 'r') as rgifile: 10 | 11 | # Define a dictionary where desired formatted rgi entries (gene column format) are keys and the items are the number of hits and default gene fraction percentage 12 | rgi_perf_dict = {} 13 | 14 | # Go through the file and fill out the dictionary using long format 15 | reader = csv.reader(rgifile, delimiter=',') 16 | next(reader) 17 | for row in reader: 18 | aro_name = "RGI|" + row[2] + "|" + row[0] 19 | rgi_perf_dict[aro_name] = [row[3], 80] 20 | 21 | # Close the file 22 | rgifile.close() 23 | 24 | # Get the name of the sample 25 | sample_name = rgi_perf_file.split("_")[0] 26 | 27 | # Get counts from the provided long_file 28 | with open(long_file, 'r') as long_file: 29 | 30 | # Define a dictionary where genes are keys and the items are the sample name, number of hits, and gene fraction percentage 31 | long_dict = {} 32 | 33 | # Go through the file and fill out the dictionary using long format lines 34 | long_reader = csv.reader(long_file, delimiter=',') 35 | header = next(long_reader) 36 | for long_row in long_reader: 37 | split_gene_name = long_row[1].split("|") 38 | if split_gene_name[len(split_gene_name)-1] != "RequiresSNPConfirmation": 39 | long_dict[long_row[1]] = [long_row[0], long_row[2], long_row[3]] 40 | 41 | # Close the file 42 | long_file.close() 43 | 44 | 45 | # Write combined output to the provided output file 46 | with open(combined_output, 'w', newline='\n') as combined_file: 47 | combined_write = csv.writer(combined_file, delimiter=',') 48 | combined_write.writerow(header) 49 | # Write to the combined file using the dictionaries we created previously 50 | for rgi_key, rgi_item in rgi_perf_dict.items(): 51 | temp_rgi_write = rgi_item.copy() 52 | temp_rgi_write.insert(0, rgi_key) 53 | temp_rgi_write.insert(0, sample_name) 54 | combined_write.writerow(temp_rgi_write) 55 | 56 | for long_key, long_item in long_dict.items(): 57 | temp_long_write = long_item.copy() 58 | temp_long_write.insert(1, long_key) 59 | combined_write.writerow(temp_long_write) 60 | 61 | 62 | if __name__ == '__main__': 63 | rgi_long_combine(sys.argv[1], sys.argv[2], sys.argv[3]) 64 | -------------------------------------------------------------------------------- /bin/amr_long_to_wide.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | __author__ = "Steven Lakin" 4 | __copyright__ = "" 5 | __credits__ = ["Steven Lakin"] 6 | __version__ = "" 7 | __maintainer__ = "lakinsm" 8 | __email__ = "lakinsm@colostate.edu" 9 | __status__ = "Cows go moo." 10 | 11 | import argparse 12 | import sys 13 | 14 | amr_level_names = {0: 'Class', 1: 'Mechanism', 2: 'Group'} 15 | 16 | def parse_cmdline_params(cmdline_params): 17 | info = "" 18 | parser = argparse.ArgumentParser(description=info) 19 | parser.add_argument('-i', '--input_files', nargs='+', required=True, 20 | help='Use globstar to pass a list of files, (Ex: *.tsv)') 21 | parser.add_argument('-o', '--output_file', required=True, 22 | help='Output file name for writing the AMR_analytic_matrix.csv file') 23 | return parser.parse_args(cmdline_params) 24 | 25 | def amr_load_data(file_name_list): 26 | samples = {} 27 | labels = set() 28 | for file in file_name_list: 29 | with open(file, 'r') as f: 30 | data = f.read().split('\n')[1:] 31 | for entry in data: 32 | if not entry: 33 | continue 34 | entry = entry.split('\t') 35 | sample = entry[0].split('.')[0] 36 | count = float(entry[2]) 37 | gene_name = entry[1] 38 | try: 39 | samples[sample][gene_name] = count 40 | except KeyError: 41 | try: 42 | samples[sample].setdefault(gene_name, count) 43 | except KeyError: 44 | samples.setdefault(sample, {gene_name: count}) 45 | labels.add(gene_name) 46 | return samples, labels 47 | 48 | def output_amr_analytic_data(outfile, S, L): 49 | with open(outfile, 'w') as amr: 50 | local_sample_names = [] 51 | for sample, dat in S.items(): 52 | local_sample_names.append(sample) 53 | amr.write(','.join(local_sample_names) + '\n') 54 | for label in L: 55 | local_counts = [] 56 | amr.write(label + ',') 57 | for local_sample in local_sample_names: 58 | if label in S[local_sample]: 59 | local_counts.append(str(S[local_sample][label])) 60 | else: 61 | local_counts.append(str(0)) 62 | amr.write(','.join(local_counts) + '\n') 63 | 64 | if __name__ == '__main__': 65 | opts = parse_cmdline_params(sys.argv[1:]) 66 | S, L = amr_load_data(opts.input_files) 67 | output_amr_analytic_data(opts.output_file, S, L) 68 | -------------------------------------------------------------------------------- /bin/kraken2_long_to_wide.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | import sys 4 | import argparse 5 | import numpy as np 6 | 7 | __author__ = 'Steven Lakin' 8 | __maintainer__ = 'lakinsm' 9 | __email__ = 'Steven.Lakin@colostate.edu' 10 | 11 | 12 | taxa_levels = { 13 | 'D': 0, 14 | 'K': 1, 15 | 'P': 2, 16 | 'C': 3, 17 | 'O': 4, 18 | 'F': 5, 19 | 'G': 6, 20 | 'S': 7, 21 | 'U': 8 22 | } 23 | 24 | taxa_level_names = { 25 | 0: 'Domain', 26 | 1: 'Kingdom', 27 | 2: 'Phylum', 28 | 3: 'Class', 29 | 4: 'Order', 30 | 5: 'Family', 31 | 6: 'Genus', 32 | 7: 'Species', 33 | 8: 'Unclassified' 34 | } 35 | 36 | 37 | def parse_cmdline_params(cmdline_params): 38 | info = "" 39 | parser = argparse.ArgumentParser(description=info) 40 | parser.add_argument('-i', '--input_files', nargs='+', required=True, 41 | help='Use globstar to pass a list of files, (Ex: *.tsv)') 42 | parser.add_argument('-o', '--output_file', required=True, 43 | help='Output file name for writing the kraken_analytic_matrix.csv file') 44 | return parser.parse_args(cmdline_params) 45 | 46 | 47 | def dict_to_matrix(D): 48 | ncol = len(D.keys()) 49 | unique_nodes = [] 50 | samples = [] 51 | for sample, tdict in D.items(): 52 | for taxon in tdict.keys(): 53 | if taxon not in unique_nodes: 54 | unique_nodes.append(taxon) 55 | nrow = len(unique_nodes) 56 | return_values = np.zeros((nrow, ncol), dtype=np.float) 57 | for j, (sample, tdict) in enumerate(D.items()): 58 | samples.append(sample) 59 | for i, taxon in enumerate(unique_nodes): 60 | if taxon in tdict: 61 | return_values[i, j] = np.float(tdict[taxon]) 62 | return return_values, unique_nodes, samples 63 | 64 | 65 | def kraken2_load_analytic_data(file_name_list): 66 | return_values = {} 67 | unclassifieds = {} # { sample: [unclassified, total, percent] } 68 | for file in file_name_list: 69 | sample_id = file.split('/')[-1].replace('.kraken.report', '') 70 | unclassifieds.setdefault(sample_id, [0, 0, 0]) 71 | with open(file, 'r') as f: 72 | data = f.read().split('\n') 73 | taxon_list = ['NA'] * 8 74 | previous_taxon_level = 0 75 | for line in data: 76 | if not line: 77 | continue 78 | entries = line.split('\t') 79 | node_count = int(entries[2]) 80 | node_level = entries[3] 81 | node_name = entries[5].strip() 82 | if node_level == 'U': 83 | unclassifieds[sample_id][0] = node_count 84 | unclassifieds[sample_id][1] += node_count 85 | unclassifieds[sample_id][2] = float(entries[0]) 86 | continue 87 | elif node_level == 'R': 88 | unclassifieds[sample_id][1] += int(entries[1]) 89 | continue 90 | if len(node_level) > 1: 91 | if node_level[0] in ('U', 'R'): 92 | continue 93 | parent_node_level = node_level[0] 94 | else: 95 | parent_node_level = node_level 96 | this_taxon_level = taxa_levels[parent_node_level] 97 | if len(node_level) == 1: 98 | taxon_list[this_taxon_level] = node_name 99 | if this_taxon_level < previous_taxon_level: 100 | taxon_list[this_taxon_level + 1:] = ['NA'] * (7 - this_taxon_level) 101 | previous_taxon_level = this_taxon_level 102 | if node_count == 0: 103 | continue 104 | this_taxonomy_string = '|'.join(taxon_list[:this_taxon_level + 1]) 105 | try: 106 | return_values[sample_id][this_taxonomy_string] += node_count 107 | except KeyError: 108 | try: 109 | return_values[sample_id].setdefault(this_taxonomy_string, node_count) 110 | except KeyError: 111 | return_values.setdefault(sample_id, {this_taxonomy_string: node_count}) 112 | return dict_to_matrix(return_values), unclassifieds 113 | 114 | 115 | def output_kraken2_analytic_data(outfile, M, m_names, n_names, unclassifieds): 116 | with open(outfile, 'w') as out, \ 117 | open('kraken_unclassifieds.csv', 'w') as u_out: 118 | out.write(','.join(n_names) + '\n') 119 | for i, row in enumerate(M): 120 | out.write('\"{}\",'.format( 121 | m_names[i].replace(',', '') 122 | )) 123 | out.write(','.join([str(x) for x in row]) + '\n') 124 | u_out.write('SampleID,NumberUnclassified,Total,PercentUnclassified\n') 125 | for sample, numbers in unclassifieds.items(): 126 | u_out.write('{},{}\n'.format( 127 | sample, 128 | ','.join([str(x) for x in numbers]) 129 | )) 130 | 131 | 132 | if __name__ == '__main__': 133 | opts = parse_cmdline_params(sys.argv[1:]) 134 | kraken2_load_analytic_data(opts.input_files) 135 | (K, m, n), u = kraken2_load_analytic_data(opts.input_files) 136 | output_kraken2_analytic_data(opts.output_file, K, m, n, u) 137 | -------------------------------------------------------------------------------- /bin/kraken_long_to_wide.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | import argparse 4 | import numpy as np 5 | import sys 6 | 7 | __author__ = "Steven Lakin" 8 | __copyright__ = "" 9 | __credits__ = ["Steven Lakin"] 10 | __version__ = "" 11 | __maintainer__ = "lakinsm" 12 | __email__ = "lakinsm@colostate.edu" 13 | __status__ = "Cows go moo." 14 | 15 | taxa_level = {'D': 0, 'P': 1, 'C': 2, 'O': 3, 'F': 4, 'G': 5, 'S': 6} 16 | taxa_level_names = {1: 'Domain', 2: 'Phylum', 3: 'Class', 4: 'Order', 17 | 5: 'Family', 6: 'Genus', 7: 'Species', 8: 'Unclassified'} 18 | 19 | def parse_cmdline_params(cmdline_params): 20 | info = "" 21 | parser = argparse.ArgumentParser(description=info) 22 | parser.add_argument('-i', '--input_files', nargs='+', required=True, 23 | help='Use globstar to pass a list of files, (Ex: *.tsv)') 24 | parser.add_argument('-o', '--output_directory', required=True, 25 | help='Output directory for writing the kraken_analytic_matrix.csv file') 26 | return parser.parse_args(cmdline_params) 27 | 28 | def dict_to_matrix(D): 29 | ncol = len(D.keys()) 30 | unique_nodes = [] 31 | samples = [] 32 | for sample, tdict in D.items(): 33 | for taxon in tdict.keys(): 34 | if taxon not in unique_nodes: 35 | unique_nodes.append(taxon) 36 | nrow = len(unique_nodes) 37 | ret = np.zeros((nrow, ncol), dtype=np.float) 38 | for j, (sample, tdict) in enumerate(D.items()): 39 | samples.append(sample) 40 | for i, taxon in enumerate(unique_nodes): 41 | if taxon in tdict: 42 | ret[i, j] = np.float(tdict[taxon]) 43 | return ret, unique_nodes, samples 44 | 45 | def kraken_load_analytic_data(file_name_list): 46 | ret = {} 47 | for file in file_name_list: 48 | sample_id = file.split('/')[-1].replace('.kraken.report', '') 49 | with open(file, 'r') as f: 50 | data = f.read().split('\n') 51 | assignment_list = [''] * 15 52 | taxon_list = ['NA'] * 7 53 | for entry in data: 54 | if not entry: 55 | continue 56 | temp_name = entry.split('\t')[5] 57 | space_level = int((len(temp_name) - len(temp_name.lstrip(' '))) / 2) - 1 58 | if (space_level <= 0) and (''.join(entry.split()[5:]) not in ('Viruses', 'Bacteria', 'Archaea')): 59 | continue 60 | if space_level < 0: 61 | space_level = 0 62 | entry = entry.split() 63 | if entry[3] == 'U': 64 | continue 65 | node_name = ' '.join(entry[5:]) 66 | assignment_list[space_level] = node_name 67 | assignment_len = len(assignment_list) - assignment_list.count('') 68 | if (space_level + 1) < assignment_len: 69 | assignment_list = assignment_list[:space_level + 1] + [''] * (14 - space_level) 70 | if entry[3] != '-': 71 | node_level = taxa_level[entry[3]] 72 | taxon_list[node_level] = node_name 73 | taxon_name = '|'.join(taxon_list[:node_level+1]) 74 | taxon_len = len(taxon_list) - taxon_list.count('NA') 75 | if (node_level + 1) < taxon_len: 76 | taxon_list = taxon_list[:node_level + 1] + ['NA'] * (6 - node_level) 77 | temp_list = [x for x in taxon_list] 78 | if entry[3] == '-': 79 | temp_list = [x for x in taxon_list] 80 | while temp_list and temp_list[-1] == 'NA': 81 | temp_list.pop() 82 | if (space_level + 1) < assignment_len: 83 | iter_loc = space_level + 1 84 | while True: 85 | if iter_loc == 0: 86 | break 87 | try: 88 | iter_loc = temp_list.index(assignment_list[iter_loc]) 89 | break 90 | except ValueError: 91 | iter_loc -= 1 92 | temp_list = [x for x in taxon_list[:iter_loc + 1]] 93 | taxon_list = taxon_list[:iter_loc + 1] + ['NA'] * (6 - iter_loc) 94 | taxon_name = '|'.join(temp_list) 95 | if float(entry[2]) == 0.0: 96 | continue 97 | try: 98 | ret[sample_id][taxon_name] += float(entry[2]) 99 | except KeyError: 100 | try: 101 | ret[sample_id].setdefault(taxon_name, float(entry[2])) 102 | except KeyError: 103 | ret.setdefault(sample_id, {taxon_name: float(entry[2])}) 104 | return dict_to_matrix(ret) 105 | 106 | def output_kraken_analytic_data(outdir, M, m_names, n_names): 107 | with open(outdir + '/kraken_analytic_matrix.csv', 'w') as out: 108 | out.write(','.join(n_names) + '\n') 109 | for i, row in enumerate(M): 110 | out.write('\"{}\",'.format(m_names[i].replace(',', ''))) 111 | out.write(','.join([str(x) for x in row]) + '\n') 112 | 113 | if __name__ == '__main__': 114 | opts = parse_cmdline_params(sys.argv[1:]) 115 | K, m, n = kraken_load_analytic_data(opts.input_files) 116 | output_kraken_analytic_data(opts.output_directory, K, m, n) 117 | -------------------------------------------------------------------------------- /bin/rarefaction: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meglab-metagenomics/amrplusplus_v2/f8e6f9427e7ed63c0f1ac901ebbe4eddd5053ed2/bin/rarefaction -------------------------------------------------------------------------------- /bin/resistome: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meglab-metagenomics/amrplusplus_v2/f8e6f9427e7ed63c0f1ac901ebbe4eddd5053ed2/bin/resistome -------------------------------------------------------------------------------- /bin/samtools_idxstats.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | __author__ = "Chris Dean" 4 | __copyright__ = "" 5 | __credits__ = ["Chris Dean"] 6 | __version__ = "" 7 | __maintainer__ = "cdeanj" 8 | __email__ = "cdean11@colostate.edu" 9 | __status__ = "Cows go moo." 10 | 11 | import argparse 12 | import glob 13 | import os 14 | import sys 15 | 16 | def parse_cmdline_params(cmdline_params): 17 | info = "Parses a Samtools idxstats file to obtain the total number of mapped, unmapped, and total reads" 18 | parser = argparse.ArgumentParser(description=info) 19 | parser.add_argument('-i', '--input_files', nargs='+', required=True, 20 | help='Use globstar to pass a list of files, (Ex: *.tsv)') 21 | parser.add_argument('-o', '--output_file', required=True, 22 | help='Output file to write mapping results to') 23 | return parser.parse_args(cmdline_params) 24 | 25 | def header(output_file): 26 | with open(output_file, 'a') as o: 27 | o.write('Sample\tNumberOfInputReads\tMapped\tUnmapped\n') 28 | o.close() 29 | 30 | def mapping_stats(input_list, output_file): 31 | for f in input_list: 32 | mapped = 0 33 | unmapped = 0 34 | number_of_reads = 0 35 | with open(f, 'r') as fp: 36 | sample_name = os.path.basename(str(fp.name)).split('.', 1)[0] 37 | for line in fp: 38 | columns = line.strip().split('\t') 39 | mapped += int(columns[2]) 40 | unmapped += int(columns[3]) 41 | number_of_reads += mapped + unmapped 42 | fp.close() 43 | with open(output_file, 'a') as o: 44 | o.write(sample_name + '\t' + str(number_of_reads) + '\t' + str(mapped) + '\t' + str(unmapped) + '\n') 45 | o.close() 46 | 47 | if __name__ == "__main__": 48 | opts = parse_cmdline_params(sys.argv[1:]) 49 | header(opts.output_file) 50 | mapping_stats(opts.input_files, opts.output_file) 51 | -------------------------------------------------------------------------------- /bin/trimmomatic_stats.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | __author__ = "Chris Dean" 4 | __copyright__ = "" 5 | __credits__ = ["Chris Dean"] 6 | __version__ = "" 7 | __maintainer__ = "cdeanj" 8 | __email__ = "cdean11@colostate.edu" 9 | __status__ = "Cows go moo." 10 | 11 | import argparse 12 | import glob 13 | import os 14 | import re 15 | import sys 16 | 17 | def parse_cmdline_params(cmdline_params): 18 | info = "Parses a Trimmomatic log file to obtain the total number of input reads and dropped reads" 19 | parser = argparse.ArgumentParser(description=info) 20 | parser.add_argument('-i', '--input_files', nargs='+', required=True, 21 | help='Use globstar to pass a list of files, (Ex: *.tsv)') 22 | parser.add_argument('-o', '--output_file', required=True, 23 | help='Output file to write mapping results to') 24 | return parser.parse_args(cmdline_params) 25 | 26 | def header(output_file): 27 | with open(output_file, 'a') as o: 28 | o.write('Sample\tNumberOfInputReads\tForwardOnlySurviving\tReverseOnlySurviving\tDropped\n') 29 | o.close() 30 | 31 | def qc_stats(input_list, output_file): 32 | for f in input_list: 33 | total = 0 34 | forward_surviving = 0 35 | reverse_surviving = 0 36 | dropped = 0 37 | with open(f, 'r') as fp: 38 | sample_name = os.path.basename(str(fp.name)).split('.', 1)[0] 39 | for line in fp: 40 | total = re.search('Input Read Pairs: (\d+)', line) 41 | forward_surviving = re.search('Forward Only Surviving: (\d+)', line) 42 | reverse_surviving = re.search('Reverse Only Surviving: (\d+)', line) 43 | dropped = re.search('Dropped: (\d+)', line) 44 | if total: 45 | break 46 | fp.close() 47 | with open(output_file, 'a') as ofp: 48 | ofp.write(sample_name + '\t' + total.group(1) + '\t' + forward_surviving.group(1) + '\t' + reverse_surviving.group(1) + '\t' + dropped.group(1) + '\n') 49 | ofp.close() 50 | 51 | if __name__ == "__main__": 52 | opts = parse_cmdline_params(sys.argv[1:]) 53 | header(opts.output_file) 54 | qc_stats(opts.input_files, opts.output_file) 55 | -------------------------------------------------------------------------------- /config/MEG_AMI.config: -------------------------------------------------------------------------------- 1 | // The location of each dependency binary needs to be specified here. 2 | // The examples listed below are assuming the tools are already in your $PATH, however, 3 | // the absolute path to each tool can be entered individually. 4 | env { 5 | /* These following tools are required to run AmrPlusPlus*/ 6 | JAVA = "java" 7 | TRIMMOMATIC = "~/.conda/envs/AmrPlusPlus/share/trimmomatic/trimmomatic.jar" 8 | PYTHON3 = "python3" 9 | BWA = "bwa" 10 | SAMTOOLS = "samtools" 11 | BEDTOOLS = "bedtools" 12 | RESISTOME = "resistome" 13 | RAREFACTION = "rarefaction" 14 | SNPFINDER = "snpfinder" 15 | FREEBAYES = "freebayes" 16 | /* These next tools are optional depending on which analyses you want to run */ 17 | KRAKEN2 = "kraken2" 18 | RGI = "rgi" 19 | DIAMOND = "diamond" 20 | } 21 | 22 | process { 23 | cpus = 1 // The maximum amount of CPUs to use 24 | disk = '125 GB' // The maximum amount of disk space a single process is allowed to use 25 | //errorStrategy = 'ignore' // Ignore process errors 26 | executor = 'local' // The type of system the processes are being run on (do not modify this) 27 | maxForks = 1 // The maximum number of forks a single process is allowed to spawn 28 | memory = '8 GB' // The maximum amount of memory a single process is allowed to use 29 | } 30 | -------------------------------------------------------------------------------- /config/local.config: -------------------------------------------------------------------------------- 1 | // The location of each dependency binary needs to be specified here. 2 | // The examples listed below are assuming the tools are already in your $PATH, however, 3 | // the absolute path to each tool can be entered individually. 4 | env { 5 | /* These following tools are required to run AmrPlusPlus*/ 6 | JAVA = "java" 7 | TRIMMOMATIC = "trimmomatic-0.36.jar" 8 | PYTHON3 = "python3" 9 | BWA = "bwa" 10 | SAMTOOLS = "samtools" 11 | BEDTOOLS = "bedtools" 12 | RESISTOME = "resistome" 13 | RAREFACTION = "rarefaction" 14 | SNPFINDER = "snpfinder" 15 | FREEBAYES = "freebayes" 16 | /* These next tools are optional depending on which analyses you want to run */ 17 | KRAKEN2 = "kraken2" 18 | RGI = "rgi" 19 | DIAMOND = "diamond" 20 | } 21 | 22 | process { 23 | cpus = 4 // The maximum amount of CPUs to use 24 | disk = '125 GB' // The maximum amount of disk space a single process is allowed to use 25 | //errorStrategy = 'ignore' // Ignore process errors 26 | executor = 'local' // The type of system the processes are being run on (do not modify this) 27 | maxForks = 1 // The maximum number of forks a single process is allowed to spawn 28 | memory = '8 GB' // The maximum amount of memory a single process is allowed to use 29 | } 30 | -------------------------------------------------------------------------------- /config/local_MSI.config: -------------------------------------------------------------------------------- 1 | // The location of each dependency binary needs to be specified here. 2 | // The paths listed below are just examples, however, I recommend 3 | // following a similar format. 4 | 5 | env { 6 | /* These following tools are required to run AmrPlusPlus*/ 7 | JAVA = "/panfs/roc/msisoft/java/openjdk-8_202/bin/java" 8 | TRIMMOMATIC = "/panfs/roc/msisoft/trimmomatic/0.33/trimmomatic.jar" 9 | PYTHON3 = "/panfs/roc/msisoft/anaconda/anaconda3-2018.12/bin/python" 10 | BWA = "/panfs/roc/msisoft/bwa/0.7.17_gcc-7.2.0_haswell/bwa" 11 | SAMTOOLS = "/panfs/roc/msisoft/samtools/1.9_gcc-7.2.0_haswell/bin/samtools" 12 | BEDTOOLS = "/panfs/roc/msisoft/bedtools/2.27.1/bin/bedtools" 13 | RESISTOME = "/home/noyes046/shared/tools/resistomeanalyzer_v2/resistome" 14 | RAREFACTION = "/home/noyes046/shared/tools/rarefaction" 15 | SNPFINDER = "/home/noyes046/shared/tools/snpfinder" 16 | FREEBAYES = "/soft/freebayes/1.2.0/bin/freebayes" 17 | /* These next tools are optional depending on which analyses you want to run */ 18 | KRAKEN2 = "/panfs/roc/msisoft/kraken/2.0.7beta/kraken2" 19 | RGI = "/panfs/roc/groups/11/noyes046/edoster/.conda/envs/AmrPlusPlus_env/bin/rgi" 20 | } 21 | 22 | process { 23 | maxForks = 3 24 | disk = '125 GB' // The maximum amount of disk space a single process is allowed to use 25 | /* errorStrategy = 'ignore' // Ignore process errors */ 26 | executor = 'local' // The type of system the processes are being run on (do not modify this) 27 | } 28 | -------------------------------------------------------------------------------- /config/local_angus.config: -------------------------------------------------------------------------------- 1 | // The location of each dependency binary needs to be specified here. 2 | // The examples listed below are assuming the tools are already in your $PATH, however, 3 | // the absolute path to each tool can be entered individually. 4 | 5 | env { 6 | /* These following tools are required to run AmrPlusPlus*/ 7 | JAVA = "/usr/bin/java" 8 | TRIMMOMATIC = "/s/angus/index/common/tools/Trimmomatic-0.36/trimmomatic-0.36.jar" 9 | PYTHON3 = "/usr/bin/python3" 10 | BWA = "/usr/bin/bwa" 11 | SAMTOOLS = "/usr/local/bin/samtools" 12 | BEDTOOLS = "/usr/bin/bedtools" 13 | RESISTOME = "/s/angus/index/common/tools/resistome" 14 | RAREFACTION = "/s/angus/index/common/tools/rarefaction" 15 | SNPFINDER = "/s/angus/index/common/tools/snpfinder" 16 | FREEBAYES = "/s/angus/index/common/tools/miniconda3/envs/AmrPlusPlus_env/bin/freebayes" 17 | /* These next tools are optional depending on which analyses you want to run */ 18 | KRAKEN2 = "/s/angus/index/common/tools/miniconda3/envs/AmrPlusPlus_env/bin/kraken2" 19 | RGI = "/s/angus/index/common/tools/miniconda3/envs/AmrPlusPlus_env/bin/rgi" 20 | DIAMOND = "/s/angus/index/common/tools/miniconda3/envs/AmrPlusPlus_env/bin/diamond" 21 | } 22 | 23 | 24 | 25 | 26 | process { 27 | maxForks = 3 28 | disk = '125 GB' // The maximum amount of disk space a single process is allowed to use 29 | executor = 'local' // The type of system the processes are being run on (do not modify this) 30 | } 31 | -------------------------------------------------------------------------------- /config/singularity.config: -------------------------------------------------------------------------------- 1 | singularity { 2 | /* Enables Singularity container execution by default */ 3 | enabled = true 4 | cacheDir = "$PWD" 5 | /* Enable auto-mounting of host paths (requires user bind control feature enabled */ 6 | autoMounts = true 7 | } 8 | 9 | env { 10 | /* These following tools are required to run AmrPlusPlus*/ 11 | JAVA = '/usr/local/envs/AmrPlusPlus_env/bin/java' 12 | TRIMMOMATIC = '/usr/local/envs/AmrPlusPlus_env/share/trimmomatic/trimmomatic.jar' 13 | PYTHON3 = "python3" 14 | BWA = "bwa" 15 | SAMTOOLS = "samtools" 16 | BEDTOOLS = "bedtools" 17 | RESISTOME = "resistome" 18 | RAREFACTION = "rarefaction" 19 | SNPFINDER = "snpfinder" 20 | FREEBAYES = "freebayes" 21 | /* These next tools are optional depending on which analyses you want to run */ 22 | KRAKEN2 = "kraken2" 23 | RGI = "rgi" 24 | DIAMOND = "diamond" 25 | } 26 | 27 | 28 | process { 29 | process.container = 'shub://meglab-metagenomics/amrplusplus_v2' 30 | maxForks = 10 // The maximum number of forks a single process is allowed to spawn 31 | withName:RunRGI { 32 | container = 'shub://meglab-metagenomics/amrplusplus_v2:rgi' 33 | } 34 | withName:RunDedupRGI { 35 | container = 'shub://meglab-metagenomics/amrplusplus_v2:rgi' 36 | } 37 | } 38 | -------------------------------------------------------------------------------- /config/singularity_slurm.config: -------------------------------------------------------------------------------- 1 | singularity { 2 | /* Enables Singularity container execution by default */ 3 | enabled = true 4 | cacheDir = "$PWD" 5 | /* Enable auto-mounting of host paths (requires user bind control feature enabled */ 6 | autoMounts = true 7 | } 8 | 9 | env { 10 | /* These following tools are required to run AmrPlusPlus*/ 11 | JAVA = '/usr/local/envs/AmrPlusPlus_env/bin//java' 12 | TRIMMOMATIC = '/usr/local/envs/AmrPlusPlus_env/share/trimmomatic/trimmomatic.jar' 13 | PYTHON3 = "python3" 14 | BWA = "bwa" 15 | SAMTOOLS = "samtools" 16 | BEDTOOLS = "bedtools" 17 | RESISTOME = "resistome" 18 | RAREFACTION = "rarefaction" 19 | SNPFINDER = "snpfinder" 20 | FREEBAYES = "freebayes" 21 | /* These next tools are optional depending on which analyses you want to run */ 22 | KRAKEN2 = "kraken2" 23 | RGI = "/opt/conda/envs/PI_env/bin/rgi" 24 | DIAMOND = "diamond" 25 | } 26 | 27 | 28 | process { 29 | process.executor='slurm' 30 | process.container = 'shub://meglab-metagenomics/amrplusplus_v2' 31 | maxForks = 10 // The maximum number of forks a single process is allowed to spawn 32 | withName:RunQC { 33 | process.qos='normal' 34 | clusterOptions='--job-name=QC%j --qos=normal --time=23:59:00' 35 | } 36 | withName:QCStats { 37 | process.qos='normal' 38 | clusterOptions='--job-name=QCstats%j --qos=normal --time=05:00:00' 39 | } 40 | withName:BuildHostIndex { 41 | process.qos='normal' 42 | clusterOptions='--job-name=hostindex%j --qos=normal --ntasks-per-node=12 --time=23:59:00' 43 | } 44 | withName:BuildAMRIndex { 45 | process.qos='normal' 46 | clusterOptions='--job-name=AMRindex%j --qos=normal --ntasks-per-node=12 --time=23:59:00' 47 | } 48 | withName:DedupReads { 49 | process.qos='normal' 50 | clusterOptions='--job-name=dedup%j --qos=normal --partition=shas --ntasks-per-node=12 --time=23:59:00' 51 | } 52 | withName:AlignReadsToHost { 53 | process.time = '20:00:00' 54 | process.qos='normal' 55 | clusterOptions='--job-name=AlignHost%j --qos=normal --partition=shas --ntasks-per-node=12 --time=23:59:00' 56 | } 57 | withName:RemoveHostDNA { 58 | process.qos='normal' 59 | clusterOptions='--job-name=RMHost%j --qos=normal --partition=shas --ntasks-per-node=12 --time=23:59:00' 60 | } 61 | withName:HostRemovalStats { 62 | process.qos='normal' 63 | clusterOptions='--job-name=hoststats%j --qos=normal --time=05:00:00' 64 | } 65 | withName:NonHostReads { 66 | process.qos='normal' 67 | clusterOptions='--job-name=BAMFastq%j --qos=normal --time=23:59:00' 68 | } 69 | withName:AlignDedupSNPToAMR { 70 | process.qos='normal' 71 | clusterOptions='--job-name=alignAMR%j --qos=normal --time=23:59:00' 72 | } 73 | withName:AlignToAMR { 74 | process.qos='normal' 75 | clusterOptions='--job-name=alignAMR%j --qos=normal --time=23:59:00' 76 | } 77 | withName:DedupRunResistome { 78 | process.qos='normal' 79 | clusterOptions='--job-name=resistome%j --qos=normal --partition=shas --ntasks-per-node=12 --time=23:59:00' 80 | } 81 | withName:RunResistome { 82 | process.qos='normal' 83 | clusterOptions='--job-name=resistome%j --qos=normal --partition=shas --ntasks-per-node=12 --time=23:59:00' 84 | } 85 | withName:RunFreebayes { 86 | process.qos='normal' 87 | clusterOptions='--job-name=freebayes%j --qos=normal --time=23:59:00' 88 | } 89 | withName:RunRarefaction { 90 | process.qos='normal' 91 | clusterOptions='--job-name=rarefaction%j --qos=normal --time=23:59:00 --ntasks-per-node=12' 92 | } 93 | withName:RunSNPFinder { 94 | process.qos='normal' 95 | clusterOptions='--job-name=SNPfinder%j --qos=normal --partition=shas --ntasks-per-node=12 --time=23:59:00' 96 | } 97 | withName:ResistomeResults { 98 | process.qos='normal' 99 | clusterOptions='--job-name=LtoWide%j --qos=normal --time=05:00:00' 100 | } 101 | withName:SNPAlignToAMR { 102 | process.qos='normal' 103 | clusterOptions='--job-name=SNPAlignToAMR%j --qos=normal --time=23:59:00' 104 | } 105 | withName:SNPRunResistome { 106 | process.qos='normal' 107 | clusterOptions='--job-name=SNPresistome%j --qos=normal --time=23:59:00' 108 | } 109 | withName:SNPRunRarefaction { 110 | process.qos='normal' 111 | clusterOptions='--job-name=SNPrarefaction%j --qos=normal --time=23:59:00' 112 | } 113 | withName:SNPconfirmation { 114 | process.qos='normal' 115 | clusterOptions='--job-name=SNPconfirmation%j --qos=normal --time=23:59:00' 116 | module='jdk/1.8.0:singularity/2.5.2' 117 | } 118 | withName:SNPgene_alignment { 119 | process.qos='normal' 120 | clusterOptions='--job-name=SNPalignment%j --qos=normal --time=23:59:00' 121 | } 122 | withName:SNPRunFreebayes { 123 | process.qos='normal' 124 | clusterOptions='--job-name=SNPfreebayes%j --qos=normal --time=23:59:00' 125 | } 126 | withName:SNPRunSNPFinder { 127 | process.qos='normal' 128 | clusterOptions='--job-name=SNPsnpfinder%j --qos=normal --time=23:59:00' 129 | } 130 | withName:SNPResistomeResults { 131 | process.qos='normal' 132 | clusterOptions='--job-name=SNPLongToWide --qos=normal --time=5:00:00' 133 | } 134 | withName:DedupNonSNPResistomeResults { 135 | process.qos='normal' 136 | clusterOptions='--job-name=DedupNonSNPLongToWide --qos=normal --time=5:00:00' 137 | } 138 | withName:HMMResistomeResults { 139 | process.qos='normal' 140 | clusterOptions='--job-name=HMM_LongToWide --qos=normal --time=5:00:00' 141 | } 142 | withName:SamDedupRunResistome { 143 | process.qos='normal' 144 | clusterOptions='--job-name=SamDedupSNPresistome%j --qos=normal --time=23:59:00' 145 | } 146 | withName:SamDedupResistomeResults { 147 | process.qos='normal' 148 | clusterOptions='--job-name=SamDedup_LongToWide --qos=normal --time=5:00:00' 149 | } 150 | withName:Samtools_dedup_HMMcontig_count { 151 | process.qos='normal' 152 | clusterOptions='--job-name=SamDedupSNPresistome%j --qos=normal --time=23:59:00' 153 | } 154 | withName:Samtools_dedup_HMMResistomeResults { 155 | process.qos='normal' 156 | clusterOptions='--job-name=SamDedup_LongToWide --qos=normal --time=5:00:00' 157 | } 158 | withName:ExtractSNP { 159 | process.qos='normal' 160 | clusterOptions='--job-name=ExtractSNP%j --qos=normal --time=23:59:00' 161 | } 162 | withName:RunRGI { 163 | process.qos='normal' 164 | container = 'shub://EnriqueDoster/bioinformatic-nextflow-pipelines:rgi' 165 | clusterOptions='--job-name=RunRGI%j --qos=normal --time=23:59:00' 166 | } 167 | withName:SNPconfirmation { 168 | process.qos='normal' 169 | clusterOptions='--job-name=SNPconfirmation --qos=normal --time=23:59:00' 170 | } 171 | withName:Confirmed_AMR_hits { 172 | process.qos='normal' 173 | clusterOptions='--job-name=Confirmed_AMR_hits --qos=normal --time=23:59:00' 174 | } 175 | withName:Confirmed_ResistomeResults { 176 | process.qos='normal' 177 | clusterOptions='--job-name=Confirmed_ResistomeResults --qos=normal --time=23:59:00' 178 | } 179 | withName:ExtractDedupSNP { 180 | process.qos='normal' 181 | clusterOptions='--job-name=ExtractDedupSNP --qos=normal --time=23:59:00' 182 | } 183 | withName:RunDedupRGI { 184 | process.qos='normal' 185 | container = 'shub://EnriqueDoster/bioinformatic-nextflow-pipelines:rgi' 186 | clusterOptions='--job-name=RunDedupRGI --qos=normal --time=23:59:00' 187 | } 188 | withName:DedupSNPconfirmation { 189 | process.qos='normal' 190 | clusterOptions='--job-name=DedupSNPconfirmation --qos=normal --time=23:59:00' 191 | } 192 | withName:ConfirmDedupAMRHits { 193 | process.qos='normal' 194 | clusterOptions='--job-name=ConfirmDedupAMRHits --qos=normal --time=23:59:00' 195 | } 196 | withName:DedupSNPConfirmed_ResistomeResults { 197 | process.qos='normal' 198 | clusterOptions='--job-name=DedupSNPConfirmed_ResistomeResults --qos=normal --time=23:59:00' 199 | } 200 | } 201 | -------------------------------------------------------------------------------- /containers/Singularity: -------------------------------------------------------------------------------- 1 | Bootstrap: docker 2 | From: debian:jessie-slim 3 | 4 | #Includes trimmomatic, samtools, bwa, bedtools, vcftools, htslib, kraken2, SNPfinder, freebayes, bbmap 5 | 6 | %environment 7 | export LC_ALL=C 8 | 9 | %post 10 | apt update \ 11 | && apt install -y --no-install-recommends \ 12 | build-essential ca-certificates sudo tcsh\ 13 | git make automake autoconf openjdk-7-jre wget gzip unzip sed\ 14 | zlib1g-dev curl libbz2-dev locales libncurses5-dev liblzma-dev libcurl4-openssl-dev software-properties-common apt-transport-https\ 15 | python3-pip python3-docopt python3-pytest python-dev python3-dev\ 16 | libcurl4-openssl-dev libssl-dev zlib1g-dev fonts-texgyre \ 17 | gcc g++ gfortran libblas-dev liblapack-dev dos2unix libstdc++6\ 18 | r-base-core r-recommended hmmer\ 19 | && rm -rf /var/lib/apt/lists/* 20 | 21 | 22 | wget -c https://repo.continuum.io/archive/Anaconda3-2020.02-Linux-x86_64.sh 23 | sh Anaconda3-2020.02-Linux-x86_64.sh -bfp /usr/local 24 | 25 | # add bioconda channels 26 | conda config --add channels defaults 27 | conda config --add channels conda-forge 28 | conda config --add channels bioconda 29 | 30 | # install bulk of bioinformatic tools using conda 31 | conda create -n AmrPlusPlus_env python=3 trimmomatic bwa samtools bedtools freebayes bbmap vcftools htslib kraken2 32 | 33 | . /usr/local/bin/activate AmrPlusPlus_env 34 | 35 | #ln -s /usr/local/envs/AmrPlusPlus_env/bin/* /usr/local/bin/ 36 | 37 | #Still experimenting with how to change $PATH location. 38 | echo 'export PATH=$PATH:/usr/local/envs/AmrPlusPlus_env/bin/' >> $SINGULARITY_ENVIRONMENT 39 | 40 | # SNPfinder 41 | cd /usr/local 42 | git clone https://github.com/cdeanj/snpfinder.git 43 | cd snpfinder 44 | make 45 | cp snpfinder /usr/local/bin 46 | cd / 47 | 48 | # Make sure all the tools have the right permissions to use the tools 49 | chmod -R 777 /usr/local/ 50 | 51 | %test 52 | 53 | -------------------------------------------------------------------------------- /containers/Singularity.RGI: -------------------------------------------------------------------------------- 1 | Bootstrap: docker 2 | From: debian:jessie-slim 3 | 4 | #Includes Resistance Gene Identifier (RGI) 5 | 6 | %environment 7 | export LC_ALL=C 8 | 9 | %post 10 | apt update \ 11 | && apt install -y --no-install-recommends \ 12 | build-essential ca-certificates sudo tcsh\ 13 | git make automake autoconf openjdk-7-jre wget gzip unzip sed\ 14 | zlib1g-dev curl libbz2-dev locales libncurses5-dev liblzma-dev libcurl4-openssl-dev software-properties-common apt-transport-https\ 15 | python3-pip python3-docopt python3-pytest python-dev python3-dev\ 16 | libcurl4-openssl-dev libssl-dev zlib1g-dev fonts-texgyre \ 17 | gcc g++ gfortran libblas-dev liblapack-dev dos2unix libstdc++6\ 18 | && rm -rf /var/lib/apt/lists/* 19 | 20 | 21 | wget -c https://repo.continuum.io/archive/Anaconda3-2020.02-Linux-x86_64.sh 22 | sh Anaconda3-2020.02-Linux-x86_64.sh -bfp /usr/local 23 | 24 | # add bioconda channels 25 | conda config --add channels defaults 26 | conda config --add channels conda-forge 27 | conda config --add channels bioconda 28 | 29 | # install bulk of bioinformatic tools using conda 30 | conda create -n AmrPlusPlus_env rgi 31 | 32 | . /usr/local/bin/activate AmrPlusPlus_env 33 | 34 | #change $PATH 35 | echo 'export PATH=/usr/local/envs/AmrPlusPlus_env/bin/:$PATH' >> $SINGULARITY_ENVIRONMENT 36 | 37 | 38 | # Make sure all the tools have the right permissions to use the tools 39 | chmod -R 777 /usr/local/ 40 | 41 | # This downloads the latest CARD database and attempts to load it for RGI 42 | # Doesn't seem to work due to the github [RGI issue #60](https://github.com/arpcard/rgi/issues/60) 43 | #wget -q -O card-data.tar.bz2 https://card.mcmaster.ca/latest/data && tar xfvj card-data.tar.bz2 44 | #/usr/local/envs/AmrPlusPlus_env/bin/rgi load -i card.json 45 | 46 | %test 47 | 48 | 49 | -------------------------------------------------------------------------------- /data/HMM.tar.xz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meglab-metagenomics/amrplusplus_v2/f8e6f9427e7ed63c0f1ac901ebbe4eddd5053ed2/data/HMM.tar.xz -------------------------------------------------------------------------------- /data/adapters/nextera.fa: -------------------------------------------------------------------------------- 1 | >TruSeqR2_nextera 2 | CTGTCTCTTATACACATCTCCGAGCCCACGAGACTAAGGCGAATCTCGTATGCCGTCTTCTGCTTGAAAA 3 | >nextera_R2_right_side_adapter 4 | CTGTCTCTTATACACATCTGACGCTGCCGACGAGCGATCTAGTGTAGATCTCGGTGGTCGCCGTATCATTAAAA 5 | >I5_Nextera_Transposase_1 6 | CTGTCTCTTATACACATCTGACGCTGCCGACGA 7 | >I7_Nextera_Transposase_1 8 | CTGTCTCTTATACACATCTCCGAGCCCACGAGAC 9 | >I5_Nextera_Transposase_2 10 | CTGTCTCTTATACACATCTCTGATGGCGCGAGGGAGGC 11 | >I7_Nextera_Transposase_2 12 | CTGTCTCTTATACACATCTCTGAGCGGGCTGGCAAGGC 13 | >I5_Primer_Nextera_XT_and_Nextera_Enrichment_[N/S/E]501 14 | GACGCTGCCGACGAGCGATCTAGTGTAGATCTCGGTGGTCGCCGTATCATT 15 | >I5_Primer_Nextera_XT_and_Nextera_Enrichment_[N/S/E]502 16 | GACGCTGCCGACGAATAGAGAGGTGTAGATCTCGGTGGTCGCCGTATCATT 17 | >I5_Primer_Nextera_XT_and_Nextera_Enrichment_[N/S/E]503 18 | GACGCTGCCGACGAAGAGGATAGTGTAGATCTCGGTGGTCGCCGTATCATT 19 | >I5_Primer_Nextera_XT_and_Nextera_Enrichment_[N/S/E]504 20 | GACGCTGCCGACGATCTACTCTGTGTAGATCTCGGTGGTCGCCGTATCATT 21 | >I5_Primer_Nextera_XT_and_Nextera_Enrichment_[N/S/E]505 22 | GACGCTGCCGACGACTCCTTACGTGTAGATCTCGGTGGTCGCCGTATCATT 23 | >I5_Primer_Nextera_XT_and_Nextera_Enrichment_[N/S/E]506 24 | GACGCTGCCGACGATATGCAGTGTGTAGATCTCGGTGGTCGCCGTATCATT 25 | >I5_Primer_Nextera_XT_and_Nextera_Enrichment_[N/S/E]507 26 | GACGCTGCCGACGATACTCCTTGTGTAGATCTCGGTGGTCGCCGTATCATT 27 | >I5_Primer_Nextera_XT_and_Nextera_Enrichment_[N/S/E]508 28 | GACGCTGCCGACGAAGGCTTAGGTGTAGATCTCGGTGGTCGCCGTATCATT 29 | >I5_Primer_Nextera_XT_and_Nextera_Enrichment_[N/S/E]517 30 | GACGCTGCCGACGATCTTACGCGTGTAGATCTCGGTGGTCGCCGTATCATT 31 | >I7_Primer_Nextera_XT_and_Nextera_Enrichment_N701 32 | CCGAGCCCACGAGACTAAGGCGAATCTCGTATGCCGTCTTCTGCTTG 33 | >I7_Primer_Nextera_XT_and_Nextera_Enrichment_N702 34 | CCGAGCCCACGAGACCGTACTAGATCTCGTATGCCGTCTTCTGCTTG 35 | >I7_Primer_Nextera_XT_and_Nextera_Enrichment_N703 36 | CCGAGCCCACGAGACAGGCAGAAATCTCGTATGCCGTCTTCTGCTTG 37 | >I7_Primer_Nextera_XT_and_Nextera_Enrichment_N704 38 | CCGAGCCCACGAGACTCCTGAGCATCTCGTATGCCGTCTTCTGCTTG 39 | >I7_Primer_Nextera_XT_and_Nextera_Enrichment_N705 40 | CCGAGCCCACGAGACGGACTCCTATCTCGTATGCCGTCTTCTGCTTG 41 | >I7_Primer_Nextera_XT_and_Nextera_Enrichment_N706 42 | CCGAGCCCACGAGACTAGGCATGATCTCGTATGCCGTCTTCTGCTTG 43 | >I7_Primer_Nextera_XT_and_Nextera_Enrichment_N707 44 | CCGAGCCCACGAGACCTCTCTACATCTCGTATGCCGTCTTCTGCTTG 45 | >I7_Primer_Nextera_XT_and_Nextera_Enrichment_N708 46 | CCGAGCCCACGAGACCAGAGAGGATCTCGTATGCCGTCTTCTGCTTG 47 | >I7_Primer_Nextera_XT_and_Nextera_Enrichment_N709 48 | CCGAGCCCACGAGACGCTACGCTATCTCGTATGCCGTCTTCTGCTTG 49 | >I7_Primer_Nextera_XT_and_Nextera_Enrichment_N710 50 | CCGAGCCCACGAGACCGAGGCTGATCTCGTATGCCGTCTTCTGCTTG 51 | >I7_Primer_Nextera_XT_and_Nextera_Enrichment_N711 52 | CCGAGCCCACGAGACAAGAGGCAATCTCGTATGCCGTCTTCTGCTTG 53 | >I7_Primer_Nextera_XT_and_Nextera_Enrichment_N712 54 | CCGAGCCCACGAGACGTAGAGGAATCTCGTATGCCGTCTTCTGCTTG 55 | >I5_Primer_Nextera_XT_Index_Kit_v2_S502 56 | GACGCTGCCGACGAATAGAGAGGTGTAGATCTCGGTGGTCGCCGTATCATT 57 | >I5_Primer_Nextera_XT_Index_Kit_v2_S503 58 | GACGCTGCCGACGAAGAGGATAGTGTAGATCTCGGTGGTCGCCGTATCATT 59 | >I5_Primer_Nextera_XT_Index_Kit_v2_S505 60 | GACGCTGCCGACGACTCCTTACGTGTAGATCTCGGTGGTCGCCGTATCATT 61 | >I5_Primer_Nextera_XT_Index_Kit_v2_S506 62 | GACGCTGCCGACGATATGCAGTGTGTAGATCTCGGTGGTCGCCGTATCATT 63 | >I5_Primer_Nextera_XT_Index_Kit_v2_S507 64 | GACGCTGCCGACGATACTCCTTGTGTAGATCTCGGTGGTCGCCGTATCATT 65 | >I5_Primer_Nextera_XT_Index_Kit_v2_S508 66 | GACGCTGCCGACGAAGGCTTAGGTGTAGATCTCGGTGGTCGCCGTATCATT 67 | >I5_Primer_Nextera_XT_Index_Kit_v2_S510 68 | GACGCTGCCGACGAATTAGACGGTGTAGATCTCGGTGGTCGCCGTATCATT 69 | >I5_Primer_Nextera_XT_Index_Kit_v2_S511 70 | GACGCTGCCGACGACGGAGAGAGTGTAGATCTCGGTGGTCGCCGTATCATT 71 | >I5_Primer_Nextera_XT_Index_Kit_v2_S513 72 | GACGCTGCCGACGACTAGTCGAGTGTAGATCTCGGTGGTCGCCGTATCATT 73 | >I5_Primer_Nextera_XT_Index_Kit_v2_S515 74 | GACGCTGCCGACGAAGCTAGAAGTGTAGATCTCGGTGGTCGCCGTATCATT 75 | >I5_Primer_Nextera_XT_Index_Kit_v2_S516 76 | GACGCTGCCGACGAACTCTAGGGTGTAGATCTCGGTGGTCGCCGTATCATT 77 | >I5_Primer_Nextera_XT_Index_Kit_v2_S517 78 | GACGCTGCCGACGATCTTACGCGTGTAGATCTCGGTGGTCGCCGTATCATT 79 | >I5_Primer_Nextera_XT_Index_Kit_v2_S518 80 | GACGCTGCCGACGACTTAATAGGTGTAGATCTCGGTGGTCGCCGTATCATT 81 | >I5_Primer_Nextera_XT_Index_Kit_v2_S520 82 | GACGCTGCCGACGAATAGCCTTGTGTAGATCTCGGTGGTCGCCGTATCATT 83 | >I5_Primer_Nextera_XT_Index_Kit_v2_S521 84 | GACGCTGCCGACGATAAGGCTCGTGTAGATCTCGGTGGTCGCCGTATCATT 85 | >I5_Primer_Nextera_XT_Index_Kit_v2_S522 86 | GACGCTGCCGACGATCGCATAAGTGTAGATCTCGGTGGTCGCCGTATCATT 87 | >I7_Primer_Nextera_XT_Index_Kit_v2_N701 88 | CCGAGCCCACGAGACTAAGGCGAATCTCGTATGCCGTCTTCTGCTTG 89 | >I7_Primer_Nextera_XT_Index_Kit_v2_N702 90 | CCGAGCCCACGAGACCGTACTAGATCTCGTATGCCGTCTTCTGCTTG 91 | >I7_Primer_Nextera_XT_Index_Kit_v2_N703 92 | CCGAGCCCACGAGACAGGCAGAAATCTCGTATGCCGTCTTCTGCTTG 93 | >I7_Primer_Nextera_XT_Index_Kit_v2_N704 94 | CCGAGCCCACGAGACTCCTGAGCATCTCGTATGCCGTCTTCTGCTTG 95 | >I7_Primer_Nextera_XT_Index_Kit_v2_N705 96 | CCGAGCCCACGAGACGGACTCCTATCTCGTATGCCGTCTTCTGCTTG 97 | >I7_Primer_Nextera_XT_Index_Kit_v2_N706 98 | CCGAGCCCACGAGACTAGGCATGATCTCGTATGCCGTCTTCTGCTTG 99 | >I7_Primer_Nextera_XT_Index_Kit_v2_N707 100 | CCGAGCCCACGAGACCTCTCTACATCTCGTATGCCGTCTTCTGCTTG 101 | >I7_Primer_Nextera_XT_Index_Kit_v2_N710 102 | CCGAGCCCACGAGACCGAGGCTGATCTCGTATGCCGTCTTCTGCTTG 103 | >I7_Primer_Nextera_XT_Index_Kit_v2_N711 104 | CCGAGCCCACGAGACAAGAGGCAATCTCGTATGCCGTCTTCTGCTTG 105 | >I7_Primer_Nextera_XT_Index_Kit_v2_N712 106 | CCGAGCCCACGAGACGTAGAGGAATCTCGTATGCCGTCTTCTGCTTG 107 | >I7_Primer_Nextera_XT_Index_Kit_v2_N714 108 | CCGAGCCCACGAGACGCTCATGAATCTCGTATGCCGTCTTCTGCTTG 109 | >I7_Primer_Nextera_XT_Index_Kit_v2_N715 110 | CCGAGCCCACGAGACATCTCAGGATCTCGTATGCCGTCTTCTGCTTG 111 | >I7_Primer_Nextera_XT_Index_Kit_v2_N716 112 | CCGAGCCCACGAGACACTCGCTAATCTCGTATGCCGTCTTCTGCTTG 113 | >I7_Primer_Nextera_XT_Index_Kit_v2_N718 114 | CCGAGCCCACGAGACGGAGCTACATCTCGTATGCCGTCTTCTGCTTG 115 | >I7_Primer_Nextera_XT_Index_Kit_v2_N719 116 | CCGAGCCCACGAGACGCGTAGTAATCTCGTATGCCGTCTTCTGCTTG 117 | >I7_Primer_Nextera_XT_Index_Kit_v2_N720 118 | CCGAGCCCACGAGACCGGAGCCTATCTCGTATGCCGTCTTCTGCTTG 119 | >I7_Primer_Nextera_XT_Index_Kit_v2_N721 120 | CCGAGCCCACGAGACTACGCTGCATCTCGTATGCCGTCTTCTGCTTG 121 | >I7_Primer_Nextera_XT_Index_Kit_v2_N722 122 | CCGAGCCCACGAGACATGCGCAGATCTCGTATGCCGTCTTCTGCTTG 123 | >I7_Primer_Nextera_XT_Index_Kit_v2_N723 124 | CCGAGCCCACGAGACTAGCGCTCATCTCGTATGCCGTCTTCTGCTTG 125 | >I7_Primer_Nextera_XT_Index_Kit_v2_N724 126 | CCGAGCCCACGAGACACTGAGCGATCTCGTATGCCGTCTTCTGCTTG 127 | >I7_Primer_Nextera_XT_Index_Kit_v2_N726 128 | CCGAGCCCACGAGACCCTAAGACATCTCGTATGCCGTCTTCTGCTTG 129 | >I7_Primer_Nextera_XT_Index_Kit_v2_N727 130 | CCGAGCCCACGAGACCGATCAGTATCTCGTATGCCGTCTTCTGCTTG 131 | >I7_Primer_Nextera_XT_Index_Kit_v2_N728 132 | CCGAGCCCACGAGACTGCAGCTAATCTCGTATGCCGTCTTCTGCTTG 133 | >I7_Primer_Nextera_XT_Index_Kit_v2_N729 134 | CCGAGCCCACGAGACTCGACGTCATCTCGTATGCCGTCTTCTGCTTG 135 | >I5_Adapter_Nextera 136 | CTGATGGCGCGAGGGAGGCGTGTAGATCTCGGTGGTCGCCGTATCATT 137 | >I7_Adapter_Nextera_No_Barcode 138 | CTGAGCGGGCTGGCAAGGCAGACCGATCTCGTATGCCGTCTTCTGCTTG 139 | >Nextera_LMP_Read1_External_Adapter 140 | GATCGGAAGAGCACACGTCTGAACTCCAGTCAC 141 | >Nextera_LMP_Read2_External_Adapter 142 | GATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT 143 | -------------------------------------------------------------------------------- /data/host/chr21.fasta.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meglab-metagenomics/amrplusplus_v2/f8e6f9427e7ed63c0f1ac901ebbe4eddd5053ed2/data/host/chr21.fasta.gz -------------------------------------------------------------------------------- /data/raw/S1_test_R1.fastq.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meglab-metagenomics/amrplusplus_v2/f8e6f9427e7ed63c0f1ac901ebbe4eddd5053ed2/data/raw/S1_test_R1.fastq.gz -------------------------------------------------------------------------------- /data/raw/S1_test_R2.fastq.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meglab-metagenomics/amrplusplus_v2/f8e6f9427e7ed63c0f1ac901ebbe4eddd5053ed2/data/raw/S1_test_R2.fastq.gz -------------------------------------------------------------------------------- /data/raw/S2_test_R1.fastq.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meglab-metagenomics/amrplusplus_v2/f8e6f9427e7ed63c0f1ac901ebbe4eddd5053ed2/data/raw/S2_test_R1.fastq.gz -------------------------------------------------------------------------------- /data/raw/S2_test_R2.fastq.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meglab-metagenomics/amrplusplus_v2/f8e6f9427e7ed63c0f1ac901ebbe4eddd5053ed2/data/raw/S2_test_R2.fastq.gz -------------------------------------------------------------------------------- /data/raw/S3_test_R1.fastq.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meglab-metagenomics/amrplusplus_v2/f8e6f9427e7ed63c0f1ac901ebbe4eddd5053ed2/data/raw/S3_test_R1.fastq.gz -------------------------------------------------------------------------------- /data/raw/S3_test_R2.fastq.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meglab-metagenomics/amrplusplus_v2/f8e6f9427e7ed63c0f1ac901ebbe4eddd5053ed2/data/raw/S3_test_R2.fastq.gz -------------------------------------------------------------------------------- /docs/AmrPlusPlus_Pipeline_workflow.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meglab-metagenomics/amrplusplus_v2/f8e6f9427e7ed63c0f1ac901ebbe4eddd5053ed2/docs/AmrPlusPlus_Pipeline_workflow.pdf -------------------------------------------------------------------------------- /docs/CHANGELOG.md: -------------------------------------------------------------------------------- 1 | Details on AMR++ updates 2 | ------------ 3 | 4 | ## 2020-05-21 : AMR++ v2.0.2 update 5 | Fixed a mistake with the config/singularity.config file to correctly call the singularity container anytime that RGI is run. 6 | 7 | ## 2020-05-21 : AMR++ v2.0.1 update 8 | We identified issues with running RGI thanks to github users, AroArz and DiegoBrambilla. As of this update, RGI developers are focused on contributing to the COVID-19 response, so we plan to reconvene with them when their schedule opens up. In the meantime, we are releasing updates to continue AMR++ functionality. 9 | We found that the errors were associated with RGI bugs that were previously reported: 10 | * In [RGI issue #93](https://github.com/arpcard/rgi/issues/93), the github user, mahesh-panchal, reported that you need to run the "rgi main" command twice on the same dataset for it to successfully complete the analysis. The RGI component of AMR++ has been updated to work for now, but we plan for further changes to clean up the code. 11 | * In [RGI issue #60](https://github.com/arpcard/rgi/issues/60), caspargross reported issues with containerizing RGI due to requirements for a "writable file system". As a temporary fix, we updated AMR++ code so that the user has to download the CARD database locally and use an additional flag to specify the location of the local database. 12 | * Errors in running RGI will now be "ignored" so that the pipeline continues running but still provides any temporary files created with RGI. This should allow you to troubleshoot on your own or run any additional analysis using the reads aligning to gene accessions that "RequireSNPConfirmation" 13 | 14 | Other updates: 15 | * minor fixes to singularity/slurm configuration files 16 | * updated the "resistome" and "rarefaction" code. It is now included in the "bin" directory. 17 | * updated the script for downloading the latest version of mini-kraken 18 | * created new singularity container just for the RGI software 19 | * output zipped nonhost files directly 20 | -------------------------------------------------------------------------------- /docs/FAQs.md: -------------------------------------------------------------------------------- 1 | Troubleshooting and frequently asked questions (FAQs) 2 | ------------ 3 | 4 | Many errors that may be encountered may ultimately be the result of user error. If you encounter an error message any time that this pipeline is used, carefully check the command you used for any spelling errors. Additionally, many of these error messages give some detail as too where the code is wrong. Here are a few common errors and our suggestions for basic troubleshooting. 5 | 6 | * Are you using the correct "profile" to run AmrPlusPlus? 7 | * We provide many examples of profile configurationg and choosing the correct one depends on your computing environment. 8 | * If you have singularity installed on your server, we recommend using the "singularity" profile to avoid the installation of any additional tools. 9 | * If you already have the tools installed on your server, the best option is to configure the local.config file to point to the absolute PATH to each too. 10 | 11 | * Are the right user permissions are granted to the file/directory/server in which you are going to run the pipeline? 12 | * In servers with multiple users, there are often cases in which certain directories give some users more editing privileges than others. Start by navigating to the directory in which you will be working. Next, type “ls -lha or ls -l”. This produces a list of all files in that directory and info on what permissions the user has using the “-rwxrwxrwx” scheme; r = read permissions, w = writing permissions, and x = execute permissions). 13 | * Permission errors could be due to the directories chosen for the pipeline output or individual bioinformatic tools installed by other users, for example. 14 | * Review this tutorial for more information regarding file permissions: https://www.guru99.com/file-permissions.html 15 | 16 | 17 | -------------------------------------------------------------------------------- /docs/accessing_AMR++.md: -------------------------------------------------------------------------------- 1 | Accessing AMR++ 2 | ------------ 3 | 4 | This section will help you get access to all the bioinformatic tools required for metagenomic analysis with AMR++. 5 | 6 | Amazon Web Services 7 | ----- 8 | 9 | In order to facilitate evaluation of the MEGARes 2.0 database and the functionality of AMR++ 2.0 pipeline, we have provided free access to an Amazon Machine Image (AMI) with example files for analysis. AMR++ 2.0 is pre-installed and fully integrated with all necessary bioinformatic tools and dependencies within an AMI named "Microbial_Ecology_Group_AMR_AMI", allowing users to easily employ the AMR++ v2.0 pipeline within the Amazon Web Services (AWS) ecosystem. Please follow the instructions on amazon web services for details on creating your own EC2 instance (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html). With this approach, users pay for the cost of a suitable AWS EC2 instance without the challenge of accessing large computing clusters and individually installing each piece of software necessary to run the pipeline (including all dependencies). Integration within AWS also allows users to scale the computing resources to fit the needs of any project size. 10 | 11 | Singularity container 12 | ----------------- 13 | 14 | Singularity containers allow the packaging of multiple bioinformatic tools. While singularity is a popular tool and likely to be supported by many computing clusters, please contact your system administrator for help with installing singularity. Installation on a local computer is also an option and can be performed by following these instructions: https://sylabs.io/guides/3.0/user-guide/installation.html 15 | 16 | We provide AMR++ with a singularity container that is automatically accessed when running the AMR++ pipeline by using the flag, "-profile singularity". Additionally, the singularity container is supported on singularity-hub.org and can be used locally for custom analysis (https://singularity-hub.org/collections/3418). P 17 | 18 | ```bash 19 | # Choose your preference to pull the container from Singularity Hub (once) 20 | $ singularity pull shub://meglab-metagenomics/amrplusplus_v2 21 | 22 | # Then interact with it (enter "exit" to leave the singularity container): 23 | $ singularity shell amrplusplus_v2.sif 24 | 25 | ``` 26 | 27 | 28 | -------------------------------------------------------------------------------- /docs/configuration.md: -------------------------------------------------------------------------------- 1 | Configuration 2 | ------------- 3 | 4 | The pipeline source code comes with a configuration file that can be used to set environment variables or default command-line options. Setting these variables before hand may be useful in situations when you do not want to specify a long list of options from the command line. This configuration file can be found in the root source code directory and is called **nextflow.config**. You can modify this file, save the changes, and run the pipeline directly. 5 | 6 | 7 | Customize Environment Variables using profiles 8 | ---------------------------------------------- 9 | 10 | The **nextflow.config** contains a section that allows the use of environment "profiles" when running AmrPlusPlus. Further information for each profile can be found within the /config directory. In brief, profiles allow control over how the pipeline is run on different computing clusters. We recommend the "singularity" profile which employs a Singularity container with all the required bioinformatic tools. 11 | 12 | 13 | ```bash 14 | profiles { 15 | local { 16 | includeConfig "config/local.config" 17 | } 18 | local_angus { 19 | includeConfig "config/local_angus.config" 20 | } 21 | local_MSI { 22 | includeConfig "config/local_MSI.config" 23 | } 24 | slurm { 25 | process.executor = 'slurm' 26 | includeConfig "config/slurm.config" 27 | process.container = 'shub://meglab-metagenomics/amrplusplus_v2' 28 | } 29 | singularity { 30 | includeConfig "config/singularity.config" 31 | process.container = 'shub://meglab-metagenomics/amrplusplus_v2' 32 | } 33 | } 34 | ``` 35 | 36 | Customize Command-line Options 37 | ------------------------------ 38 | 39 | The params section allows you to set the different commmand-line options that can be used within the pipeline. Here, you can specify input/output options, trimming options, and algorithm options. 40 | 41 | If you intend to run multiple samples in parallel, you must specify a glob pattern for your sequence data as shown for the **reads** parameter. For more information on globs, please see this related [article](https://en.wikipedia.org/wiki/Glob_(programming)). 42 | 43 | 44 | By default, the pipeline uses the default minikraken database (~4GB) to classify and assign taxonomic labels to your sequences. As Kraken loads this database into memory, this mini database is particularly useful for people who do not have access to large memory servers. We provide a script to easily download the minikraken database. 45 | 46 | > sh download_minikraken.sh 47 | 48 | If you would like to use a custom database or the standard Kraken database (~160GB), you will need to build it yourself and modify the **kraken_db** environment variable in the nextflow.config file to point to its location on your machine. 49 | 50 | 51 | ```bash 52 | params { 53 | /* Location of forward and reverse read pairs */ 54 | reads = "data/raw/*_{1,2}.fastq.gz" 55 | 56 | /* Location of adapter sequences */ 57 | adapters = "data/adapters/nextera.fa" 58 | 59 | /* Location of tab delimited adapter sequences */ 60 | fqc_adapters = "data/adapters/nextera.tab" 61 | 62 | /* Location of host genome index files */ 63 | host_index = "" 64 | 65 | /* Location of host genome */ 66 | host = "data/host/chr21.fasta.gz" 67 | 68 | /* Kraken database location, default is "none" */ 69 | kraken_db = "minikraken2_v2_8GB_201904_UPDATE" 70 | 71 | /* Location of amr index files */ 72 | amr_index = "" 73 | 74 | /* Location of antimicrobial resistance (MEGARes) database */ 75 | amr = "data/amr/megares_database_v1.02.fasta" 76 | 77 | /* Location of amr annotation file */ 78 | annotation = "data/amr/megares_annotations_v1.02.csv" 79 | 80 | /* Location of SNP metadata */ 81 | snp_annotation = "data/amr/snp_location_metadata.csv" 82 | 83 | /* Location of SNP confirmation script */ 84 | snp_confirmation = "bin/snp_confirmation.py" 85 | 86 | /* Output directory */ 87 | output = "test_results" 88 | 89 | /* Number of threads */ 90 | threads = 10 91 | smem_threads = 12 92 | 93 | /* Trimmomatic trimming parameters */ 94 | leading = 10 95 | trailing = 3 96 | slidingwindow = "4:15" 97 | minlen = 36 98 | 99 | /* Resistome threshold */ 100 | threshold = 80 101 | 102 | /* Starting rarefaction level */ 103 | min = 5 104 | 105 | /* Ending rarefaction level */ 106 | max = 100 107 | 108 | /* Number of levels to skip */ 109 | skip = 5 110 | 111 | /* Number of iterations to sample at */ 112 | samples = 1 113 | 114 | /* Display help message */ 115 | help = false 116 | } 117 | ``` 118 | -------------------------------------------------------------------------------- /docs/contact.md: -------------------------------------------------------------------------------- 1 | Contact 2 | ------- 3 | 4 | Questions, comments, or feature requests can be directed to meglab.metagenomics@gmail.com. 5 | 6 | View our website for further information: 7 | http://megares.meglab.org/ 8 | -------------------------------------------------------------------------------- /docs/dependencies.md: -------------------------------------------------------------------------------- 1 | Dependencies 2 | ------------ 3 | 4 | AmrPlusPlus uses a variety of open-source tools. The tools used, descriptions, and version specifics are provided below. 5 | 6 | ### Bedtools 7 | - Description: Bedtools is a suite of tools that can be used to compute and extract useful information from BAM, BED, and BCF files. 8 | - Version: 2.28.0 9 | - DOI: https://doi.org/10.1093/bioinformatics/btq033 10 | 11 | ### BWA 12 | - Description: BWA is a short and long read sequence aligner for aligning raw sequence data to a reference genome. 13 | - Version: 0.7.17 14 | - DOI: https://doi.org/10.1093/bioinformatics/btp324 15 | 16 | ### Kraken2 17 | - Description: Kraken is a fast taxonomic sequence classifier that assigns taxonomy labels to short-reads. 18 | - Version: 2.0.8 19 | - DOI: https://doi.org/10.1186/gb-2014-15-3-r46 20 | 21 | ### RarefactionAnalyzer 22 | - Description: RarefactionAnalyzer is a tool that can be used for performing rarefaction analysis. 23 | - Version: 0.0.0 24 | - DOI: https://doi.org/10.1093/nar/gkw1009 25 | 26 | ### ResistomeAnalyzer 27 | - Description: ResistomeAnalyzer is a tool for analyzing the resistome of large metagenomic datasets. 28 | - Version: 0.0.0 29 | - DOI: https://doi.org/10.1093/nar/gkw1009 30 | 31 | ### Samtools 32 | - Description: Samtools is a program for manipulating and extracting useful information from alignment files in SAM or BAM format. 33 | - Version: 1.9 34 | - DOI: https://doi.org/10.1093/bioinformatics/btp352 35 | 36 | ### SNPFinder 37 | - Description: SNPFinder is a haplotype variant caller that can be used for metagenomics datasets. 38 | - Version: 0.0.0 39 | - DOI: https://doi.org/10.1093/nar/gkw1009 40 | 41 | ### Trimmomatic 42 | - Description: Trimmomatic is a tool for removing low quality base pairs (bps) and adapter sequences from raw sequence data. 43 | - Version: 0.39 44 | - DOI: https://doi.org/10.1093/bioinformatics/btu170 45 | 46 | ### Freebayes 47 | - Description: Trimmomatic is a tool for removing low quality base pairs (bps) and adapter sequences from raw sequence data. 48 | - Version: 1.3.1 49 | - https://arxiv.org/abs/1207.3907v2 50 | 51 | ### Resistance Gene Identifier 52 | - Description: Trimmomatic is a tool for removing low quality base pairs (bps) and adapter sequences from raw sequence data. 53 | - Version: 0.39 54 | - https://card.mcmaster.ca/analyze/rgi 55 | 56 | 57 | 58 | -------------------------------------------------------------------------------- /docs/installation.md: -------------------------------------------------------------------------------- 1 | Installation 2 | ------------ 3 | 4 | This section will help you get started with running the AmrPlusPlus pipeline with Nextflow and Docker. This tutorial assumes you will be running the pipeline from a POSIX compatible system such as Linux, Solaris, or OS X. 5 | 6 | Setup 7 | ----- 8 | 9 | We will go over a typical pipeline setup scenario in which you connect to a remote server, install Nextflow, and download the pipeline source code. For the easist use of AmrPlusPlus, make sure that Singularity is installed and in your $PATH variable. 10 | Visit this website for further information: 11 | https://singularity.lbl.gov/docs-installation 12 | 13 | If Singularity cannot be installed, configure the "config/local.config" file to specify the absolute PATH to each required bioinformatic tool. Then, change the flag after "-profile" to "local" when running the pipeline. 14 | 15 | ```bash 16 | # username and host address 17 | $ ssh [USER]@[HOST] 18 | 19 | # Check if you have nextflow installed, 20 | $ nextflow -h 21 | 22 | # If not available, install Nextflow 23 | $ curl -s https://get.nextflow.io | bash 24 | # If you do not have curl installed, try wget 25 | # $ wget -qO- https://get.nextflow.io | bash 26 | 27 | # give write permissions to user 28 | $ chmod u+x nextflow 29 | 30 | # move nextflow executable to a folder in your PATH environment variable 31 | $ mv nextflow $HOME/bin 32 | 33 | # create a test directory and change into it 34 | $ mkdir amr_test && cd amr_test 35 | 36 | # clone pipeline source code 37 | $ git clone https://github.com/meglab-metagenomics/amrplusplus_v2.git . 38 | ``` 39 | 40 | Run a Simple Test 41 | ----------------- 42 | 43 | We will run a small sample dataset that comes with the pipeline source code. As such, we will not be specifying any input paths as they have already been included. During the program's execution, the required tool dependencies will be accessed using a Singularity container. As there are many tool dependencies, downloading the container could take some time depending on your connection speed. 44 | 45 | ```bash 46 | # navigate into AmrPlusPlus repository 47 | $ cd amrplusplus_v2/ 48 | 49 | # command to run the amrplusplus pipeline 50 | $ nextflow run main_AmrPlusPlus_v2.nf -profile singularity --output test_results 51 | 52 | # change directories to view pipeline outputs 53 | $ cd test/ 54 | ``` 55 | 56 | 57 | -------------------------------------------------------------------------------- /docs/output.md: -------------------------------------------------------------------------------- 1 | Output 2 | ------ 3 | 4 | All intermediate outputs produced from each module of this pipeline are provided as flat files that can be viewed in a text editor. These files are copied from the root **work/** directory created by Nextflow, so if disk space is a concern, this directory should be deleted as it can get quite large. 5 | 6 | Directory Structure 7 | ------------------- 8 | 9 | The output directories created by the pipeline are named after the module that produced them. Each file output is prefixed with the sample name and suffixed with a short product description. 10 | 11 | Files without sample prefixes are a result of aggregation. For example, the files **host.removal.stats** and **trimmomatic.stats** provide count matrices for the number of reads discarded as a result of host-dna removal and number of trimmed reads for each sample. 12 | 13 | ```bash 14 | ├── RunQC 15 | │ ├── Paired 16 | │ │ ├── SRR532663.1P.fastq 17 | │ │ └── SRR532663.2P.fastq 18 | │ ├── Stats 19 | │ │ └── trimmomatic.stats 20 | │ └── Unpaired 21 | │ ├── SRR532663.1U.fastq 22 | │ └── SRR532663.2U.fastq 23 | ├── BuildHostIndex 24 | │ ├── chr21.fasta.amb 25 | │ ├── chr21.fasta.ann 26 | │ ├── chr21.fasta.bwt 27 | │ ├── chr21.fasta.pac 28 | │ └── chr21.fasta.sa 29 | ├── AlignReadsToHost 30 | │ └── SRR532663.host.sam 31 | ├── NonHostReads 32 | │ ├── SRR532663.non.host.R1.fastq 33 | │ └── SRR532663.non.host.R2.fastq 34 | ├── RemoveHostDNA 35 | │ ├── HostRemovalStats 36 | │ │ └── host.removal.stats 37 | │ └── NonHostBAM 38 | │ └── SRR532663.host.sorted.removed.bam 39 | ├── AlignToAMR 40 | │ └── SRR532663.amr.alignment.sam 41 | ├── RunResistome 42 | │ └── SRR532663.gene.tsv 43 | ├── ResistomeResults 44 | │ └── AMR_analytic_matrix.csv 45 | ├── RunRarefaction 46 | │ ├── SRR532663.class.tsv 47 | │ ├── SRR532663.gene.tsv 48 | │ ├── SRR532663.group.tsv 49 | │ └── SRR532663.mech.tsv 50 | ├── RunKraken 51 | │ └── SRR532663.kraken.report 52 | │ └── SRR532663.kraken.filtered.report 53 | ├── KrakenResults 54 | │ └── kraken_analytic_matrix.csv 55 | ├── FilteredKrakenResults 56 | │ └── filtered_kraken_analytic_matrix.csv 57 | 58 | 59 | 60 | ``` 61 | -------------------------------------------------------------------------------- /docs/requirements.md: -------------------------------------------------------------------------------- 1 | Software Requirements 2 | --------------------- 3 | To run AmrPlusPlus, you will need the following libraries and tools installed on your server or local machine. 4 | 5 | - Singularity 6 | - Visit this website for further information: https://singularity.lbl.gov/docs-installation 7 | - Java 7+ (Required) 8 | - Nextflow (Required) 9 | 10 | NOTE: If you choose not to install Singularity, you will need to download each of the required dependencies and add the executable paths to your .bashrc file to run the pipeline. A list of these dependencies can be found in the [Dependencies](https://github.com/meglab-metagenomics/amrplusplus_v2/blob/master/docs/dependencies.md) section of this document. 11 | -------------------------------------------------------------------------------- /docs/usage.md: -------------------------------------------------------------------------------- 1 | Usage 2 | ----- 3 | 4 | ### Display Help Message 5 | 6 | The `help` parameter displays the available options and commands. 7 | ``` 8 | $ nextflow run main_AmrPlusPlus_v2.nf --help 9 | ``` 10 | 11 | ### File Inputs 12 | 13 | #### Set custom sequence data 14 | 15 | The `reads` parameter accepts sequence files in standard fastq and gz format. 16 | ``` 17 | $ nextflow run main_AmrPlusPlus_v2.nf --reads "data/raw/*_R{1,2}.fastq" 18 | ``` 19 | 20 | #### Set host genome 21 | 22 | The `host` parameter accepts a fasta formatted host genome. 23 | ``` 24 | $ nextflow run main_AmrPlusPlus_v2.nf --host "data/host/chr21.fasta.gz" 25 | ``` 26 | 27 | #### Set host index 28 | 29 | The `host_index` parameter allows you to upload pre-built host indexes produced by BWA. 30 | ``` 31 | $ nextflow run main_AmrPlusPlus_v2.nf --host "data/host/chr21.fasta.gz" --host_index "data/index/*" 32 | ``` 33 | 34 | #### Set resistance database 35 | 36 | The `amr` parameter accepts a fasta formatted resistance database. 37 | ``` 38 | $ nextflow run main_AmrPlusPlus_v2.nf --amr "data/amr/megares_database_v1.02.fasta" 39 | ``` 40 | 41 | #### Set annotation database 42 | 43 | The `annotation` parameter accepts a csv formatted annotation database. 44 | ``` 45 | $ nextflow run main_AmrPlusPlus_v2.nf --annotation "data/amr/megares_annotations_v1.02.csv" 46 | ``` 47 | 48 | #### Set adapter file 49 | 50 | The `adapters` parameter accepts a fasta formatted adapter file. 51 | ``` 52 | $ nextflow run main_AmrPlusPlus_v2.nf --adapters "data/adapters/adapters.fa" 53 | ``` 54 | 55 | ### File Outputs 56 | 57 | #### Set output and work directories 58 | 59 | The `output` parameter writes the results to the specified directory. As a nextflow variable, the `work` parameter only requires one dash and determines where the temporary files will be directed. Upon completing the run, you can delete the temporary file directory. 60 | ``` 61 | $ nextflow run main_AmrPlusPlus_v2.nf --output "test/" -work "work_dir/" 62 | ``` 63 | 64 | ### Resume a pipeline run 65 | 66 | If the pipeline run is cancelled or stopped for whatever reason, using the same command with the addition of the `-resume` flag will attempt to pick up where the pipeline stopped. 67 | ``` 68 | $ nextflow run main_AmrPlusPlus_v2.nf --output "test/" -work "work_dir/" -resume 69 | ``` 70 | 71 | ### Trimming Options 72 | 73 | #### Set custom trimming parameters 74 | 75 | ``` 76 | $ nextflow run main_AmrPlusPlus_v2.nf \ 77 | --reads "data/raw/*_R{1,2}.fastq" \ 78 | --leading 3 \ 79 | --trailing 3 \ 80 | --minlen 36 \ 81 | --slidingwindow 4 \ 82 | --adapters "data/adapters/nextera.fa" 83 | --output "test/" 84 | ``` 85 | 86 | ### Algorithm Options 87 | 88 | #### Set custom algorithm options 89 | 90 | ``` 91 | $ nextflow run main_AmrPlusPlus_v2.nf \ 92 | --reads "data/raw/*_R{1,2}.fastq" \ 93 | --threshold 80 \ 94 | --min 1 \ 95 | --max 100 \ 96 | --samples 5 \ 97 | --skip 5 \ 98 | --output "test/" 99 | ``` 100 | 101 | #### Set number of threads to use for each process 102 | 103 | ``` 104 | $ nextflow run main_AmrPlusPlus_v2.nf --threads 8 105 | ``` 106 | -------------------------------------------------------------------------------- /download_minikraken.sh: -------------------------------------------------------------------------------- 1 | # Install nextflow 2 | # curl -s https://get.nextflow.io | bash 3 | 4 | 5 | # Download minikraken database and unzip 6 | wget ftp://ftp.ccb.jhu.edu/pub/data/kraken2_dbs/minikraken_8GB_202003.tgz 7 | tar -xvzf minikraken_8GB_202003.tgz 8 | -------------------------------------------------------------------------------- /launch_mpi_slurm.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | #SBATCH --job-name=AMRPlusPlus 3 | #SBATCH --partition=shas 4 | #SBATCH --ntasks=1 5 | #SBATCH --qos=long 6 | #SBATCH --cpus-per-task=1 7 | #SBATCH --time=100:00:00 8 | #SBATCH --export=ALL 9 | #SBATCH --mail-user=enriquedoster@gmail.com 10 | #SBATCH --mail-type=ALL 11 | 12 | module purge 13 | module load jdk/1.8.0 14 | module load singularity/2.5.2 15 | module spider openmpi/4.0.0 16 | 17 | mpirun --pernode ./nextflow run main_AmrPlusPlus_v2.nf -resume -profile msi_pbs \ 18 | -w /work_dir --threads 15 \ 19 | --output output_results --host /PATH/TO/HOST/GENOME \ 20 | --reads "RAWREADS/*_R{1,2}.fastq.gz" -with-mpi 21 | -------------------------------------------------------------------------------- /main_AmrPlusPlus_v2.nf: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env nextflow 2 | 3 | /* 4 | vim: syntax=groovy 5 | -*- mode: groovy;-*- 6 | */ 7 | 8 | if (params.help ) { 9 | return help() 10 | } 11 | if( params.host_index ) { 12 | host_index = Channel.fromPath(params.host_index).toSortedList() 13 | //if( host_index.isEmpty() ) return index_error(host_index) 14 | } 15 | if( params.host ) { 16 | host = file(params.host) 17 | if( !host.exists() ) return host_error(host) 18 | } 19 | if( params.amr ) { 20 | amr = file(params.amr) 21 | if( !amr.exists() ) return amr_error(amr) 22 | } 23 | if( params.adapters ) { 24 | adapters = file(params.adapters) 25 | if( !adapters.exists() ) return adapter_error(adapters) 26 | } 27 | if( params.annotation ) { 28 | annotation = file(params.annotation) 29 | if( !annotation.exists() ) return annotation_error(annotation) 30 | } 31 | 32 | if(params.kraken_db) { 33 | kraken_db = file(params.kraken_db) 34 | } 35 | 36 | threads = params.threads 37 | 38 | threshold = params.threshold 39 | 40 | min = params.min 41 | max = params.max 42 | skip = params.skip 43 | samples = params.samples 44 | 45 | leading = params.leading 46 | trailing = params.trailing 47 | slidingwindow = params.slidingwindow 48 | minlen = params.minlen 49 | 50 | Channel 51 | .fromFilePairs( params.reads, flat: true ) 52 | .ifEmpty { exit 1, "Read pair files could not be found: ${params.reads}" } 53 | .set { reads } 54 | 55 | process RunQC { 56 | tag { sample_id } 57 | 58 | publishDir "${params.output}/RunQC", mode: 'copy', pattern: '*.fastq.gz', 59 | saveAs: { filename -> 60 | if(filename.indexOf("P.fastq.gz") > 0) "Paired/$filename" 61 | else if(filename.indexOf("U.fastq.gz") > 0) "Unpaired/$filename" 62 | else {} 63 | } 64 | 65 | input: 66 | set sample_id, file(forward), file(reverse) from reads 67 | 68 | output: 69 | set sample_id, file("${sample_id}.1P.fastq.gz"), file("${sample_id}.2P.fastq.gz") into (paired_fastq) 70 | set sample_id, file("${sample_id}.1U.fastq.gz"), file("${sample_id}.2U.fastq.gz") into (unpaired_fastq) 71 | file("${sample_id}.trimmomatic.stats.log") into (trimmomatic_stats) 72 | 73 | """ 74 | ${JAVA} -jar ${TRIMMOMATIC} \ 75 | PE \ 76 | -threads ${threads} \ 77 | $forward $reverse ${sample_id}.1P.fastq.gz ${sample_id}.1U.fastq.gz ${sample_id}.2P.fastq.gz ${sample_id}.2U.fastq.gz \ 78 | ILLUMINACLIP:${adapters}:2:30:10:3:TRUE \ 79 | LEADING:${leading} \ 80 | TRAILING:${trailing} \ 81 | SLIDINGWINDOW:${slidingwindow} \ 82 | MINLEN:${minlen} \ 83 | 2> ${sample_id}.trimmomatic.stats.log 84 | """ 85 | } 86 | 87 | trimmomatic_stats.toSortedList().set { trim_stats } 88 | 89 | process QCStats { 90 | tag { sample_id } 91 | 92 | publishDir "${params.output}/RunQC", mode: 'copy', 93 | saveAs: { filename -> 94 | if(filename.indexOf(".stats") > 0) "Stats/$filename" 95 | else {} 96 | } 97 | 98 | input: 99 | file(stats) from trim_stats 100 | 101 | output: 102 | file("trimmomatic.stats") 103 | 104 | """ 105 | ${PYTHON3} $baseDir/bin/trimmomatic_stats.py -i ${stats} -o trimmomatic.stats 106 | """ 107 | } 108 | 109 | if( !params.host_index ) { 110 | process BuildHostIndex { 111 | publishDir "${params.output}/BuildHostIndex", mode: "copy" 112 | 113 | tag { host.baseName } 114 | 115 | input: 116 | file(host) 117 | 118 | output: 119 | file '*' into (host_index) 120 | 121 | """ 122 | ${BWA} index ${host} 123 | """ 124 | } 125 | } 126 | 127 | process AlignReadsToHost { 128 | tag { sample_id } 129 | 130 | publishDir "${params.output}/AlignReadsToHost", mode: "copy" 131 | 132 | input: 133 | set sample_id, file(forward), file(reverse) from paired_fastq 134 | file index from host_index 135 | file host 136 | 137 | output: 138 | set sample_id, file("${sample_id}.host.sorted.bam") into (host_bam) 139 | 140 | """ 141 | ${BWA} mem ${host} ${forward} ${reverse} -t ${threads} > ${sample_id}.host.sam 142 | ${SAMTOOLS} view -bS ${sample_id}.host.sam | ${SAMTOOLS} sort -@ ${threads} -o ${sample_id}.host.sorted.bam 143 | rm ${sample_id}.host.sam 144 | """ 145 | } 146 | 147 | process RemoveHostDNA { 148 | tag { sample_id } 149 | 150 | publishDir "${params.output}/RemoveHostDNA", mode: "copy", pattern: '*.bam', 151 | saveAs: { filename -> 152 | if(filename.indexOf(".bam") > 0) "NonHostBAM/$filename" 153 | } 154 | 155 | input: 156 | set sample_id, file(bam) from host_bam 157 | 158 | output: 159 | set sample_id, file("${sample_id}.host.sorted.removed.bam") into (non_host_bam) 160 | file("${sample_id}.samtools.idxstats") into (idxstats_logs) 161 | 162 | """ 163 | ${SAMTOOLS} index ${bam} && ${SAMTOOLS} idxstats ${bam} > ${sample_id}.samtools.idxstats 164 | ${SAMTOOLS} view -h -f 4 -b ${bam} -o ${sample_id}.host.sorted.removed.bam 165 | """ 166 | } 167 | 168 | idxstats_logs.toSortedList().set { host_removal_stats } 169 | 170 | process HostRemovalStats { 171 | tag { sample_id } 172 | 173 | publishDir "${params.output}/RemoveHostDNA", mode: "copy", 174 | saveAs: { filename -> 175 | if(filename.indexOf(".stats") > 0) "HostRemovalStats/$filename" 176 | } 177 | 178 | input: 179 | file(stats) from host_removal_stats 180 | 181 | output: 182 | file("host.removal.stats") 183 | 184 | """ 185 | ${PYTHON3} $baseDir/bin/samtools_idxstats.py -i ${stats} -o host.removal.stats 186 | """ 187 | } 188 | 189 | process NonHostReads { 190 | tag { sample_id } 191 | 192 | publishDir "${params.output}/NonHostReads", mode: "copy" 193 | 194 | input: 195 | set sample_id, file(bam) from non_host_bam 196 | 197 | output: 198 | set sample_id, file("${sample_id}.non.host.R1.fastq.gz"), file("${sample_id}.non.host.R2.fastq.gz") into (non_host_fastq_megares, non_host_fastq_dedup,non_host_fastq_kraken) 199 | 200 | """ 201 | ${BEDTOOLS} \ 202 | bamtofastq \ 203 | -i ${bam} \ 204 | -fq ${sample_id}.non.host.R1.fastq.gz \ 205 | -fq2 ${sample_id}.non.host.R2.fastq.gz 206 | """ 207 | } 208 | 209 | /* 210 | - 211 | -- 212 | --- 213 | ---- nonhost reads for megares and kraken2 214 | --- 215 | -- 216 | - 217 | */ 218 | 219 | 220 | 221 | /* 222 | ---- Run alignment to MEGAres 223 | */ 224 | 225 | if( !params.amr_index ) { 226 | process BuildAMRIndex { 227 | tag { amr.baseName } 228 | 229 | input: 230 | file(amr) 231 | 232 | output: 233 | file '*' into (amr_index) 234 | 235 | """ 236 | ${BWA} index ${amr} 237 | """ 238 | } 239 | } 240 | 241 | process AlignToAMR { 242 | tag { sample_id } 243 | 244 | publishDir "${params.output}/AlignToAMR", mode: "copy" 245 | 246 | input: 247 | set sample_id, file(forward), file(reverse) from non_host_fastq_megares 248 | file index from amr_index 249 | file amr 250 | 251 | output: 252 | set sample_id, file("${sample_id}.amr.alignment.sam") into (megares_resistome_sam, megares_rarefaction_sam, megares_snp_sam , megares_snpfinder_sam) 253 | set sample_id, file("${sample_id}.amr.alignment.dedup.sam") into (megares_dedup_resistome_sam) 254 | set sample_id, file("${sample_id}.amr.alignment.dedup.bam") into (megares_dedup_resistome_bam) 255 | 256 | 257 | """ 258 | ${BWA} mem ${amr} ${forward} ${reverse} -t ${threads} -R '@RG\\tID:${sample_id}\\tSM:${sample_id}' > ${sample_id}.amr.alignment.sam 259 | ${SAMTOOLS} view -S -b ${sample_id}.amr.alignment.sam > ${sample_id}.amr.alignment.bam 260 | ${SAMTOOLS} sort -n ${sample_id}.amr.alignment.bam -o ${sample_id}.amr.alignment.sorted.bam 261 | ${SAMTOOLS} fixmate ${sample_id}.amr.alignment.sorted.bam ${sample_id}.amr.alignment.sorted.fix.bam 262 | ${SAMTOOLS} sort ${sample_id}.amr.alignment.sorted.fix.bam -o ${sample_id}.amr.alignment.sorted.fix.sorted.bam 263 | ${SAMTOOLS} rmdup -S ${sample_id}.amr.alignment.sorted.fix.sorted.bam ${sample_id}.amr.alignment.dedup.bam 264 | ${SAMTOOLS} view -h -o ${sample_id}.amr.alignment.dedup.sam ${sample_id}.amr.alignment.dedup.bam 265 | rm ${sample_id}.amr.alignment.bam 266 | rm ${sample_id}.amr.alignment.sorted*.bam 267 | """ 268 | } 269 | 270 | process RunResistome { 271 | tag { sample_id } 272 | 273 | publishDir "${params.output}/RunResistome", mode: "copy" 274 | 275 | input: 276 | set sample_id, file(sam) from megares_resistome_sam 277 | file annotation 278 | file amr 279 | 280 | output: 281 | file("${sample_id}.gene.tsv") into (megares_resistome_counts, SNP_confirm_long) 282 | file("${sample_id}.group.tsv") into (megares_group_counts) 283 | file("${sample_id}.mechanism.tsv") into (megares_mech_counts) 284 | file("${sample_id}.class.tsv") into (megares_class_counts) 285 | file("${sample_id}.type.tsv") into (megares_type_counts) 286 | 287 | """ 288 | $baseDir/bin/resistome -ref_fp ${amr} \ 289 | -annot_fp ${annotation} \ 290 | -sam_fp ${sam} \ 291 | -gene_fp ${sample_id}.gene.tsv \ 292 | -group_fp ${sample_id}.group.tsv \ 293 | -mech_fp ${sample_id}.mechanism.tsv \ 294 | -class_fp ${sample_id}.class.tsv \ 295 | -type_fp ${sample_id}.type.tsv \ 296 | -t ${threshold} 297 | """ 298 | } 299 | 300 | megares_resistome_counts.toSortedList().set { megares_amr_l_to_w } 301 | 302 | process ResistomeResults { 303 | tag { } 304 | 305 | publishDir "${params.output}/ResistomeResults", mode: "copy" 306 | 307 | input: 308 | file(resistomes) from megares_amr_l_to_w 309 | 310 | output: 311 | file("AMR_analytic_matrix.csv") into amr_master_matrix 312 | 313 | """ 314 | ${PYTHON3} $baseDir/bin/amr_long_to_wide.py -i ${resistomes} -o AMR_analytic_matrix.csv 315 | """ 316 | } 317 | 318 | 319 | /* samtools deduplication of megares alignment */ 320 | process SamDedupRunResistome { 321 | tag { sample_id } 322 | 323 | publishDir "${params.output}/SamDedupRunResistome", mode: "copy" 324 | 325 | input: 326 | set sample_id, file(sam) from megares_dedup_resistome_sam 327 | file annotation 328 | file amr 329 | 330 | output: 331 | file("${sample_id}.gene.tsv") into (megares_dedup_resistome_counts) 332 | file("${sample_id}.group.tsv") into (megares_dedup_group_counts) 333 | file("${sample_id}.mechanism.tsv") into (megares_dedup_mech_counts) 334 | file("${sample_id}.class.tsv") into (megares_dedup_class_counts) 335 | file("${sample_id}.type.tsv") into (megares_dedup_type_counts) 336 | 337 | """ 338 | $baseDir/bin/resistome -ref_fp ${amr} \ 339 | -annot_fp ${annotation} \ 340 | -sam_fp ${sam} \ 341 | -gene_fp ${sample_id}.gene.tsv \ 342 | -group_fp ${sample_id}.group.tsv \ 343 | -mech_fp ${sample_id}.mechanism.tsv \ 344 | -class_fp ${sample_id}.class.tsv \ 345 | -type_fp ${sample_id}.type.tsv \ 346 | -t ${threshold} 347 | """ 348 | } 349 | 350 | megares_dedup_resistome_counts.toSortedList().set { megares_dedup_amr_l_to_w } 351 | 352 | process SamDedupResistomeResults { 353 | tag { } 354 | 355 | publishDir "${params.output}/SamDedup_ResistomeResults", mode: "copy" 356 | 357 | input: 358 | file(resistomes) from megares_dedup_amr_l_to_w 359 | 360 | output: 361 | file("SamDedup_AMR_analytic_matrix.csv") into megares_dedup_amr_master_matrix 362 | 363 | """ 364 | ${PYTHON3} $baseDir/bin/amr_long_to_wide.py -i ${resistomes} -o SamDedup_AMR_analytic_matrix.csv 365 | """ 366 | } 367 | 368 | process RunRarefaction { 369 | tag { sample_id } 370 | 371 | publishDir "${params.output}/RunRarefaction", mode: "copy" 372 | 373 | input: 374 | set sample_id, file(sam) from megares_rarefaction_sam 375 | file annotation 376 | file amr 377 | 378 | output: 379 | set sample_id, file("*.tsv") into (rarefaction) 380 | 381 | """ 382 | $baseDir/bin/rarefaction \ 383 | -ref_fp ${amr} \ 384 | -sam_fp ${sam} \ 385 | -annot_fp ${annotation} \ 386 | -gene_fp ${sample_id}.gene.tsv \ 387 | -group_fp ${sample_id}.group.tsv \ 388 | -mech_fp ${sample_id}.mech.tsv \ 389 | -class_fp ${sample_id}.class.tsv \ 390 | -type_fp ${sample_id}.type.tsv \ 391 | -min ${min} \ 392 | -max ${max} \ 393 | -skip ${skip} \ 394 | -samples ${samples} \ 395 | -t ${threshold} 396 | """ 397 | } 398 | 399 | 400 | 401 | 402 | 403 | 404 | 405 | def nextflow_version_error() { 406 | println "" 407 | println "This workflow requires Nextflow version 0.25 or greater -- You are running version $nextflow.version" 408 | println "Run ./nextflow self-update to update Nextflow to the latest available version." 409 | println "" 410 | return 1 411 | } 412 | 413 | def adapter_error(def input) { 414 | println "" 415 | println "[params.adapters] fail to open: '" + input + "' : No such file or directory" 416 | println "" 417 | return 1 418 | } 419 | 420 | def amr_error(def input) { 421 | println "" 422 | println "[params.amr] fail to open: '" + input + "' : No such file or directory" 423 | println "" 424 | return 1 425 | } 426 | 427 | def annotation_error(def input) { 428 | println "" 429 | println "[params.annotation] fail to open: '" + input + "' : No such file or directory" 430 | println "" 431 | return 1 432 | } 433 | 434 | def fastq_error(def input) { 435 | println "" 436 | println "[params.reads] fail to open: '" + input + "' : No such file or directory" 437 | println "" 438 | return 1 439 | } 440 | 441 | def host_error(def input) { 442 | println "" 443 | println "[params.host] fail to open: '" + input + "' : No such file or directory" 444 | println "" 445 | return 1 446 | } 447 | 448 | def index_error(def input) { 449 | println "" 450 | println "[params.host_index] fail to open: '" + input + "' : No such file or directory" 451 | println "" 452 | return 1 453 | } 454 | 455 | def help() { 456 | println "" 457 | println "Program: AmrPlusPlus" 458 | println "Documentation: https://github.com/colostatemeg/amrplusplus/blob/master/README.md" 459 | println "Contact: Christopher Dean " 460 | println "" 461 | println "Usage: nextflow run main.nf [options]" 462 | println "" 463 | println "Input/output options:" 464 | println "" 465 | println " --reads STR path to FASTQ formatted input sequences" 466 | println " --adapters STR path to FASTA formatted adapter sequences" 467 | println " --host STR path to FASTA formatted host genome" 468 | println " --host_index STR path to BWA generated index files" 469 | println " --amr STR path to AMR resistance database" 470 | println " --annotation STR path to AMR annotation file" 471 | println " --output STR directory to write process outputs to" 472 | println " --KRAKENDB STR path to kraken database" 473 | println "" 474 | println "Trimming options:" 475 | println "" 476 | println " --leading INT cut bases off the start of a read, if below a threshold quality" 477 | println " --minlen INT drop the read if it is below a specified length" 478 | println " --slidingwindow INT perform sw trimming, cutting once the average quality within the window falls below a threshold" 479 | println " --trailing INT cut bases off the end of a read, if below a threshold quality" 480 | println "" 481 | println "Algorithm options:" 482 | println "" 483 | println " --threads INT number of threads to use for each process" 484 | println " --threshold INT gene fraction threshold" 485 | println " --min INT starting sample level" 486 | println " --max INT ending sample level" 487 | println " --samples INT number of sampling iterations to perform" 488 | println " --skip INT number of levels to skip" 489 | println "" 490 | println "Help options:" 491 | println "" 492 | println " --help display this message" 493 | println "" 494 | return 1 495 | } 496 | -------------------------------------------------------------------------------- /main_AmrPlusPlus_v2_withKraken.nf: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env nextflow 2 | 3 | /* 4 | vim: syntax=groovy 5 | -*- mode: groovy;-*- 6 | */ 7 | 8 | if (params.help ) { 9 | return help() 10 | } 11 | if( params.host_index ) { 12 | host_index = Channel.fromPath(params.host_index).toSortedList() 13 | //if( host_index.isEmpty() ) return index_error(host_index) 14 | } 15 | if( params.host ) { 16 | host = file(params.host) 17 | if( !host.exists() ) return host_error(host) 18 | } 19 | if( params.amr ) { 20 | amr = file(params.amr) 21 | if( !amr.exists() ) return amr_error(amr) 22 | } 23 | if( params.adapters ) { 24 | adapters = file(params.adapters) 25 | if( !adapters.exists() ) return adapter_error(adapters) 26 | } 27 | if( params.annotation ) { 28 | annotation = file(params.annotation) 29 | if( !annotation.exists() ) return annotation_error(annotation) 30 | } 31 | if(params.kraken_db) { 32 | kraken_db = file(params.kraken_db) 33 | } 34 | 35 | threads = params.threads 36 | 37 | threshold = params.threshold 38 | 39 | min = params.min 40 | max = params.max 41 | skip = params.skip 42 | samples = params.samples 43 | 44 | leading = params.leading 45 | trailing = params.trailing 46 | slidingwindow = params.slidingwindow 47 | minlen = params.minlen 48 | 49 | Channel 50 | .fromFilePairs( params.reads, flat: true ) 51 | .ifEmpty { exit 1, "Read pair files could not be found: ${params.reads}" } 52 | .set { reads } 53 | 54 | process RunQC { 55 | tag { sample_id } 56 | 57 | publishDir "${params.output}/RunQC", mode: 'copy', pattern: '*.fastq.gz', 58 | saveAs: { filename -> 59 | if(filename.indexOf("P.fastq.gz") > 0) "Paired/$filename" 60 | else if(filename.indexOf("U.fastq.gz") > 0) "Unpaired/$filename" 61 | else {} 62 | } 63 | 64 | input: 65 | set sample_id, file(forward), file(reverse) from reads 66 | 67 | output: 68 | set sample_id, file("${sample_id}.1P.fastq.gz"), file("${sample_id}.2P.fastq.gz") into (paired_fastq) 69 | set sample_id, file("${sample_id}.1U.fastq.gz"), file("${sample_id}.2U.fastq.gz") into (unpaired_fastq) 70 | file("${sample_id}.trimmomatic.stats.log") into (trimmomatic_stats) 71 | 72 | """ 73 | ${JAVA} -jar ${TRIMMOMATIC} \ 74 | PE \ 75 | -threads ${threads} \ 76 | $forward $reverse ${sample_id}.1P.fastq.gz ${sample_id}.1U.fastq.gz ${sample_id}.2P.fastq.gz ${sample_id}.2U.fastq.gz \ 77 | ILLUMINACLIP:${adapters}:2:30:10:3:TRUE \ 78 | LEADING:${leading} \ 79 | TRAILING:${trailing} \ 80 | SLIDINGWINDOW:${slidingwindow} \ 81 | MINLEN:${minlen} \ 82 | 2> ${sample_id}.trimmomatic.stats.log 83 | """ 84 | } 85 | 86 | trimmomatic_stats.toSortedList().set { trim_stats } 87 | 88 | process QCStats { 89 | tag { sample_id } 90 | 91 | publishDir "${params.output}/RunQC", mode: 'copy', 92 | saveAs: { filename -> 93 | if(filename.indexOf(".stats") > 0) "Stats/$filename" 94 | else {} 95 | } 96 | 97 | input: 98 | file(stats) from trim_stats 99 | 100 | output: 101 | file("trimmomatic.stats") 102 | 103 | """ 104 | ${PYTHON3} $baseDir/bin/trimmomatic_stats.py -i ${stats} -o trimmomatic.stats 105 | """ 106 | } 107 | 108 | if( !params.host_index ) { 109 | process BuildHostIndex { 110 | publishDir "${params.output}/BuildHostIndex", mode: "copy" 111 | 112 | tag { host.baseName } 113 | 114 | input: 115 | file(host) 116 | 117 | output: 118 | file '*' into (host_index) 119 | 120 | """ 121 | ${BWA} index ${host} 122 | """ 123 | } 124 | } 125 | 126 | process AlignReadsToHost { 127 | tag { sample_id } 128 | 129 | publishDir "${params.output}/AlignReadsToHost", mode: "copy" 130 | 131 | input: 132 | set sample_id, file(forward), file(reverse) from paired_fastq 133 | file index from host_index 134 | file host 135 | 136 | output: 137 | set sample_id, file("${sample_id}.host.sam") into (host_sam) 138 | 139 | """ 140 | ${BWA} mem ${host} ${forward} ${reverse} -t ${threads} > ${sample_id}.host.sam 141 | """ 142 | } 143 | 144 | process RemoveHostDNA { 145 | tag { sample_id } 146 | 147 | publishDir "${params.output}/RemoveHostDNA", mode: "copy", pattern: '*.bam', 148 | saveAs: { filename -> 149 | if(filename.indexOf(".bam") > 0) "NonHostBAM/$filename" 150 | } 151 | 152 | input: 153 | set sample_id, file(sam) from host_sam 154 | 155 | output: 156 | set sample_id, file("${sample_id}.host.sorted.removed.bam") into (non_host_bam) 157 | file("${sample_id}.samtools.idxstats") into (idxstats_logs) 158 | 159 | """ 160 | ${SAMTOOLS} view -bS ${sam} | ${SAMTOOLS} sort -@ ${threads} -o ${sample_id}.host.sorted.bam 161 | ${SAMTOOLS} index ${sample_id}.host.sorted.bam && ${SAMTOOLS} idxstats ${sample_id}.host.sorted.bam > ${sample_id}.samtools.idxstats 162 | ${SAMTOOLS} view -h -f 4 -b ${sample_id}.host.sorted.bam -o ${sample_id}.host.sorted.removed.bam 163 | """ 164 | } 165 | 166 | idxstats_logs.toSortedList().set { host_removal_stats } 167 | 168 | process HostRemovalStats { 169 | tag { sample_id } 170 | 171 | publishDir "${params.output}/RemoveHostDNA", mode: "copy", 172 | saveAs: { filename -> 173 | if(filename.indexOf(".stats") > 0) "HostRemovalStats/$filename" 174 | } 175 | 176 | input: 177 | file(stats) from host_removal_stats 178 | 179 | output: 180 | file("host.removal.stats") 181 | 182 | """ 183 | ${PYTHON3} $baseDir/bin/samtools_idxstats.py -i ${stats} -o host.removal.stats 184 | """ 185 | } 186 | 187 | process NonHostReads { 188 | tag { sample_id } 189 | 190 | publishDir "${params.output}/NonHostReads", mode: "copy" 191 | 192 | input: 193 | set sample_id, file(bam) from non_host_bam 194 | 195 | output: 196 | set sample_id, file("${sample_id}.non.host.R1.fastq.gz"), file("${sample_id}.non.host.R2.fastq.gz") into (non_host_fastq_megares, non_host_fastq_dedup,non_host_fastq_kraken) 197 | 198 | """ 199 | ${BEDTOOLS} \ 200 | bamtofastq \ 201 | -i ${bam} \ 202 | -fq ${sample_id}.non.host.R1.fastq.gz \ 203 | -fq2 ${sample_id}.non.host.R2.fastq.gz 204 | """ 205 | } 206 | 207 | /* 208 | - 209 | -- 210 | --- 211 | ---- nonhost reads for megares and kraken2 212 | --- 213 | -- 214 | - 215 | */ 216 | 217 | 218 | /* 219 | ---- Run Kraken2 220 | */ 221 | 222 | process RunKraken { 223 | tag { sample_id } 224 | 225 | publishDir "${params.output}/RunKraken", mode: 'copy', 226 | saveAs: { filename -> 227 | if(filename.indexOf(".kraken.raw") > 0) "Standard/$filename" 228 | else if(filename.indexOf(".kraken.report") > 0) "Standard_report/$filename" 229 | else if(filename.indexOf(".kraken.filtered.report") > 0) "Filtered_report/$filename" 230 | else if(filename.indexOf(".kraken.filtered.raw") > 0) "Filtered/$filename" 231 | else {} 232 | } 233 | 234 | input: 235 | set sample_id, file(forward), file(reverse) from non_host_fastq_kraken 236 | 237 | 238 | output: 239 | file("${sample_id}.kraken.report") into (kraken_report,kraken_extract_taxa) 240 | set sample_id, file("${sample_id}.kraken.raw") into kraken_raw 241 | file("${sample_id}.kraken.filtered.report") into kraken_filter_report 242 | file("${sample_id}.kraken.filtered.raw") into kraken_filter_raw 243 | 244 | 245 | """ 246 | ${KRAKEN2} --db ${kraken_db} --paired ${forward} ${reverse} --threads ${threads} --report ${sample_id}.kraken.report > ${sample_id}.kraken.raw 247 | ${KRAKEN2} --db ${kraken_db} --confidence 1 --paired ${forward} ${reverse} --threads ${threads} --report ${sample_id}.kraken.filtered.report > ${sample_id}.kraken.filtered.raw 248 | """ 249 | } 250 | 251 | 252 | kraken_report.toSortedList().set { kraken_l_to_w } 253 | kraken_filter_report.toSortedList().set { kraken_filter_l_to_w } 254 | 255 | process KrakenResults { 256 | tag { } 257 | 258 | publishDir "${params.output}/KrakenResults", mode: "copy" 259 | 260 | input: 261 | file(kraken_reports) from kraken_l_to_w 262 | 263 | output: 264 | file("kraken_analytic_matrix.csv") into kraken_master_matrix 265 | 266 | """ 267 | ${PYTHON3} $baseDir/bin/kraken2_long_to_wide.py -i ${kraken_reports} -o kraken_analytic_matrix.csv 268 | """ 269 | } 270 | 271 | process FilteredKrakenResults { 272 | tag { sample_id } 273 | 274 | publishDir "${params.output}/FilteredKrakenResults", mode: "copy" 275 | 276 | input: 277 | file(kraken_reports) from kraken_filter_l_to_w 278 | 279 | output: 280 | file("filtered_kraken_analytic_matrix.csv") into filter_kraken_master_matrix 281 | 282 | """ 283 | ${PYTHON3} $baseDir/bin/kraken2_long_to_wide.py -i ${kraken_reports} -o filtered_kraken_analytic_matrix.csv 284 | """ 285 | } 286 | 287 | /* 288 | ---- Run alignment to MEGAres 289 | */ 290 | 291 | if( !params.amr_index ) { 292 | process BuildAMRIndex { 293 | tag { amr.baseName } 294 | 295 | input: 296 | file(amr) 297 | 298 | output: 299 | file '*' into (amr_index) 300 | 301 | """ 302 | ${BWA} index ${amr} 303 | """ 304 | } 305 | } 306 | 307 | process AlignToAMR { 308 | tag { sample_id } 309 | 310 | publishDir "${params.output}/AlignToAMR", mode: "copy" 311 | 312 | input: 313 | set sample_id, file(forward), file(reverse) from non_host_fastq_megares 314 | file index from amr_index 315 | file amr 316 | 317 | output: 318 | set sample_id, file("${sample_id}.amr.alignment.sam") into (megares_resistome_sam, megares_rarefaction_sam, megares_snp_sam , megares_snpfinder_sam) 319 | set sample_id, file("${sample_id}.amr.alignment.dedup.sam") into (megares_dedup_resistome_sam) 320 | set sample_id, file("${sample_id}.amr.alignment.dedup.bam") into (megares_dedup_resistome_bam) 321 | 322 | 323 | """ 324 | ${BWA} mem ${amr} ${forward} ${reverse} -t ${threads} -R '@RG\\tID:${sample_id}\\tSM:${sample_id}' > ${sample_id}.amr.alignment.sam 325 | ${SAMTOOLS} view -S -b ${sample_id}.amr.alignment.sam > ${sample_id}.amr.alignment.bam 326 | ${SAMTOOLS} sort -n ${sample_id}.amr.alignment.bam -o ${sample_id}.amr.alignment.sorted.bam 327 | ${SAMTOOLS} fixmate ${sample_id}.amr.alignment.sorted.bam ${sample_id}.amr.alignment.sorted.fix.bam 328 | ${SAMTOOLS} sort ${sample_id}.amr.alignment.sorted.fix.bam -o ${sample_id}.amr.alignment.sorted.fix.sorted.bam 329 | ${SAMTOOLS} rmdup -S ${sample_id}.amr.alignment.sorted.fix.sorted.bam ${sample_id}.amr.alignment.dedup.bam 330 | ${SAMTOOLS} view -h -o ${sample_id}.amr.alignment.dedup.sam ${sample_id}.amr.alignment.dedup.bam 331 | rm ${sample_id}.amr.alignment.bam 332 | rm ${sample_id}.amr.alignment.sorted*.bam 333 | """ 334 | } 335 | 336 | process RunResistome { 337 | tag { sample_id } 338 | 339 | publishDir "${params.output}/RunResistome", mode: "copy" 340 | 341 | input: 342 | set sample_id, file(sam) from megares_resistome_sam 343 | file annotation 344 | file amr 345 | 346 | output: 347 | file("${sample_id}.gene.tsv") into (megares_resistome_counts, SNP_confirm_long) 348 | file("${sample_id}.group.tsv") into (megares_group_counts) 349 | file("${sample_id}.mechanism.tsv") into (megares_mech_counts) 350 | file("${sample_id}.class.tsv") into (megares_class_counts) 351 | file("${sample_id}.type.tsv") into (megares_type_counts) 352 | 353 | """ 354 | $baseDir/bin/resistome -ref_fp ${amr} \ 355 | -annot_fp ${annotation} \ 356 | -sam_fp ${sam} \ 357 | -gene_fp ${sample_id}.gene.tsv \ 358 | -group_fp ${sample_id}.group.tsv \ 359 | -mech_fp ${sample_id}.mechanism.tsv \ 360 | -class_fp ${sample_id}.class.tsv \ 361 | -type_fp ${sample_id}.type.tsv \ 362 | -t ${threshold} 363 | """ 364 | } 365 | 366 | megares_resistome_counts.toSortedList().set { megares_amr_l_to_w } 367 | 368 | process ResistomeResults { 369 | tag { } 370 | 371 | publishDir "${params.output}/ResistomeResults", mode: "copy" 372 | 373 | input: 374 | file(resistomes) from megares_amr_l_to_w 375 | 376 | output: 377 | file("AMR_analytic_matrix.csv") into amr_master_matrix 378 | 379 | """ 380 | ${PYTHON3} $baseDir/bin/amr_long_to_wide.py -i ${resistomes} -o AMR_analytic_matrix.csv 381 | """ 382 | } 383 | 384 | 385 | /* samtools deduplication of megares alignment */ 386 | process SamDedupRunResistome { 387 | tag { sample_id } 388 | 389 | publishDir "${params.output}/SamDedupRunResistome", mode: "copy" 390 | 391 | input: 392 | set sample_id, file(sam) from megares_dedup_resistome_sam 393 | file annotation 394 | file amr 395 | 396 | output: 397 | file("${sample_id}.gene.tsv") into (megares_dedup_resistome_counts) 398 | file("${sample_id}.group.tsv") into (megares_dedup_group_counts) 399 | file("${sample_id}.mechanism.tsv") into (megares_dedup_mech_counts) 400 | file("${sample_id}.class.tsv") into (megares_dedup_class_counts) 401 | file("${sample_id}.type.tsv") into (megares_dedup_type_counts) 402 | 403 | """ 404 | $baseDir/bin/resistome -ref_fp ${amr} \ 405 | -annot_fp ${annotation} \ 406 | -sam_fp ${sam} \ 407 | -gene_fp ${sample_id}.gene.tsv \ 408 | -group_fp ${sample_id}.group.tsv \ 409 | -mech_fp ${sample_id}.mechanism.tsv \ 410 | -class_fp ${sample_id}.class.tsv \ 411 | -type_fp ${sample_id}.type.tsv \ 412 | -t ${threshold} 413 | """ 414 | } 415 | 416 | megares_dedup_resistome_counts.toSortedList().set { megares_dedup_amr_l_to_w } 417 | 418 | process SamDedupResistomeResults { 419 | tag { } 420 | 421 | publishDir "${params.output}/SamDedup_ResistomeResults", mode: "copy" 422 | 423 | input: 424 | file(resistomes) from megares_dedup_amr_l_to_w 425 | 426 | output: 427 | file("SamDedup_AMR_analytic_matrix.csv") into megares_dedup_amr_master_matrix 428 | 429 | """ 430 | ${PYTHON3} $baseDir/bin/amr_long_to_wide.py -i ${resistomes} -o SamDedup_AMR_analytic_matrix.csv 431 | """ 432 | } 433 | 434 | process RunRarefaction { 435 | tag { sample_id } 436 | 437 | publishDir "${params.output}/RunRarefaction", mode: "copy" 438 | 439 | input: 440 | set sample_id, file(sam) from megares_rarefaction_sam 441 | file annotation 442 | file amr 443 | 444 | output: 445 | set sample_id, file("*.tsv") into (rarefaction) 446 | 447 | """ 448 | $baseDir/bin/rarefaction \ 449 | -ref_fp ${amr} \ 450 | -sam_fp ${sam} \ 451 | -annot_fp ${annotation} \ 452 | -gene_fp ${sample_id}.gene.tsv \ 453 | -group_fp ${sample_id}.group.tsv \ 454 | -mech_fp ${sample_id}.mech.tsv \ 455 | -class_fp ${sample_id}.class.tsv \ 456 | -type_fp ${sample_id}.type.tsv \ 457 | -min ${min} \ 458 | -max ${max} \ 459 | -skip ${skip} \ 460 | -samples ${samples} \ 461 | -t ${threshold} 462 | """ 463 | } 464 | 465 | 466 | 467 | 468 | def nextflow_version_error() { 469 | println "" 470 | println "This workflow requires Nextflow version 0.25 or greater -- You are running version $nextflow.version" 471 | println "Run ./nextflow self-update to update Nextflow to the latest available version." 472 | println "" 473 | return 1 474 | } 475 | 476 | def adapter_error(def input) { 477 | println "" 478 | println "[params.adapters] fail to open: '" + input + "' : No such file or directory" 479 | println "" 480 | return 1 481 | } 482 | 483 | def amr_error(def input) { 484 | println "" 485 | println "[params.amr] fail to open: '" + input + "' : No such file or directory" 486 | println "" 487 | return 1 488 | } 489 | 490 | def annotation_error(def input) { 491 | println "" 492 | println "[params.annotation] fail to open: '" + input + "' : No such file or directory" 493 | println "" 494 | return 1 495 | } 496 | 497 | def fastq_error(def input) { 498 | println "" 499 | println "[params.reads] fail to open: '" + input + "' : No such file or directory" 500 | println "" 501 | return 1 502 | } 503 | 504 | def host_error(def input) { 505 | println "" 506 | println "[params.host] fail to open: '" + input + "' : No such file or directory" 507 | println "" 508 | return 1 509 | } 510 | 511 | def index_error(def input) { 512 | println "" 513 | println "[params.host_index] fail to open: '" + input + "' : No such file or directory" 514 | println "" 515 | return 1 516 | } 517 | 518 | def help() { 519 | println "" 520 | println "Program: AmrPlusPlus" 521 | println "Documentation: https://github.com/colostatemeg/amrplusplus/blob/master/README.md" 522 | println "Contact: Christopher Dean " 523 | println "" 524 | println "Usage: nextflow run main.nf [options]" 525 | println "" 526 | println "Input/output options:" 527 | println "" 528 | println " --reads STR path to FASTQ formatted input sequences" 529 | println " --adapters STR path to FASTA formatted adapter sequences" 530 | println " --host STR path to FASTA formatted host genome" 531 | println " --host_index STR path to BWA generated index files" 532 | println " --amr STR path to AMR resistance database" 533 | println " --annotation STR path to AMR annotation file" 534 | println " --output STR directory to write process outputs to" 535 | println " --KRAKENDB STR path to kraken database" 536 | println "" 537 | println "Trimming options:" 538 | println "" 539 | println " --leading INT cut bases off the start of a read, if below a threshold quality" 540 | println " --minlen INT drop the read if it is below a specified length" 541 | println " --slidingwindow INT perform sw trimming, cutting once the average quality within the window falls below a threshold" 542 | println " --trailing INT cut bases off the end of a read, if below a threshold quality" 543 | println "" 544 | println "Algorithm options:" 545 | println "" 546 | println " --threads INT number of threads to use for each process" 547 | println " --threshold INT gene fraction threshold" 548 | println " --min INT starting sample level" 549 | println " --max INT ending sample level" 550 | println " --samples INT number of sampling iterations to perform" 551 | println " --skip INT number of levels to skip" 552 | println "" 553 | println "Help options:" 554 | println "" 555 | println " --help display this message" 556 | println "" 557 | return 1 558 | } 559 | -------------------------------------------------------------------------------- /main_AmrPlusPlus_v2_withRGI.nf: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env nextflow 2 | 3 | /* 4 | vim: syntax=groovy 5 | -*- mode: groovy;-*- 6 | */ 7 | 8 | if (params.help ) { 9 | return help() 10 | } 11 | if( params.host_index ) { 12 | host_index = Channel.fromPath(params.host_index).toSortedList() 13 | //if( host_index.isEmpty() ) return index_error(host_index) 14 | } 15 | if( params.host ) { 16 | host = file(params.host) 17 | if( !host.exists() ) return host_error(host) 18 | } 19 | if( params.amr ) { 20 | amr = file(params.amr) 21 | if( !amr.exists() ) return amr_error(amr) 22 | } 23 | if( params.adapters ) { 24 | adapters = file(params.adapters) 25 | if( !adapters.exists() ) return adapter_error(adapters) 26 | } 27 | if( params.annotation ) { 28 | annotation = file(params.annotation) 29 | if( !annotation.exists() ) return annotation_error(annotation) 30 | } 31 | if(params.kraken_db) { 32 | kraken_db = file(params.kraken_db) 33 | } 34 | 35 | card_db = file(params.card_db) 36 | 37 | threads = params.threads 38 | 39 | threshold = params.threshold 40 | 41 | min = params.min 42 | max = params.max 43 | skip = params.skip 44 | samples = params.samples 45 | 46 | leading = params.leading 47 | trailing = params.trailing 48 | slidingwindow = params.slidingwindow 49 | minlen = params.minlen 50 | 51 | Channel 52 | .fromFilePairs( params.reads, flat: true ) 53 | .ifEmpty { exit 1, "Read pair files could not be found: ${params.reads}" } 54 | .set { reads } 55 | 56 | process RunQC { 57 | tag { sample_id } 58 | 59 | publishDir "${params.output}/RunQC", mode: 'copy', pattern: '*.fastq.gz', 60 | saveAs: { filename -> 61 | if(filename.indexOf("P.fastq.gz") > 0) "Paired/$filename" 62 | else if(filename.indexOf("U.fastq.gz") > 0) "Unpaired/$filename" 63 | else {} 64 | } 65 | 66 | input: 67 | set sample_id, file(forward), file(reverse) from reads 68 | 69 | output: 70 | set sample_id, file("${sample_id}.1P.fastq.gz"), file("${sample_id}.2P.fastq.gz") into (paired_fastq) 71 | set sample_id, file("${sample_id}.1U.fastq.gz"), file("${sample_id}.2U.fastq.gz") into (unpaired_fastq) 72 | file("${sample_id}.trimmomatic.stats.log") into (trimmomatic_stats) 73 | 74 | """ 75 | ${JAVA} -jar ${TRIMMOMATIC} \ 76 | PE \ 77 | -threads ${threads} \ 78 | $forward $reverse ${sample_id}.1P.fastq.gz ${sample_id}.1U.fastq.gz ${sample_id}.2P.fastq.gz ${sample_id}.2U.fastq.gz \ 79 | ILLUMINACLIP:${adapters}:2:30:10:3:TRUE \ 80 | LEADING:${leading} \ 81 | TRAILING:${trailing} \ 82 | SLIDINGWINDOW:${slidingwindow} \ 83 | MINLEN:${minlen} \ 84 | 2> ${sample_id}.trimmomatic.stats.log 85 | """ 86 | } 87 | 88 | trimmomatic_stats.toSortedList().set { trim_stats } 89 | 90 | process QCStats { 91 | tag { sample_id } 92 | 93 | publishDir "${params.output}/RunQC", mode: 'copy', 94 | saveAs: { filename -> 95 | if(filename.indexOf(".stats") > 0) "Stats/$filename" 96 | else {} 97 | } 98 | 99 | input: 100 | file(stats) from trim_stats 101 | 102 | output: 103 | file("trimmomatic.stats") 104 | 105 | """ 106 | ${PYTHON3} $baseDir/bin/trimmomatic_stats.py -i ${stats} -o trimmomatic.stats 107 | """ 108 | } 109 | 110 | if( !params.host_index ) { 111 | process BuildHostIndex { 112 | publishDir "${params.output}/BuildHostIndex", mode: "copy" 113 | 114 | tag { host.baseName } 115 | 116 | input: 117 | file(host) 118 | 119 | output: 120 | file '*' into (host_index) 121 | 122 | """ 123 | ${BWA} index ${host} 124 | """ 125 | } 126 | } 127 | 128 | process AlignReadsToHost { 129 | tag { sample_id } 130 | 131 | publishDir "${params.output}/AlignReadsToHost", mode: "copy" 132 | 133 | input: 134 | set sample_id, file(forward), file(reverse) from paired_fastq 135 | file index from host_index 136 | file host 137 | 138 | output: 139 | set sample_id, file("${sample_id}.host.sam") into (host_sam) 140 | 141 | """ 142 | ${BWA} mem ${host} ${forward} ${reverse} -t ${threads} > ${sample_id}.host.sam 143 | """ 144 | } 145 | 146 | process RemoveHostDNA { 147 | tag { sample_id } 148 | 149 | publishDir "${params.output}/RemoveHostDNA", mode: "copy", pattern: '*.bam', 150 | saveAs: { filename -> 151 | if(filename.indexOf(".bam") > 0) "NonHostBAM/$filename" 152 | } 153 | 154 | input: 155 | set sample_id, file(sam) from host_sam 156 | 157 | output: 158 | set sample_id, file("${sample_id}.host.sorted.removed.bam") into (non_host_bam) 159 | file("${sample_id}.samtools.idxstats") into (idxstats_logs) 160 | 161 | """ 162 | ${SAMTOOLS} view -bS ${sam} | ${SAMTOOLS} sort -@ ${threads} -o ${sample_id}.host.sorted.bam 163 | ${SAMTOOLS} index ${sample_id}.host.sorted.bam && ${SAMTOOLS} idxstats ${sample_id}.host.sorted.bam > ${sample_id}.samtools.idxstats 164 | ${SAMTOOLS} view -h -f 4 -b ${sample_id}.host.sorted.bam -o ${sample_id}.host.sorted.removed.bam 165 | """ 166 | } 167 | 168 | idxstats_logs.toSortedList().set { host_removal_stats } 169 | 170 | process HostRemovalStats { 171 | tag { sample_id } 172 | 173 | publishDir "${params.output}/RemoveHostDNA", mode: "copy", 174 | saveAs: { filename -> 175 | if(filename.indexOf(".stats") > 0) "HostRemovalStats/$filename" 176 | } 177 | 178 | input: 179 | file(stats) from host_removal_stats 180 | 181 | output: 182 | file("host.removal.stats") 183 | 184 | """ 185 | ${PYTHON3} $baseDir/bin/samtools_idxstats.py -i ${stats} -o host.removal.stats 186 | """ 187 | } 188 | 189 | process NonHostReads { 190 | tag { sample_id } 191 | 192 | publishDir "${params.output}/NonHostReads", mode: "copy" 193 | 194 | input: 195 | set sample_id, file(bam) from non_host_bam 196 | 197 | output: 198 | set sample_id, file("${sample_id}.non.host.R1.fastq.gz"), file("${sample_id}.non.host.R2.fastq.gz") into (non_host_fastq_megares, non_host_fastq_dedup,non_host_fastq_kraken) 199 | 200 | """ 201 | ${BEDTOOLS} \ 202 | bamtofastq \ 203 | -i ${bam} \ 204 | -fq ${sample_id}.non.host.R1.fastq.gz \ 205 | -fq2 ${sample_id}.non.host.R2.fastq.gz 206 | """ 207 | } 208 | 209 | /* 210 | - 211 | -- 212 | --- 213 | ---- nonhost reads for megares 214 | --- 215 | -- 216 | - 217 | */ 218 | 219 | 220 | /* 221 | ---- Run alignment to MEGAres 222 | */ 223 | 224 | if( !params.amr_index ) { 225 | process BuildAMRIndex { 226 | tag { amr.baseName } 227 | 228 | input: 229 | file(amr) 230 | 231 | output: 232 | file '*' into (amr_index) 233 | 234 | """ 235 | ${BWA} index ${amr} 236 | """ 237 | } 238 | } 239 | 240 | process AlignToAMR { 241 | tag { sample_id } 242 | 243 | publishDir "${params.output}/AlignToAMR", mode: "copy" 244 | 245 | input: 246 | set sample_id, file(forward), file(reverse) from non_host_fastq_megares 247 | file index from amr_index 248 | file amr 249 | 250 | output: 251 | set sample_id, file("${sample_id}.amr.alignment.sam") into (megares_resistome_sam, megares_rarefaction_sam, megares_snp_sam , megares_snpfinder_sam, megares_RGI_sam) 252 | set sample_id, file("${sample_id}.amr.alignment.dedup.sam") into (megares_dedup_resistome_sam,megares_dedup_RGI_sam) 253 | set sample_id, file("${sample_id}.amr.alignment.dedup.bam") into (megares_dedup_resistome_bam) 254 | 255 | 256 | """ 257 | ${BWA} mem ${amr} ${forward} ${reverse} -t ${threads} -R '@RG\\tID:${sample_id}\\tSM:${sample_id}' > ${sample_id}.amr.alignment.sam 258 | ${SAMTOOLS} view -S -b ${sample_id}.amr.alignment.sam > ${sample_id}.amr.alignment.bam 259 | ${SAMTOOLS} sort -n ${sample_id}.amr.alignment.bam -o ${sample_id}.amr.alignment.sorted.bam 260 | ${SAMTOOLS} fixmate ${sample_id}.amr.alignment.sorted.bam ${sample_id}.amr.alignment.sorted.fix.bam 261 | ${SAMTOOLS} sort ${sample_id}.amr.alignment.sorted.fix.bam -o ${sample_id}.amr.alignment.sorted.fix.sorted.bam 262 | ${SAMTOOLS} rmdup -S ${sample_id}.amr.alignment.sorted.fix.sorted.bam ${sample_id}.amr.alignment.dedup.bam 263 | ${SAMTOOLS} view -h -o ${sample_id}.amr.alignment.dedup.sam ${sample_id}.amr.alignment.dedup.bam 264 | rm ${sample_id}.amr.alignment.bam 265 | rm ${sample_id}.amr.alignment.sorted*.bam 266 | """ 267 | } 268 | 269 | process RunResistome { 270 | tag { sample_id } 271 | 272 | publishDir "${params.output}/RunResistome", mode: "copy" 273 | 274 | input: 275 | set sample_id, file(sam) from megares_resistome_sam 276 | file annotation 277 | file amr 278 | 279 | output: 280 | file("${sample_id}.gene.tsv") into (megares_resistome_counts, SNP_confirm_long) 281 | file("${sample_id}.group.tsv") into (megares_group_counts) 282 | file("${sample_id}.mechanism.tsv") into (megares_mech_counts) 283 | file("${sample_id}.class.tsv") into (megares_class_counts) 284 | file("${sample_id}.type.tsv") into (megares_type_counts) 285 | 286 | """ 287 | $baseDir/bin/resistome -ref_fp ${amr} \ 288 | -annot_fp ${annotation} \ 289 | -sam_fp ${sam} \ 290 | -gene_fp ${sample_id}.gene.tsv \ 291 | -group_fp ${sample_id}.group.tsv \ 292 | -mech_fp ${sample_id}.mechanism.tsv \ 293 | -class_fp ${sample_id}.class.tsv \ 294 | -type_fp ${sample_id}.type.tsv \ 295 | -t ${threshold} 296 | """ 297 | } 298 | 299 | megares_resistome_counts.toSortedList().set { megares_amr_l_to_w } 300 | 301 | process ResistomeResults { 302 | tag { } 303 | 304 | publishDir "${params.output}/ResistomeResults", mode: "copy" 305 | 306 | input: 307 | file(resistomes) from megares_amr_l_to_w 308 | 309 | output: 310 | file("AMR_analytic_matrix.csv") into amr_master_matrix 311 | 312 | """ 313 | ${PYTHON3} $baseDir/bin/amr_long_to_wide.py -i ${resistomes} -o AMR_analytic_matrix.csv 314 | """ 315 | } 316 | 317 | 318 | /* samtools deduplication of megares alignment */ 319 | process SamDedupRunResistome { 320 | tag { sample_id } 321 | 322 | publishDir "${params.output}/SamDedupRunResistome", mode: "copy" 323 | 324 | input: 325 | set sample_id, file(sam) from megares_dedup_resistome_sam 326 | file annotation 327 | file amr 328 | 329 | output: 330 | file("${sample_id}.gene.tsv") into (megares_dedup_resistome_counts) 331 | file("${sample_id}.group.tsv") into (megares_dedup_group_counts) 332 | file("${sample_id}.mechanism.tsv") into (megares_dedup_mech_counts) 333 | file("${sample_id}.class.tsv") into (megares_dedup_class_counts) 334 | file("${sample_id}.type.tsv") into (megares_dedup_type_counts) 335 | 336 | """ 337 | $baseDir/bin/resistome -ref_fp ${amr} \ 338 | -annot_fp ${annotation} \ 339 | -sam_fp ${sam} \ 340 | -gene_fp ${sample_id}.gene.tsv \ 341 | -group_fp ${sample_id}.group.tsv \ 342 | -mech_fp ${sample_id}.mechanism.tsv \ 343 | -class_fp ${sample_id}.class.tsv \ 344 | -type_fp ${sample_id}.type.tsv \ 345 | -t ${threshold} 346 | """ 347 | } 348 | 349 | megares_dedup_resistome_counts.toSortedList().set { megares_dedup_amr_l_to_w } 350 | 351 | process SamDedupResistomeResults { 352 | tag { } 353 | 354 | publishDir "${params.output}/SamDedup_ResistomeResults", mode: "copy" 355 | 356 | input: 357 | file(resistomes) from megares_dedup_amr_l_to_w 358 | 359 | output: 360 | file("SamDedup_AMR_analytic_matrix.csv") into megares_dedup_amr_master_matrix 361 | 362 | """ 363 | ${PYTHON3} $baseDir/bin/amr_long_to_wide.py -i ${resistomes} -o SamDedup_AMR_analytic_matrix.csv 364 | """ 365 | } 366 | 367 | process RunRarefaction { 368 | tag { sample_id } 369 | 370 | publishDir "${params.output}/RunRarefaction", mode: "copy" 371 | 372 | input: 373 | set sample_id, file(sam) from megares_rarefaction_sam 374 | file annotation 375 | file amr 376 | 377 | output: 378 | set sample_id, file("*.tsv") into (rarefaction) 379 | 380 | """ 381 | $baseDir/bin/rarefaction \ 382 | -ref_fp ${amr} \ 383 | -sam_fp ${sam} \ 384 | -annot_fp ${annotation} \ 385 | -gene_fp ${sample_id}.gene.tsv \ 386 | -group_fp ${sample_id}.group.tsv \ 387 | -mech_fp ${sample_id}.mech.tsv \ 388 | -class_fp ${sample_id}.class.tsv \ 389 | -type_fp ${sample_id}.type.tsv \ 390 | -min ${min} \ 391 | -max ${max} \ 392 | -skip ${skip} \ 393 | -samples ${samples} \ 394 | -t ${threshold} 395 | """ 396 | } 397 | 398 | 399 | 400 | /* 401 | ---- Confirmation of alignments to genes that require SNP confirmation with RGI 402 | */ 403 | 404 | process ExtractSNP { 405 | tag { sample_id } 406 | 407 | publishDir "${params.output}/ExtractMegaresSNPs", mode: "copy", 408 | saveAs: { filename -> 409 | if(filename.indexOf(".snp.fasta") > 0) "SNP_fasta/$filename" 410 | else if(filename.indexOf("gene.tsv") > 0) "Gene_hits/$filename" 411 | else {} 412 | } 413 | 414 | input: 415 | set sample_id, file(sam) from megares_RGI_sam 416 | file annotation 417 | file amr 418 | 419 | output: 420 | set sample_id, file("*.snp.fasta") into megares_snp_fasta 421 | set sample_id, file("${sample_id}*.gene.tsv") into (resistome_hits) 422 | 423 | """ 424 | awk -F "\\t" '{if (\$1!="@SQ" && \$1!="@RG" && \$1!="@PG" && \$1!="@HD" && \$3="RequiresSNPConfirmation" ) {print ">"\$1"\\n"\$10}}' ${sam} | tr -d '"' > ${sample_id}.snp.fasta 425 | $baseDir/bin/resistome -ref_fp ${amr} \ 426 | -annot_fp ${annotation} \ 427 | -sam_fp ${sam} \ 428 | -gene_fp ${sample_id}.gene.tsv \ 429 | -group_fp ${sample_id}.group.tsv \ 430 | -mech_fp ${sample_id}.mechanism.tsv \ 431 | -class_fp ${sample_id}.class.tsv \ 432 | -type_fp ${sample_id}.type.tsv \ 433 | -t ${threshold} 434 | """ 435 | } 436 | 437 | 438 | /* This doesn't work with the singularity container, so I'll just leave the CARD downlaod as a manual option. 439 | process DL_CARD_db { 440 | 441 | publishDir "${params.output}/CARD_db", mode: "symlink" 442 | 443 | output: 444 | file("card.json") into card_db 445 | 446 | """ 447 | 448 | chmod -R 777 /usr/local/ 449 | wget -q -O card-data.tar.bz2 https://card.mcmaster.ca/latest/data && tar xfvj card-data.tar.bz2 450 | """ 451 | } 452 | 453 | */ 454 | 455 | 456 | process RunRGI { 457 | tag { sample_id } 458 | errorStrategy 'ignore' 459 | 460 | 461 | publishDir "${params.output}/RunRGI", mode: "symlink" 462 | 463 | input: 464 | set sample_id, file(fasta) from megares_snp_fasta 465 | file card_db 466 | 467 | output: 468 | set sample_id, file("${sample_id}*rgi_output.txt") into rgi_results 469 | 470 | """ 471 | ${RGI} load --local -i ${card_db} --debug 472 | 473 | # We are using the code provided in the following RGI github issue https://github.com/arpcard/rgi/issues/93 474 | set +e 475 | echo "Run RGI the first time" 476 | ${RGI} main --input_sequence ${fasta} --output_file ${sample_id}_rgi_output -a diamond --local 477 | set -e 478 | echo "Run RGI again" 479 | ${RGI} main --input_sequence ${fasta} --output_file ${sample_id}_rgi_output -a diamond --local 480 | 481 | 482 | """ 483 | } 484 | 485 | 486 | 487 | process SNPconfirmation { 488 | tag { sample_id } 489 | errorStrategy 'ignore' 490 | 491 | publishDir "${params.output}/SNPConfirmation", mode: "copy", 492 | saveAs: { filename -> 493 | if(filename.indexOf("_rgi_perfect_hits.csv") > 0) "Perfect_RGI/$filename" 494 | else if(filename.indexOf("_rgi_strict_hits.csv") > 0) "Strict_RGI/$filename" 495 | else if(filename.indexOf("_rgi_loose_hits.csv") > 0) "Loose_RGI/$filename" 496 | else {} 497 | } 498 | 499 | input: 500 | set sample_id, file(rgi) from rgi_results 501 | 502 | output: 503 | set sample_id, file("${sample_id}_rgi_perfect_hits.csv") into perfect_snp_long_hits 504 | """ 505 | ${PYTHON3} $baseDir/bin/RGI_aro_hits.py ${rgi} ${sample_id} 506 | """ 507 | } 508 | 509 | process Confirmed_AMR_hits { 510 | tag { sample_id } 511 | errorStrategy 'ignore' 512 | 513 | publishDir "${params.output}/SNP_confirmed_counts", mode: "copy" 514 | 515 | input: 516 | set sample_id, file(megares_counts) from resistome_hits 517 | set sample_id, file(perfect_rgi_counts) from perfect_snp_long_hits 518 | 519 | output: 520 | file("${sample_id}*perfect_SNP_confirmed_counts") into perfect_confirmed_counts 521 | 522 | """ 523 | ${PYTHON3} $baseDir/bin/RGI_long_combine.py ${perfect_rgi_counts} ${megares_counts} ${sample_id}.perfect_SNP_confirmed_counts ${sample_id} 524 | """ 525 | } 526 | 527 | 528 | perfect_confirmed_counts.toSortedList().set { perfect_confirmed_amr_l_to_w } 529 | 530 | process Confirmed_ResistomeResults { 531 | tag {} 532 | errorStrategy 'ignore' 533 | 534 | publishDir "${params.output}/Confirmed_ResistomeResults", mode: "copy" 535 | 536 | input: 537 | file(perfect_confirmed_resistomes) from perfect_confirmed_amr_l_to_w 538 | 539 | output: 540 | file("perfect_SNP_confirmed_AMR_analytic_matrix.csv") into perfect_confirmed_matrix 541 | 542 | """ 543 | ${PYTHON3} $baseDir/bin/amr_long_to_wide.py -i ${perfect_confirmed_resistomes} -o perfect_SNP_confirmed_AMR_analytic_matrix.csv 544 | """ 545 | } 546 | 547 | /* 548 | ---- Confirmation of deduped alignments to genes that require SNP confirmation with RGI. 549 | */ 550 | 551 | 552 | process ExtractDedupSNP { 553 | tag { sample_id } 554 | 555 | errorStrategy 'ignore' 556 | publishDir "${params.output}/ExtractDedupMegaresSNPs", mode: "copy", 557 | saveAs: { filename -> 558 | if(filename.indexOf(".snp.fasta") > 0) "SNP_fasta/$filename" 559 | else if(filename.indexOf("gene.tsv") > 0) "Gene_hits/$filename" 560 | else {} 561 | } 562 | 563 | input: 564 | set sample_id, file(sam) from megares_dedup_RGI_sam 565 | file annotation 566 | file amr 567 | 568 | output: 569 | set sample_id, file("*.snp.fasta") into dedup_megares_snp_fasta 570 | set sample_id, file("${sample_id}*.gene.tsv") into (dedup_resistome_hits) 571 | 572 | """ 573 | awk -F "\\t" '{if (\$1!="@SQ" && \$1!="@RG" && \$1!="@PG" && \$1!="@HD" && \$3="RequiresSNPConfirmation" ) {print ">"\$1"\\n"\$10}}' ${sam} | tr -d '"' > ${sample_id}.snp.fasta 574 | $baseDir/bin/resistome -ref_fp ${amr} \ 575 | -annot_fp ${annotation} \ 576 | -sam_fp ${sam} \ 577 | -gene_fp ${sample_id}.gene.tsv \ 578 | -group_fp ${sample_id}.group.tsv \ 579 | -mech_fp ${sample_id}.mechanism.tsv \ 580 | -class_fp ${sample_id}.class.tsv \ 581 | -type_fp ${sample_id}.type.tsv \ 582 | -t ${threshold} 583 | """ 584 | } 585 | 586 | process RunDedupRGI { 587 | tag { sample_id } 588 | errorStrategy 'ignore' 589 | 590 | publishDir "${params.output}/RunDedupRGI", mode: "copy" 591 | 592 | input: 593 | set sample_id, file(fasta) from dedup_megares_snp_fasta 594 | file card_db 595 | 596 | output: 597 | set sample_id, file("${sample_id}_rgi_output.txt") into dedup_rgi_results 598 | 599 | """ 600 | ${RGI} load --local -i ${card_db} --debug 601 | 602 | # We are using the code provided in the following RGI github issue https://github.com/arpcard/rgi/issues/93 603 | set +e 604 | echo "Run RGI the first time" 605 | ${RGI} main --input_sequence ${fasta} --output_file ${sample_id}_rgi_output -a diamond --local 606 | set -e 607 | echo "Run RGI again" 608 | ${RGI} main --input_sequence ${fasta} --output_file ${sample_id}_rgi_output -a diamond --local 609 | 610 | """ 611 | } 612 | 613 | 614 | process DedupSNPconfirmation { 615 | tag { sample_id } 616 | errorStrategy 'ignore' 617 | publishDir "${params.output}/DedupSNPConfirmation", mode: "copy", 618 | saveAs: { filename -> 619 | if(filename.indexOf("_rgi_perfect_hits.csv") > 0) "Perfect_RGI/$filename" 620 | else if(filename.indexOf("_rgi_strict_hits.csv") > 0) "Strict_RGI/$filename" 621 | else if(filename.indexOf("_rgi_loose_hits.csv") > 0) "Loose_RGI/$filename" 622 | else {} 623 | } 624 | 625 | input: 626 | set sample_id, file(rgi) from dedup_rgi_results 627 | 628 | output: 629 | set sample_id, file("${sample_id}_rgi_perfect_hits.csv") into dedup_perfect_snp_long_hits 630 | """ 631 | ${PYTHON3} $baseDir/bin/RGI_aro_hits.py ${rgi} ${sample_id} 632 | """ 633 | } 634 | 635 | process ConfirmDedupAMRHits { 636 | tag { sample_id } 637 | 638 | errorStrategy 'ignore' 639 | publishDir "${params.output}/SNP_confirmed_counts", mode: "copy" 640 | 641 | input: 642 | set sample_id, file(megares_counts) from dedup_resistome_hits 643 | set sample_id, file(perfect_rgi_counts) from dedup_perfect_snp_long_hits 644 | 645 | output: 646 | file("${sample_id}*perfect_SNP_confirmed_counts") into dedup_perfect_confirmed_counts 647 | 648 | """ 649 | ${PYTHON3} $baseDir/bin/RGI_long_combine.py ${perfect_rgi_counts} ${megares_counts} ${sample_id}.perfect_SNP_confirmed_counts ${sample_id} 650 | """ 651 | } 652 | 653 | 654 | dedup_perfect_confirmed_counts.toSortedList().set { dedup_perfect_confirmed_amr_l_to_w } 655 | 656 | process DedupSNPConfirmed_ResistomeResults { 657 | tag {} 658 | errorStrategy 'ignore' 659 | publishDir "${params.output}/Confirmed_ResistomeResults", mode: "copy" 660 | 661 | input: 662 | file(perfect_confirmed_resistomes) from dedup_perfect_confirmed_amr_l_to_w 663 | 664 | output: 665 | file("perfect_SNP_confirmed_AMR_analytic_matrix.csv") into dedup_perfect_confirmed_matrix 666 | 667 | """ 668 | ${PYTHON3} $baseDir/bin/amr_long_to_wide.py -i ${perfect_confirmed_resistomes} -o perfect_SNP_confirmed_dedup_AMR_analytic_matrix.csv 669 | """ 670 | } 671 | 672 | 673 | 674 | 675 | 676 | def nextflow_version_error() { 677 | println "" 678 | println "This workflow requires Nextflow version 0.25 or greater -- You are running version $nextflow.version" 679 | println "Run ./nextflow self-update to update Nextflow to the latest available version." 680 | println "" 681 | return 1 682 | } 683 | 684 | def adapter_error(def input) { 685 | println "" 686 | println "[params.adapters] fail to open: '" + input + "' : No such file or directory" 687 | println "" 688 | return 1 689 | } 690 | 691 | def amr_error(def input) { 692 | println "" 693 | println "[params.amr] fail to open: '" + input + "' : No such file or directory" 694 | println "" 695 | return 1 696 | } 697 | 698 | def annotation_error(def input) { 699 | println "" 700 | println "[params.annotation] fail to open: '" + input + "' : No such file or directory" 701 | println "" 702 | return 1 703 | } 704 | 705 | def fastq_error(def input) { 706 | println "" 707 | println "[params.reads] fail to open: '" + input + "' : No such file or directory" 708 | println "" 709 | return 1 710 | } 711 | 712 | def host_error(def input) { 713 | println "" 714 | println "[params.host] fail to open: '" + input + "' : No such file or directory" 715 | println "" 716 | return 1 717 | } 718 | 719 | def index_error(def input) { 720 | println "" 721 | println "[params.host_index] fail to open: '" + input + "' : No such file or directory" 722 | println "" 723 | return 1 724 | } 725 | 726 | def help() { 727 | println "" 728 | println "Program: AmrPlusPlus" 729 | println "Documentation: https://github.com/colostatemeg/amrplusplus/blob/master/README.md" 730 | println "Contact: Christopher Dean " 731 | println "" 732 | println "Usage: nextflow run main.nf [options]" 733 | println "" 734 | println "Input/output options:" 735 | println "" 736 | println " --reads STR path to FASTQ formatted input sequences" 737 | println " --adapters STR path to FASTA formatted adapter sequences" 738 | println " --host STR path to FASTA formatted host genome" 739 | println " --host_index STR path to BWA generated index files" 740 | println " --amr STR path to AMR resistance database" 741 | println " --annotation STR path to AMR annotation file" 742 | println " --output STR directory to write process outputs to" 743 | println " --KRAKENDB STR path to kraken database" 744 | println "" 745 | println "Trimming options:" 746 | println "" 747 | println " --leading INT cut bases off the start of a read, if below a threshold quality" 748 | println " --minlen INT drop the read if it is below a specified length" 749 | println " --slidingwindow INT perform sw trimming, cutting once the average quality within the window falls below a threshold" 750 | println " --trailing INT cut bases off the end of a read, if below a threshold quality" 751 | println "" 752 | println "Algorithm options:" 753 | println "" 754 | println " --threads INT number of threads to use for each process" 755 | println " --threshold INT gene fraction threshold" 756 | println " --min INT starting sample level" 757 | println " --max INT ending sample level" 758 | println " --samples INT number of sampling iterations to perform" 759 | println " --skip INT number of levels to skip" 760 | println "" 761 | println "Help options:" 762 | println "" 763 | println " --help display this message" 764 | println "" 765 | return 1 766 | } 767 | -------------------------------------------------------------------------------- /main_AmrPlusPlus_v2_withRGI_Kraken.nf: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env nextflow 2 | 3 | /* 4 | vim: syntax=groovy 5 | -*- mode: groovy;-*- 6 | */ 7 | 8 | if (params.help ) { 9 | return help() 10 | } 11 | if( params.host_index ) { 12 | host_index = Channel.fromPath(params.host_index).toSortedList() 13 | //if( host_index.isEmpty() ) return index_error(host_index) 14 | } 15 | if( params.host ) { 16 | host = file(params.host) 17 | if( !host.exists() ) return host_error(host) 18 | } 19 | if( params.amr ) { 20 | amr = file(params.amr) 21 | if( !amr.exists() ) return amr_error(amr) 22 | } 23 | if( params.adapters ) { 24 | adapters = file(params.adapters) 25 | if( !adapters.exists() ) return adapter_error(adapters) 26 | } 27 | if( params.annotation ) { 28 | annotation = file(params.annotation) 29 | if( !annotation.exists() ) return annotation_error(annotation) 30 | } 31 | if(params.kraken_db) { 32 | kraken_db = file(params.kraken_db) 33 | } 34 | 35 | card_db = file(params.card_db) 36 | 37 | threads = params.threads 38 | 39 | threshold = params.threshold 40 | 41 | min = params.min 42 | max = params.max 43 | skip = params.skip 44 | samples = params.samples 45 | 46 | leading = params.leading 47 | trailing = params.trailing 48 | slidingwindow = params.slidingwindow 49 | minlen = params.minlen 50 | 51 | Channel 52 | .fromFilePairs( params.reads, flat: true ) 53 | .ifEmpty { exit 1, "Read pair files could not be found: ${params.reads}" } 54 | .set { reads } 55 | 56 | process RunQC { 57 | tag { sample_id } 58 | 59 | publishDir "${params.output}/RunQC", mode: 'copy', pattern: '*.fastq.gz', 60 | saveAs: { filename -> 61 | if(filename.indexOf("P.fastq.gz") > 0) "Paired/$filename" 62 | else if(filename.indexOf("U.fastq.gz") > 0) "Unpaired/$filename" 63 | else {} 64 | } 65 | 66 | input: 67 | set sample_id, file(forward), file(reverse) from reads 68 | 69 | output: 70 | set sample_id, file("${sample_id}.1P.fastq.gz"), file("${sample_id}.2P.fastq.gz") into (paired_fastq) 71 | set sample_id, file("${sample_id}.1U.fastq.gz"), file("${sample_id}.2U.fastq.gz") into (unpaired_fastq) 72 | file("${sample_id}.trimmomatic.stats.log") into (trimmomatic_stats) 73 | 74 | """ 75 | ${JAVA} -jar ${TRIMMOMATIC} \ 76 | PE \ 77 | -threads ${threads} \ 78 | $forward $reverse ${sample_id}.1P.fastq.gz ${sample_id}.1U.fastq.gz ${sample_id}.2P.fastq.gz ${sample_id}.2U.fastq.gz \ 79 | ILLUMINACLIP:${adapters}:2:30:10:3:TRUE \ 80 | LEADING:${leading} \ 81 | TRAILING:${trailing} \ 82 | SLIDINGWINDOW:${slidingwindow} \ 83 | MINLEN:${minlen} \ 84 | 2> ${sample_id}.trimmomatic.stats.log 85 | """ 86 | } 87 | 88 | trimmomatic_stats.toSortedList().set { trim_stats } 89 | 90 | process QCStats { 91 | tag { sample_id } 92 | 93 | publishDir "${params.output}/RunQC", mode: 'copy', 94 | saveAs: { filename -> 95 | if(filename.indexOf(".stats") > 0) "Stats/$filename" 96 | else {} 97 | } 98 | 99 | input: 100 | file(stats) from trim_stats 101 | 102 | output: 103 | file("trimmomatic.stats") 104 | 105 | """ 106 | ${PYTHON3} $baseDir/bin/trimmomatic_stats.py -i ${stats} -o trimmomatic.stats 107 | """ 108 | } 109 | 110 | if( !params.host_index ) { 111 | process BuildHostIndex { 112 | publishDir "${params.output}/BuildHostIndex", mode: "copy" 113 | 114 | tag { host.baseName } 115 | 116 | input: 117 | file(host) 118 | 119 | output: 120 | file '*' into (host_index) 121 | 122 | """ 123 | ${BWA} index ${host} 124 | """ 125 | } 126 | } 127 | 128 | process AlignReadsToHost { 129 | tag { sample_id } 130 | 131 | publishDir "${params.output}/AlignReadsToHost", mode: "copy" 132 | 133 | input: 134 | set sample_id, file(forward), file(reverse) from paired_fastq 135 | file index from host_index 136 | file host 137 | 138 | output: 139 | set sample_id, file("${sample_id}.host.sam") into (host_sam) 140 | 141 | """ 142 | ${BWA} mem ${host} ${forward} ${reverse} -t ${threads} > ${sample_id}.host.sam 143 | """ 144 | } 145 | 146 | process RemoveHostDNA { 147 | tag { sample_id } 148 | 149 | publishDir "${params.output}/RemoveHostDNA", mode: "copy", pattern: '*.bam', 150 | saveAs: { filename -> 151 | if(filename.indexOf(".bam") > 0) "NonHostBAM/$filename" 152 | } 153 | 154 | input: 155 | set sample_id, file(sam) from host_sam 156 | 157 | output: 158 | set sample_id, file("${sample_id}.host.sorted.removed.bam") into (non_host_bam) 159 | file("${sample_id}.samtools.idxstats") into (idxstats_logs) 160 | 161 | """ 162 | ${SAMTOOLS} view -bS ${sam} | ${SAMTOOLS} sort -@ ${threads} -o ${sample_id}.host.sorted.bam 163 | ${SAMTOOLS} index ${sample_id}.host.sorted.bam && ${SAMTOOLS} idxstats ${sample_id}.host.sorted.bam > ${sample_id}.samtools.idxstats 164 | ${SAMTOOLS} view -h -f 4 -b ${sample_id}.host.sorted.bam -o ${sample_id}.host.sorted.removed.bam 165 | """ 166 | } 167 | 168 | idxstats_logs.toSortedList().set { host_removal_stats } 169 | 170 | process HostRemovalStats { 171 | tag { sample_id } 172 | 173 | publishDir "${params.output}/RemoveHostDNA", mode: "copy", 174 | saveAs: { filename -> 175 | if(filename.indexOf(".stats") > 0) "HostRemovalStats/$filename" 176 | } 177 | 178 | input: 179 | file(stats) from host_removal_stats 180 | 181 | output: 182 | file("host.removal.stats") 183 | 184 | """ 185 | ${PYTHON3} $baseDir/bin/samtools_idxstats.py -i ${stats} -o host.removal.stats 186 | """ 187 | } 188 | 189 | process NonHostReads { 190 | tag { sample_id } 191 | 192 | publishDir "${params.output}/NonHostReads", mode: "copy" 193 | 194 | input: 195 | set sample_id, file(bam) from non_host_bam 196 | 197 | output: 198 | set sample_id, file("${sample_id}.non.host.R1.fastq.gz"), file("${sample_id}.non.host.R2.fastq.gz") into (non_host_fastq_megares, non_host_fastq_dedup,non_host_fastq_kraken) 199 | 200 | """ 201 | ${BEDTOOLS} \ 202 | bamtofastq \ 203 | -i ${bam} \ 204 | -fq ${sample_id}.non.host.R1.fastq.gz \ 205 | -fq2 ${sample_id}.non.host.R2.fastq.gz 206 | """ 207 | } 208 | 209 | 210 | /* 211 | - 212 | -- 213 | --- 214 | ---- nonhost reads for megares and kraken2 215 | --- 216 | -- 217 | - 218 | */ 219 | 220 | 221 | /* 222 | ---- Run Kraken2 223 | */ 224 | 225 | 226 | 227 | process RunKraken { 228 | tag { sample_id } 229 | 230 | publishDir "${params.output}/RunKraken", mode: 'copy', 231 | saveAs: { filename -> 232 | if(filename.indexOf(".kraken.raw") > 0) "Standard/$filename" 233 | else if(filename.indexOf(".kraken.report") > 0) "Standard_report/$filename" 234 | else if(filename.indexOf(".kraken.filtered.report") > 0) "Filtered_report/$filename" 235 | else if(filename.indexOf(".kraken.filtered.raw") > 0) "Filtered/$filename" 236 | else {} 237 | } 238 | 239 | input: 240 | set sample_id, file(forward), file(reverse) from non_host_fastq_kraken 241 | 242 | output: 243 | file("${sample_id}.kraken.report") into (kraken_report,kraken_extract_taxa) 244 | set sample_id, file("${sample_id}.kraken.raw") into kraken_raw 245 | file("${sample_id}.kraken.filtered.report") into kraken_filter_report 246 | file("${sample_id}.kraken.filtered.raw") into kraken_filter_raw 247 | 248 | """ 249 | ${KRAKEN2} --db ${kraken_db} --paired ${forward} ${reverse} --threads ${threads} --report ${sample_id}.kraken.report > ${sample_id}.kraken.raw 250 | ${KRAKEN2} --db ${kraken_db} --confidence 1 --paired ${forward} ${reverse} --threads ${threads} --report ${sample_id}.kraken.filtered.report > ${sample_id}.kraken.filtered.raw 251 | """ 252 | } 253 | 254 | kraken_report.toSortedList().set { kraken_l_to_w } 255 | kraken_filter_report.toSortedList().set { kraken_filter_l_to_w } 256 | 257 | process KrakenResults { 258 | tag { } 259 | 260 | publishDir "${params.output}/KrakenResults", mode: "copy" 261 | 262 | input: 263 | file(kraken_reports) from kraken_l_to_w 264 | 265 | output: 266 | file("kraken_analytic_matrix.csv") into kraken_master_matrix 267 | 268 | """ 269 | ${PYTHON3} $baseDir/bin/kraken2_long_to_wide.py -i ${kraken_reports} -o kraken_analytic_matrix.csv 270 | """ 271 | } 272 | 273 | process FilteredKrakenResults { 274 | tag { sample_id } 275 | 276 | publishDir "${params.output}/FilteredKrakenResults", mode: "copy" 277 | 278 | input: 279 | file(kraken_reports) from kraken_filter_l_to_w 280 | 281 | output: 282 | file("filtered_kraken_analytic_matrix.csv") into filter_kraken_master_matrix 283 | 284 | """ 285 | ${PYTHON3} $baseDir/bin/kraken2_long_to_wide.py -i ${kraken_reports} -o filtered_kraken_analytic_matrix.csv 286 | """ 287 | } 288 | 289 | 290 | 291 | /* 292 | ---- Run alignment to MEGAres 293 | */ 294 | 295 | if( !params.amr_index ) { 296 | process BuildAMRIndex { 297 | tag { amr.baseName } 298 | 299 | input: 300 | file(amr) 301 | 302 | output: 303 | file '*' into (amr_index) 304 | 305 | """ 306 | ${BWA} index ${amr} 307 | """ 308 | } 309 | } 310 | 311 | process AlignToAMR { 312 | tag { sample_id } 313 | 314 | 315 | 316 | 317 | 318 | publishDir "${params.output}/AlignToAMR", mode: "copy" 319 | 320 | input: 321 | set sample_id, file(forward), file(reverse) from non_host_fastq_megares 322 | file index from amr_index 323 | file amr 324 | 325 | output: 326 | set sample_id, file("${sample_id}.amr.alignment.sam") into (megares_resistome_sam, megares_rarefaction_sam, megares_snp_sam , megares_snpfinder_sam, megares_RGI_sam) 327 | set sample_id, file("${sample_id}.amr.alignment.dedup.sam") into (megares_dedup_resistome_sam,megares_dedup_RGI_sam) 328 | set sample_id, file("${sample_id}.amr.alignment.dedup.bam") into (megares_dedup_resistome_bam) 329 | 330 | 331 | """ 332 | ${BWA} mem ${amr} ${forward} ${reverse} -t ${threads} -R '@RG\\tID:${sample_id}\\tSM:${sample_id}' > ${sample_id}.amr.alignment.sam 333 | ${SAMTOOLS} view -S -b ${sample_id}.amr.alignment.sam > ${sample_id}.amr.alignment.bam 334 | ${SAMTOOLS} sort -n ${sample_id}.amr.alignment.bam -o ${sample_id}.amr.alignment.sorted.bam 335 | ${SAMTOOLS} fixmate ${sample_id}.amr.alignment.sorted.bam ${sample_id}.amr.alignment.sorted.fix.bam 336 | ${SAMTOOLS} sort ${sample_id}.amr.alignment.sorted.fix.bam -o ${sample_id}.amr.alignment.sorted.fix.sorted.bam 337 | ${SAMTOOLS} rmdup -S ${sample_id}.amr.alignment.sorted.fix.sorted.bam ${sample_id}.amr.alignment.dedup.bam 338 | ${SAMTOOLS} view -h -o ${sample_id}.amr.alignment.dedup.sam ${sample_id}.amr.alignment.dedup.bam 339 | rm ${sample_id}.amr.alignment.bam 340 | rm ${sample_id}.amr.alignment.sorted*.bam 341 | """ 342 | } 343 | 344 | process RunResistome { 345 | tag { sample_id } 346 | 347 | publishDir "${params.output}/RunResistome", mode: "copy" 348 | 349 | input: 350 | set sample_id, file(sam) from megares_resistome_sam 351 | file annotation 352 | file amr 353 | 354 | output: 355 | file("${sample_id}.gene.tsv") into (megares_resistome_counts, SNP_confirm_long) 356 | file("${sample_id}.group.tsv") into (megares_group_counts) 357 | file("${sample_id}.mechanism.tsv") into (megares_mech_counts) 358 | file("${sample_id}.class.tsv") into (megares_class_counts) 359 | file("${sample_id}.type.tsv") into (megares_type_counts) 360 | 361 | """ 362 | $baseDir/bin/resistome -ref_fp ${amr} \ 363 | -annot_fp ${annotation} \ 364 | -sam_fp ${sam} \ 365 | -gene_fp ${sample_id}.gene.tsv \ 366 | -group_fp ${sample_id}.group.tsv \ 367 | -mech_fp ${sample_id}.mechanism.tsv \ 368 | -class_fp ${sample_id}.class.tsv \ 369 | -type_fp ${sample_id}.type.tsv \ 370 | -t ${threshold} 371 | """ 372 | } 373 | 374 | megares_resistome_counts.toSortedList().set { megares_amr_l_to_w } 375 | 376 | process ResistomeResults { 377 | tag { } 378 | 379 | publishDir "${params.output}/ResistomeResults", mode: "copy" 380 | 381 | input: 382 | file(resistomes) from megares_amr_l_to_w 383 | 384 | output: 385 | file("AMR_analytic_matrix.csv") into amr_master_matrix 386 | 387 | """ 388 | ${PYTHON3} $baseDir/bin/amr_long_to_wide.py -i ${resistomes} -o AMR_analytic_matrix.csv 389 | """ 390 | } 391 | 392 | 393 | /* samtools deduplication of megares alignment */ 394 | process SamDedupRunResistome { 395 | tag { sample_id } 396 | 397 | publishDir "${params.output}/SamDedupRunResistome", mode: "copy" 398 | 399 | input: 400 | set sample_id, file(sam) from megares_dedup_resistome_sam 401 | file annotation 402 | file amr 403 | 404 | output: 405 | file("${sample_id}.gene.tsv") into (megares_dedup_resistome_counts) 406 | file("${sample_id}.group.tsv") into (megares_dedup_group_counts) 407 | file("${sample_id}.mechanism.tsv") into (megares_dedup_mech_counts) 408 | file("${sample_id}.class.tsv") into (megares_dedup_class_counts) 409 | file("${sample_id}.type.tsv") into (megares_dedup_type_counts) 410 | 411 | """ 412 | $baseDir/bin/resistome -ref_fp ${amr} \ 413 | -annot_fp ${annotation} \ 414 | -sam_fp ${sam} \ 415 | -gene_fp ${sample_id}.gene.tsv \ 416 | -group_fp ${sample_id}.group.tsv \ 417 | -mech_fp ${sample_id}.mechanism.tsv \ 418 | -class_fp ${sample_id}.class.tsv \ 419 | -type_fp ${sample_id}.type.tsv \ 420 | -t ${threshold} 421 | """ 422 | } 423 | 424 | megares_dedup_resistome_counts.toSortedList().set { megares_dedup_amr_l_to_w } 425 | 426 | process SamDedupResistomeResults { 427 | tag { } 428 | 429 | publishDir "${params.output}/SamDedup_ResistomeResults", mode: "copy" 430 | 431 | input: 432 | file(resistomes) from megares_dedup_amr_l_to_w 433 | 434 | output: 435 | file("SamDedup_AMR_analytic_matrix.csv") into megares_dedup_amr_master_matrix 436 | 437 | """ 438 | ${PYTHON3} $baseDir/bin/amr_long_to_wide.py -i ${resistomes} -o SamDedup_AMR_analytic_matrix.csv 439 | """ 440 | } 441 | 442 | process RunRarefaction { 443 | tag { sample_id } 444 | 445 | publishDir "${params.output}/RunRarefaction", mode: "copy" 446 | 447 | input: 448 | set sample_id, file(sam) from megares_rarefaction_sam 449 | file annotation 450 | file amr 451 | 452 | output: 453 | set sample_id, file("*.tsv") into (rarefaction) 454 | 455 | """ 456 | $baseDir/bin/rarefaction \ 457 | -ref_fp ${amr} \ 458 | -sam_fp ${sam} \ 459 | -annot_fp ${annotation} \ 460 | -gene_fp ${sample_id}.gene.tsv \ 461 | -group_fp ${sample_id}.group.tsv \ 462 | -mech_fp ${sample_id}.mech.tsv \ 463 | -class_fp ${sample_id}.class.tsv \ 464 | -type_fp ${sample_id}.type.tsv \ 465 | -min ${min} \ 466 | -max ${max} \ 467 | -skip ${skip} \ 468 | -samples ${samples} \ 469 | -t ${threshold} 470 | """ 471 | } 472 | 473 | 474 | 475 | /* 476 | ---- Confirmation of alignments to genes that require SNP confirmation with RGI 477 | */ 478 | 479 | process ExtractSNP { 480 | tag { sample_id } 481 | 482 | errorStrategy 'ignore' 483 | 484 | publishDir "${params.output}/ExtractMegaresSNPs", mode: "copy", 485 | saveAs: { filename -> 486 | if(filename.indexOf(".snp.fasta") > 0) "SNP_fasta/$filename" 487 | else if(filename.indexOf("gene.tsv") > 0) "Gene_hits/$filename" 488 | else {} 489 | } 490 | 491 | input: 492 | set sample_id, file(sam) from megares_RGI_sam 493 | file annotation 494 | file amr 495 | 496 | output: 497 | set sample_id, file("*.snp.fasta") into megares_snp_fasta 498 | set sample_id, file("${sample_id}*.gene.tsv") into (resistome_hits) 499 | 500 | """ 501 | awk -F "\\t" '{if (\$1!="@SQ" && \$1!="@RG" && \$1!="@PG" && \$1!="@HD" && \$3="RequiresSNPConfirmation" ) {print ">"\$1"\\n"\$10}}' ${sam} | tr -d '"' > ${sample_id}.snp.fasta 502 | $baseDir/bin/resistome -ref_fp ${amr} \ 503 | -annot_fp ${annotation} \ 504 | -sam_fp ${sam} \ 505 | -gene_fp ${sample_id}.gene.tsv \ 506 | -group_fp ${sample_id}.group.tsv \ 507 | -mech_fp ${sample_id}.mechanism.tsv \ 508 | -class_fp ${sample_id}.class.tsv \ 509 | -type_fp ${sample_id}.type.tsv \ 510 | -t ${threshold} 511 | """ 512 | } 513 | 514 | 515 | 516 | 517 | process RunRGI { 518 | tag { sample_id } 519 | errorStrategy 'ignore' 520 | 521 | 522 | publishDir "${params.output}/RunRGI", mode: "symlink" 523 | 524 | input: 525 | set sample_id, file(fasta) from megares_snp_fasta 526 | file card_db 527 | 528 | output: 529 | set sample_id, file("${sample_id}*rgi_output.txt") into rgi_results 530 | 531 | """ 532 | ${RGI} load --local -i ${card_db} --debug 533 | 534 | # We are using the code provided in the following RGI github issue https://github.com/arpcard/rgi/issues/93 535 | set +e 536 | echo "Run RGI the first time" 537 | ${RGI} main --input_sequence ${fasta} --output_file ${sample_id}_rgi_output -a diamond --local 538 | set -e 539 | echo "Run RGI again" 540 | ${RGI} main --input_sequence ${fasta} --output_file ${sample_id}_rgi_output -a diamond --local 541 | 542 | 543 | """ 544 | } 545 | 546 | 547 | process SNPconfirmation { 548 | tag { sample_id } 549 | errorStrategy 'ignore' 550 | 551 | publishDir "${params.output}/SNPConfirmation", mode: "copy", 552 | saveAs: { filename -> 553 | if(filename.indexOf("_rgi_perfect_hits.csv") > 0) "Perfect_RGI/$filename" 554 | else if(filename.indexOf("_rgi_strict_hits.csv") > 0) "Strict_RGI/$filename" 555 | else if(filename.indexOf("_rgi_loose_hits.csv") > 0) "Loose_RGI/$filename" 556 | else {} 557 | } 558 | 559 | input: 560 | set sample_id, file(rgi) from rgi_results 561 | 562 | output: 563 | set sample_id, file("${sample_id}_rgi_perfect_hits.csv") into perfect_snp_long_hits 564 | """ 565 | ${PYTHON3} $baseDir/bin/RGI_aro_hits.py ${rgi} ${sample_id} 566 | """ 567 | } 568 | 569 | process Confirmed_AMR_hits { 570 | tag { sample_id } 571 | 572 | publishDir "${params.output}/SNP_confirmed_counts", mode: "copy" 573 | 574 | input: 575 | set sample_id, file(megares_counts) from resistome_hits 576 | set sample_id, file(perfect_rgi_counts) from perfect_snp_long_hits 577 | 578 | output: 579 | file("${sample_id}*perfect_SNP_confirmed_counts") into perfect_confirmed_counts 580 | 581 | """ 582 | ${PYTHON3} $baseDir/bin/RGI_long_combine.py ${perfect_rgi_counts} ${megares_counts} ${sample_id}.perfect_SNP_confirmed_counts ${sample_id} 583 | """ 584 | } 585 | 586 | 587 | perfect_confirmed_counts.toSortedList().set { perfect_confirmed_amr_l_to_w } 588 | 589 | process Confirmed_ResistomeResults { 590 | tag {} 591 | 592 | publishDir "${params.output}/Confirmed_ResistomeResults", mode: "copy" 593 | 594 | input: 595 | file(perfect_confirmed_resistomes) from perfect_confirmed_amr_l_to_w 596 | 597 | output: 598 | file("perfect_SNP_confirmed_AMR_analytic_matrix.csv") into perfect_confirmed_matrix 599 | 600 | """ 601 | ${PYTHON3} $baseDir/bin/amr_long_to_wide.py -i ${perfect_confirmed_resistomes} -o perfect_SNP_confirmed_AMR_analytic_matrix.csv 602 | """ 603 | } 604 | 605 | /* 606 | ---- Confirmation of deduped alignments to genes that require SNP confirmation with RGI. 607 | */ 608 | 609 | 610 | process ExtractDedupSNP { 611 | tag { sample_id } 612 | 613 | publishDir "${params.output}/ExtractDedupMegaresSNPs", mode: "copy", 614 | saveAs: { filename -> 615 | if(filename.indexOf(".snp.fasta") > 0) "SNP_fasta/$filename" 616 | else if(filename.indexOf("gene.tsv") > 0) "Gene_hits/$filename" 617 | else {} 618 | } 619 | 620 | input: 621 | set sample_id, file(sam) from megares_dedup_RGI_sam 622 | file annotation 623 | file amr 624 | 625 | output: 626 | set sample_id, file("*.snp.fasta") into dedup_megares_snp_fasta 627 | set sample_id, file("${sample_id}*.gene.tsv") into (dedup_resistome_hits) 628 | 629 | """ 630 | awk -F "\\t" '{if (\$1!="@SQ" && \$1!="@RG" && \$1!="@PG" && \$1!="@HD" && \$3="RequiresSNPConfirmation" ) {print ">"\$1"\\n"\$10}}' ${sam} | tr -d '"' > ${sample_id}.snp.fasta 631 | ${RESISTOME} -ref_fp ${amr} \ 632 | -annot_fp ${annotation} \ 633 | -sam_fp ${sam} \ 634 | -gene_fp ${sample_id}.gene.tsv \ 635 | -group_fp ${sample_id}.group.tsv \ 636 | -class_fp ${sample_id}.class.tsv \ 637 | -mech_fp ${sample_id}.mechanism.tsv \ 638 | -t ${threshold} 639 | 640 | """ 641 | } 642 | 643 | process RunDedupRGI { 644 | tag { sample_id } 645 | errorStrategy 'ignore' 646 | publishDir "${params.output}/RunDedupRGI", mode: "copy" 647 | 648 | input: 649 | set sample_id, file(fasta) from dedup_megares_snp_fasta 650 | file card_db 651 | 652 | output: 653 | set sample_id, file("${sample_id}_rgi_output.txt") into dedup_rgi_results 654 | 655 | """ 656 | ${RGI} load --local -i ${card_db} --debug 657 | 658 | # We are using the code provided in the following RGI github issue https://github.com/arpcard/rgi/issues/93 659 | set +e 660 | echo "Run RGI the first time" 661 | ${RGI} main --input_sequence ${fasta} --output_file ${sample_id}_rgi_output -a diamond --local 662 | set -e 663 | echo "Run RGI again" 664 | ${RGI} main --input_sequence ${fasta} --output_file ${sample_id}_rgi_output -a diamond --local 665 | 666 | """ 667 | } 668 | 669 | 670 | process DedupSNPconfirmation { 671 | tag { sample_id } 672 | errorStrategy 'ignore' 673 | 674 | publishDir "${params.output}/DedupSNPConfirmation", mode: "copy", 675 | saveAs: { filename -> 676 | if(filename.indexOf("_rgi_perfect_hits.csv") > 0) "Perfect_RGI/$filename" 677 | else if(filename.indexOf("_rgi_strict_hits.csv") > 0) "Strict_RGI/$filename" 678 | else if(filename.indexOf("_rgi_loose_hits.csv") > 0) "Loose_RGI/$filename" 679 | else {} 680 | } 681 | 682 | input: 683 | set sample_id, file(rgi) from dedup_rgi_results 684 | 685 | output: 686 | set sample_id, file("${sample_id}_rgi_perfect_hits.csv") into dedup_perfect_snp_long_hits 687 | """ 688 | ${PYTHON3} $baseDir/bin/RGI_aro_hits.py ${rgi} ${sample_id} 689 | """ 690 | } 691 | 692 | process ConfirmDedupAMRHits { 693 | tag { sample_id } 694 | 695 | publishDir "${params.output}/SNP_confirmed_counts", mode: "copy" 696 | 697 | input: 698 | set sample_id, file(megares_counts) from dedup_resistome_hits 699 | set sample_id, file(perfect_rgi_counts) from dedup_perfect_snp_long_hits 700 | 701 | output: 702 | file("${sample_id}*perfect_SNP_confirmed_counts") into dedup_perfect_confirmed_counts 703 | 704 | """ 705 | ${PYTHON3} $baseDir/bin/RGI_long_combine.py ${perfect_rgi_counts} ${megares_counts} ${sample_id}.perfect_SNP_confirmed_counts ${sample_id} 706 | """ 707 | } 708 | 709 | 710 | dedup_perfect_confirmed_counts.toSortedList().set { dedup_perfect_confirmed_amr_l_to_w } 711 | 712 | process DedupSNPConfirmed_ResistomeResults { 713 | tag {} 714 | 715 | publishDir "${params.output}/Confirmed_ResistomeResults", mode: "copy" 716 | 717 | input: 718 | file(perfect_confirmed_resistomes) from dedup_perfect_confirmed_amr_l_to_w 719 | 720 | output: 721 | file("perfect_SNP_confirmed_AMR_analytic_matrix.csv") into dedup_perfect_confirmed_matrix 722 | 723 | """ 724 | ${PYTHON3} $baseDir/bin/amr_long_to_wide.py -i ${perfect_confirmed_resistomes} -o perfect_SNP_confirmed_dedup_AMR_analytic_matrix.csv 725 | """ 726 | } 727 | 728 | 729 | 730 | 731 | 732 | def nextflow_version_error() { 733 | println "" 734 | println "This workflow requires Nextflow version 0.25 or greater -- You are running version $nextflow.version" 735 | println "Run ./nextflow self-update to update Nextflow to the latest available version." 736 | println "" 737 | return 1 738 | } 739 | 740 | def adapter_error(def input) { 741 | println "" 742 | println "[params.adapters] fail to open: '" + input + "' : No such file or directory" 743 | println "" 744 | return 1 745 | } 746 | 747 | def amr_error(def input) { 748 | println "" 749 | println "[params.amr] fail to open: '" + input + "' : No such file or directory" 750 | println "" 751 | return 1 752 | } 753 | 754 | def annotation_error(def input) { 755 | println "" 756 | println "[params.annotation] fail to open: '" + input + "' : No such file or directory" 757 | println "" 758 | return 1 759 | } 760 | 761 | def fastq_error(def input) { 762 | println "" 763 | println "[params.reads] fail to open: '" + input + "' : No such file or directory" 764 | println "" 765 | return 1 766 | } 767 | 768 | def host_error(def input) { 769 | println "" 770 | println "[params.host] fail to open: '" + input + "' : No such file or directory" 771 | println "" 772 | return 1 773 | } 774 | 775 | def index_error(def input) { 776 | println "" 777 | println "[params.host_index] fail to open: '" + input + "' : No such file or directory" 778 | println "" 779 | return 1 780 | } 781 | 782 | def help() { 783 | println "" 784 | println "Program: AmrPlusPlus" 785 | println "Documentation: https://github.com/colostatemeg/amrplusplus/blob/master/README.md" 786 | println "Contact: Christopher Dean " 787 | println "" 788 | println "Usage: nextflow run main.nf [options]" 789 | println "" 790 | println "Input/output options:" 791 | println "" 792 | println " --reads STR path to FASTQ formatted input sequences" 793 | println " --adapters STR path to FASTA formatted adapter sequences" 794 | println " --host STR path to FASTA formatted host genome" 795 | println " --host_index STR path to BWA generated index files" 796 | println " --amr STR path to AMR resistance database" 797 | println " --annotation STR path to AMR annotation file" 798 | println " --output STR directory to write process outputs to" 799 | println " --KRAKENDB STR path to kraken database" 800 | println "" 801 | println "Trimming options:" 802 | println "" 803 | println " --leading INT cut bases off the start of a read, if below a threshold quality" 804 | println " --minlen INT drop the read if it is below a specified length" 805 | println " --slidingwindow INT perform sw trimming, cutting once the average quality within the window falls below a threshold" 806 | println " --trailing INT cut bases off the end of a read, if below a threshold quality" 807 | println "" 808 | println "Algorithm options:" 809 | println "" 810 | println " --threads INT number of threads to use for each process" 811 | println " --threshold INT gene fraction threshold" 812 | println " --min INT starting sample level" 813 | println " --max INT ending sample level" 814 | println " --samples INT number of sampling iterations to perform" 815 | println " --skip INT number of levels to skip" 816 | println "" 817 | println "Help options:" 818 | println "" 819 | println " --help display this message" 820 | println "" 821 | return 1 822 | } 823 | -------------------------------------------------------------------------------- /nextflow.config: -------------------------------------------------------------------------------- 1 | manifest { 2 | /* Homepage of project */ 3 | homePage = 'https://github.com/meglab-metagenomics/amrplusplus_v2' 4 | 5 | /* Description of project */ 6 | description = 'AmrPlusPlus: A bioinformatic pipeline for characterizing the resistome with the MEGARes database and the microbiome using Kraken.' 7 | 8 | /* Main pipeline script */ 9 | mainScript = 'main_AmrPlusPlus_v2.nf' 10 | 11 | /* Default repository branch */ 12 | defaultBranch = 'master' 13 | } 14 | 15 | params { 16 | /* Location of forward and reverse read pairs */ 17 | reads = "data/raw/*_R{1,2}.fastq.gz" 18 | 19 | /* Location of adapter sequences */ 20 | adapters = "data/adapters/nextera.fa" 21 | 22 | /* Location of host genome index files */ 23 | host_index = "" 24 | 25 | /* Location of host genome */ 26 | host = "data/host/chr21.fasta.gz" 27 | 28 | /* Kraken database location, default is "none" */ 29 | kraken_db = "minikraken2_v2_8GB_201904_UPDATE" 30 | 31 | /* Location of amr index files */ 32 | amr_index = "" 33 | 34 | /* Location of antimicrobial resistance (MEGARes) database */ 35 | amr = "data/amr/megares_modified_database_v2.00.fasta" 36 | 37 | /* Location of amr annotation file */ 38 | annotation = "data/amr/megares_modified_annotations_v2.00.csv" 39 | 40 | /* Location of SNP confirmation script */ 41 | snp_confirmation = "bin/snp_confirmation.py" 42 | 43 | /* Output directory */ 44 | output = "test_results" 45 | 46 | /* Number of threads */ 47 | threads = 10 48 | smem_threads = 12 49 | 50 | /* Trimmomatic trimming parameters */ 51 | leading = 3 52 | trailing = 3 53 | slidingwindow = "4:15" 54 | minlen = 36 55 | 56 | /* Resistome threshold */ 57 | threshold = 80 58 | 59 | /* Starting rarefaction level */ 60 | min = 5 61 | 62 | /* Ending rarefaction level */ 63 | max = 100 64 | 65 | /* Number of levels to skip */ 66 | skip = 5 67 | 68 | /* Number of iterations to sample at */ 69 | samples = 1 70 | 71 | /* Display help message */ 72 | help = false 73 | } 74 | 75 | /* These files correspond to configuration files that can be edited to best suit your computing environment. */ 76 | profiles { 77 | local { 78 | includeConfig "config/local.config" 79 | } 80 | MEG_AMI { 81 | includeConfig "config/MEG_AMI.config" 82 | } 83 | local_angus { 84 | includeConfig "config/local_angus.config" 85 | } 86 | local_MSI { 87 | includeConfig "config/local_MSI.config" 88 | } 89 | singularity_slurm { 90 | process.executor = 'slurm' 91 | includeConfig "config/singularity_slurm.config" 92 | process.container = 'shub://meglab-metagenomics/amrplusplus_v2' 93 | } 94 | singularity { 95 | includeConfig "config/singularity.config" 96 | process.container = 'shub://meglab-metagenomics/amrplusplus_v2' 97 | } 98 | } 99 | -------------------------------------------------------------------------------- /previous_versions/main_amr_plus_plus_v1.nf: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env nextflow 2 | 3 | /* 4 | vim: syntax=groovy 5 | -*- mode: groovy;-*- 6 | */ 7 | 8 | if (params.help ) { 9 | return help() 10 | } 11 | if( params.host_index ) { 12 | host_index = Channel.fromPath(params.host_index).toSortedList() 13 | //if( host_index.isEmpty() ) return index_error(host_index) 14 | } 15 | if( params.host ) { 16 | host = file(params.host) 17 | if( !host.exists() ) return host_error(host) 18 | } 19 | if( params.amr ) { 20 | amr = file(params.amr) 21 | if( !amr.exists() ) return amr_error(amr) 22 | } 23 | if( params.adapters ) { 24 | adapters = file(params.adapters) 25 | if( !adapters.exists() ) return adapter_error(adapters) 26 | } 27 | if( params.annotation ) { 28 | annotation = file(params.annotation) 29 | if( !annotation.exists() ) return annotation_error(annotation) 30 | } 31 | 32 | kraken_db = params.kraken_db 33 | threads = params.threads 34 | 35 | threshold = params.threshold 36 | 37 | min = params.min 38 | max = params.max 39 | skip = params.skip 40 | samples = params.samples 41 | 42 | leading = params.leading 43 | trailing = params.trailing 44 | slidingwindow = params.slidingwindow 45 | minlen = params.minlen 46 | 47 | Channel 48 | .fromFilePairs( params.reads, flat: true ) 49 | .ifEmpty { exit 1, "Read pair files could not be found: ${params.reads}" } 50 | .set { reads } 51 | 52 | process RunQC { 53 | tag { sample_id } 54 | 55 | publishDir "${params.output}/RunQC", mode: 'copy', pattern: '*.fastq', 56 | saveAs: { filename -> 57 | if(filename.indexOf("P.fastq") > 0) "Paired/$filename" 58 | else if(filename.indexOf("U.fastq") > 0) "Unpaired/$filename" 59 | else {} 60 | } 61 | 62 | input: 63 | set sample_id, file(forward), file(reverse) from reads 64 | 65 | output: 66 | set sample_id, file("${sample_id}.1P.fastq.gz"), file("${sample_id}.2P.fastq.gz") into (paired_fastq) 67 | set sample_id, file("${sample_id}.1U.fastq.gz"), file("${sample_id}.2U.fastq.gz") into (unpaired_fastq) 68 | file("${sample_id}.trimmomatic.stats.log") into (trimmomatic_stats) 69 | 70 | """ 71 | ${JAVA} -jar ${TRIMMOMATIC}/trimmomatic.jar \ 72 | PE \ 73 | -threads ${threads} \ 74 | $forward $reverse ${sample_id}.1P.fastq ${sample_id}.1U.fastq ${sample_id}.2P.fastq ${sample_id}.2U.fastq \ 75 | ILLUMINACLIP:${adapters}:2:30:10:3:TRUE \ 76 | LEADING:${leading} \ 77 | TRAILING:${trailing} \ 78 | SLIDINGWINDOW:${slidingwindow} \ 79 | MINLEN:${minlen} \ 80 | 2> ${sample_id}.trimmomatic.stats.log 81 | 82 | gzip *fastq 83 | """ 84 | } 85 | 86 | trimmomatic_stats.toSortedList().set { trim_stats } 87 | 88 | process QCStats { 89 | tag { sample_id } 90 | 91 | publishDir "${params.output}/RunQC", mode: 'copy', 92 | saveAs: { filename -> 93 | if(filename.indexOf(".stats") > 0) "Stats/$filename" 94 | else {} 95 | } 96 | 97 | input: 98 | file(stats) from trim_stats 99 | 100 | output: 101 | file("trimmomatic.stats") 102 | 103 | """ 104 | ${PYTHON3} $baseDir/bin/trimmomatic_stats.py -i ${stats} -o trimmomatic.stats 105 | """ 106 | } 107 | 108 | if( !params.host_index ) { 109 | process BuildHostIndex { 110 | publishDir "${params.output}/BuildHostIndex", mode: "copy" 111 | 112 | tag { host.baseName } 113 | 114 | input: 115 | file(host) 116 | 117 | output: 118 | file '*' into (host_index) 119 | 120 | """ 121 | ${BWA} index ${host} 122 | """ 123 | } 124 | } 125 | 126 | process AlignReadsToHost { 127 | tag { sample_id } 128 | 129 | publishDir "${params.output}/AlignReadsToHost", mode: "copy" 130 | 131 | input: 132 | set sample_id, file(forward), file(reverse) from paired_fastq 133 | file index from host_index 134 | file host 135 | 136 | output: 137 | set sample_id, file("${sample_id}.host.sam") into (host_sam) 138 | 139 | """ 140 | ${BWA} mem ${host} ${forward} ${reverse} -t ${threads} > ${sample_id}.host.sam 141 | """ 142 | } 143 | 144 | process RemoveHostDNA { 145 | tag { sample_id } 146 | 147 | publishDir "${params.output}/RemoveHostDNA", mode: "copy", pattern: '*.bam', 148 | saveAs: { filename -> 149 | if(filename.indexOf(".bam") > 0) "NonHostBAM/$filename" 150 | } 151 | 152 | input: 153 | set sample_id, file(sam) from host_sam 154 | 155 | output: 156 | set sample_id, file("${sample_id}.host.sorted.removed.bam") into (non_host_bam) 157 | file("${sample_id}.samtools.idxstats") into (idxstats_logs) 158 | 159 | """ 160 | ${SAMTOOLS} view -bS ${sam} | ${SAMTOOLS} sort -@ ${threads} -o ${sample_id}.host.sorted.bam 161 | ${SAMTOOLS} index ${sample_id}.host.sorted.bam && ${SAMTOOLS} idxstats ${sample_id}.host.sorted.bam > ${sample_id}.samtools.idxstats 162 | ${SAMTOOLS} view -h -f 4 -b ${sample_id}.host.sorted.bam -o ${sample_id}.host.sorted.removed.bam 163 | """ 164 | } 165 | 166 | idxstats_logs.toSortedList().set { host_removal_stats } 167 | 168 | process HostRemovalStats { 169 | tag { sample_id } 170 | 171 | publishDir "${params.output}/RemoveHostDNA", mode: "copy", 172 | saveAs: { filename -> 173 | if(filename.indexOf(".stats") > 0) "HostRemovalStats/$filename" 174 | } 175 | 176 | input: 177 | file(stats) from host_removal_stats 178 | 179 | output: 180 | file("host.removal.stats") 181 | 182 | """ 183 | ${PYTHON3} $baseDir/bin/samtools_idxstats.py -i ${stats} -o host.removal.stats 184 | """ 185 | } 186 | 187 | process BAMToFASTQ { 188 | tag { sample_id } 189 | 190 | publishDir "${params.output}/BAMToFASTQ", mode: "copy" 191 | 192 | input: 193 | set sample_id, file(bam) from non_host_bam 194 | 195 | output: 196 | set sample_id, file("${sample_id}.non.host.R1.fastq"), file("${sample_id}.non.host.R2.fastq") into (non_host_fastq, non_host_fastq_kraken) 197 | 198 | """ 199 | ${BEDTOOLS} \ 200 | bamtofastq \ 201 | -i ${bam} \ 202 | -fq ${sample_id}.non.host.R1.fastq \ 203 | -fq2 ${sample_id}.non.host.R2.fastq 204 | """ 205 | } 206 | 207 | if( !params.amr_index ) { 208 | process BuildAMRIndex { 209 | tag { amr.baseName } 210 | 211 | input: 212 | file(amr) 213 | 214 | output: 215 | file '*' into (amr_index) 216 | 217 | """ 218 | ${BWA} index ${amr} 219 | """ 220 | } 221 | } 222 | 223 | process AlignToAMR { 224 | tag { sample_id } 225 | 226 | publishDir "${params.output}/AlignToAMR", mode: "copy" 227 | 228 | input: 229 | set sample_id, file(forward), file(reverse) from non_host_fastq 230 | file index from amr_index 231 | file amr 232 | 233 | output: 234 | set sample_id, file("${sample_id}.amr.alignment.sam") into (resistome_sam, rarefaction_sam, snp_sam) 235 | 236 | """ 237 | ${BWA} mem ${amr} ${forward} ${reverse} -t ${threads} > ${sample_id}.amr.alignment.sam 238 | """ 239 | } 240 | 241 | process RunResistome { 242 | tag { sample_id } 243 | 244 | publishDir "${params.output}/RunResistome", mode: "copy" 245 | 246 | input: 247 | set sample_id, file(sam) from resistome_sam 248 | file annotation 249 | file amr 250 | 251 | output: 252 | file("${sample_id}.gene.tsv") into (resistome) 253 | 254 | """ 255 | ${RESISTOME} -ref_fp ${amr} \ 256 | -annot_fp ${annotation} \ 257 | -sam_fp ${sam} \ 258 | -gene_fp ${sample_id}.gene.tsv \ 259 | -group_fp ${sample_id}.group.tsv \ 260 | -class_fp ${sample_id}.class.tsv \ 261 | -mech_fp ${sample_id}.mechanism.tsv \ 262 | -t ${threshold} 263 | """ 264 | } 265 | 266 | process RunRarefaction { 267 | tag { sample_id } 268 | 269 | publishDir "${params.output}/RunRarefaction", mode: "copy" 270 | 271 | input: 272 | set sample_id, file(sam) from rarefaction_sam 273 | file annotation 274 | file amr 275 | 276 | output: 277 | set sample_id, file("*.tsv") into (rarefaction) 278 | 279 | """ 280 | ${RAREFACTION} \ 281 | -ref_fp ${amr} \ 282 | -sam_fp ${sam} \ 283 | -annot_fp ${annotation} \ 284 | -gene_fp ${sample_id}.gene.tsv \ 285 | -group_fp ${sample_id}.group.tsv \ 286 | -class_fp ${sample_id}.class.tsv \ 287 | -mech_fp ${sample_id}.mech.tsv \ 288 | -min ${min} \ 289 | -max ${max} \ 290 | -skip ${skip} \ 291 | -samples ${samples} \ 292 | -t ${threshold} 293 | """ 294 | } 295 | 296 | process RunSNPFinder { 297 | tag { sample_id } 298 | 299 | publishDir "${params.output}/RunSNPFinder", mode: "copy" 300 | 301 | input: 302 | set sample_id, file(sam) from snp_sam 303 | file amr 304 | 305 | output: 306 | set sample_id, file("*.tsv") into (snp) 307 | 308 | """ 309 | ${SNPFINDER} \ 310 | -amr_fp ${amr} \ 311 | -sampe ${sam} \ 312 | -out_fp ${sample_id}.tsv 313 | """ 314 | } 315 | 316 | process RunKraken { 317 | tag { sample_id } 318 | 319 | publishDir "${params.output}/RunKraken", mode: "copy" 320 | 321 | input: 322 | set sample_id, file(forward), file(reverse) from non_host_fastq_kraken 323 | 324 | output: 325 | file("${sample_id}.kraken.filtered.report") into 326 | kraken_report 327 | 328 | """ 329 | ${KRAKEN2} --preload --db ${kraken_db} --paired ${forward} ${reverse} --threads ${threads} --report ${sample_id}.kraken.report > ${sample_id}.kraken.raw 330 | ${KRAKEN2} --preload --db ${kraken_db} --confidence 1 --paired ${forward} ${reverse} --threads ${threads} --report ${sample_id}.kraken.filtered.report > ${sample_id}.kraken.raw 331 | """ 332 | } 333 | 334 | resistome.toSortedList().set { amr_l_to_w } 335 | 336 | process AMRLongToWide { 337 | tag { } 338 | 339 | publishDir "${params.output}/AMRLongToWide", mode: "copy" 340 | 341 | input: 342 | file(resistomes) from amr_l_to_w 343 | 344 | output: 345 | file("AMR_analytic_matrix.csv") into amr_master_matrix 346 | 347 | """ 348 | mkdir ret 349 | ${PYTHON3} $baseDir/bin/amr_long_to_wide.py -i ${resistomes} -o ret 350 | mv ret/AMR_analytic_matrix.csv . 351 | """ 352 | } 353 | 354 | kraken_report.toSortedList().set { kraken_l_to_w } 355 | 356 | process KrakenLongToWide { 357 | tag { } 358 | 359 | publishDir "${params.output}/KrakenLongToWide", mode: "copy" 360 | 361 | input: 362 | file(kraken_reports) from kraken_l_to_w 363 | 364 | output: 365 | file("kraken_analytic_matrix.csv") into kraken_master_matrix 366 | 367 | """ 368 | mkdir ret 369 | ${PYTHON3} $baseDir/bin/kraken2_long_to_wide.py -i ${kraken_reports} -o ret 370 | mv ret/kraken_analytic_matrix.csv . 371 | """ 372 | } 373 | 374 | def nextflow_version_error() { 375 | println "" 376 | println "This workflow requires Nextflow version 0.25 or greater -- You are running version $nextflow.version" 377 | println "Run ./nextflow self-update to update Nextflow to the latest available version." 378 | println "" 379 | return 1 380 | } 381 | 382 | def adapter_error(def input) { 383 | println "" 384 | println "[params.adapters] fail to open: '" + input + "' : No such file or directory" 385 | println "" 386 | return 1 387 | } 388 | 389 | def amr_error(def input) { 390 | println "" 391 | println "[params.amr] fail to open: '" + input + "' : No such file or directory" 392 | println "" 393 | return 1 394 | } 395 | 396 | def annotation_error(def input) { 397 | println "" 398 | println "[params.annotation] fail to open: '" + input + "' : No such file or directory" 399 | println "" 400 | return 1 401 | } 402 | 403 | def fastq_error(def input) { 404 | println "" 405 | println "[params.reads] fail to open: '" + input + "' : No such file or directory" 406 | println "" 407 | return 1 408 | } 409 | 410 | def host_error(def input) { 411 | println "" 412 | println "[params.host] fail to open: '" + input + "' : No such file or directory" 413 | println "" 414 | return 1 415 | } 416 | 417 | def index_error(def input) { 418 | println "" 419 | println "[params.host_index] fail to open: '" + input + "' : No such file or directory" 420 | println "" 421 | return 1 422 | } 423 | 424 | def help() { 425 | println "" 426 | println "Program: AmrPlusPlus" 427 | println "Documentation: https://github.com/colostatemeg/amrplusplus/blob/master/README.md" 428 | println "Contact: Christopher Dean " 429 | println "" 430 | println "Usage: nextflow run main.nf [options]" 431 | println "" 432 | println "Input/output options:" 433 | println "" 434 | println " --reads STR path to FASTQ formatted input sequences" 435 | println " --adapters STR path to FASTA formatted adapter sequences" 436 | println " --host STR path to FASTA formatted host genome" 437 | println " --host_index STR path to BWA generated index files" 438 | println " --amr STR path to AMR resistance database" 439 | println " --annotation STR path to AMR annotation file" 440 | println " --output STR directory to write process outputs to" 441 | println " --KRAKENDB STR path to kraken database" 442 | println "" 443 | println "Trimming options:" 444 | println "" 445 | println " --leading INT cut bases off the start of a read, if below a threshold quality" 446 | println " --minlen INT drop the read if it is below a specified length" 447 | println " --slidingwindow INT perform sw trimming, cutting once the average quality within the window falls below a threshold" 448 | println " --trailing INT cut bases off the end of a read, if below a threshold quality" 449 | println "" 450 | println "Algorithm options:" 451 | println "" 452 | println " --threads INT number of threads to use for each process" 453 | println " --threshold INT gene fraction threshold" 454 | println " --min INT starting sample level" 455 | println " --max INT ending sample level" 456 | println " --samples INT number of sampling iterations to perform" 457 | println " --skip INT number of levels to skip" 458 | println "" 459 | println "Help options:" 460 | println "" 461 | println " --help display this message" 462 | println "" 463 | return 1 464 | } 465 | --------------------------------------------------------------------------------