├── LICENSE ├── README.md ├── amr_finder ├── .gitignore ├── README.md ├── amr_report.cwl ├── amrfinder ├── amrfinder-prot.sh ├── blastp.cwl ├── blastp_params.yaml ├── blastx.cwl ├── fasta_check.cwl ├── gff_check.cwl ├── hmmsearch.cwl ├── hmmsearch_params.yaml ├── impl_amrfinder.py ├── test_dna.fa ├── test_dna_fail.fa ├── test_prot.fa ├── test_prot.gff ├── test_prot_fail.fa ├── update_docker_version.sh ├── wf_amr_dna.cwl ├── wf_amr_dna_params.yaml ├── wf_amr_prot.cwl ├── wf_amr_prot_params.yaml └── wf_makeblastdb.cwl └── contam_filter ├── blastn.cwl ├── blastn_params.yaml ├── common_fastas.yaml ├── create_blast_list.cwl ├── create_db_list.cwl ├── decompress.cwl ├── decompress_params.yaml ├── default_fastas.yaml ├── euk_fasta.yaml ├── euk_params.yaml ├── flatten.cwl ├── flatten_params.yaml ├── makeblastdb.cwl ├── makeblastdb_params.yaml ├── pa_test.fa.gz ├── params.yaml ├── prok_fasta.yaml ├── prok_params.yaml ├── typedefs.yaml ├── vecscreen.cwl ├── wf_blastit.cwl ├── wf_blastit_params.yaml ├── wf_contam_detect.cwl ├── wf_contam_detect_params.yaml ├── wf_vecscreen.cwl └── wf_vecscreen_params.cwl /LICENSE: -------------------------------------------------------------------------------- 1 | PUBLIC DOMAIN NOTICE 2 | National Center for Biotechnology Information 3 | 4 | This software/database is a "United States Government Work" under the 5 | terms of the United States Copyright Act. It was written as part of 6 | the author's official duties as a United States Government employee and 7 | thus cannot be copyrighted. This software/database is freely available 8 | to the public for use. The National Library of Medicine and the U.S. 9 | Government have not placed any restriction on its use or reproduction. 10 | 11 | Although all reasonable efforts have been taken to ensure the accuracy 12 | and reliability of the software and data, the NLM and the U.S. 13 | Government do not and cannot warrant the performance or results that 14 | may be obtained by using this software or data. The NLM and the U.S. 15 | Government disclaim all warranties, express or implied, including 16 | warranties of performance, merchantability or fitness for any particular 17 | purpose. 18 | 19 | Please cite the author in any work or product based on this material. 20 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # ncbi-pipelines 2 | A useful set of bioinformatic pipelines written in CWL. 3 | -------------------------------------------------------------------------------- /amr_finder/.gitignore: -------------------------------------------------------------------------------- 1 | data/ 2 | *.pyc 3 | -------------------------------------------------------------------------------- /amr_finder/README.md: -------------------------------------------------------------------------------- 1 | # ~~NCBI Antimicrobial Resistance Gene Finder (alpha)~~ DEPRICATED 2 | # Please see https://github.com/ncbi/amr/wiki for the new easier-to-install version 3 | 4 | ## Overview 5 | 6 | This software and the accompanying database are designed to find acquired 7 | antimicrobial resistance genes in protein or nucleotide sequences. 8 | 9 | ## Mechanism 10 | 11 | AMRFinder can be run in two modes with protein sequence as input or with DNA 12 | sequence as input. When run with protein sequence it uses both BLASTP and HMMER 13 | to search protein sequences for AMR genes along with a hierarchical tree of 14 | gene families to classify and name novel sequences. With nucleotide sequences 15 | it uses BLASTX translated searches and the hierarchical tree of gene families. 16 | 17 | ## Installation 18 | 19 | To run AMRFinder you will need Linux, Docker, and CWL (Common Workflow 20 | Language). We provide instructions here for two installation modes. One 21 | using Docker and the other using the docker emulator uDocker. We recommend the 22 | Docker installation, but installing Docker requires root, so we have made 23 | AMRFinder also compatible with uDocker and included instructions for uDocker 24 | installation as well. AMRFinder runs considerably slower under uDocker than 25 | Docker. 26 | 27 | ### Quick start 28 | 29 | These instructions assume that you have python, pip, and docker installed. We provide more details about how to install these prerequisites below. 30 | 31 | ```shell 32 | ~$ virtualenv --python=python2 ~/cwl 33 | ~$ source ~/cwl/bin/activate 34 | (cwl) ~$ pip install -U wheel setuptools 35 | (cwl) ~$ pip install -U cwltool[deps] PyYAML cwlref-runner 36 | (cwl) ~$ svn co https://github.com/ncbi/pipelines/trunk/amr_finder 37 | (cwl) ~$ cd amr_finder 38 | (cwl) ~/amr_finder$ ./amrfinder -p test_prot.fa 39 | ``` 40 | 41 | ### Installation summary 42 | 43 | The AMR Finder only runs on Linux and depends upon two main pieces of 44 | software, Docker and CWL (Common Workflow Language). 45 | 46 | We briefly show two possible routes to installation, one using docker and one 47 | using uDocker. Docker (http://docker.com) requires root to install, so we have 48 | also made AMRFinder compatible with uDocker 49 | (https://github.com/indigo-dc/udocker). 50 | 51 | Note that uDocker requires python2. AMRFinder itself is compatible with 52 | either Python 2 or 3. 53 | 54 | There are two parts to installing AMRFinder. Installing the code itself and 55 | installing the prerequisites. 56 | 57 | 58 | Prerequisites: 59 | - python2 or python3 (uDocker requires python2) 60 | - docker or uDocker 61 | - subversion 62 | - python packages 63 | - wheel 64 | - setuptools 65 | - PyYAML 66 | - cwlref-runner 67 | - cwltool 68 | 69 | ### Prerequisites 70 | 71 | You will need to install the prerequisites if they're not already installed on 72 | your system. 73 | 74 | e.g., 75 | 76 | The instructions that follow use subversion (svn), pip, docker, and virtualenv. Check you have these installed with: 77 | 78 | ```shell 79 | ~$ svn --version 80 | ~$ pip --version 81 | ~$ virtualenv --version 82 | ~$ docker run hello-world 83 | ``` 84 | If pip is not installed see https://pip.pypa.io/en/stable/installing/ for installation instructions. 85 | 86 | Virtualenv can be easily installed with pip: 87 | 88 | ```shell 89 | ~$ pip install virtualenv 90 | ``` 91 | 92 | To create a virtualenv for your installation of CWL and AMRFinder: 93 | 94 | ```shell 95 | ~$ virtualenv --python=python2 cwl 96 | ``` 97 | (Note that if you're running python2 by default you will skip the '--python=python2') 98 | 99 | ### Installing subversion 100 | 101 | Your sysadmin can help you install subversion using whatever method is preferred on your system. For example with an Ubuntu distribution you could use: 102 | 103 | ```shell 104 | ~$ sudo apt install subversion 105 | ``` 106 | 107 | ### Installing CWL 108 | 109 | ```shell 110 | ~$ source cwl/bin/activate 111 | (cwl) ~$ pip install -U wheel setuptools 112 | (cwl) ~$ pip install -U cwltool[deps] PyYAML cwlref-runner 113 | ``` 114 | 115 | ### Installing Docker 116 | 117 | It is recommended that you install Docker instead of the more limited uDocker. 118 | Detailed instructions may be found on the docker website. [Docker 119 | Install](https://docs.docker.com/install/). Please install the latest version 120 | of docker, it is usually newer than the one that comes with your distribution. 121 | Note that it requires root access to install, and the user who will be running 122 | the software will need to be in the docker group. The required docker 123 | containers images will download automatically the first time the pipeline runs. 124 | Afterwards, they will be cached and subsequent runs will execute much faster. 125 | 126 | As an example of how to install Docker under Ubuntu. This needs to be executed by a user with root access (e.g., your sysadmin). 127 | 128 | ```shell 129 | ~$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - 130 | ~$ sudo apt-key fingerprint 0EBFCD88 131 | ~$ sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu \ 132 | > $(lsb_release -cs) \ 133 | > stable" 134 | ~$ sudo apt-get install docker-ce 135 | ~$ sudo groupadd docker 136 | ~$ sudo usermod -aG docker $USER 137 | ~$ exit 138 | # need to exit and log in again to pick up new group 139 | ``` 140 | 141 | To test your Docker installation try: 142 | 143 | ```{shell} 144 | ~$ docker run hello-world 145 | ``` 146 | You should see a message that starts with: 147 | 148 | > Hello from Docker! 149 | > This message shows that your installation appears to be working correctly. 150 | 151 | ### Installing uDocker 152 | 153 | UDocker compatibility is provided only because some servers may have security 154 | policies that make it difficult to install Docker. We recommend the docker 155 | installation if possible. 156 | 157 | A simple example of how you might install uDocker given the virtualenv we 158 | created above: 159 | 160 | ```shell 161 | ~$ source cwl/bin/activate # to enter the virtualenv 162 | (cwl) ~$ curl https://raw.githubusercontent.com/indigo-dc/udocker/master/udocker.py > cwl/bin/udocker 163 | (cwl) ~$ chmod u+rx cwl/bin/udocker 164 | (cwl) ~$ udocker install 165 | ``` 166 | Your mileage may vary, as the required set of package dependencies will 167 | be different from system to system, depending upon what is already 168 | installed. Note that we support both Python 2 & 3, however, uDocker 169 | only works with Python 2. 170 | 171 | To test your uDocker installation: 172 | 173 | ```{shell} 174 | ~$ docker run hello-world 175 | ``` 176 | You should see a message that starts with: 177 | 178 | > Hello from Docker! 179 | > This message shows that your installation appears to be working correctly. 180 | 181 | ### Retrieving the AMR software 182 | 183 | The AMRFinder software is available at GitHub at https://github.com/ncbi/pipelines/tree/master/amr_finder, 184 | and can be retrieved with svn like: 185 | 186 | ```shell 187 | ~$ svn co https://github.com/ncbi/pipelines/trunk/amr_finder 188 | ``` 189 | 190 | 191 | ### Initial test run 192 | 193 | ```shell 194 | ~$ source ~/cwl/activate # activate the virtualenv 195 | (cwl) ~$ cd amr_finder 196 | (cwl) ~/amr_finder$ ./amrfinder -p test_prot.fa 197 | ``` 198 | 199 | ### Shell script to run virtualenv before running amrfinder. 200 | 201 | Because we used a virtualenv we might want to use a tiny shell script to 202 | invoke amrfinder: 203 | 204 | ```shell 205 | #!/bin/sh 206 | 207 | source $HOME/cwl/bin/activate # activate virtualenv 208 | $HOME/amr_finder/amrfinder $@ 209 | ``` 210 | 211 | ### Testing AMRFinder 212 | 213 | A small set of test data are included with AMRFinder just to make sure things 214 | are working 215 | 216 | ```shell 217 | ~$ source cwl/bin/activate # to enter the virtualenv 218 | (cwl) ~$ cd amr_finder 219 | (cwl) ~$ ./amrfinder -p test_prot.fa -g test_prot.gff 220 | ``` 221 | You should see something like: 222 | 223 | Target identifier Contig id Start Stop Strand Gene symbol Protein name Method Target lengthReference protein length % Coverage of reference protein % Identity to reference protein Alignment length Accession of closest protein Name of closest protein HMM id HMM description 224 | blaOXA-436_partial contig1 4001 4699 + blaOXA OXA-48 family class D beta-lactamase PARTIAL 233 265 87.92 100.00 233 WP_058842180.1 OXA-48 family carbapenem-hydrolyzing class D beta-lactamase OXA-436 NF000387.2 OXA-48 family class D beta-lactamase 225 | blaPDC-114_blast contig1 2001 3191 + blaPDC PDC family class C beta-lactamase BLAST 397 397 100.00 99.75 397 WP_061189306.1 class C beta-lactamase PDC-114 NF000422.2 PDC family class C beta-lactamase 226 | blaTEM-156 contig1 1 858 + blaTEM-156 class A beta-lactamase TEM-156 ALLELE 286 286 100.00 100.00 286 WP_061158039.1 class A beta-lactamase TEM-156 NF000531.2 TEM family class A beta-lactamase 227 | vanG contig1 5001 6047 + vanG D-alanine--D-serine ligase VanG EXACT 349 349 100.00 100.00 349 WP_063856695.1 D-alanine--D-serine ligase VanG NF000091.3 D-alanine--D-serine ligase VanG 228 | 229 | 230 | ```shell 231 | ~$ ./amrfinder -n test_dna.fa 232 | ``` 233 | 234 | You should see something like: 235 | 236 | Target identifier Contig id Start Stop Strand Gene symbol Protein name Method Target lengthReference protein length % Coverage of reference protein % Identity to reference protein Alignment length Accession of closest protein Name of closest protein HMM id HMM description 237 | blaOXA-436_partial_cds blaOXA-436_partial_cds 101 802 + blaOXA OXA-48 family class D beta-lactamasePARTIAL 234 265 88.30 100.00 234 WP_058842180.1 OXA-48 family carbapenem-hydrolyzing class D beta-lactamase OXA-436 NF000387.2 OXA-48 family class D beta-lactamase 238 | blaPDC-114_blast blaPDC-114_blast 1 1191 + blaPDC PDC family class C beta-lactamase BLAST 397 397 100.00 99.75 397 WP_061189306.1 class C beta-lactamase PDC-114 NF000422.2 PDC family class C beta-lactamase 239 | blaTEM-156 blaTEM-156 101 958 + blaTEM-156 class A beta-lactamase TEM-156 ALLELE 286 286 100.00 100.00 286 WP_061158039.1 class A beta-lactamase TEM-156 NF000531.2 TEM family class A beta-lactamase 240 | vanG vanG 101 1147 + vanG D-alanine--D-serine ligase VanG EXACT 349 349 100.00 100.00 349 WP_063856695.1 D-alanine--D-serine ligase VanG NF000091.3 D-alanine--D-serine ligase VanG 241 | 242 | ## Running AMRFinder 243 | 244 | ### Typical options 245 | 246 | The only required arguments are either 247 | `-p ` for proteins or `-n ` for nucleotides. 248 | We also provide an automatic update mechanism to update the code and database 249 | by using `-u`. This will update to the latest AMR database, as well as any code 250 | changes in AMRFinder. Use '--help' to see the complete set of options and 251 | flags. 252 | 253 | ### Input file formats 254 | 255 | `-p ` and `-n `: 256 | FASTA files are in standard format. The identifiers reported in the output are 257 | the first non-whitespace characters on the defline. 258 | 259 | `-g ` 260 | GFF files are used to get sequence coordinates for AMRFinder hits from protein 261 | sequence. The identifier from the identifier from the FASTA file is matched up 262 | with the 'Name=' attribute from field 9 in the GFF file. See test_prot.gff for 263 | a simple example. (e.g., `amrfinder -p test_prot.fa -g test_prot.gff` should 264 | result in the sample output shown below) 265 | 266 | ### Output format 267 | 268 | AMRFinder output is in tab-delimited format (.tsv). The output format depends 269 | on the options `-p`, `-n`, and `-g`. Protein searches with gff files (`-p 270 | -g ` and translated dna searches (`-n `) will also 271 | include contig, start, and stop columns. 272 | 273 | A sample AMRFinder report: 274 | 275 | Target identifier Contig id Start Stop Strand Gene symbol Protein name Method Target length Reference protein length % Coverage of reference protein % Identity to reference protein Alignment length Accession of closest protein Name of closest protein HMM id HMM description 276 | blaOXA-436_partial contig1 4001 4699 + blaOXA OXA-48 family class D beta-lactamase PARTIAL 233 265 87.92 100.00 233 WP_058842180.1 OXA-48 family carbapenem-hydrolyzing class D beta-lactamase OXA-436 NF000387.2 OXA-48 family class D beta-lactamase 277 | blaPDC-114_blast contig1 2001 3191 + blaPDC PDC family class C beta-lactamase BLAST 397 397 100.00 99.75 397 WP_061189306.1 class C beta-lactamase PDC-114 NF000422.2 PDC family class C beta-lactamase 278 | blaTEM-156 contig1 1 858 + blaTEM-156 class A beta-lactamase TEM-156 ALLELE 286 286 100.00 100.00 286 WP_061158039.1 class A beta-lactamase TEM-156 NF000531.2 TEM family class A beta-lactamase 279 | nimIJ_hmm contig1 1001 1495 + nimIJ NimIJ family nitroimidazole resistance protein HMM 165 NA NA NA NA NA NA NF000262.1 NimIJ family nitroimidazole resistance protein 280 | vanG contig1 5001 6047 + vanG D-alanine--D-serine ligase VanG EXACT 349 349 100.00 100.00 349 WP_063856695.1 D-alanine--D-serine ligase VanG NF000091.3 D-alanine--D-serine ligase VanG 281 | 282 | 283 | Fields: 284 | 285 | - Target Identifier - This is from the FASTA defline for the protein or DNA sequence 286 | - Contig id - (optional) Contig name 287 | - Start - (optional) 1-based coordinate of first nucleotide coding for protein in DNA sequence on contig 288 | - Stop - (optional) 1-based corrdinate of last nucleotide coding for protein in DNA sequence on contig 289 | - Gene symbol - Gene or gene-family symbol for protein hit 290 | - Protein name - Full-text name for the protein 291 | - Method - Type of hit found by AMRFinder one of five options 292 | - ALLELE - 100% sequence match over 100% of length to a protein named at the allele level in the AMRFinder database 293 | - EXACT - 100% sequence match over 100% of length to a protein in the database that is not a named allele 294 | - BLAST - BLAST alignment is > 90% of length and > 90% identity to a protein in the AMRFinder database 295 | - PARTIAL - BLAST alignment is > 50% of length, but < 90% of length and > 90% identity 296 | - HMM - HMM was hit above the cutoff, but there was not a BLAST hit that met standards for BLAST or PARTIAL 297 | - Target length - The length of the query protein. The length of the blast hit for translated-DNA searches 298 | - Reference protein length - The length of the AMR Protein in the database (NA if HMM-only hit) 299 | - % Coverage of reference protein - % covered by blast hit (NA if HMM-only hit) 300 | - % Identity to reference protein - % amino-acid identity to reference protein (NA if HMM-only hit) 301 | - Alignment length - Length of BLAST alignment in amino-acids (NA if HMM-only hit) 302 | - Accession of closest protein - RefSeq accession for protin hit by BLAST (NA if HMM-only hit) 303 | - Name of closest protein - Full name assigned to the AMRFinder database protein (NA if HMM-only hit) 304 | - HMM id - Accession for the HMM 305 | - HMM description - The family name associated with the HMM 306 | 307 | ## Known Issues 308 | 309 | Handling of fusion genes is still under active development. Currently they are 310 | reported as two lines, one for each portion of the fusion. Gene symbol, Protein 311 | name, Name of closest protein, HMM id, and HMM description are with respect to 312 | the individual elements of the fusion. This behavior is subject to change. 313 | 314 | File format checking of input files is almost nonexistant. Software behavior 315 | with incorrect input files is not defined. 316 | 317 | If you find bugs not listed here or have other questions/comments please email 318 | us at pd-help@ncbi.nlm.nih.gov. 319 | 320 | ## Methods 321 | 322 | ### Protein searches (-p) 323 | 324 | AMRFinder-prot uses the database of AMR gene sequences, hidden Markov models 325 | (HMMs), the hierarchical tree of AMR protein designations, and a custom 326 | rule-set to generate names and coordinates for AMR genes, along with 327 | descriptions of the evidence used to identify the sequence. Genes are reported 328 | with the following procedure after both HMMER and BLASTP searches are run. 329 | 330 | #### BLASTP matches 331 | 332 | BLASTP is run with the -task blastp-fast -word_size 6 -threshold 21 -evalue 333 | 1e-20 -comp_based_stats 0 options against the AMR gene database described 334 | above. Exact BLAST matches over the full length of the reference protein are 335 | reported. If there is no exact match, then the following rules are applied: 336 | Matches with < 90% identity or with < 50% coverage of the protein are dropped. 337 | If the hit is to a fusion protein than at least 90% of the protein must be 338 | covered. A BLAST match to a reference protein is removed if it is covered by 339 | another BLAST match which has more identical residues or the same number of 340 | identical residues, but to a longer reference protein. A single match is chosen 341 | as the best of what remains sorting by the following criteria in order (1) if 342 | it is exact; (2) has more identical residues; (3) hits a shorter protein; or 343 | (4) the gene symbol comes first in alphabetical order. 344 | 345 | #### HMM matches 346 | 347 | HMMER version 3.1b2 (http://hmmer.org/) is run using the --cut_tc -Z 10000 348 | options with the HMM database described above. HMM matches with full_score 349 | < TC1 or domain_score < TC2 are dropped. All HMM matches to HMMs for parent 350 | nodes of other HMM matches in the hierarchy are removed. The match(es) with the 351 | highest full score are kept. If there is an exact BLAST match or the family of 352 | the BLAST match reference protein is descendant of the family of the HMM then 353 | the information for the nearest HMM node to the BLAST match are returned. 354 | 355 | ### Translated DNA searches (-n) 356 | 357 | Translated alignments using BLASTX of the assembly against the AMR protein 358 | database can be used to help identify partial, split, or unannotated AMR 359 | proteins using the -task tblastn-fast -word_size 3 -evalue 1e-20 -seg no 360 | -comp_based_stats 0 options. The algorithm for selecting hits is as described 361 | above for proteins, but note that HMM searches are not performed against the 362 | unannotated assembly. 363 | 364 | ## Help 365 | 366 | If you have questions about AMRFinder that aren't answered in this document you 367 | can email us at pd-help@ncbi.nlm.nih.gov 368 | 369 | ## License 370 | 371 | ### HMMER 372 | 373 | This distribution includes HMMER (c) Sean Eddy and the Howard Hughes Medical 374 | Institue and licensed under the GNU General Public License version 3 (GPLv3) 375 | (https://www.gnu.org/licenses/) 376 | 377 | See http://hmmer.org for details. 378 | 379 | ### PUBLIC DOMAIN NOTICE 380 | 381 | This software/database is "United States Government Work" under the terms of 382 | the United States Copyright Act. It was written as part of the authors' 383 | official duties for the United States Government and thus cannot be 384 | copyrighted. This software/database is freely available to the public for use 385 | without a copyright notice. Restrictions cannot be placed on its present or 386 | future use. 387 | 388 | Although all reasonable efforts have been taken to ensure the accuracy and 389 | reliability of the software and data, the National Center for Biotechnology 390 | Information (NCBI) and the U.S. Government do not and cannot warrant the 391 | performance or results that may be obtained by using this software or data. 392 | NCBI, NLM, and the U.S. Government disclaim all warranties as to performance, 393 | merchantability or fitness for any particular purpose. 394 | 395 | In any work or product derived from this material, proper attribution of the 396 | authors as the source of the software or data should be made, using: 397 | https://ncbi.nlm.nih.gov/pathogens/antimicrobial-resistance/AMRFinder/ as the 398 | citation. 399 | 400 | 401 | 402 | 403 | -------------------------------------------------------------------------------- /amr_finder/amr_report.cwl: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env cwl-runner 2 | cwlVersion: v1.0 3 | class: CommandLineTool 4 | hints: 5 | DockerRequirement: 6 | dockerPull: ncbi/amr:18.06 7 | 8 | baseCommand: amr_report 9 | stdout: output.txt 10 | inputs: 11 | fam: 12 | type: File 13 | inputBinding: 14 | prefix: -fam 15 | blastp: 16 | type: File? 17 | inputBinding: 18 | prefix: -blastp 19 | blastx: 20 | type: File? 21 | inputBinding: 22 | prefix: -blastx 23 | hmmdom: 24 | type: File? 25 | inputBinding: 26 | prefix: -hmmdom 27 | hmmsearch: 28 | type: File? 29 | inputBinding: 30 | prefix: -hmmsearch 31 | gff: 32 | type: File? 33 | inputBinding: 34 | prefix: -gff 35 | outfile: 36 | type: string? 37 | #default: "results.sseqid" 38 | inputBinding: 39 | prefix: -out 40 | ident_min: 41 | type: float? 42 | inputBinding: 43 | prefix: -ident_min 44 | complete_cover_min: 45 | type: float? 46 | inputBinding: 47 | prefix: -complete_cover_min 48 | partial_cover_min: 49 | type: float? 50 | inputBinding: 51 | prefix: -partial_cover_min 52 | pseudo: 53 | type: boolean? 54 | default: true 55 | inputBinding: 56 | prefix: -pseudo 57 | qc: 58 | type: boolean? 59 | default: false 60 | inputBinding: 61 | prefix: -qc 62 | verbose: 63 | type: int? 64 | default: 0 65 | inputBinding: 66 | prefix: -verbose 67 | 68 | outputs: 69 | output: 70 | type: stdout 71 | -------------------------------------------------------------------------------- /amr_finder/amrfinder: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | from __future__ import print_function 3 | import argparse 4 | import importlib 5 | import os 6 | import subprocess 7 | import sys 8 | 9 | import impl_amrfinder 10 | 11 | # N.B. This code represents the minimum functionallity required to 12 | # download and update the latest version of the code. As such 13 | # resist updating this code as much as possible. All changes 14 | # should be made in impl_armfinder.py 15 | 16 | def main(): 17 | parse = argparse.ArgumentParser(add_help=False) 18 | #parse.add_argument('-x', '--check-update', help='Check for any updates to this pipeline (default: %(default)s)', action='store_true') 19 | parse.add_argument('-v', '--version', action='store_true', help='Print current version information, and checks for latest version.') 20 | parse.add_argument('-u', '--update', action='store_true', help='Update this code and supplemental data, then quit. (default: %(default)s)') 21 | parse.add_argument('-U', '--update-data', action='store_true', help='Update auxillary data from the ftp site, then quit. (default: %(default)s)') 22 | 23 | args, remaining_argv = parse.parse_known_args() 24 | 25 | script_path = os.path.dirname(os.path.realpath(__file__)) 26 | if args.version: 27 | impl_amrfinder.print_versions(script_path) 28 | sys.exit(0) 29 | elif args.update: 30 | print("Checking for update...", file=sys.stderr, end='') 31 | try: 32 | out = open(os.devnull, "wb") 33 | svn = subprocess.check_call(["svn", "update", script_path], stdout=out, stderr=out) 34 | print("success!", file=sys.stderr) 35 | except subprocess.CalledProcessError: 36 | print("failure.", file=sys.stderr) 37 | print("Updating supplementary data") 38 | impl_amrfinder.update_data() 39 | #importlib.reload(impl_amrfinder) 40 | elif args.update_data: 41 | print("Updating supplementary data") 42 | impl_amrfinder.update_data() 43 | else: 44 | impl_amrfinder.run(parse) 45 | 46 | sys.exit(0) #success 47 | 48 | if __name__ == "__main__": 49 | main() 50 | -------------------------------------------------------------------------------- /amr_finder/amrfinder-prot.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | #DEBUG=1 3 | DEBUG='' 4 | 5 | # paths 6 | BLASTP=/usr/bin/blastp 7 | HMMER=/opt/hmmer/3.1b2/bin 8 | AMRDir=/panfs/pan1.be-md.ncbi.nlm.nih.gov/bacterial_pathogens/backup/data/amrtests/locked_db/170130 9 | HMMLIB=$AMRDir/AMR.LIB 10 | #FAMREPORT=~brovervv/code/prod/famReport 11 | FAMREPORT=/panfs/pan1.be-md.ncbi.nlm.nih.gov/bacterial_pathogens/backup/packages/amrfinder/famReport 12 | #EXTRACT_FASTA=~brovervv/code/genetics/extractFastaProt 13 | EXTRACT_FASTA=/panfs/pan1.be-md.ncbi.nlm.nih.gov/bacterial_pathogens/backup/packages/amrfinder/extractFastaProt 14 | 15 | function quit { 16 | if [ -e "$TmpFNam" ] 17 | then 18 | [ -n "$DEBUG" ] || rm -f $TmpFNam.* 19 | fi 20 | [ -n "$DEBUG" ] && echo "TmpFNam = $TmpFNam" 21 | exit $exitcode 22 | } 23 | function usage { 24 | echo " 25 | Print an AMR report from protein sequence 26 | Usage: amrfinder-prot.sh [-options] 27 | Options: 28 | -g - for contig/position 29 | -B - no blasts in report (only used for testing) 30 | -E - no exact/allele matches (only used for testing) 31 | -f - FASTA output file of reported AMR proteins 32 | -p - use -parse_deflines for blast (sometimes fixes problems caused 33 | - by defline formats in the input file) 34 | -d - Use alternate AMR database directory 35 | - Default is $AMRDir 36 | " 37 | exitcode=1 38 | quit 39 | } 40 | 41 | BLASTS='' 42 | EXACT='' 43 | PARSE_DEFLINES='' 44 | GFF='' 45 | 46 | if [ $# -eq 0 ] 47 | then 48 | usage 49 | fi 50 | 51 | command_line="$0 $@" 52 | 53 | while getopts ":g:BSf:phd:" opt; do 54 | case $opt in 55 | g) 56 | echo "option -g $OPTARG" >&2 57 | GFF="$OPTARG" 58 | ;; 59 | B) 60 | echo "option -B" >&2 61 | BLASTS='-noblast' 62 | ;; 63 | E) 64 | echo "option -E" >&2 65 | EXACT='-nosame' 66 | ;; 67 | f) 68 | echo "option -f $OPTARG" >&2 69 | output_file="$OPTARG" 70 | ;; 71 | p) 72 | echo "option -p" >&2 73 | PARSE_DEFLINES='-parse_deflines' 74 | ;; 75 | d) 76 | echo "option -d $OPTARG" >&2 77 | AMRDir="$OPTARG" 78 | ;; 79 | h) 80 | usage 81 | ;; 82 | \?) 83 | echo "Invalid option: -$OPTARG" >&2 84 | usage 85 | ;; 86 | :) 87 | echo "option -$OPTARG requires a filename." >&2 88 | usage 89 | ;; 90 | esac 91 | done 92 | 93 | shift $((OPTIND-1)) 94 | 95 | PROTEIN_FILE=$1 96 | shift 1 97 | 98 | if [ "X$2" != "X" ] 99 | then 100 | echo "Unknown options: $@" >&2 101 | usage 102 | fi 103 | 104 | 105 | # now run amrfinder 106 | TmpFNam=`mktemp` 107 | exitcode=1 108 | 109 | 110 | ### BLAST 111 | $BLASTP -task blastp-fast -db $AMRDir/AMRProt -query $PROTEIN_FILE -show_gis -word_size 6 -threshold 21 -evalue 1e-20 -comp_based_stats 0 $PARSE_DEFLINES -outfmt '6 qseqid sseqid length nident qstart qend qlen sstart send slen qseq' > $TmpFNam.blastp 2> $TmpFNam.err \ 112 | || { 113 | echo "Error running BLAST: " >&2 114 | cat $TmpFNam.err >&2 115 | quit 116 | } & 117 | 118 | 119 | ### HMMER 120 | $HMMER/hmmsearch --tblout $TmpFNam.hmmsearch --noali --domtblout $TmpFNam.dom --cut_tc -Z 10000 $HMMLIB $PROTEIN_FILE 2>&1 > $TmpFNam.out \ 121 | || { 122 | echo "" 123 | cat $TmpFNam.hmmsearch 124 | echo "" 125 | cat $TmpFNam.dom 126 | quit 127 | } 128 | 129 | wait 130 | 131 | ### famReport 132 | 133 | $FAMREPORT -fam $AMRDir/fam.tab -aa -in $TmpFNam.blastp -gff "$GFF" -hmmsearch $TmpFNam.hmmsearch -hmmdom $TmpFNam.dom -out $TmpFNam.sseqid -verbose 0 $BLASTS $EXACT 134 | 135 | if [ ! "$?" -eq 0 ] 136 | then 137 | quit 138 | fi 139 | 140 | ### extractFastaProt 141 | if [ "X$output_file" != "X" ] 142 | then 143 | $EXTRACT_FASTA -in $1 -target $TmpFNam.sseqid > $output_file 144 | if ($?) 145 | then 146 | quit 147 | fi 148 | fi 149 | 150 | exitcode=0 151 | 152 | quit 153 | -------------------------------------------------------------------------------- /amr_finder/blastp.cwl: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env cwl-runner 2 | cwlVersion: v1.0 3 | class: CommandLineTool 4 | hints: 5 | DockerRequirement: 6 | dockerPull: ncbi/amr:18.06 7 | 8 | baseCommand: blastp 9 | stdout: blastp.out 10 | inputs: 11 | query: 12 | type: File 13 | inputBinding: 14 | prefix: -query 15 | db: 16 | type: Directory 17 | inputBinding: 18 | prefix: -db 19 | valueFrom: $(self.path)/$(self.basename) 20 | task: 21 | type: string? 22 | default: blastp-fast 23 | inputBinding: 24 | prefix: -task 25 | outfmt: 26 | type: string? 27 | default: "6 qseqid sseqid length nident qstart qend qlen sstart send slen qseq" 28 | inputBinding: 29 | prefix: -outfmt 30 | show_gis: 31 | type: boolean 32 | default: true 33 | inputBinding: 34 | prefix: -show_gis 35 | word_size: 36 | type: int? 37 | default: 6 38 | inputBinding: 39 | prefix: -word_size 40 | threshold: 41 | type: int? 42 | default: 21 43 | inputBinding: 44 | prefix: -threshold 45 | evalue: 46 | type: double? 47 | default: 1e-20 48 | inputBinding: 49 | prefix: -evalue 50 | comp_based_stats: 51 | type: int? 52 | default: 0 53 | inputBinding: 54 | prefix: -comp_based_stats 55 | num_threads: 56 | type: int? 57 | inputBinding: 58 | prefix: -num_threads 59 | parse_deflines: 60 | type: boolean? 61 | default: true 62 | inputBinding: 63 | prefix: -parse_deflines 64 | 65 | outputs: 66 | - id: output 67 | type: File 68 | outputBinding: 69 | glob: "blastp.out" 70 | -------------------------------------------------------------------------------- /amr_finder/blastp_params.yaml: -------------------------------------------------------------------------------- 1 | query: 2 | class: File 3 | location: test_prot.fa 4 | db: 5 | class: Directory 6 | location: AMRProt 7 | parse_deflines: true 8 | -------------------------------------------------------------------------------- /amr_finder/blastx.cwl: -------------------------------------------------------------------------------- 1 | cwlVersion: v1.0 2 | class: CommandLineTool 3 | hints: 4 | DockerRequirement: 5 | dockerPull: ncbi/amr:18.06 6 | 7 | baseCommand: blastx 8 | #stdout: $(inputs.db).out 9 | stdout: blastx.out 10 | inputs: 11 | query: 12 | type: File 13 | inputBinding: 14 | prefix: -query 15 | db: 16 | type: Directory 17 | inputBinding: 18 | prefix: -db 19 | valueFrom: $(self.path)/$(self.basename) 20 | query_gencode: 21 | type: int? 22 | default: 11 23 | inputBinding: 24 | prefix: -query_gencode 25 | outfmt: 26 | type: string? 27 | default: "6 qseqid sseqid length nident qstart qend qlen sstart send slen qseq" 28 | inputBinding: 29 | prefix: -outfmt 30 | show_gis: 31 | type: boolean 32 | default: true 33 | inputBinding: 34 | prefix: -show_gis 35 | word_size: 36 | type: int? 37 | default: 3 38 | inputBinding: 39 | prefix: -word_size 40 | evalue: 41 | type: double? 42 | default: 1e-20 43 | inputBinding: 44 | prefix: -evalue 45 | comp_based_stats: 46 | type: int? 47 | default: 0 48 | inputBinding: 49 | prefix: -comp_based_stats 50 | max_target_seqs: 51 | type: int? 52 | default: 10000 53 | inputBinding: 54 | prefix: -max_target_seqs 55 | num_threads: 56 | type: int? 57 | inputBinding: 58 | prefix: -num_threads 59 | parse_deflines: 60 | type: boolean? 61 | default: true 62 | inputBinding: 63 | prefix: -parse_deflines 64 | seg: 65 | type: string? 66 | default: no 67 | inputBinding: 68 | prefix: -seg 69 | 70 | outputs: 71 | - id: output 72 | type: File 73 | outputBinding: 74 | glob: "blastx.out" 75 | -------------------------------------------------------------------------------- /amr_finder/fasta_check.cwl: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env cwl-runner 2 | cwlVersion: v1.0 3 | class: CommandLineTool 4 | hints: 5 | DockerRequirement: 6 | dockerPull: ncbi/amr:18.06 7 | 8 | # Check the correctness of a FASTA file. Exit with an error if it is incorrect. 9 | # Usage: fasta_check [-qc] [-verbose 0] [-noprogress] [-profile] [-json ""] [-log ""] [-aa] [-hyphen] 10 | # Help: fasta_check -help|-h 11 | # Parameters: 12 | # : FASTA file 13 | # [-qc]: Integrity checks (quality control) 14 | # [-verbose 0]: Level of verbosity 15 | # [-noprogress]: Turn off progress printout 16 | # [-profile]: Use chronometers to profile 17 | # [-json ""]: Output file in Json format 18 | # [-log ""]: Error log file, appended 19 | # [-aa]: Amino acid sequenes, otherwise nucleotide 20 | # [-hyphen]: Hyphens are allowed 21 | 22 | baseCommand: fasta_check 23 | stdout: fasta_check.out 24 | inputs: 25 | fasta: 26 | type: File 27 | inputBinding: 28 | position: 1 29 | qc: 30 | type: string? 31 | inputBinding: 32 | prefix: -qc 33 | verbose: 34 | type: int? 35 | inputBinding: 36 | prefix: -verbose 37 | noprogress: 38 | type: boolean? 39 | inputBinding: 40 | prefix: -noprogress 41 | profile: 42 | type: boolean? 43 | inputBinding: 44 | prefix: -profile 45 | json: 46 | type: string? 47 | inputBinding: 48 | prefix: -json 49 | log: 50 | type: string? 51 | inputBinding: 52 | prefix: -log 53 | aa: 54 | type: boolean? 55 | inputBinding: 56 | prefix: -aa 57 | hyphen: 58 | type: boolean? 59 | inputBinding: 60 | prefix: -hyphen 61 | 62 | outputs: 63 | - id: output 64 | type: File 65 | outputBinding: 66 | glob: "fasta_check.out" 67 | -------------------------------------------------------------------------------- /amr_finder/gff_check.cwl: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env cwl-runner 2 | cwlVersion: v1.0 3 | class: CommandLineTool 4 | hints: 5 | DockerRequirement: 6 | dockerPull: ncbi/amr:18.06 7 | 8 | # Check the correctness of a .gff-file. Exit with an error if it is incorrect. 9 | # Usage: gff_check [-qc] [-verbose 0] [-noprogress] [-profile] [-threads 1] [-json ""] [-log ""] [-fasta ""] [-locus_tag ""] 10 | # Help: gff_check -help|-h 11 | # Parameters: 12 | # : .gff-file, if an empty string then exit 0 13 | # [-qc]: Integrity checks (quality control) 14 | # [-verbose 0]: Level of verbosity 15 | # [-noprogress]: Turn off progress printout 16 | # [-profile]: Use chronometers to profile 17 | # [-threads 1]: Max. number of threads 18 | # [-json ""]: Output file in Json format 19 | # [-log ""]: Error log file, appended 20 | # [-fasta ""]: Protein FASTA file 21 | # [-locus_tag ""]: File with matches: " ", where is from "[locus_tag=]" in the FASTA comment and from the .gff-file 22 | 23 | baseCommand: gff_check 24 | stdout: gff_check.out 25 | inputs: 26 | gff: 27 | type: File? 28 | default: 29 | class: File 30 | basename: "emptystring" 31 | contents: "" 32 | inputBinding: 33 | position: 1 34 | gff_file: 35 | type: File? 36 | qc: 37 | type: string? 38 | inputBinding: 39 | prefix: -qc 40 | position: 2 41 | verbose: 42 | type: int? 43 | inputBinding: 44 | prefix: -verbose 45 | position: 3 46 | noprogress: 47 | type: boolean? 48 | inputBinding: 49 | prefix: -noprogress 50 | position: 4 51 | profile: 52 | type: boolean? 53 | inputBinding: 54 | prefix: -profile 55 | position: 5 56 | threads: 57 | type: int? 58 | inputBinding: 59 | prefix: -threads 60 | position: 6 61 | json: 62 | type: string? 63 | inputBinding: 64 | prefix: -json 65 | position: 7 66 | log: 67 | type: string? 68 | inputBinding: 69 | prefix: -log 70 | position: 8 71 | fasta: 72 | type: File? 73 | default: "" 74 | inputBinding: 75 | prefix: -fasta 76 | position: 9 77 | locus_tag: 78 | type: File? 79 | inputBinding: 80 | prefix: -locus_tag 81 | position: 10 82 | 83 | outputs: 84 | - id: output 85 | type: File 86 | outputBinding: 87 | glob: "gff_check.out" 88 | -------------------------------------------------------------------------------- /amr_finder/hmmsearch.cwl: -------------------------------------------------------------------------------- 1 | cwlVersion: v1.0 2 | class: CommandLineTool 3 | hints: 4 | DockerRequirement: 5 | dockerPull: ncbi/amr:18.06 6 | 7 | baseCommand: hmmsearch 8 | inputs: 9 | tblout: 10 | type: string? 11 | default: hmmsearch.out 12 | inputBinding: 13 | position: 1 14 | prefix: --tblout 15 | noali: 16 | type: boolean? 17 | default: true 18 | inputBinding: 19 | position: 2 20 | prefix: --noali 21 | domtblout: 22 | type: string? 23 | default: domtbl.out 24 | inputBinding: 25 | position: 3 26 | prefix: --domtblout 27 | cut_tc: 28 | type: boolean? 29 | default: true 30 | inputBinding: 31 | position: 4 32 | prefix: --cut_tc 33 | cpu: 34 | type: int? 35 | inputBinding: 36 | position: 5 37 | prefix: --cpu 38 | Z: 39 | type: int? 40 | default: 10000 41 | inputBinding: 42 | position: 6 43 | prefix: -Z 44 | db: 45 | type: File 46 | inputBinding: 47 | position: 7 48 | query: 49 | type: File 50 | inputBinding: 51 | position: 8 52 | fasta_check_dummy: 53 | type: File? 54 | gff_check_dummy: 55 | type: File? 56 | 57 | 58 | outputs: 59 | - id: hmmsearch_out 60 | type: File 61 | outputBinding: 62 | glob: "hmmsearch.out" 63 | - id: hmmdom_out 64 | type: File 65 | outputBinding: 66 | glob: "domtbl.out" 67 | -------------------------------------------------------------------------------- /amr_finder/hmmsearch_params.yaml: -------------------------------------------------------------------------------- 1 | query: 2 | class: File 3 | location: test.fa 4 | 5 | -------------------------------------------------------------------------------- /amr_finder/impl_amrfinder.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | import argparse 3 | import errno 4 | import os 5 | #import re 6 | import subprocess 7 | import sys 8 | import tempfile 9 | import yaml 10 | from distutils import spawn 11 | from ftplib import FTP 12 | 13 | def print_versions(spath): 14 | os.chdir(os.path.dirname(os.path.realpath(__file__))) 15 | revision = "Not available" 16 | latest = "Not available" 17 | container = "Not available" 18 | supp_data = "Not available" 19 | try: 20 | err = open(os.devnull, "wb") 21 | r = subprocess.check_output("set -o pipefail; svn info | grep ^Revision | cut -d' ' -f2", shell=True, stderr=err) 22 | revision = r.decode('UTF-8').strip() 23 | r = subprocess.check_output("set -o pipefail; svn info | grep ^URL | cut -d' ' -f2", shell=True, stderr=err) 24 | url = r.decode('UTF-8').strip() 25 | r = subprocess.check_output("set -o pipefail; svn info {} | grep ^Revision | cut -d' ' -f2".format(url), shell=True, stderr=err) 26 | latest = r.decode('UTF-8').strip() 27 | except subprocess.CalledProcessError: 28 | revision = "Not available" 29 | latest = "Not available" 30 | 31 | try: 32 | r = subprocess.check_output("grep -hPo '(?<=dockerPull: )(.*)(?=$)' *.cwl | sort -u | awk '{printf(\" %s\\n\", $1)}'", shell=True) 33 | container = r.decode('UTF-8').strip() 34 | except subprocess.CalledProcessError: 35 | container = "Not available" 36 | if check_data(): 37 | pre = os.path.dirname(os.path.realpath(__file__)) + "/data/latest/" 38 | target = os.path.realpath(pre) 39 | base = os.path.basename(os.path.normpath(target)) 40 | supp_data = base 41 | 42 | print("Current CWL revision:", revision) 43 | print(" Latest CWL revision:", latest) 44 | print(" Docker container:", container) 45 | print(" AMRFinder Database:", supp_data) 46 | 47 | 48 | def mkdir_p(path): 49 | try: 50 | os.makedirs(path) 51 | except OSError as exc: # Python >2.5 52 | if exc.errno == errno.EEXIST and os.path.isdir(path): 53 | pass 54 | else: 55 | raise 56 | 57 | def available_cpu_count(): 58 | try: 59 | import multiprocessing 60 | return multiprocessing.cpu_count() 61 | except (ImportError, NotImplementedError): 62 | pass 63 | 64 | return 1 65 | 66 | def check_data(files = [ 'AMR.LIB', 'AMRProt', 'fam.tab' ]): 67 | pre = os.path.dirname(os.path.realpath(__file__)) + "/data/latest/" 68 | if not os.path.isdir(pre): 69 | return False 70 | for base in files: 71 | f = pre + base 72 | if not os.path.isfile(f): 73 | return False 74 | return True 75 | 76 | def check_fasta(f, aa=False): 77 | try: 78 | err = open(os.devnull, "wb") 79 | r = subprocess.check_output("set -o pipefail; svn info | grep ^Revision | cut -d' ' -f2", shell=True, stderr=err) 80 | revision = r.strip() 81 | r = subprocess.check_output("set -o pipefail; svn info | grep ^URL | cut -d' ' -f2", shell=True, stderr=err) 82 | url = r.strip() 83 | r = subprocess.check_output("set -o pipefail; svn info {} | grep ^Revision | cut -d' ' -f2".format(url), shell=True, stderr=err) 84 | latest = r.strip() 85 | except subprocess.CalledProcessError: 86 | pass 87 | 88 | def get_latest(ls): 89 | # Example results to be parsed 90 | #'modify=20180330183106;perm=fle;size=4096;type=dir;unique=28UFC8F9;UNIX.group=562;UNIX.mode=0444;UNIX.owner=14; 2018-03-30.1' 91 | #'modify=20180410155844;perm=adfr;size=12;type=OS.unix=symlink;unique=28UFC8FE;UNIX.group=562;UNIX.mode=0444;UNIX.owner=14; latest' 92 | #'modify=20180409170726;perm=fle;size=4096;type=dir;unique=28UFC8FE;UNIX.group=562;UNIX.mode=0444;UNIX.owner=14; 2018-04-09.1' 93 | dirs = {} 94 | latest = '' 95 | for f in ls: 96 | fields = f.split(';') 97 | typ = fields[3].split('=', 1)[1] 98 | if typ == 'dir': 99 | u = fields[4].split('=', 1)[1] 100 | n = fields[8].strip() 101 | dirs[u] = n 102 | elif typ == 'OS.unix=symlink': 103 | n = fields[8].strip() 104 | if n == 'latest': 105 | latest_uniq = fields[4].split('=', 1)[1] 106 | 107 | return dirs[latest_uniq] 108 | 109 | def update_data(): 110 | prevdir = os.getcwd() 111 | pre = os.path.dirname(os.path.realpath(__file__)) + "/data/" 112 | mkdir_p(pre) 113 | os.chdir(pre) 114 | ftp = FTP('ftp.ncbi.nlm.nih.gov') # connect to host, default port 115 | ftp.login() # user anonymous, passwd anonymous@ 116 | ftp.cwd('/pathogen/Antimicrobial_resistance/AMRFinder/data') # change into "debian" directory 117 | ls = [] 118 | ftp.retrlines('MLSD', ls.append) 119 | latest = get_latest(ls) 120 | #print("Latest = {}".format(latest)) 121 | mkdir_p(latest) 122 | os.chdir(latest) 123 | ftp.cwd(latest) 124 | files = [] 125 | ftp.retrlines('NLST', files.append) 126 | for f in files: 127 | print(" Fetching {}...".format(f), end='') 128 | ftp.retrbinary('RETR {}'.format(f), open(f, 'wb').write) 129 | print("success!") 130 | 131 | os.chdir("..") 132 | if os.path.islink("latest"): 133 | os.unlink("latest") 134 | os.symlink(latest, "latest") 135 | 136 | os.chdir(prevdir) 137 | 138 | class cwlgen: 139 | def __init__(self, args): 140 | self.args = args 141 | self.parse_deflines = True 142 | self.do_protein = True if self.args.protein else False 143 | pre = os.path.dirname(os.path.realpath(__file__)) + "/data/latest/" 144 | if args.custom_database is not None: 145 | pre = args.custom_database 146 | if pre[-1] != '/': 147 | pre += '/' 148 | self.fastadb = pre + 'AMRProt' 149 | self.hmmdb = pre + 'AMR.LIB' 150 | self.fam = pre + 'fam.tab' 151 | 152 | def prot_params(self): 153 | p = { 154 | 'query': { 155 | 'class': 'File', 156 | 'location': os.path.realpath(self.args.fasta) 157 | }, 158 | 'fasta': { 159 | 'class': 'File', 160 | 'location': os.path.realpath(self.fastadb) 161 | }, 162 | 'hmmdb': { 163 | 'class': 'File', 164 | 'location': os.path.realpath(self.hmmdb) 165 | }, 166 | 'fam': { 167 | 'class': 'File', 168 | 'location': os.path.realpath(self.fam) 169 | }, 170 | 'parse_deflines': self.parse_deflines 171 | } 172 | if self.args.gff: 173 | p['gff'] = { 174 | 'class': 'File', 175 | 'location': os.path.realpath(self.args.gff) 176 | } 177 | if self.args.num_threads: 178 | p['num_threads'] = self.args.num_threads 179 | p['cpu'] = self.args.num_threads 180 | return p 181 | 182 | def dna_params(self): 183 | p = { 184 | 'query': { 185 | 'class': 'File', 186 | 'location': os.path.realpath(self.args.fasta) 187 | }, 188 | 'fasta': { 189 | 'class': 'File', 190 | 'location': os.path.realpath(self.fastadb) 191 | }, 192 | 'fam': { 193 | 'class': 'File', 194 | 'location': os.path.realpath(self.fam) 195 | }, 196 | 'parse_deflines': self.parse_deflines, 197 | 'ident_min': self.args.ident_min, 198 | 'complete_cover_min': self.args.coverage_min, 199 | 'query_gencode': self.args.translation_table 200 | } 201 | if self.args.num_threads: 202 | p['num_threads'] = self.args.num_threads 203 | return p 204 | 205 | def params(self): 206 | params = self.prot_params() if self.do_protein else self.dna_params() 207 | 208 | (fdstream, self.param_file) = tempfile.mkstemp(suffix=".yaml", prefix="amr_params_") 209 | stream = os.fdopen(fdstream, 'w') 210 | yaml.dump(params, stream) 211 | #print(self.param_file) 212 | #print(yaml.dump(params)) 213 | 214 | def run(self): 215 | cwlcmd = [] 216 | if spawn.find_executable("cwl-runner") != None: 217 | cwlcmd = ['cwl-runner'] 218 | elif spawn.find_executable("cwltool") != None: 219 | cwlcmd = ['cwltool'] 220 | else: 221 | print("No CWL platform found.", file=sys.stderr) 222 | sys.exit(1) 223 | docker_avail = spawn.find_executable("docker") 224 | if docker_avail == None: 225 | cwlcmd.extend(['--user-space-docker-cmd', 'udocker']) 226 | if self.args.parallel: 227 | cwlcmd.extend(['--parallel']) 228 | script_path = os.path.dirname(os.path.realpath(__file__)) 229 | script_name = "/wf_amr_prot.cwl" if self.do_protein else "/wf_amr_dna.cwl" 230 | cwlscript = script_path + script_name 231 | cwlcmd.extend([cwlscript, self.param_file]) 232 | 233 | try: 234 | out = None 235 | if not self.args.show_output: 236 | out = open(os.devnull, "wb") 237 | subprocess.check_call(cwlcmd, stdout=out, stderr=out) 238 | 239 | for line in open('output.txt','r'): 240 | print(line, end='') 241 | except subprocess.CalledProcessError as eCPE: 242 | if not self.isFormatError(): 243 | print(eCPE.cmd) 244 | print("Return code:", eCPE.returncode) 245 | print(eCPE.output) 246 | except OSError: 247 | print(cwl.stdout) 248 | finally: 249 | self.cleanup() 250 | 251 | def isFormatError(self): 252 | files = [ "fasta_check.out", "gff_check.out" ] 253 | for f in files: 254 | if os.path.exists(f): 255 | if os.path.getsize(f) > 0: 256 | for line in open(f,'r'): 257 | print(line, end='') 258 | return True 259 | return False 260 | 261 | def cleanup(self): 262 | def safe_remove(f): 263 | if os.path.exists(f): 264 | os.remove(f) 265 | 266 | files = ["output.txt", "fasta_check.out", "gff_check.out", self.param_file] 267 | if self.args.retain_files: 268 | print("\nFiles retained:", files) 269 | else: 270 | for f in files: 271 | safe_remove(f) 272 | 273 | # Cleanup after cwltool's use of py2py3 274 | safe_remove('/tmp/futurized_code.py') 275 | safe_remove('/tmp/original_code.py') 276 | safe_remove('/tmp/py2_detection_code.py') 277 | 278 | class FastaAction(argparse.Action): 279 | def __call__(self, parser, namespace, values, option_string=None): 280 | if option_string == '-p' or option_string == '--protein': 281 | setattr(namespace, 'protein', True) 282 | if option_string == '-n' or option_string == '--nucleotide': 283 | setattr(namespace, 'protein', False) 284 | #print('%r %r %r' % (namespace, values, option_string)) 285 | setattr(namespace, self.dest, values) 286 | 287 | def run(updater_parser): 288 | parser = argparse.ArgumentParser( 289 | parents=[updater_parser], 290 | description='Run (or optionally update) the amr_finder pipeline.') 291 | group = parser.add_mutually_exclusive_group(required=True) 292 | group.add_argument('-p', '--protein', dest='fasta', action=FastaAction, 293 | help='Amino-acid sequences to search using BLASTP and HMMER') 294 | group.add_argument('-n', '--nucleotide', dest='fasta', action=FastaAction, 295 | help='Genomic sequence to search using BLASTX') 296 | 297 | parser.add_argument('-o', '--output', dest='outfile', 298 | help='tabfile output to this file instead of STDOUT') 299 | 300 | # Options relating to protein input (-p): 301 | #parser.add_argument('-f FASTA file containing proteins identified as candidate AMR genes 302 | parser.add_argument('-g', '--gff', help='GFF file indicating genomic location for proteins') 303 | # Options relating to nucleotide sequence input (-n) 304 | parser.add_argument('-i', '--ident_min', type=float, 305 | help='Minimum proportion identical translated AA residues (default: %(default)s).') 306 | parser.add_argument('-c', '--coverage_min', type=float, 307 | help='Minimum coverage of reference protein sequence (default: %(default)s).') 308 | parser.add_argument('-t', '--translation_table', type=int, 309 | help='Translation table for blastx (default: %(default)s). More info may be found at https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=c') 310 | parser.add_argument('--custom_database', type=str, 311 | help='Directory containing custom databases to be searched.') 312 | 313 | parser.add_argument('-s', '--show_output', action='store_true', 314 | help='Show the stdout and stderr output from the pipeline execution (verbose mode, useful for debugging).') 315 | parser.add_argument('-r', '--retain-files', action='store_true', 316 | help='Keep the YAML parameter and final output files.') 317 | 318 | parser.add_argument('-P', '--parallel', action='store_true', 319 | help='[experimental] Run jobs in parallel. Does not currently keep track of ResourceRequirements like the number of cores or memory and can overload this system.') 320 | parser.add_argument('-N', '--num_threads', type=int, 321 | help='Number of threads to use for blast/hmmr (default: %(default)s).') 322 | max_cpus = min(8, available_cpu_count()) 323 | parser.set_defaults(ident_min=0.9, 324 | coverage_min=0.9, 325 | translation_table=11, 326 | num_threads=max_cpus) 327 | 328 | args = parser.parse_args() 329 | 330 | has_data = check_data() 331 | if not has_data and args.custom_database is None: 332 | print("Required supplementary data not present, downloading via ftp.") 333 | update_data() 334 | 335 | g = cwlgen(args) 336 | g.params() 337 | g.run() 338 | 339 | 340 | 341 | -------------------------------------------------------------------------------- /amr_finder/test_dna.fa: -------------------------------------------------------------------------------- 1 | >blaTEM-156 2 | AACCCCTATTTGTTTATTTTTCTAAATACATTCAAATATGTATCCGCTCATGATACAATAACCCTGATAA 3 | ATGCTTCAATAATATTGAAAAAGGAAGAGTATGAGTATTCAACATTTCCGTGTCGCCCTTATTCCCTTTT 4 | TTGCGGCATTTTGCCTTCCTGTTTTTGCTCACCCAGAAACGCTGGTGAAAGTAAAAGATGCTGAAGATCA 5 | GTTGGGTGCACGAGTGGGTTACATCGAACTGGATCTCAACAGCGGTAAGATCCTTGAGAGTTTTCGCCCC 6 | GAAGAACGTTTTCCAATGATGAGCACTTTTAAAGTTCTGCTATGTGGCGCGGTATTATCCCGTGTTGACG 7 | CCGGGCAAGAGCAACTCGGTCGCCGCATACACTATTCTCAGAATGACTTGGTTGAGTACTCACCAGTCAC 8 | AGAAAAGCATCTTACGGATGGCATGACAGTAAGAGAATTATGCAGTGCTGCCATAACCATGAGTGATAAC 9 | ACTGCGGCCAACTTACTTCTGACAACGATCGGAGGACCGAAGGAGCTAACCGCTTTTTTGCACAACATAG 10 | GGGATCATGTAACTCGCCTTGATCGTTGGGAACCGGAGCTGAATGAAGCCATACCAAACGACGAGCGTGA 11 | CACCACGATGCCTGCAGCAATGGCAACAACGTTGCGCAAACTATTAACTGGCGAACTACTTACTCTAGCT 12 | TCCCGGCAACAATTAATAGACTGGATGGAGGCGGATAAAGTTGCAGGACCACTTCTGCGCTCGGCCCTTC 13 | CGGCTGGCTGGTTTATTGCTGATAAATCTGGAGCCGGTGAGCGTGGGTCTCGCGGTATCATTGCAGCACT 14 | GGGGCCAGATGGTAAGCCCTCCCGTATCGTAGTTATCTACACGACGGGGAGTCAGGCAACTATGGATGAA 15 | CGAAATAGACAGATCGCTGAGATAGGTGCCTCACTGATTAAGCATTGGTAACTGTCAGACCAAGTTTACT 16 | CATA 17 | 18 | >blaPDC-114_blast 19 | ATGCGCGATACCAGATTCCCCTGCCTGTGCGGCATCGCCGCTTCCACACTGCTGTTCGCCACCACCCCGG 20 | CCATTGCCGGCGAGGCCCCGGCGGATCGCCTGAAGGCACTGGTCGACGCCGCCGTACAACCGGTGATGAA 21 | GGCCAATGACATTCCGGGCCTGGCCGTAGCCATCAGCCTGAAAGGAGAACCGCATTACTTCAGCTATGGG 22 | CTGGCCTCGAAAGAGGACGGCCGCCGGGTGACGCCGGAGACCCTGTTCGAGATCGGCTCGGTGAGCAAGA 23 | CCTTCACCGCCACCCTCGCCGGCTATGCCCTGACCCAGGACAAGATGCGCCTCGACGACCGCGCCAGCCA 24 | GCACTGGCCGGCACTGCAGGGCAGCCGCTTCGACGGCATCAGCCTGCTCGACCTCGCGACCTATACCGCC 25 | GGCGGCTTGCCGCTGCAGaTCCCCGACTCGGTGCAGAAGGACCAGGCACAGATCCGCGACTACTACCGCC 26 | AGTGGCAGCCGACCTACGCGCCGGGCAGCCAGCGCCTCTATTCCAACCCGAGCATCGGCCTGTTCGGCTA 27 | TCTCGCCGCGCGCAGCCTGGGCCAGCCGTTCGAACGGCTCATGGAGCAGCAAGTGTTCCCGGCACTGGGC 28 | CTCGAACAGACCCACCTCGACGTGCCCGAGGCGGCGTTGGCGCAGTACGCCCAGGGCTACGGCAAGGACG 29 | ACCGCCCGCTACGGGCCGGTCCCGGCCCGCTGGATGCCGAAGGCTACGGGGTGAAGACCAGCGCGGCCGA 30 | CCTGCTGCGCTTCGTCGATGCCAACCTGCATCCGGAGCGCCTGGACAGGCCCTGGGCGCAGGCGCTCGAT 31 | GCCACCCATCGCGGTTACTACAAGGTCGGCGACATGACCCAGGGCCTGGGCTGGGAAGCCTACGACTGGC 32 | CGATCTCCCTGAAGCGCCTGCAGGCCGGCAACTCGACGCCGATGGCGCTGCAACCGCACAGGATCGCCAG 33 | GCTGCCCGCGCCACAGGCGCTGGAGGGCCAGCGCCTGCTGAACAAGACCGGTTCCACCAACGGCTTCGGC 34 | GCCTACGTGGCGTTCGTCCCGGGCCGCGACCTGGGACTGGTGATCCTGGCCAACCGCAACTATCCCAATG 35 | CCGAGCGGGTGAAGATCGCCTACGCCATCCTCAGCGGCCTGGAGCAGCAGGGCAAGGTGCCGCTGAAGCG 36 | CTGA 37 | 38 | >blaOXA-436_partial_cds 39 | GCCACAATTGGTCAGCAGTTTGCTATTGAAAGCCCGCCAAGACGGTGCTACTCTTAGCGCCTCATTTTTATGGATTTATTGCATTAGGCAAGGGGACATTATGCGTGCGTTAGCCTTATCGGCTGTGTTGATGGTGACAACGATGATTGGCATGCCTGCGGTGGCAAAGGAGTGGCAAGAGAACAAGAGTTGGAATGCTCACTTTAGCGAACATAAAACCCAAGGCGTGGTTGTGCTCTGGAACGAGAATACACAGCAGGGTTTTACCAACGATCTTAAACGGGCAAACCAAGCATTTTTACCTGCATCGACCTTTAAGATCCCAAACAGTTTAATTGCCTTGGACTTAGGTGTGGTTAAGGATGAGCATCAAGTCTTTAAATGGGATGGACAGACGCGAGATATCGCCGCGTGGAATCGCGACCATGACTTAATCACCGCGATGAAGTATTCGGTTGTGCCTGTTTATCAAGAATTTGCCCGCCAAATTGGCGAGGCCCGTATGAGTAAAATGTTGCACGCCTTCGATTATGGTAATGAGGATATCTCGGGCAATTTGGACAGTTTTTGGCTCGATGGTGGTATTCGCATTTCGGCTACCCAGCAAATCGCTTTTTTACGCAAGCTGTACCACAACAAGTTGCACGTTTCTGAGCGTAGTCAGCGCATCGTTAAACAAGCCATGCTGACCGAGGCAAATGCCGACTATATCATCCGGGCGAAAACTGGCTATTCGGTCAGAATTGAACCGAAAATCGGTTGGTGGGTTGGCTGGATCGAACTGGATGACAATGTGTGGTTCAAGTGGTTAGCGGCGCATTTGTGTAAAATAGCCGTCATATAAGCTGTAAAGTTATATGGACAAAATACTTATAGTCGATGCGCTTATCGCTAATGGCTCA 40 | 41 | 42 | >vanG exact 43 | ATGACTGGTAGTCAGACGGAAAAAGAATTGTATGTCAACCAATGTAAAATAGCCTATAAGCTACCCGATG 44 | GTGTAAAAATTGAAGAAAGAGGTGTGTAAAATGCAAAATAAAAAAATAGCAGTTATTTTTGGAGGCAATT 45 | CAACAGAGTACGAGGTGTCATTGCAATCGGCATCCGCTGTTTTTGAAAATATCAATACCAATAAATTTGA 46 | CATAATTCCAATAGGAATTACAAGAAGTGGTGAATGGTATCACTATACGGGAGAAAAGGAGAAAATCCTA 47 | AACAATACTTGGTTTGAAGATAGCAAAAATCTATGCCCTGTTGTCGTTTCCCAAAATCGTTCCGTTAAAG 48 | GCTTTTTAGAAATTGCTTCAGACAAATACCGTATTATAAAAGTTGATTTGGTATTCCCCGTATTGCATGG 49 | CAAAAACGGCGAAAATGGGACTTTGCAGGGCATATTTGAATTGGCAGGAATACCTGTTGTTGGCTGCGAT 50 | ACACTCTCATCAGCTCTTTGTATGGATAAGGACAGGGCACATAAACTCGTTAGCCTTGCGGGTATATCTG 51 | TTCCTAAATCGGTAACATTCAAACGCTTTAACGAAGAAGCAGCGATGAAAGAGATTGAAGCGAATTTAAC 52 | TTATCCGCTGTTTATTAAACCTGTTCGTGCAGGCTCTTCCTTTGGAATAACAAAAGTAATTGAAAAGCAA 53 | GAGCTTGATGCTGCCATAGAGTTGGCATTTGAACACGATACAGAAGTCATCGTTGAAGAAACAATAAACG 54 | GCTTTGAAGTCGGTTGTGCCGTACTTGGCATAGATGAGCTCATTGTTGGCAGAGTTGATGAAATCGAACT 55 | GTCAAGCGGCTTTTTTGATTATACAGAGAAATATACGCTTAAATCTTCAAAGATATATATGCCTGCAAGG 56 | ATTGATGCCGAAGCAGAAAAACGGATACAAGAAGCGGCTGTAACCATATATAAAGCTCTGGGCTGTTCGG 57 | GTTTTTCCAGAGTGGATATGTTTTATACACCGTCTGGCGAAATTGTATTTAATGAGGTAAACACAATACC 58 | AGGCTTTACCTCGCACAGTCGCTATCCAAATATGATGAAAGGCATTGGTCTATCGTTCTCCCAAATGTTG 59 | GATAAGCTGATAGGTCTGTATGTGGAATGATGAAAACGATTGAGCTTGAAAAGGAAGAAATTTATTGTGG 60 | AAATTTGCTGCTCGTCAACAAAAATTATCCGCTACGAGATAACAATGTAAAGGGTTTAGT 61 | 62 | -------------------------------------------------------------------------------- /amr_finder/test_dna_fail.fa: -------------------------------------------------------------------------------- 1 | >blaTEM-156 2 | AACCCCTATTTGTTTATTTTTCTAAATACATTCAAATATGTATCCGCTCATGATACAATAACCCTGATAP 3 | ATGCTTCAATAATATTGAAAAAGGAAGAGTATGAGTATTCAACATTTCCGTGTCGCCCTTATTCCCTTTT 4 | TTGCGGCATTTTGCCTTCCTGTTTTTGCTCACCCAGAAACGCTGGTGAAAGTAAAAGATGCTGAAGATCA 5 | GTTGGGTGCACGAGTGGGTTACATCGAACTGGATCTCAACAGCGGTAAGATCCTTGAGAGTTTTCGCCCC 6 | GAAGAACGTTTTCCAATGATGAGCACTTTTAAAGTTCTGCTATGTGGCGCGGTATTATCCCGTGTTGACG 7 | CCGGGCAAGAGCAACTCGGTCGCCGCATACACTATTCTCAGAATGACTTGGTTGAGTACTCACCAGTCAC 8 | AGAAAAGCATCTTACGGATGGCATGACAGTAAGAGAATTATGCAGTGCTGCCATAACCATGAGTGATAAC 9 | ACTGCGGCCAACTTACTTCTGACAACGATCGGAGGACCGAAGGAGCTAACCGCTTTTTTGCACAACATAG 10 | GGGATCATGTAACTCGCCTTGATCGTTGGGAACCGGAGCTGAATGAAGCCATACCAAACGACGAGCGTGA 11 | CACCACGATGCCTGCAGCAATGGCAACAACGTTGCGCAAACTATTAACTGGCGAACTACTTACTCTAGCT 12 | TCCCGGCAACAATTAATAGACTGGATGGAGGCGGATAAAGTTGCAGGACCACTTCTGCGCTCGGCCCTTC 13 | CGGCTGGCTGGTTTATTGCTGATAAATCTGGAGCCGGTGAGCGTGGGTCTCGCGGTATCATTGCAGCACT 14 | GGGGCCAGATGGTAAGCCCTCCCGTATCGTAGTTATCTACACGACGGGGAGTCAGGCAACTATGGATGAA 15 | CGAAATAGACAGATCGCTGAGATAGGTGCCTCACTGATTAAGCATTGGTAACTGTCAGACCAAGTTTACT 16 | CATA 17 | 18 | >blaPDC-114_blast 19 | ATGCGCGATACCAGATTCCCCTGCCTGTGCGGCATCGCCGCTTCCACACTGCTGTTCGCCACCACCCCGG 20 | CCATTGCCGGCGAGGCCCCGGCGGATCGCCTGAAGGCACTGGTCGACGCCGCCGTACAACCGGTGATGAA 21 | GGCCAATGACATTCCGGGCCTGGCCGTAGCCATCAGCCTGAAAGGAGAACCGCATTACTTCAGCTATGGG 22 | CTGGCCTCGAAAGAGGACGGCCGCCGGGTGACGCCGGAGACCCTGTTCGAGATCGGCTCGGTGAGCAAGA 23 | CCTTCACCGCCACCCTCGCCGGCTATGCCCTGACCCAGGACAAGATGCGCCTCGACGACCGCGCCAGCCA 24 | GCACTGGCCGGCACTGCAGGGCAGCCGCTTCGACGGCATCAGCCTGCTCGACCTCGCGACCTATACCGCC 25 | GGCGGCTTGCCGCTGCAGaTCCCCGACTCGGTGCAGAAGGACCAGGCACAGATCCGCGACTACTACCGCC 26 | AGTGGCAGCCGACCTACGCGCCGGGCAGCCAGCGCCTCTATTCCAACCCGAGCATCGGCCTGTTCGGCTA 27 | TCTCGCCGCGCGCAGCCTGGGCCAGCCGTTCGAACGGCTCATGGAGCAGCAAGTGTTCCCGGCACTGGGC 28 | CTCGAACAGACCCACCTCGACGTGCCCGAGGCGGCGTTGGCGCAGTACGCCCAGGGCTACGGCAAGGACG 29 | ACCGCCCGCTACGGGCCGGTCCCGGCCCGCTGGATGCCGAAGGCTACGGGGTGAAGACCAGCGCGGCCGA 30 | CCTGCTGCGCTTCGTCGATGCCAACCTGCATCCGGAGCGCCTGGACAGGCCCTGGGCGCAGGCGCTCGAT 31 | GCCACCCATCGCGGTTACTACAAGGTCGGCGACATGACCCAGGGCCTGGGCTGGGAAGCCTACGACTGGC 32 | CGATCTCCCTGAAGCGCCTGCAGGCCGGCAACTCGACGCCGATGGCGCTGCAACCGCACAGGATCGCCAG 33 | GCTGCCCGCGCCACAGGCGCTGGAGGGCCAGCGCCTGCTGAACAAGACCGGTTCCACCAACGGCTTCGGC 34 | GCCTACGTGGCGTTCGTCCCGGGCCGCGACCTGGGACTGGTGATCCTGGCCAACCGCAACTATCCCAATG 35 | CCGAGCGGGTGAAGATCGCCTACGCCATCCTCAGCGGCCTGGAGCAGCAGGGCAAGGTGCCGCTGAAGCG 36 | CTGA 37 | 38 | >blaOXA-436_partial_cds 39 | GCCACAATTGGTCAGCAGTTTGCTATTGAAAGCCCGCCAAGACGGTGCTACTCTTAGCGCCTCATTTTTATGGATTTATTGCATTAGGCAAGGGGACATTATGCGTGCGTTAGCCTTATCGGCTGTGTTGATGGTGACAACGATGATTGGCATGCCTGCGGTGGCAAAGGAGTGGCAAGAGAACAAGAGTTGGAATGCTCACTTTAGCGAACATAAAACCCAAGGCGTGGTTGTGCTCTGGAACGAGAATACACAGCAGGGTTTTACCAACGATCTTAAACGGGCAAACCAAGCATTTTTACCTGCATCGACCTTTAAGATCCCAAACAGTTTAATTGCCTTGGACTTAGGTGTGGTTAAGGATGAGCATCAAGTCTTTAAATGGGATGGACAGACGCGAGATATCGCCGCGTGGAATCGCGACCATGACTTAATCACCGCGATGAAGTATTCGGTTGTGCCTGTTTATCAAGAATTTGCCCGCCAAATTGGCGAGGCCCGTATGAGTAAAATGTTGCACGCCTTCGATTATGGTAATGAGGATATCTCGGGCAATTTGGACAGTTTTTGGCTCGATGGTGGTATTCGCATTTCGGCTACCCAGCAAATCGCTTTTTTACGCAAGCTGTACCACAACAAGTTGCACGTTTCTGAGCGTAGTCAGCGCATCGTTAAACAAGCCATGCTGACCGAGGCAAATGCCGACTATATCATCCGGGCGAAAACTGGCTATTCGGTCAGAATTGAACCGAAAATCGGTTGGTGGGTTGGCTGGATCGAACTGGATGACAATGTGTGGTTCAAGTGGTTAGCGGCGCATTTGTGTAAAATAGCCGTCATATAAGCTGTAAAGTTATATGGACAAAATACTTATAGTCGATGCGCTTATCGCTAATGGCTCA 40 | 41 | 42 | >vanG exact 43 | ATGACTGGTAGTCAGACGGAAAAAGAATTGTATGTCAACCAATGTAAAATAGCCTATAAGCTACCCGATG 44 | GTGTAAAAATTGAAGAAAGAGGTGTGTAAAATGCAAAATAAAAAAATAGCAGTTATTTTTGGAGGCAATT 45 | CAACAGAGTACGAGGTGTCATTGCAATCGGCATCCGCTGTTTTTGAAAATATCAATACCAATAAATTTGA 46 | CATAATTCCAATAGGAATTACAAGAAGTGGTGAATGGTATCACTATACGGGAGAAAAGGAGAAAATCCTA 47 | AACAATACTTGGTTTGAAGATAGCAAAAATCTATGCCCTGTTGTCGTTTCCCAAAATCGTTCCGTTAAAG 48 | GCTTTTTAGAAATTGCTTCAGACAAATACCGTATTATAAAAGTTGATTTGGTATTCCCCGTATTGCATGG 49 | CAAAAACGGCGAAAATGGGACTTTGCAGGGCATATTTGAATTGGCAGGAATACCTGTTGTTGGCTGCGAT 50 | ACACTCTCATCAGCTCTTTGTATGGATAAGGACAGGGCACATAAACTCGTTAGCCTTGCGGGTATATCTG 51 | TTCCTAAATCGGTAACATTCAAACGCTTTAACGAAGAAGCAGCGATGAAAGAGATTGAAGCGAATTTAAC 52 | TTATCCGCTGTTTATTAAACCTGTTCGTGCAGGCTCTTCCTTTGGAATAACAAAAGTAATTGAAAAGCAA 53 | GAGCTTGATGCTGCCATAGAGTTGGCATTTGAACACGATACAGAAGTCATCGTTGAAGAAACAATAAACG 54 | GCTTTGAAGTCGGTTGTGCCGTACTTGGCATAGATGAGCTCATTGTTGGCAGAGTTGATGAAATCGAACT 55 | GTCAAGCGGCTTTTTTGATTATACAGAGAAATATACGCTTAAATCTTCAAAGATATATATGCCTGCAAGG 56 | ATTGATGCCGAAGCAGAAAAACGGATACAAGAAGCGGCTGTAACCATATATAAAGCTCTGGGCTGTTCGG 57 | GTTTTTCCAGAGTGGATATGTTTTATACACCGTCTGGCGAAATTGTATTTAATGAGGTAAACACAATACC 58 | AGGCTTTACCTCGCACAGTCGCTATCCAAATATGATGAAAGGCATTGGTCTATCGTTCTCCCAAATGTTG 59 | GATAAGCTGATAGGTCTGTATGTGGAATGATGAAAACGATTGAGCTTGAAAAGGAAGAAATTTATTGTGG 60 | AAATTTGCTGCTCGTCAACAAAAATTATCCGCTACGAGATAACAATGTAAAGGGTTTAGT 61 | 62 | -------------------------------------------------------------------------------- /amr_finder/test_prot.fa: -------------------------------------------------------------------------------- 1 | >blaTEM-156 2 | MSIQHFRVALIPFFAAFCLPVFAHPETLVKVKDAEDQLGARVGYIELDLNSGKILESFRPEERFPMMSTFKVLLCGAVLS 3 | RVDAGQEQLGRRIHYSQNDLVEYSPVTEKHLTDGMTVRELCSAAITMSDNTAANLLLTTIGGPKELTAFLHNIGDHVTRL 4 | DRWEPELNEAIPNDERDTTMPAAMATTLRKLLTGELLTLASRQQLIDWMEADKVAGPLLRSALPAGWFIADKSGAGERGS 5 | RGIIAALGPDGKPSRIVVIYTTGSQATMDERNRQIAEIGASLIKHW 6 | >nimIJ_hmm family protein (HMM-only) manually edited 7 | MFREMNRKNQQLSDAECVGILENASTGTLALQGDGGYPYAVPITYVHADGKLYFHSALKGHKVDAVKGCDKASFCVIEQDE 8 | IHGEEYTTYFRSVVAFGRVRILEDEAEKMAAARLLGDRYHPHHEEALGRELAKSFGHMLVICLDIEHMTGKE 9 | AIELCRMRRPKA 10 | >blaPDC-114_blast BLAST (100% length, but 1 mismatch) 11 | MRDTRFPCLCGIAASTLLFATTPAIAGEAPADRLKALVDAAVQPVMKANDIPGLAVAISLKGEPHYFSYGLASKEDGRRV 12 | TPETLFEIGSVSKTFTATLAGYALTQDKMRLDDRASQHWPALVGSRFDGISLLDLATYTAGGLPLQFPDSVQKDQAQIRD 13 | YYRQWQPTYAPGSQRLYSNPSIGLFGYLAARSLGQPFERLMEQQVFPALGLEQTHLDVPEAALAQYAQGYGKDDRPLRAG 14 | PGPLDAEGYGVKTSAADLLRFVDANLHPERLDRPWAQALDATHRGYYKVGDMTQGLGWEAYDWPISLKRLQAGNSTPMAL 15 | QPHRIARLPAPQALEGQRLLNKTGSTNGFGAYVAFVPGRDLGLVILANRNYPNAERVKIAYAILSGLEQQGKVPLKR 16 | >blaOXA-436_partial (Should be partial OXA-48 family 17 | MRALALSAVLMVTTMIGMPAVAKEWQENKSWNAHFSEHKTQGVVVLWNENTQQGFTNDLKRANQAFLPASTFKIPNSLIA 18 | LDLGVVKDEHQVFKWDGQTRDIAAWNRDHDLITAMKYSVVPVYQEFARQIGEARMSKMLHAFDYGNEDISGNLDSFWLDG 19 | GIRISATQQIAFLRKLYHNKLHVSERSQRIVKQAMLTEANADYIIRAKTGYSVRIEPKIGWWVGWIELDDNVW 20 | >vanG 21 | MQNKKIAVIFGGNSTEYEVSLQSASAVFENINTNKFDIIPIGITRSGEWYHYTGEKEKILNNTWFEDSKN 22 | LCPVVVSQNRSVKGFLEIASDKYRIIKVDLVFPVLHGKNGENGTLQGIFELAGIPVVGCDTLSSALCMDK 23 | DRAHKLVSLAGISVPKSVTFKRFNEEAAMKEIEANLTYPLFIKPVRAGSSFGITKVIEKQELDAAIELAF 24 | EHDTEVIVEETINGFEVGCAVLGIDELIVGRVDEIELSSGFFDYTEKYTLKSSKIYMPARIDAEAEKRIQ 25 | EAAVTIYKALGCSGFSRVDMFYTPSGEIVFNEVNTIPGFTSHSRYPNMMKGIGLSFSQMLDKLIGLYVE 26 | -------------------------------------------------------------------------------- /amr_finder/test_prot.gff: -------------------------------------------------------------------------------- 1 | ##gff-version 3 2 | ##sequence-region contig1 1-50000 3 | contig1 . gene 1 858 . + . ID=gene1;Name=blaTEM-156 4 | contig1 . gene 1001 1495 . + . ID=gene2;Name=nimIJ_hmm 5 | contig1 . gene 2001 3191 . + . ID=gene3;Name=blaPDC-114_blast 6 | contig1 . gene 4001 4699 . + . ID=gene4;Name=blaOXA-436_partial 7 | contig1 . gene 5001 6047 . + . ID=gene5;Name=vanG 8 | -------------------------------------------------------------------------------- /amr_finder/test_prot_fail.fa: -------------------------------------------------------------------------------- 1 | >blaTEM-156 2 | MSIQHFRVALIPFFAAFCLPVFAHPETLVKVKDAEDQLGARVGYIELDLNSGKILESFRPEERFPMMSTFKVLLCGAVL2 3 | RVDAGQEQLGRRIHYSQNDLVEYSPVTEKHLTDGMTVRELCSAAITMSDNTAANLLLTTIGGPKELTAFLHNIGDHVTRL 4 | DRWEPELNEAIPNDERDTTMPAAMATTLRKLLTGELLTLASRQQLIDWMEADKVAGPLLRSALPAGWFIADKSGAGERGS 5 | RGIIAALGPDGKPSRIVVIYTTGSQATMDERNRQIAEIGASLIKHW 6 | >nimIJ_hmm family protein (HMM-only) manually edited 7 | MFREMNRKNQQLSDAECVGILENASTGTLALQGDGGYPYAVPITYVHADGKLYFHSALKGHKVDAVKGCDKASFCVIEQDE 8 | IHGEEYTTYFRSVVAFGRVRILEDEAEKMAAARLLGDRYHPHHEEALGRELAKSFGHMLVICLDIEHMTGKE 9 | AIELCRMRRPKA 10 | >blaPDC-114_blast BLAST (100% length, but 1 mismatch) 11 | MRDTRFPCLCGIAASTLLFATTPAIAGEAPADRLKALVDAAVQPVMKANDIPGLAVAISLKGEPHYFSYGLASKEDGRRV 12 | TPETLFEIGSVSKTFTATLAGYALTQDKMRLDDRASQHWPALVGSRFDGISLLDLATYTAGGLPLQFPDSVQKDQAQIRD 13 | YYRQWQPTYAPGSQRLYSNPSIGLFGYLAARSLGQPFERLMEQQVFPALGLEQTHLDVPEAALAQYAQGYGKDDRPLRAG 14 | PGPLDAEGYGVKTSAADLLRFVDANLHPERLDRPWAQALDATHRGYYKVGDMTQGLGWEAYDWPISLKRLQAGNSTPMAL 15 | QPHRIARLPAPQALEGQRLLNKTGSTNGFGAYVAFVPGRDLGLVILANRNYPNAERVKIAYAILSGLEQQGKVPLKR 16 | >blaOXA-436_partial (Should be partial OXA-48 family 17 | MRALALSAVLMVTTMIGMPAVAKEWQENKSWNAHFSEHKTQGVVVLWNENTQQGFTNDLKRANQAFLPASTFKIPNSLIA 18 | LDLGVVKDEHQVFKWDGQTRDIAAWNRDHDLITAMKYSVVPVYQEFARQIGEARMSKMLHAFDYGNEDISGNLDSFWLDG 19 | GIRISATQQIAFLRKLYHNKLHVSERSQRIVKQAMLTEANADYIIRAKTGYSVRIEPKIGWWVGWIELDDNVW 20 | >vanG 21 | MQNKKIAVIFGGNSTEYEVSLQSASAVFENINTNKFDIIPIGITRSGEWYHYTGEKEKILNNTWFEDSKN 22 | LCPVVVSQNRSVKGFLEIASDKYRIIKVDLVFPVLHGKNGENGTLQGIFELAGIPVVGCDTLSSALCMDK 23 | DRAHKLVSLAGISVPKSVTFKRFNEEAAMKEIEANLTYPLFIKPVRAGSSFGITKVIEKQELDAAIELAF 24 | EHDTEVIVEETINGFEVGCAVLGIDELIVGRVDEIELSSGFFDYTEKYTLKSSKIYMPARIDAEAEKRIQ 25 | EAAVTIYKALGCSGFSRVDMFYTPSGEIVFNEVNTIPGFTSHSRYPNMMKGIGLSFSQMLDKLIGLYVE 26 | -------------------------------------------------------------------------------- /amr_finder/update_docker_version.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | if [[ $1 ]]; then 4 | sed -i "s/amr:[0-9\.]\+/amr:$1/g" *.cwl 5 | else 6 | echo "Usage: $0 " 7 | fi 8 | 9 | 10 | -------------------------------------------------------------------------------- /amr_finder/wf_amr_dna.cwl: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env cwl-runner 2 | 3 | cwlVersion: v1.0 4 | class: Workflow 5 | 6 | requirements: 7 | - class: SubworkflowFeatureRequirement 8 | - class: DockerRequirement 9 | dockerPull: ncbi/amr:18.06 10 | 11 | inputs: 12 | query: File 13 | fasta: File 14 | fam: File 15 | parse_deflines: boolean 16 | ident_min: float 17 | complete_cover_min: float 18 | query_gencode: int 19 | num_threads: int? 20 | 21 | outputs: 22 | result: 23 | type: File 24 | outputSource: amr_report/output 25 | fasta_check_out: 26 | type: File 27 | outputSource: fasta_check/output 28 | 29 | steps: 30 | fasta_check: 31 | run: fasta_check.cwl 32 | in: 33 | fasta: query 34 | out: 35 | [output] 36 | 37 | makeblastdb: 38 | run: wf_makeblastdb.cwl 39 | in: 40 | fasta_check_dummy: fasta_check/output 41 | fasta: fasta 42 | out: 43 | [blastdb] 44 | 45 | blastx: 46 | run: blastx.cwl 47 | in: 48 | query: query 49 | db: makeblastdb/blastdb 50 | parse_deflines: parse_deflines 51 | query_gencode: query_gencode 52 | num_threads: num_threads 53 | out: 54 | [output] 55 | 56 | amr_report: 57 | run: amr_report.cwl 58 | in: 59 | fam: fam 60 | blastx: blastx/output 61 | ident_min: ident_min 62 | complete_cover_min: complete_cover_min 63 | out: 64 | [output] 65 | 66 | -------------------------------------------------------------------------------- /amr_finder/wf_amr_dna_params.yaml: -------------------------------------------------------------------------------- 1 | query: 2 | class: File 3 | location: test_dna.fa 4 | fasta: 5 | class: File 6 | location: data/latest/AMRProt 7 | fam: 8 | class: File 9 | location: data/latest/fam.tab 10 | parse_deflines: false 11 | ident_min: 0.9 12 | complete_cover_min: 0.5 13 | query_gencode: 11 14 | -------------------------------------------------------------------------------- /amr_finder/wf_amr_prot.cwl: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env cwl-runner 2 | 3 | cwlVersion: v1.0 4 | class: Workflow 5 | 6 | requirements: 7 | - class: SubworkflowFeatureRequirement 8 | - class: DockerRequirement 9 | dockerPull: ncbi/amr:18.06 10 | 11 | 12 | inputs: 13 | query: File 14 | fasta: File 15 | hmmdb: File 16 | fam: File 17 | gff: File? 18 | parse_deflines: boolean 19 | num_threads: int? 20 | cpu: int? 21 | 22 | outputs: 23 | result: 24 | type: File 25 | outputSource: amr_report/output 26 | fasta_check_out: 27 | type: File 28 | outputSource: fasta_check/output 29 | gff_check_out: 30 | type: File 31 | outputSource: gff_check/output 32 | 33 | steps: 34 | fasta_check: 35 | run: fasta_check.cwl 36 | in: 37 | fasta: query 38 | aa: 39 | default: True 40 | out: 41 | [output] 42 | 43 | gff_check: 44 | run: gff_check.cwl 45 | in: 46 | gff: gff 47 | fasta: query 48 | #locus_tag: 49 | # default: locus.tags 50 | out: 51 | [output] 52 | 53 | makeblastdb: 54 | run: wf_makeblastdb.cwl 55 | in: 56 | fasta_check_dummy: fasta_check/output 57 | gff_check_dummy: gff_check/output 58 | fasta: fasta 59 | out: 60 | [blastdb] 61 | 62 | blastp: 63 | run: blastp.cwl 64 | in: 65 | query: query 66 | db: makeblastdb/blastdb 67 | parse_deflines: parse_deflines 68 | num_threads: num_threads 69 | out: 70 | [output] 71 | 72 | hmmsearch: 73 | run: hmmsearch.cwl 74 | in: 75 | query: query 76 | db: hmmdb 77 | cpu: cpu 78 | fasta_check_dummy: fasta_check/output 79 | gff_check_dummy: gff_check/output 80 | out: 81 | [hmmsearch_out,hmmdom_out] 82 | 83 | amr_report: 84 | run: amr_report.cwl 85 | in: 86 | fam: fam 87 | blastp: blastp/output 88 | hmmdom: hmmsearch/hmmdom_out 89 | hmmsearch: hmmsearch/hmmsearch_out 90 | gff: gff 91 | out: 92 | [output] 93 | 94 | -------------------------------------------------------------------------------- /amr_finder/wf_amr_prot_params.yaml: -------------------------------------------------------------------------------- 1 | query: 2 | class: File 3 | location: test_prot.fa 4 | gff: 5 | class: File 6 | location: test_prot.gff 7 | fasta: 8 | class: File 9 | location: data/latest/AMRProt 10 | hmmdb: 11 | class: File 12 | location: data/latest/AMR.LIB 13 | fam: 14 | class: File 15 | location: data/latest/fam.tab 16 | parse_deflines: false 17 | -------------------------------------------------------------------------------- /amr_finder/wf_makeblastdb.cwl: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env cwl-runner 2 | cwlVersion: v1.0 3 | class: Workflow 4 | 5 | requirements: 6 | - class: StepInputExpressionRequirement 7 | 8 | inputs: 9 | fasta_check_dummy: File? 10 | gff_check_dummy: File? 11 | fasta: File 12 | 13 | outputs: 14 | blastdb: 15 | type: Directory 16 | outputSource: mkdir/blastdb 17 | 18 | steps: 19 | makeblastdb: 20 | run: 21 | class: CommandLineTool 22 | hints: 23 | DockerRequirement: 24 | dockerPull: ncbi/amr:18.06 25 | requirements: 26 | - class: InitialWorkDirRequirement 27 | listing: 28 | - entry: $(inputs.fasta) 29 | writable: False 30 | #makeblastdb -in AMRProt -dbtype prot 31 | baseCommand: makeblastdb 32 | inputs: 33 | fasta: 34 | type: File 35 | inputBinding: 36 | prefix: -in 37 | dbtype: 38 | type: string? 39 | default: prot 40 | inputBinding: 41 | prefix: -dbtype 42 | 43 | outputs: 44 | blastfiles: 45 | type: File[] 46 | outputBinding: 47 | glob: "*" 48 | in: 49 | fasta: fasta 50 | out: 51 | [blastfiles] 52 | 53 | mkdir: 54 | run: 55 | class: CommandLineTool 56 | requirements: 57 | - class: ShellCommandRequirement 58 | arguments: 59 | - shellQuote: false 60 | valueFrom: >- 61 | mkdir $(inputs.blastdir) && cp 62 | inputs: 63 | blastfiles: 64 | type: File[] 65 | inputBinding: 66 | position: 1 67 | blastdir: 68 | type: string 69 | inputBinding: 70 | position: 2 71 | 72 | outputs: 73 | blastdb: 74 | type: Directory 75 | outputBinding: 76 | glob: $(inputs.blastdir) 77 | in: 78 | blastfiles: makeblastdb/blastfiles 79 | blastdir: 80 | source: fasta 81 | valueFrom: $(self.basename) 82 | out: 83 | [blastdb] 84 | 85 | 86 | -------------------------------------------------------------------------------- /contam_filter/blastn.cwl: -------------------------------------------------------------------------------- 1 | cwlVersion: v1.0 2 | class: CommandLineTool 3 | hints: 4 | DockerRequirement: 5 | dockerPull: ncbi/blast_contamfilter 6 | 7 | baseCommand: blastn 8 | #stdout: $(inputs.db.basename).output 9 | stdout: $(inputs.db).output 10 | inputs: 11 | query: 12 | type: File 13 | inputBinding: 14 | prefix: -query 15 | db: 16 | type: string 17 | inputBinding: 18 | prefix: -db 19 | outfmt: 20 | type: string? 21 | default: 6 22 | inputBinding: 23 | prefix: -outfmt 24 | best_hit_overhang: 25 | type: double? 26 | inputBinding: 27 | prefix: -best_hit_overhang 28 | best_hit_score_edge: 29 | type: double? 30 | inputBinding: 31 | prefix: -best_hit_score_edge 32 | dust: 33 | type: string? 34 | inputBinding: 35 | prefix: -dust 36 | evalue: 37 | type: double? 38 | inputBinding: 39 | prefix: -evalue 40 | gapextend: 41 | type: int? 42 | inputBinding: 43 | prefix: -gapextend 44 | gapopen: 45 | type: int? 46 | inputBinding: 47 | prefix: -gapopen 48 | penalty: 49 | type: int? 50 | inputBinding: 51 | prefix: -penalty 52 | perc_identity: 53 | type: double? 54 | inputBinding: 55 | prefix: -perc_identity 56 | reward: 57 | type: int? 58 | inputBinding: 59 | prefix: -reward 60 | soft_masking: 61 | type: string? 62 | inputBinding: 63 | prefix: -soft_masking 64 | task: 65 | type: string? 66 | inputBinding: 67 | prefix: -task 68 | template_length: 69 | type: int? 70 | inputBinding: 71 | prefix: -template_length 72 | template_type: 73 | type: string? 74 | inputBinding: 75 | prefix: -template_type 76 | window_size: 77 | type: int? 78 | inputBinding: 79 | prefix: -window_size 80 | word_size: 81 | type: int? 82 | inputBinding: 83 | prefix: -word_size 84 | xdrop_gap: 85 | type: int? 86 | inputBinding: 87 | prefix: -xdrop_gap 88 | no_greedy: 89 | type: boolean? 90 | inputBinding: 91 | prefix: -no_greedy 92 | 93 | outputs: 94 | - id: output 95 | type: File 96 | outputBinding: 97 | glob: "*.output" 98 | -------------------------------------------------------------------------------- /contam_filter/blastn_params.yaml: -------------------------------------------------------------------------------- 1 | query: 2 | class: File 3 | location: pa_test.fa 4 | db: 5 | class: File 6 | location: contam_in_euks.fa 7 | db_aux: 8 | - class: File 9 | location: contam_in_euks.fa.nhr 10 | - class: File 11 | location: contam_in_euks.fa.nin 12 | - class: File 13 | location: contam_in_euks.fa.nsq 14 | outfmt: "6" 15 | best_hit_overhang: 0.1 16 | best_hit_score_edge: 0.1 17 | dust: "yes" 18 | evalue: 1E9 19 | gapextend: 2 20 | gapopen: 4 21 | penalty: -4 22 | perc_identity: 95 23 | reward: 3 24 | soft_masking: "true" 25 | task: "megablast" 26 | template_length: 18 27 | template_type: "coding" 28 | window_size: 120 29 | word_size: 12 30 | xdrop_gap: 20 31 | no_greedy: true 32 | -------------------------------------------------------------------------------- /contam_filter/common_fastas.yaml: -------------------------------------------------------------------------------- 1 | - db: 2 | class: File 3 | location: db/rrna.gz 4 | dbtype: "nucl" 5 | outfmt: "6" 6 | best_hit_overhang: 0.1 7 | best_hit_score_edge: 0.1 8 | dust: "yes" 9 | evalue: 1E9 10 | gapextend: 2 11 | gapopen: 4 12 | penalty: -4 13 | perc_identity: 95 14 | reward: 3 15 | soft_masking: "true" 16 | task: "megablast" 17 | template_length: 18 18 | template_type: "coding" 19 | window_size: 120 20 | word_size: 12 21 | xdrop_gap: 20 22 | no_greedy: true 23 | - db: 24 | class: File 25 | location: mito.nt 26 | dbtype: "nucl" 27 | outfmt: "6" 28 | best_hit_overhang: null 29 | dust: "yes" 30 | soft_masking: "true" 31 | perc_identity: 96.8 32 | 33 | -------------------------------------------------------------------------------- /contam_filter/create_blast_list.cwl: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env cwl-runner 2 | cwlVersion: v1.0 3 | 4 | requirements: 5 | - $import: typedefs.yaml 6 | - class: InlineJavascriptRequirement 7 | 8 | class: ExpressionTool 9 | 10 | inputs: 11 | common: 12 | type: 13 | type: array 14 | items: "typedefs.yaml#BlastnType" 15 | karyot: "typedefs.yaml#KaryotType" 16 | euk: "typedefs.yaml#BlastnType" 17 | prok: "typedefs.yaml#BlastnType" 18 | 19 | outputs: 20 | output: 21 | type: 22 | type: array 23 | items: "typedefs.yaml#BlastnType" 24 | 25 | 26 | expression: | 27 | ${ 28 | var newArray = inputs.common; 29 | if (inputs.karyot == "pro") 30 | newArray.push(inputs.prok) 31 | else 32 | newArray.push(inputs.euk) 33 | return {'output': newArray} 34 | } 35 | -------------------------------------------------------------------------------- /contam_filter/create_db_list.cwl: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env cwl-runner 2 | 3 | cwlVersion: v1.0 4 | 5 | requirements: 6 | - $import: typedefs.yaml 7 | - class: InlineJavascriptRequirement 8 | 9 | class: ExpressionTool 10 | 11 | inputs: 12 | common: 13 | type: 14 | type: array 15 | items: "typedefs.yaml#BlastnType" 16 | euks: 17 | type: 18 | type: array 19 | items: "typedefs.yaml#BlastnType" 20 | proks: 21 | type: 22 | type: array 23 | items: "typedefs.yaml#BlastnType" 24 | 25 | input: 26 | type: 27 | type: array 28 | items: 29 | type: array 30 | items: "typedefs.yaml#BlastnType" 31 | 32 | outputs: 33 | output: 34 | type: 35 | type: array 36 | items: "typedefs.yaml#BlastnType" 37 | 38 | 39 | expression: | 40 | ${ 41 | var myFlatArray = [].concat.apply([], inputs.input); 42 | return {'output': myFlatArray} 43 | } 44 | -------------------------------------------------------------------------------- /contam_filter/decompress.cwl: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env cwl-runner 2 | 3 | cwlVersion: v1.0 4 | class: CommandLineTool 5 | baseCommand: gzip 6 | 7 | requirements: 8 | - class: InlineJavascriptRequirement 9 | 10 | inputs: 11 | input_file: 12 | type: File 13 | inputBinding: 14 | position: 1 15 | prefix: -dcf 16 | 17 | outputs: 18 | output: 19 | type: stdout 20 | 21 | stdout: $((inputs.input_file.basename).replace('.gz','')) 22 | -------------------------------------------------------------------------------- /contam_filter/decompress_params.yaml: -------------------------------------------------------------------------------- 1 | #input_file: 2 | # class: File 3 | # path: db/contam_in_euks.fa.gz 4 | input_file: 5 | class: File 6 | path: db/contam_in_prok.fa 7 | -------------------------------------------------------------------------------- /contam_filter/default_fastas.yaml: -------------------------------------------------------------------------------- 1 | common: 2 | - db: rrna 3 | dbtype: "nucl" 4 | outfmt: "6" 5 | best_hit_overhang: 0.1 6 | best_hit_score_edge: 0.1 7 | dust: "yes" 8 | evalue: 1E9 9 | gapextend: 2 10 | gapopen: 4 11 | penalty: -4 12 | perc_identity: 95 13 | reward: 3 14 | soft_masking: "true" 15 | task: "megablast" 16 | template_length: 18 17 | template_type: "coding" 18 | window_size: 120 19 | word_size: 12 20 | xdrop_gap: 20 21 | no_greedy: true 22 | - db: mito.nt 23 | dbtype: "nucl" 24 | outfmt: "6" 25 | best_hit_overhang: null 26 | dust: "yes" 27 | soft_masking: "true" 28 | perc_identity: 96.8 29 | prok: 30 | db: contam_in_prok.fa 31 | dbtype: "nucl" 32 | outfmt: "6" 33 | best_hit_overhang: 0.1 34 | best_hit_score_edge: 0.1 35 | dust: "yes" 36 | evalue: 1E9 37 | gapextend: 2 38 | gapopen: 4 39 | penalty: -4 40 | perc_identity: 95 41 | reward: 3 42 | soft_masking: "true" 43 | task: "megablast" 44 | template_length: 18 45 | template_type: "coding" 46 | window_size: 120 47 | word_size: 12 48 | xdrop_gap: 20 49 | no_greedy: true 50 | euk: 51 | db: contam_in_euks.fa 52 | dbtype: "nucl" 53 | outfmt: "6" 54 | best_hit_overhang: 0.1 55 | best_hit_score_edge: 0.1 56 | dust: "yes" 57 | evalue: 1E9 58 | gapextend: 2 59 | gapopen: 4 60 | penalty: -4 61 | perc_identity: 95 62 | reward: 3 63 | soft_masking: "true" 64 | task: "megablast" 65 | template_length: 18 66 | template_type: "coding" 67 | window_size: 120 68 | word_size: 12 69 | xdrop_gap: 20 70 | no_greedy: true 71 | prok_adapt: 72 | db: adaptors_for_screening_proks.fa 73 | dbtype: "nucl" 74 | euk_adapt: 75 | db: adaptors_for_screening_euks.fa 76 | dbtype: "nucl" 77 | -------------------------------------------------------------------------------- /contam_filter/euk_fasta.yaml: -------------------------------------------------------------------------------- 1 | db: 2 | class: File 3 | location: db/contam_in_euks.fa.gz 4 | dbtype: "nucl" 5 | outfmt: "6" 6 | best_hit_overhang: 0.1 7 | best_hit_score_edge: 0.1 8 | dust: "yes" 9 | evalue: 1E9 10 | gapextend: 2 11 | gapopen: 4 12 | penalty: -4 13 | perc_identity: 95 14 | reward: 3 15 | soft_masking: "true" 16 | task: "megablast" 17 | template_length: 18 18 | template_type: "coding" 19 | window_size: 120 20 | word_size: 12 21 | xdrop_gap: 20 22 | no_greedy: true 23 | -------------------------------------------------------------------------------- /contam_filter/euk_params.yaml: -------------------------------------------------------------------------------- 1 | cell_type_fasta: 2 | - db: 3 | class: File 4 | location: db/contam_in_euks.fa.gz 5 | dbtype: "nucl" 6 | outfmt: "6" 7 | best_hit_overhang: 0.1 8 | best_hit_score_edge: 0.1 9 | dust: "yes" 10 | evalue: 1E9 11 | gapextend: 2 12 | gapopen: 4 13 | penalty: -4 14 | perc_identity: 95 15 | reward: 3 16 | soft_masking: "true" 17 | task: "megablast" 18 | template_length: 18 19 | template_type: "coding" 20 | window_size: 120 21 | word_size: 12 22 | xdrop_gap: 20 23 | no_greedy: true 24 | -------------------------------------------------------------------------------- /contam_filter/flatten.cwl: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env cwl-runner 2 | 3 | cwlVersion: v1.0 4 | 5 | requirements: 6 | - $import: typedefs.yaml 7 | - class: InlineJavascriptRequirement 8 | 9 | class: ExpressionTool 10 | 11 | inputs: 12 | input: 13 | type: 14 | type: array 15 | items: 16 | type: array 17 | items: "typedefs.yaml#BlastnType" 18 | 19 | outputs: 20 | output: 21 | type: 22 | type: array 23 | items: "typedefs.yaml#BlastnType" 24 | 25 | 26 | expression: | 27 | ${ 28 | var myFlatArray = [].concat.apply([], inputs.input); 29 | return {'output': myFlatArray} 30 | } 31 | -------------------------------------------------------------------------------- /contam_filter/flatten_params.yaml: -------------------------------------------------------------------------------- 1 | list_one: 2 | &id001 3 | - a 4 | - b 5 | - c 6 | 7 | list_two: &id002 8 | - e 9 | - f 10 | - g 11 | 12 | list_three: &id003 13 | - h 14 | - i 15 | - j 16 | 17 | input: 18 | - *id001 19 | - *id002 20 | - *id003 21 | -------------------------------------------------------------------------------- /contam_filter/makeblastdb.cwl: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env cwl-runner 2 | 3 | cwlVersion: v1.0 4 | class: CommandLineTool 5 | baseCommand: makeblastdb 6 | # makeblastdb [-h] [-help] [-in input_file] [-input_type type] 7 | # -dbtype molecule_type [-title database_title] [-parse_seqids] 8 | # [-hash_index] [-mask_data mask_data_files] [-mask_id mask_algo_ids] 9 | # [-mask_desc mask_algo_descriptions] [-gi_mask] 10 | # [-gi_mask_name gi_based_mask_names] [-out database_name] 11 | # [-max_file_sz number_of_bytes] [-logfile File_Name] [-taxid TaxID] 12 | # [-taxid_map TaxIDMapFile] [-version] 13 | 14 | requirements: 15 | - class: InitialWorkDirRequirement 16 | listing: 17 | - $(inputs.input_file) 18 | 19 | hints: 20 | DockerRequirement: 21 | dockerPull: ncbi/blast 22 | 23 | inputs: 24 | input_file: 25 | type: File 26 | inputBinding: 27 | position: 1 28 | prefix: -in 29 | input_type: 30 | type: string? 31 | inputBinding: 32 | position: 2 33 | prefix: -input_type 34 | dbtype: 35 | type: string 36 | inputBinding: 37 | position: 3 38 | prefix: -dbtype 39 | title: 40 | type: string? 41 | inputBinding: 42 | position: 4 43 | prefix: -title 44 | outdb: 45 | type: string? 46 | #default: "blastdb" 47 | inputBinding: 48 | position: 5 49 | prefix: -out 50 | 51 | outputs: 52 | output: 53 | type: 54 | type: array 55 | items: File 56 | outputBinding: 57 | glob: "$(inputs.input_file.basename).*" 58 | 59 | 60 | 61 | -------------------------------------------------------------------------------- /contam_filter/makeblastdb_params.yaml: -------------------------------------------------------------------------------- 1 | input_file: 2 | class: File 3 | location: contam_in_euks.fa 4 | #title: "This is the title." 5 | dbtype: nucl 6 | #outdb: george 7 | -------------------------------------------------------------------------------- /contam_filter/pa_test.fa.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ncbi/pipelines/7a5fae087e42ec7d2bfdf3f88ba2ea1e8fdc9ddf/contam_filter/pa_test.fa.gz -------------------------------------------------------------------------------- /contam_filter/params.yaml: -------------------------------------------------------------------------------- 1 | common_fastas: 2 | - db: 3 | class: File 4 | location: db/rrna.gz 5 | dbtype: "nucl" 6 | outfmt: "6" 7 | best_hit_overhang: 0.1 8 | best_hit_score_edge: 0.1 9 | dust: "yes" 10 | evalue: 1E9 11 | gapextend: 2 12 | gapopen: 4 13 | penalty: -4 14 | perc_identity: 95 15 | reward: 3 16 | soft_masking: "true" 17 | task: "megablast" 18 | template_length: 18 19 | template_type: "coding" 20 | window_size: 120 21 | word_size: 12 22 | xdrop_gap: 20 23 | no_greedy: true 24 | - db: 25 | class: File 26 | location: mito.nt 27 | dbtype: "nucl" 28 | outfmt: "6" 29 | best_hit_overhang: null 30 | dust: "yes" 31 | soft_masking: "true" 32 | perc_identity: 96.8 33 | query: 34 | class: File 35 | location: pa_test.fa 36 | -------------------------------------------------------------------------------- /contam_filter/prok_fasta.yaml: -------------------------------------------------------------------------------- 1 | db: 2 | class: File 3 | location: db/contam_in_prok.fa 4 | dbtype: "nucl" 5 | outfmt: "6" 6 | best_hit_overhang: 0.1 7 | best_hit_score_edge: 0.1 8 | dust: "yes" 9 | evalue: 1E9 10 | gapextend: 2 11 | gapopen: 4 12 | penalty: -4 13 | perc_identity: 95 14 | reward: 3 15 | soft_masking: "true" 16 | task: "megablast" 17 | template_length: 18 18 | template_type: "coding" 19 | window_size: 120 20 | word_size: 12 21 | xdrop_gap: 20 22 | no_greedy: true 23 | -------------------------------------------------------------------------------- /contam_filter/prok_params.yaml: -------------------------------------------------------------------------------- 1 | cell_type_fasta: 2 | - db: 3 | class: File 4 | location: db/contam_in_prok.fa 5 | dbtype: "nucl" 6 | outfmt: "6" 7 | best_hit_overhang: 0.1 8 | best_hit_score_edge: 0.1 9 | dust: "yes" 10 | evalue: 1E9 11 | gapextend: 2 12 | gapopen: 4 13 | penalty: -4 14 | perc_identity: 95 15 | reward: 3 16 | soft_masking: "true" 17 | task: "megablast" 18 | template_length: 18 19 | template_type: "coding" 20 | window_size: 120 21 | word_size: 12 22 | xdrop_gap: 20 23 | no_greedy: true 24 | -------------------------------------------------------------------------------- /contam_filter/typedefs.yaml: -------------------------------------------------------------------------------- 1 | class: SchemaDefRequirement 2 | types: 3 | - name: BlastnType 4 | type: record 5 | fields: 6 | - name: db 7 | type: string 8 | - name: dbtype 9 | type: string 10 | - name: outfmt 11 | type: string? 12 | - name: best_hit_overhang 13 | type: double? 14 | - name: best_hit_score_edge 15 | type: double? 16 | - name: dust 17 | type: string? 18 | - name: evalue 19 | type: double? 20 | - name: gapextend 21 | type: int? 22 | - name: gapopen 23 | type: int? 24 | - name: penalty 25 | type: int? 26 | - name: perc_identity 27 | type: double? 28 | - name: reward 29 | type: int? 30 | - name: soft_masking 31 | type: string? 32 | - name: task 33 | type: string? 34 | - name: template_length 35 | type: int? 36 | - name: template_type 37 | type: string? 38 | - name: window_size 39 | type: int? 40 | - name: word_size 41 | type: int? 42 | - name: xdrop_gap 43 | type: int? 44 | - name: no_greedy 45 | type: boolean? 46 | - name: KaryotType 47 | type: enum 48 | symbols: [pro, eu] 49 | - name: SettingsType 50 | type: record 51 | fields: 52 | - name: common 53 | type: 54 | type: array 55 | items: BlastnType 56 | - name: prok 57 | type: BlastnType 58 | - name: euk 59 | type: BlastnType 60 | - name: prok_adapt 61 | type: BlastnType 62 | - name: euk_adapt 63 | type: BlastnType 64 | -------------------------------------------------------------------------------- /contam_filter/vecscreen.cwl: -------------------------------------------------------------------------------- 1 | cwlVersion: v1.0 2 | class: CommandLineTool 3 | hints: 4 | DockerRequirement: 5 | dockerPull: ncbi/blast_contamfilter 6 | 7 | baseCommand: vecscreen 8 | 9 | inputs: 10 | query: 11 | type: File 12 | inputBinding: 13 | prefix: -i 14 | db: 15 | type: string 16 | inputBinding: 17 | prefix: -d 18 | outfmt: 19 | type: string? 20 | default: "3" 21 | inputBinding: 22 | prefix: -f 23 | outfile: 24 | type: string? 25 | default: "vecscreen.out" 26 | inputBinding: 27 | prefix: -o 28 | 29 | outputs: 30 | - id: output 31 | type: File 32 | outputBinding: 33 | glob: $(inputs.outfile) 34 | -------------------------------------------------------------------------------- /contam_filter/wf_blastit.cwl: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env cwl-runner 2 | 3 | cwlVersion: v1.0 4 | class: Workflow 5 | 6 | requirements: 7 | - $import: typedefs.yaml 8 | - class: StepInputExpressionRequirement 9 | 10 | inputs: 11 | dbfasta: "typedefs.yaml#BlastnType" 12 | query: File 13 | 14 | outputs: 15 | output: 16 | type: File 17 | outputSource: blastn/output 18 | 19 | steps: 20 | # decompress: 21 | # run: decompress.cwl 22 | # in: 23 | # input_file: 24 | # source: dbfasta 25 | # valueFrom: $(self.db) 26 | # out: [output] 27 | 28 | # makeblastdb: 29 | # run: makeblastdb.cwl 30 | # in: 31 | # input_file: decompress/output 32 | # dbtype: 33 | # source: dbfasta 34 | # valueFrom: $(self.dbtype) 35 | # out: [output] 36 | 37 | blastn: 38 | run: blastn.cwl 39 | in: 40 | query: query 41 | db: 42 | source: dbfasta 43 | valueFrom: $(self.db) 44 | # db: decompress/output 45 | # db_aux: makeblastdb/output 46 | outfmt: 47 | source: dbfasta 48 | valueFrom: $(self.outfmt) 49 | best_hit_overhang: 50 | source: dbfasta 51 | valueFrom: $(self.best_hit_overhang) 52 | best_hit_score_edge: 53 | source: dbfasta 54 | valueFrom: $(self.best_hit_score_edge) 55 | dust: 56 | source: dbfasta 57 | valueFrom: $(self.dust) 58 | evalue: 59 | source: dbfasta 60 | valueFrom: $(self.evalue) 61 | gapextend: 62 | source: dbfasta 63 | valueFrom: $(self.gapextend) 64 | gapopen: 65 | source: dbfasta 66 | valueFrom: $(self.gapopen) 67 | penalty: 68 | source: dbfasta 69 | valueFrom: $(self.penalty) 70 | perc_identity: 71 | source: dbfasta 72 | valueFrom: $(self.perc_identity) 73 | reward: 74 | source: dbfasta 75 | valueFrom: $(self.reward) 76 | soft_masking: 77 | source: dbfasta 78 | valueFrom: $(self.soft_masking) 79 | task: 80 | source: dbfasta 81 | valueFrom: $(self.task) 82 | template_length: 83 | source: dbfasta 84 | valueFrom: $(self.template_length) 85 | template_type: 86 | source: dbfasta 87 | valueFrom: $(self.template_type) 88 | window_size: 89 | source: dbfasta 90 | valueFrom: $(self.window_size) 91 | word_size: 92 | source: dbfasta 93 | valueFrom: $(self.word_size) 94 | xdrop_gap: 95 | source: dbfasta 96 | valueFrom: $(self.xdrop_gap) 97 | no_greedy: 98 | source: dbfasta 99 | valueFrom: $(self.no_greedy) 100 | out: [output] 101 | 102 | 103 | -------------------------------------------------------------------------------- /contam_filter/wf_blastit_params.yaml: -------------------------------------------------------------------------------- 1 | #arguments: ["-db", "/home/ubuntu/contam-dbs/contam_in_euks/contam_in_euks.fa", "-outfmt", "6", "-best_hit_overhang", "0.1", "-best_hit_score_edge", "0.1", "-dust", "yes", "-evalue", "1E-9", "-gapextend", "2", "-gapopen", "4", "-penalty", "-4", "-perc_identity", "95", "-reward", "3", "-soft_masking", "true", "-task", "megablast", "-template_length", "18", "-template_type", "coding", "-window_size", "120", "-word_size", "12", "-xdrop_gap", "20", "-no_greedy"] 2 | #arguments: ["-db", "/home/ubuntu/contam-dbs/mito/mito.nt", "-dust", "yes", "-soft_masking", "true", "-perc_identity", "98.6", "-outfmt", "6"] 3 | dbfasta: 4 | db: 5 | class: File 6 | location: db/contam_in_euks.fa.gz 7 | dbtype: "nucl" 8 | outfmt: "6" 9 | best_hit_overhang: 0.1 10 | best_hit_score_edge: 0.1 11 | dust: "yes" 12 | evalue: 1E9 13 | gapextend: 2 14 | gapopen: 4 15 | penalty: -4 16 | perc_identity: 95 17 | reward: 3 18 | soft_masking: "true" 19 | task: "megablast" 20 | template_length: 18 21 | template_type: "coding" 22 | window_size: 120 23 | word_size: 12 24 | xdrop_gap: 20 25 | no_greedy: true 26 | query: 27 | class: File 28 | location: pa_test.fa 29 | -------------------------------------------------------------------------------- /contam_filter/wf_contam_detect.cwl: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env cwl-runner 2 | 3 | cwlVersion: v1.0 4 | class: Workflow 5 | 6 | requirements: 7 | - $import: typedefs.yaml 8 | - class: SubworkflowFeatureRequirement 9 | - class: ScatterFeatureRequirement 10 | - class: MultipleInputFeatureRequirement 11 | - class: StepInputExpressionRequirement 12 | 13 | inputs: 14 | query: File 15 | karyot_type: "typedefs.yaml#KaryotType" 16 | default_fastas: "typedefs.yaml#SettingsType" 17 | 18 | 19 | outputs: 20 | result: 21 | type: File 22 | outputSource: tar_output/output 23 | 24 | steps: 25 | make_list: 26 | run: create_blast_list.cwl 27 | in: 28 | common: 29 | source: default_fastas 30 | valueFrom: $(self.common) 31 | karyot: karyot_type 32 | euk: 33 | source: default_fastas 34 | valueFrom: $(self.euk) 35 | prok: 36 | source: default_fastas 37 | valueFrom: $(self.prok) 38 | out: 39 | [output] 40 | 41 | blastit: 42 | run: wf_blastit.cwl 43 | scatter: dbfasta 44 | in: 45 | dbfasta: make_list/output 46 | query: query 47 | out: [output] 48 | 49 | vecscreen: 50 | run: wf_vecscreen.cwl 51 | in: 52 | karyot_type: karyot_type 53 | euk_adapt: 54 | source: default_fastas 55 | valueFrom: $(self.euk_adapt) 56 | prok_adapt: 57 | source: default_fastas 58 | valueFrom: $(self.prok_adapt) 59 | query: query 60 | out: [output] 61 | 62 | tar_output: 63 | in: 64 | blastin: blastit/output 65 | vecin: vecscreen/output 66 | out: [output] 67 | run: 68 | class: CommandLineTool 69 | baseCommand: [tar, chf, outputs.tar, '--transform=s/.*\///g'] 70 | inputs: 71 | blastin: 72 | type: 73 | type: array 74 | items: File 75 | inputBinding: 76 | position: 1 77 | vecin: 78 | type: File 79 | inputBinding: 80 | position: 2 81 | outputs: 82 | output: 83 | type: File 84 | outputBinding: 85 | glob: outputs.tar 86 | 87 | -------------------------------------------------------------------------------- /contam_filter/wf_contam_detect_params.yaml: -------------------------------------------------------------------------------- 1 | default_fastas: 2 | $import: "default_fastas.yaml" 3 | query: 4 | class: File 5 | location: pa_test.fa 6 | karyot_type: eu 7 | -------------------------------------------------------------------------------- /contam_filter/wf_vecscreen.cwl: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env cwl-runner 2 | 3 | cwlVersion: v1.0 4 | class: Workflow 5 | 6 | requirements: 7 | - $import: typedefs.yaml 8 | - class: StepInputExpressionRequirement 9 | - class: InlineJavascriptRequirement 10 | 11 | inputs: 12 | karyot_type: "typedefs.yaml#KaryotType" 13 | query: File 14 | euk_adapt: "typedefs.yaml#BlastnType" 15 | prok_adapt: "typedefs.yaml#BlastnType" 16 | 17 | outputs: 18 | output: 19 | type: File 20 | outputSource: filter/output 21 | 22 | steps: 23 | choose_adapt: 24 | in: 25 | karyot: karyot_type 26 | euk: euk_adapt 27 | prok: prok_adapt 28 | out: [output] 29 | run: 30 | class: ExpressionTool 31 | inputs: 32 | karyot: "typedefs.yaml#KaryotType" 33 | euk: "typedefs.yaml#BlastnType" 34 | prok: "typedefs.yaml#BlastnType" 35 | outputs: 36 | output: "typedefs.yaml#BlastnType" 37 | expression: | 38 | ${ 39 | if (inputs.karyot == "pro") return {'output': inputs.prok} 40 | return {'output': inputs.euk} 41 | } 42 | 43 | vecscreen: 44 | in: 45 | query: query 46 | db: 47 | source: choose_adapt/output 48 | valueFrom: $(self.db) 49 | #db: 50 | # source: dbfasta 51 | # valueFrom: $(self.db) 52 | #db: decompress/output 53 | #db_aux: makeblastdb/output 54 | out: [output] 55 | run: vecscreen.cwl 56 | 57 | filter: 58 | in: 59 | infile: vecscreen/output 60 | out: [output] 61 | run: 62 | class: CommandLineTool 63 | hints: 64 | DockerRequirement: 65 | dockerPull: ncbi/blast_contamfilter 66 | baseCommand: VSlistTo1HitPerLine.awk 67 | stdout: vecscreen.output 68 | inputs: 69 | infile: 70 | type: File 71 | inputBinding: 72 | position: 1 73 | outputs: 74 | output: 75 | type: stdout 76 | 77 | 78 | -------------------------------------------------------------------------------- /contam_filter/wf_vecscreen_params.cwl: -------------------------------------------------------------------------------- 1 | dbfasta: 2 | db: adaptors_for_screening_euks.fa 3 | dbtype: "nucl" 4 | query: 5 | class: File 6 | location: pa_test.fa 7 | --------------------------------------------------------------------------------