├── .gitignore
├── CHANGELOG.md
├── LICENSE
├── README.md
├── benchmarking
├── benchmark-Bmallei-fasta-all.sh
├── benchmark-Bmallei-fastq-snp.sh
├── benchmark-Bmallei-fastq-sv.sh
├── benchmark-bwa-vs-mm-all.sh
├── benchmark-cov-all.sh
├── benchmark-cov-snp.sh
├── benchmark-cov-sv.sh
├── benchmark-numvariants-snp.sh
├── benchmark-numvariants-sv.sh
├── benchmark-real_data-Bush-fasta-snp.sh
├── benchmark-real_data-Bush-fastq-snp.sh
├── benchmark-real_data-HG002-sv.sh
├── benchmark-sim_data-Bush-Ecoli-divergent-snp.sh
├── benchmark-sim_data-Bush-Ecoli-same-snp.sh
├── benchmark-sim_data-PR-Bmallei-all.sh
├── benchmark-sim_data-PR-Ecoli-all.sh
├── benchmark-sim_data-PR-Lacidophilus-all.sh
├── benchmark-threads-all.sh
├── benchmark-threads-snp.sh
├── benchmark-threads-sv.sh
├── parameter_snp.txt
├── parameter_sv.txt
├── parameter_sv_bmallei.txt
├── parameter_sv_ecoli.txt
└── parameter_sv_lacidophilus.txt
├── build.sh
├── res
└── logo.png
├── setup.py
├── spec-file.txt
├── testdata
├── testdata_mut.fasta
├── testdata_ref.fasta
├── testdata_snp.vcf
└── testdata_sv.vcf
├── variantdetective.py
└── variantdetective
├── __init__.py
├── combine_variants.py
├── fragment_lengths.py
├── main.py
├── simulate.py
├── simulate_tools.py
├── snp_indel.py
├── structural_variant.py
├── tools.py
├── validate_inputs.py
└── version.py
/.gitignore:
--------------------------------------------------------------------------------
1 | __pycache__/
2 | recipe/
3 | variantdetective.egg-info/
4 |
--------------------------------------------------------------------------------
/CHANGELOG.md:
--------------------------------------------------------------------------------
1 | # Changelog
2 |
3 | All notable changes to this project will be documented in this file.
4 |
5 | ## [1.0.1] - 2024-03-27
6 |
7 | ### Added
8 |
9 | - Created a changelog file to document project changes.
10 |
11 | ### Fixed
12 | - Fixed issue #9 to allow running the pipeline in a folder that already exists and overwrite the results, enhancing usability and automation capability.
13 |
14 | ## [1.0.0] - 2024-01-16
15 |
16 | ### Added
17 |
18 | The first release of VariantDetective. This version of the tool is represented in the manuscript found here: https://academic.oup.com/bioinformatics/article/40/2/btae066/7609103
19 |
20 |
21 | [1.0.1]: https://github.com/OLF-Bioinformatics/VariantDetective/compare/v1.0.0...v1.0.1
22 | [1.0.0]: https://github.com/OLF-Bioinformatics/VariantDetective/releases/tag/v1.0.0
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2022 Government of Canada
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |

2 |
3 | This program is designed to identify short variants and structural variants. Variants can be identified from genomic sequences (FASTA) or from combinations of short and/or long reads (FASTQ). If genomic sequences are provided as input, long reads will be simulated to detect variants.
4 |
5 | This tool makes use of other open-source variant callers and creates consensus sets in order to validate a variant. Summary files for short variants and structural variants are generated outlining the different types of variants found in the sample.
6 |
7 | ## Author
8 |
9 | Phil Charron \<\>
10 |
11 | ## Table of Contents
12 | - [Installation](#installation)
13 | - [Installation from Source](#installation-from-source)
14 | - [Conda Installation](#conda-installation)
15 | - [Quick Usage](#quick-usage)
16 | - [List of Commands](#list-of-commands)
17 | - [Variant Callers](#variant-callers)
18 | - [Short Variant Callers](#short-variant-callers)
19 | - [Structural Variant Callers](#structural-variant-callers)
20 | - [Long Read Simulator](#long-read-simulator)
21 | - [Parameters](#parameters)
22 | - [Outputs](#outputs)
23 | - [Output files - snp_indel directory](#output-files---snp_indel-directory)
24 | - [Output files - structural_variant directory](#output-files---structural_variant-directory)
25 | - [Reporting Issues](#reporting-issues)
26 | - [Citing VariantDetective](#citing-variantdetective)
27 |
28 |
29 | ## Installation
30 |
31 | All software and tools used by VariantDetective can be found in the [spec-file.txt](spec-file.txt). VariantDetective can be installed via pip after creating the conda environment to support it or via conda.
32 |
33 | ### Installation from Source
34 | VariantDetective can be installed from source using the following method.
35 | ```
36 | # Download VariantDetective repository
37 | git clone https://github.com/OLF-Bioinformatics/VariantDetective.git
38 | cd VariantDetective
39 | # Create conda variant for tools
40 | conda create -n variantdetective -y && conda activate variantdetective
41 | # Install specific versions of tools
42 | conda install -n variantdetective --file spec-file.txt
43 | # Install VariantDetective
44 | pip install -e .
45 | ```
46 |
47 | ### Conda Installation
48 | ```
49 | conda create -n vd -y
50 | conda activate vd
51 | conda install -c bioconda -c conda-forge -c charronp variantdetective
52 | ```
53 |
54 | ## Test Data
55 | After successfully installing VariantDetective, you can verify its functionality
56 | running test data. This step ensures that the isntallation has been completed
57 | correctly and the project is functioning as expected.
58 |
59 | From within the VariantDetective directory, the test can be run using the following command:
60 |
61 | ```
62 | variantdetective all_variants -r testdata/testdata_ref.fasta -g testdata/testdata_mut.fasta -o testdata/test
63 | ```
64 |
65 | Once the tool is done, check the output. Compare the `testdata/test/snp_indel/snp_final.vcf` file with `testdata/testdata_snp.vcf`
66 | and the `testdata/test/structural_variant/combined_sv.vcf` file with `testdata/testdata_sv.vcf`.
67 |
68 | ## Quick Usage
69 |
70 | **Find snps/indels and structural variants from an assembled genome (FASTA)**
71 |
72 | ```
73 | variantdetective all_variants -r REFERENCE.fasta -g SAMPLE.fasta
74 | ```
75 |
76 | **Find snps/indels and structural variants from raw reads (FASTQ)**
77 |
78 | ```
79 | variantdetective all_variants -r REFERENCE.fasta -1 SHORT_READ_1.fastq -2 SHORT_READ_2.fastq -l LONG_READ.fastq
80 | ```
81 |
82 | **Find snps/indels from an assembled genome (FASTA)**
83 |
84 | ```
85 | variantdetective snp_indel -r REFERENCE.fasta -g SAMPLE.fasta
86 | ```
87 |
88 | **Find snps/indels from raw reads (FASTQ)**
89 |
90 | ```
91 | variantdetective snp_indel -r REFERENCE.fasta -1 SHORT_READ_1.fastq -2 SHORT_READ_2.fastq
92 | ```
93 |
94 | **Find structural variants from an assembled genome (FASTA)**
95 |
96 | ```
97 | variantdetective structural_variant -r REFERENCE.fasta -g SAMPLE.fasta
98 | ```
99 |
100 | **Find structural variants from raw reads (FASTQ)**
101 |
102 | ```
103 | variantdetective structural_variant -r REFERENCE.fasta -l LONG_READ.fastq
104 | ```
105 |
106 | **Combine SNP VCF files predicted from other tools and get consensus set of minimum 2 callers**
107 |
108 | ```
109 | variantdetective combine_variants --snp_vcf TOOL1.vcf TOOL2.vcf TOOL3.VCF --snp_consensus 2
110 | ```
111 |
112 | **Combine SV VCF files predicted from other tools and get consensus set of minimum 2 callers**
113 |
114 | ```
115 | variantdetective combine_variants --sv_vcf TOOL1.vcf TOOL2.vcf TOOL3.VCF --sv_consensus 2
116 | ```
117 |
118 | ## List of Commands
119 | | Command | Description |
120 | | --- | --- |
121 | | `variantdetective all_variants` | Identify structural variants (SV) from long reads (FASTQ) and SNPs/indels from short reads (FASTQ), or both types of variants from genome sequence (FASTA). If genome sequence (FASTA) is provided, reads will be simulated to predict SV, SNPs and indels. |
122 | | `variantdetective structural_variant` | Identify structural variants (SV) from long reads (FASTQ) or genome sequence (FASTA). If genome sequence (FASTA) is provided, reads will be simulated to predict SVs. |
123 | | `variantdetective snp_indel` | Identify SNPs/indels from short reads (FASTQ) or genome sequence (FASTA). If genome sequence (FASTA) is provided instead, reads will be simulated to predict SNPs and indels. |
124 | | `variantdetective combine_variants` | Combine SNPs/indels VCF files or SV VCF files predicted from other tools. |
125 |
126 | ## Variant Callers
127 | VariantDetective makes use of published open-source variant callers and creates consensus sets in order to validate a variant.
128 |
129 | ### Short Variant Callers
130 | - [Clair3](https://github.com/HKU-BAL/Clair3)
131 | - [Freebayes](https://github.com/freebayes/freebayes)
132 | - [GATK4 HaplotypeCaller](https://github.com/broadinstitute/gatk)
133 |
134 | Intersections of VCF files are created using the [VCFtools](https://github.com/vcftools) `vcf-isec` tool. The final VCF output consensus file containing variants found in at least 2 variant callers (default) is created using the [BCFtools](https://github.com/samtools/bcftools) `concat` tool.
135 |
136 | ### Structural Variant Callers
137 | - [cuteSV](https://github.com/tjiangHIT/cuteSV)
138 | - [NanoSV](https://github.com/mroosmalen/nanosv)
139 | - [NanoVar](https://github.com/cytham/nanovar)
140 | - [SVIM](https://github.com/eldariont/svim)
141 |
142 | The consensus VCF file is created using the [SURVIVOR](https://github.com/fritzsedlazeck/SURVIVOR) `merge` tool. Parameters for merging structural variants are a maximum allowed distance of 1 kbp between breakpoints and calls supported by at least 3 variant callers (default) where they agree on both type and strand.
143 |
144 | ## Long Read Simulator
145 | When a genomic FASTA file is provided as query input, long reads are simulated in order to detect variants. The long read simulation tool is adapted from [Badread](https://github.com/rrwick/Badread), a tool that creates simulated reads. It has been modified to create perfectly matching reads to the reference file and to allow multi-threading to speed up the process.
146 |
147 | ## Parameters
148 | All input files can be uncompressed (.fasta/.fastq) or gzipped (.fastq.gz/.fastq.gz)
149 |
150 | | Options | Available Command | Description | Default |
151 | | --- | :---: | --- | :---: |
152 | | `-r FASTA` | `all_variants`
`structural_variant`
`snp_indel` | Path to reference genome in FASTA. Required | - |
153 | | `-g FASTA` | `all_variants`
`structural_variant`
`snp_indel` | Path to query genomic FASTA file. Can't be combined with `-1`, `-2` or `-l`| - |
154 | | `-1 FASTQ`
`--short1 FASTQ` | `all_variants`
`snp_indel` | Path to pair 1 of short reads FASTQ file. Must always be combined with `-2`. If running `all_variants`, must be combined with `-l`| - |
155 | | `-2 FASTQ`
`--short2 FASTQ` | `all_variants`
`snp_indel` | Path to pair 2 of short reads FASTQ file. Must always be combined with `-1`. If running `all_variants`, must be combined with `-l`| - |
156 | | `-l FASTQ`
`--long FASTQ` | `all_variants`
`structural_variant` | Path to long reads FASTQ file. If running `all_variants`, must be combined with `-1` and `-2`| - |
157 | | `--readcov READCOV` | `all_variants`
`structural_variant`
`snp_indel` | If using `-g` as input, define the absolute amount of simulated reads (e.g. 250M) or relative simulated read depth (e.g. 50x) | `50x` |
158 | | `--readlen MEAN,STDEV` | `all_variants`
`structural_variant`
`snp_indel` | If using `-g` as input, define the mean length and standard deviation of simulated reads | `15000,13000` |
159 | | `--mincov_snp MINCOV_SNP` | `all_variants`
`snp_indel` | Minimum number of reads required to call SNP/Indel | `2` |
160 | | `--minqual_snp MINQUAL_SNP` | `all_variants`
`snp_indel` | Minimum quality of SNP/Indel to be filtered out | `20` |
161 | | `--assembler {bwa,minimap2}` | `all_variants`
`snp_indel` | Choose which assembler (bwa or minimap2) to use when using paired-end short reads | `bwa` |
162 | | `--snp_consensus SNP_CONSENSUS` | `all_variants`
`snp_indel` | Specifies the minimum number of tools required to detect an SNP or Indel to include it in the consensus list | `2` |
163 | | `--mincov_sv MINCOV_SV` | `all_variants`
`structural_variant` | Minimum number of reads required to call SV | `2` |
164 | | `--minlen_sv MINLEN_SV` | `all_variants`
`structural_variant` | Minimum length of SV to be detected | `25` |
165 | | `--minqual_sv MINQUAL_SV` | `all_variants`
`structural_variant` | Minimum quality of SV to be filtered out from SVIM | `15` |
166 | | `--sv_consensus SV_CONSENSUS` | `all_variants`
`structural_variant` | Specifies the minimum number of tools required to detect an SV to include it in the consensus list | `3` |
167 | | `-o OUT`
`--out OUT` | `all_variants`
`structural_variant`
`snp_indel` | Output directory. Will be created if it does not exist | `./` |
168 | | `-t THREADS`
`--threads THREADS` | `all_variants`
`structural_variant`
`snp_indel` | Number of threads used for job | `1` |
169 | | `-h`
`--help` | `all_variants`
`structural_variant`
`snp_indel` | Show help message and exit | - |
170 | | `-v`
`--version` | `all_variants`
`structural_variant`
`snp_indel`| Show program version number and exit | - |
171 |
172 | ## Outputs
173 |
174 | All input files will be copied to the output folder. Within the output folder, directories containing the `structural_variant` and `snp_indel` results will be created.
175 |
176 | ### Output files - `snp_indel` directory
177 |
178 | | Output | Description |
179 | |---:|---|
180 | | `snp_final.vcf` | Variants that were found in at least 2 variant callers in VCF format |
181 | | `snp_final.csv` | Variants that were found in at least 2 variant callers in CSV format |
182 | | `snp_final.tab` | Variants that were found in at least 2 variant callers in TSV format |
183 | | `snp_final_summary.txt` | Summary of different short variant types found in snp_final files |
184 | | `freebayes.haplotypecaller.clair3.vcf.gz` | Variants in common between all variants callers |
185 | | `freebayes.clair3.vcf.gz` | Variants in common between Freebayes and Clair3 |
186 | | `freebayes.haplotypecaller.vcf.gz` | Variants in common between Freebayes and HaplotypeCaller |
187 | | `haplotypecaller.unique.vcf.gz` | Variants in common between HaplotypeCaller and Clair3 |
188 | | `clair3.unique.vcf.gz` | Variants only found by Clair3 |
189 | | `freebayes.unique.vcf.gz` | Variants only found by Freebayes |
190 | | `haplotypecaller.unique.vcf.gz` | Variants only found by HaplotypeCaller |
191 | | `alignment.mm.rg.sorted.bam` | Alignment in BAM format |
192 | | `alignment.mm.rg.sorted.bam.bai` | Index file of alignments |
193 | | `clair3/` | Directory containing files related to Clair3 variant calling |
194 | | `freebayes/` | Directory containing files related to Freebayes variant calling |
195 | | `haplotypecaller/` | Directory containing files related to HaplotypeCaller variant calling |
196 |
197 | ### Output files - `structural_variant` directory
198 |
199 | | Output | Description |
200 | |---:|---|
201 | | `combined_sv.vcf` | Variants that were found in at least 2 variant callers in VCF format |
202 | | `combined_sv.csv` | Variants that were found in at least 2 variant callers in CSV format |
203 | | `combined_sv.tab` | Variants that were found in at least 2 variant callers in TSV format |
204 | | `combined_sv_summary.txt` | Summary of different structural variant types found in combined_sv files |
205 | | `alignment.mm.sorted.bam` | Alignment in BAM format |
206 | | `alignment.mm.sorted.bam.bai` | Index file of alignments |
207 | | `cutesv/` | Directory containing files related to cuteSV variant calling |
208 | | `nanosv/` | Directory containing files related to NanoSV variant calling |
209 | | `nanovar/` | Directory containing files related to NanoVar variant calling |
210 | | `svim/` | Directory containing files related to SVIM variant calling |
211 |
212 |
213 | ## Reporting Issues
214 |
215 | If you have any issues installing or running VariantDetective, or would like a new feature added to the tool, please open an issue here on GitHub.
216 |
217 | ## Citing VariantDetective
218 |
219 | The manuscript describing this tool is available [here](https://academic.oup.com/bioinformatics/article/40/2/btae066/7609103).
220 |
221 | The tool should be cited as follows:
222 |
223 | > Philippe Charron, Mingsong Kang, "VariantDetective: An Accurate All-in-One Pipeline for Detecting Consensus Bacterial SNPs and SVs," Bioinformatics, Vol. 40, No. 2, February 2024, btae066, https://doi.org/10.1093/bioinformatics/btae066.
224 |
--------------------------------------------------------------------------------
/benchmarking/benchmark-Bmallei-fasta-all.sh:
--------------------------------------------------------------------------------
1 | for file in benchmarking/benchmark-Bmallei-fasta-all/*.fasta
2 | do
3 | name=${file::-6}
4 | echo ${file}
5 | ./variantdetective.py all_variants \
6 | -g ${file} \
7 | -r benchmarking/benchmark-Bmallei-fasta-all/GCA_000011705.1.fa \
8 | -t 24 \
9 | --readcov 50X \
10 | -o $name
11 | rm $name/GCA_000011705.1*
12 | rm $name/*fast*
13 | rm $name/*/*bam*
14 | done
15 | cat
--------------------------------------------------------------------------------
/benchmarking/benchmark-Bmallei-fastq-snp.sh:
--------------------------------------------------------------------------------
1 | dir='benchmark-Bmallei-fastq-snp'
2 | mkdir -p $dir
3 |
4 | for sra in {SRR1618492,SRR1618671,SRR1618688,SRR1618499,SRR1618349,SRR1616952,SRR2146904,SRR2146899,SRR2146902,SRR2147667,SRR8283094,SRR8072932,SRR8072935,SRR8072938,ERR9616711};
5 | do
6 | echo $sra;
7 | ./variantdetective.py snp_indel \
8 | -1 ${sra}_1.fastq \
9 | -2 ${sra}_2.fastq \
10 | -r GCA_000011705.1.fa \
11 | -t 24 \
12 | -o $dir/$sra
13 | rm $dir/$sra/GCA_000011705.1*
14 | rm $dir/$sra/*fast*
15 | rm $dir/$sra/*/*bam*
16 | done
17 |
--------------------------------------------------------------------------------
/benchmarking/benchmark-Bmallei-fastq-sv.sh:
--------------------------------------------------------------------------------
1 | dir='benchmark-Bmallei-fastq-sv'
2 | mkdir -p $dir
3 |
4 | for sra in {SRR1618494,SRR1618669,SRR1618686,SRR1618500,SRR1618350,SRR1617359,SRR2146906,SRR2146901,SRR2146903,SRR2147669,SRR8283092,SRR8072934,SRR8072936,SRR8072939,ERR9616715};
5 | do
6 | echo $sra;
7 | ./variantdetective.py structural_variant \
8 | -l ${sra}.fastq \
9 | -r GCA_000011705.1.fa \
10 | -t 24 \
11 | -o $dir/$sra
12 | rm $dir/$sra/GCA_000011705.1*
13 | rm $dir/$sra/*fast*
14 | rm $dir/$sra/*/*bam*
15 | done
16 |
--------------------------------------------------------------------------------
/benchmarking/benchmark-bwa-vs-mm-all.sh:
--------------------------------------------------------------------------------
1 | dir='benchmark-bwa-vs-mm-all'
2 | mkdir -p $dir
3 | cp parameter_sv.txt $dir/parameter_sv.txt
4 | cp parameter_snp.txt $dir/parameter_snp.txt
5 | j=50X
6 | for i in {1..5}
7 | do
8 | SURVIVOR simSV GCA_000011705.1.fa parameter_snp.txt 0.00017 0 $dir/sim_snp_${i}
9 | SURVIVOR simSV $dir/sim_snp_${i}.fasta parameter_sv.txt 0 0 $dir/sim_snp_sv_${i}
10 | # Run using BWA
11 | ./variantdetective.py all_variants \
12 | -g $dir/sim_snp_sv_${i}.fasta \
13 | -r GCA_000011705.1.fa \
14 | -t 24 \
15 | --readcov $j \
16 | --assembler bwa \
17 | -o $dir/sim_${i}_${j}_bwa
18 | rm $dir/sim_${i}_${j}_bwa/GCA_000011705.1*
19 | rm $dir/sim_${i}_${j}_bwa/*fast*
20 | rm $dir/sim_${i}_${j}_bwa/*/*bam*
21 | # Run using minimap2
22 | ./variantdetective.py all_variants \
23 | -g $dir/sim_snp_sv_${i}.fasta \
24 | -r GCA_000011705.1.fa \
25 | -t 24 \
26 | --readcov $j \
27 | --assembler minimap2 \
28 | -o $dir/sim_${i}_${j}_mm
29 | rm $dir/sim_${i}_${j}_mm/GCA_000011705.1*
30 | rm $dir/sim_${i}_${j}_mm/*fast*
31 | rm $dir/sim_${i}_${j}_mm/*/*bam*
32 | done
--------------------------------------------------------------------------------
/benchmarking/benchmark-cov-all.sh:
--------------------------------------------------------------------------------
1 | dir='benchmark-cov-all'
2 | mkdir -p $dir
3 | cp parameter_sv.txt $dir/parameter_sv.txt
4 | for j in {25X,50X,100X,200X}
5 | do
6 | for i in {1..5}
7 | do
8 | SURVIVOR simSV GCA_000011705.1.fa $dir/parameter_sv.txt 0.00017 0 $dir/sim_$i
9 | start=`date +%s%N`
10 | ./variantdetective.py all_variants \
11 | -g $dir/sim_$i.fasta \
12 | -r GCA_000011705.1.fa \
13 | -t 24 \
14 | --readcov $j \
15 | -o $dir/sim_$i
16 | end=`date +%s%N`
17 | rm -r $dir/sim_$i*
18 |
19 | echo `expr $end - $start` >> $dir/results-$dir-$j.txt
20 | done
21 | done
22 |
--------------------------------------------------------------------------------
/benchmarking/benchmark-cov-snp.sh:
--------------------------------------------------------------------------------
1 | dir='benchmark-cov-snp'
2 | mkdir -p $dir
3 | cp parameter_snp.txt $dir/parameter_snp.txt
4 | for j in {25X,50X,100X,200X,500X}
5 | do
6 | for i in {1..5}
7 | do
8 | SURVIVOR simSV GCA_000011705.1.fa $dir/parameter_snp.txt 0.00017 0 $dir/sim_$i
9 | start=`date +%s%N`
10 | ./variantdetective.py snp_indel \
11 | -g $dir/sim_$i.fasta \
12 | -r GCA_000011705.1.fa \
13 | -t 24 \
14 | --readcov $j \
15 | -o $dir/sim_$i
16 | end=`date +%s%N`
17 | rm -r $dir/sim_$i*
18 | echo `expr $end - $start` >> $dir/results-$dir-$j.txt
19 | done
20 | done
21 |
--------------------------------------------------------------------------------
/benchmarking/benchmark-cov-sv.sh:
--------------------------------------------------------------------------------
1 | dir='benchmark-cov-sv'
2 | mkdir -p $dir
3 | cp parameter_sv.txt $dir/parameter_sv.txt
4 | for j in {25X,50X,100X,200X}
5 | do
6 | for i in {1..5}
7 | do
8 | SURVIVOR simSV GCA_000011705.1.fa $dir/parameter_sv.txt 0 0 $dir/sim_$i
9 | start=`date +%s%N`
10 | ./variantdetective.py structural_variant \
11 | -g $dir/sim_$i.fasta \
12 | -r GCA_000011705.1.fa \
13 | -t 24 \
14 | --readcov $j \
15 | -o $dir/sim_$i
16 | end=`date +%s%N`
17 | rm -r $dir/sim_$i*
18 | echo `expr $end - $start` >> $dir/results-$dir-$j.txt
19 | done
20 | done
21 |
--------------------------------------------------------------------------------
/benchmarking/benchmark-numvariants-snp.sh:
--------------------------------------------------------------------------------
1 | dir='benchmark-numvariants-snp'
2 | mkdir -p $dir
3 | cp parameter_snp.txt $dir/parameter_snp.txt
4 | for j in {0.000017,0.000034,0.00017,0.00034,0.0017,0.0034,0.017}
5 | do
6 | for i in {1..5}
7 | do
8 | SURVIVOR simSV GCA_000011705.1.fa $dir/parameter_snp.txt $j 0 $dir/sim_$i
9 | start=`date +%s%N`
10 | ./variantdetective.py snp_indel \
11 | -g $dir/sim_$i.fasta \
12 | -r GCA_000011705.1.fa \
13 | -t 24 \
14 | -o $dir/sim_$i
15 | end=`date +%s%N`
16 | rm -r $dir/sim_$i*
17 | echo `expr $end - $start` >> $dir/results-$dir-$j.txt
18 | done
19 | done
20 |
--------------------------------------------------------------------------------
/benchmarking/benchmark-numvariants-sv.sh:
--------------------------------------------------------------------------------
1 | dir='benchmark-numvariants-sv'
2 | mkdir -p $dir
3 | for j in {21,46,66,128,210}
4 | do
5 | for i in {1..5}
6 | do
7 | SURVIVOR simSV GCA_000011705.1.fa $dir/parameter_sv_$j.txt 0 0 $dir/sim_$i
8 | start=`date +%s%N`
9 | ./variantdetective.py structural_variant \
10 | -g $dir/sim_$i.fasta \
11 | -r GCA_000011705.1.fa \
12 | -t 24 \
13 | --readcov 50X \
14 | -o $dir/sim_$i
15 | end=`date +%s%N`
16 | rm -r $dir/sim_$i*
17 | echo `expr $end - $start` >> $dir/results-$dir-$j.txt
18 | done
19 | done
20 |
--------------------------------------------------------------------------------
/benchmarking/benchmark-real_data-Bush-fasta-snp.sh:
--------------------------------------------------------------------------------
1 | dir='benchmark-real_data-Bush-fasta-snp'
2 | for i in {cft073,mgh78578,rbhstw00029,rbhstw00053,rbhstw00059,rbhstw00122,rbhstw00127,rbhstw00128,rbhstw00131,rbhstw00167,rbhstw00189,rbhstw00277,rbhstw00309,rbhstw00340,rbhstw00350,rhb10c07,rhb11c04,rhb14c01}
3 | do
4 | ./variantdetective.py snp_indel \
5 | -g $dir/${i}/${i}.fasta \
6 | -r $dir/${i}/*.fa \
7 | -t 40 \
8 | -o $dir/${i}/ \
9 | --readcov 100x \
10 | --readlen 100000,2000 \
11 | rm $dir/${i}/*/*bam*
12 | done
13 |
--------------------------------------------------------------------------------
/benchmarking/benchmark-real_data-Bush-fastq-snp.sh:
--------------------------------------------------------------------------------
1 | dir='benchmark-real_data-Bush-fastq-snp'
2 | for i in {cft073,mgh78578,rbhstw00029,rbhstw00053,rbhstw00059,rbhstw00122,rbhstw00127,rbhstw00128,rbhstw00131,rbhstw00167,rbhstw00189,rbhstw00277,rbhstw00309,rbhstw00340,rbhstw00350,rhb10c07,rhb11c04,rhb14c01}
3 | do
4 | ./variantdetective.py snp_indel \
5 | -1 $dir/${i}/${i}.1.fq.gz \
6 | -2 $dir/${i}/${i}.2.fq.gz \
7 | -r $dir/${i}/*.fa \
8 | -t 40 \
9 | -o $dir/${i}/
10 | rm $dir/${i}/*/*bam*
11 | done
12 |
--------------------------------------------------------------------------------
/benchmarking/benchmark-real_data-HG002-sv.sh:
--------------------------------------------------------------------------------
1 | dir='benchmark-real_data_HG002-sv'
2 | mkdir -p $dir
3 | ./variantdetective.py structural_variant \
4 | -l $dir/NA12878-12.5mil.fq.gz \
5 | -r $dir/hg19_ucsc_main.fa \
6 | -t 24 \
7 | -o $dir/NA12878_results
--------------------------------------------------------------------------------
/benchmarking/benchmark-sim_data-Bush-Ecoli-divergent-snp.sh:
--------------------------------------------------------------------------------
1 | dir='benchmark-sim_data-Bush-Ecoli-divergent-snp'
2 | for i in {09-00049,1223,14EC017,14EC020,2012C-4227,2013C-4465,2016C-3878,210221272,2149,28RC1,317,350,51008369SK1,5CRE51,746,789,9000,94-3024,95JB1,ABWA45,AR_0006,AR_0017,AR_0055,AR_0069,AR_0151,AR_0369,AR436,ATCC25922,BH100Lsubstr.MG2017}
3 | do
4 | ./variantdetective.py snp_indel \
5 | -g $dir/${i}/${i}_mutated_simulation.fasta \
6 | -r $dir/NC_000913.3.fasta \
7 | -t 40 \
8 | -o $dir/${i}/ \
9 | --readcov 100x \
10 | --readlen 100000,2000 \
11 | rm $dir/${i}/*/*bam*
12 | done
13 |
14 |
15 |
--------------------------------------------------------------------------------
/benchmarking/benchmark-sim_data-Bush-Ecoli-same-snp.sh:
--------------------------------------------------------------------------------
1 | dir='benchmark-sim_data-Bush-Ecoli-same-snp'
2 | for i in {09-00049,1223,14EC017,14EC020,2012C-4227,2013C-4465,2016C-3878,210221272,2149,28RC1,317,350,51008369SK1,5CRE51,746,789,9000,94-3024,95JB1,ABWA45,AR_0006,AR_0017,AR_0055,AR_0069,AR_0151,AR_0369,AR436,ATCC25922,BH100Lsubstr.MG2017}
3 | do
4 | ./variantdetective.py snp_indel \
5 | -g $dir/${i}/${i}_mutated_simulation.fasta \
6 | -r $dir/${i}/${i}_simulated.fasta \
7 | -t 40 \
8 | -o $dir/${i}/ \
9 | --readcov 100x \
10 | --readlen 100000,2000 \
11 | rm $dir/${i}/*/*bam*
12 | done
13 |
14 |
15 |
--------------------------------------------------------------------------------
/benchmarking/benchmark-sim_data-PR-Bmallei-all.sh:
--------------------------------------------------------------------------------
1 | dir='benchmark-PR-Bmallei-all'
2 | mkdir -p $dir
3 | cp parameter_sv_bmallei.txt $dir/parameter_sv_bmallei.txt
4 | cp parameter_snp.txt $dir/parameter_snp.txt
5 | for i in {1..5}
6 | do
7 | SURVIVOR simSV GCA_000011705.1.fa parameter_snp.txt 0.00017 0 $dir/sim_snp_${i}
8 | SURVIVOR simSV $dir/sim_snp_${i}.fasta parameter_sv_bmallei.txt 0 0 $dir/sim_snp_sv_${i}
9 | variantdetective all_variants \
10 | -g $dir/sim_snp_sv_${i}.fasta \
11 | -r GCA_000011705.1.fa \
12 | -t 24 \
13 | --readcov 50X \
14 | -o $dir/sim_${i}_${j}
15 | rm $dir/sim_${i}_${j}/GCA_000011705*
16 | rm $dir/sim_${i}_${j}/*fast*
17 | rm $dir/sim_${i}_${j}/*/*bam*
18 | rm $dir/sim_${i}_${j}/structural_variant/nanovar/*bam*
19 | done
20 |
--------------------------------------------------------------------------------
/benchmarking/benchmark-sim_data-PR-Ecoli-all.sh:
--------------------------------------------------------------------------------
1 | dir='benchmark-PR-Ecoli-all'
2 | mkdir -p $dir
3 | cp parameter_sv_ecoli.txt $dir/parameter_sv_ecoli.txt
4 | cp parameter_snp.txt $dir/parameter_snp.txt
5 | for i in {1..5}
6 | do
7 | SURVIVOR simSV NC_000913.3.fasta parameter_snp.txt 0.00017 0 $dir/sim_snp_${i}
8 | SURVIVOR simSV $dir/sim_snp_${i}.fasta parameter_sv_ecoli.txt 0 0 $dir/sim_snp_sv_${i}
9 | variantdetective all_variants \
10 | -g $dir/sim_snp_sv_${i}.fasta \
11 | -r NC_000913.3.fasta \
12 | -t 24 \
13 | --readcov 50X \
14 | -o $dir/sim_${i}_${j}
15 | rm $dir/sim_${i}_${j}/NC_000913*
16 | rm $dir/sim_${i}_${j}/*fast*
17 | rm $dir/sim_${i}_${j}/*/*bam*
18 | rm $dir/sim_${i}_${j}/structural_variant/nanovar/*bam*
19 | done
20 |
--------------------------------------------------------------------------------
/benchmarking/benchmark-sim_data-PR-Lacidophilus-all.sh:
--------------------------------------------------------------------------------
1 | dir='benchmark-PR-Lacidophilus-all'
2 | mkdir -p $dir
3 | cp parameter_sv_lacidophilus.txt $dir/parameter_sv_lacidophilus.txt
4 | cp parameter_snp.txt $dir/parameter_snp.txt
5 | j=50X
6 | for i in {1..5}
7 | do
8 | SURVIVOR simSV NC_000913.3.fasta parameter_snp.txt 0.00017 0 $dir/sim_snp_${i}
9 | SURVIVOR simSV $dir/sim_snp_${i}.fasta parameter_sv_lacidophilus.txt 0 0 $dir/sim_snp_sv_${i}
10 | variantdetective all_variants \
11 | -g $dir/sim_snp_sv_${i}.fasta \
12 | -r NC_000913.3.fasta \
13 | -t 24 \
14 | --readcov $j \
15 | -o $dir/sim_${i}_${j}
16 | rm $dir/sim_${i}_${j}/NC_000913*
17 | rm $dir/sim_${i}_${j}/*fast*
18 | rm $dir/sim_${i}_${j}/*/*bam*
19 | rm $dir/sim_${i}_${j}/structural_variant/nanovar/*bam*
20 | done
21 |
--------------------------------------------------------------------------------
/benchmarking/benchmark-threads-all.sh:
--------------------------------------------------------------------------------
1 | dir='benchmark-threads-all'
2 | mkdir -p $dir
3 | cp parameter_sv.txt $dir/parameter_sv.txt
4 | for j in {1,2,4,8,12,24,48}
5 | do
6 | for i in {1..5}
7 | do
8 | SURVIVOR simSV GCA_000011705.1.fa $dir/parameter_sv.txt 0.00017 0 $dir/sim_$i
9 | start=`date +%s%N`
10 | ./variantdetective.py all_variants \
11 | -g $dir/sim_$i.fasta \
12 | -r GCA_000011705.1.fa \
13 | -t $j \
14 | -o $dir/sim_$i
15 | end=`date +%s%N`
16 | rm -r $dir/sim_$i*
17 | echo `expr $end - $start` >> $dir/results-$dir-$j.txt
18 | done
19 | done
20 |
--------------------------------------------------------------------------------
/benchmarking/benchmark-threads-snp.sh:
--------------------------------------------------------------------------------
1 | dir='benchmark-threads-snp'
2 | mkdir -p $dir
3 | cp parameter_snp.txt $dir/parameter_snp.txt
4 | for j in {1,2,4,8,12,24,48}
5 | do
6 | for i in {1..5}
7 | do
8 | SURVIVOR simSV GCA_000011705.1.fa $dir/parameter_snp.txt 0.00017 0 $dir/sim_$i
9 | start=`date +%s%N`
10 | ./variantdetective.py snp_indel \
11 | -g $dir/sim_$i.fasta \
12 | -r GCA_000011705.1.fa \
13 | -t $j \
14 | -o $dir/sim_$i
15 | end=`date +%s%N`
16 | rm -r $dir/sim_$i*
17 | echo `expr $end - $start` >> $dir/results-$dir-$j.txt
18 | done
19 | done
20 |
--------------------------------------------------------------------------------
/benchmarking/benchmark-threads-sv.sh:
--------------------------------------------------------------------------------
1 | dir='benchmark-threads-sv'
2 | mkdir -p $dir
3 | cp parameter_sv.txt $dir/parameter_sv.txt
4 | for j in {1,2,4,8,12,24,48}
5 | do
6 | for i in {1..5}
7 | do
8 | SURVIVOR simSV GCA_000011705.1.fa $dir/parameter_sv.txt 0 0 $dir/sim_$i
9 | start=`date +%s%N`
10 | ./variantdetective.py structural_variant \
11 | -g $dir/sim_$i.fasta \
12 | -r GCA_000011705.1.fa \
13 | -t $j \
14 | -o $dir/sim_$i
15 | end=`date +%s%N`
16 | rm -r $dir/sim_$i*
17 | echo `expr $end - $start` >> $dir/results-$dir-$j.txt
18 | done
19 | done
20 |
--------------------------------------------------------------------------------
/benchmarking/parameter_snp.txt:
--------------------------------------------------------------------------------
1 | PARAMETER FILE: DO JUST MODIFY THE VALUES AND KEEP THE SPACES!
2 | DUPLICATION_minimum_length: 1000
3 | DUPLICATION_maximum_length: 20000
4 | DUPLICATION_number: 0
5 | INDEL_minimum_length: 1000
6 | INDEL_maximum_length: 30000
7 | INDEL_number: 0
8 | TRANSLOCATION_minimum_length: 10000
9 | TRANSLOCATION_maximum_length: 30000
10 | TRANSLOCATION_number: 0
11 | INVERSION_minimum_length: 500
12 | INVERSION_maximum_length: 10000
13 | INVERSION_number: 0
14 | INV_del_minimum_length: 600
15 | INV_del_maximum_length: 800
16 | INV_del_number: 0
17 | INV_dup_minimum_length: 600
18 | INV_dup_maximum_length: 800
19 | INV_dup_number: 0
20 |
--------------------------------------------------------------------------------
/benchmarking/parameter_sv.txt:
--------------------------------------------------------------------------------
1 | PARAMETER FILE: DO JUST MODIFY THE VALUES AND KEEP THE SPACES!
2 | DUPLICATION_minimum_length: 1000
3 | DUPLICATION_maximum_length: 20000
4 | DUPLICATION_number: 3
5 | INDEL_minimum_length: 1000
6 | INDEL_maximum_length: 30000
7 | INDEL_number: 85
8 | TRANSLOCATION_minimum_length: 10000
9 | TRANSLOCATION_maximum_length: 30000
10 | TRANSLOCATION_number: 1
11 | INVERSION_minimum_length: 500
12 | INVERSION_maximum_length: 10000
13 | INVERSION_number: 10
14 | INV_del_minimum_length: 600
15 | INV_del_maximum_length: 800
16 | INV_del_number: 0
17 | INV_dup_minimum_length: 600
18 | INV_dup_maximum_length: 800
19 | INV_dup_number: 0
20 |
--------------------------------------------------------------------------------
/benchmarking/parameter_sv_bmallei.txt:
--------------------------------------------------------------------------------
1 | PARAMETER FILE: DO JUST MODIFY THE VALUES AND KEEP THE SPACES!
2 | DUPLICATION_minimum_length: 1000
3 | DUPLICATION_maximum_length: 20000
4 | DUPLICATION_number: 3
5 | INDEL_minimum_length: 1000
6 | INDEL_maximum_length: 30000
7 | INDEL_number: 85
8 | TRANSLOCATION_minimum_length: 10000
9 | TRANSLOCATION_maximum_length: 30000
10 | TRANSLOCATION_number: 1
11 | INVERSION_minimum_length: 500
12 | INVERSION_maximum_length: 10000
13 | INVERSION_number: 10
14 | INV_del_minimum_length: 600
15 | INV_del_maximum_length: 800
16 | INV_del_number: 0
17 | INV_dup_minimum_length: 600
18 | INV_dup_maximum_length: 800
19 | INV_dup_number: 0
20 |
--------------------------------------------------------------------------------
/benchmarking/parameter_sv_ecoli.txt:
--------------------------------------------------------------------------------
1 | PARAMETER FILE: DO JUST MODIFY THE VALUES AND KEEP THE SPACES!
2 | DUPLICATION_minimum_length: 1000
3 | DUPLICATION_maximum_length: 20000
4 | DUPLICATION_number: 3
5 | INDEL_minimum_length: 1000
6 | INDEL_maximum_length: 30000
7 | INDEL_number: 87
8 | TRANSLOCATION_minimum_length: 10000
9 | TRANSLOCATION_maximum_length: 30000
10 | TRANSLOCATION_number: 0
11 | INVERSION_minimum_length: 500
12 | INVERSION_maximum_length: 10000
13 | INVERSION_number: 10
14 | INV_del_minimum_length: 600
15 | INV_del_maximum_length: 800
16 | INV_del_number: 0
17 | INV_dup_minimum_length: 600
18 | INV_dup_maximum_length: 800
19 | INV_dup_number: 0
20 |
--------------------------------------------------------------------------------
/benchmarking/parameter_sv_lacidophilus.txt:
--------------------------------------------------------------------------------
1 | PARAMETER FILE: DO JUST MODIFY THE VALUES AND KEEP THE SPACES!
2 | DUPLICATION_minimum_length: 333
3 | DUPLICATION_maximum_length: 6667
4 | DUPLICATION_number: 3
5 | INDEL_minimum_length: 333
6 | INDEL_maximum_length: 10000
7 | INDEL_number: 87
8 | TRANSLOCATION_minimum_length: 3333
9 | TRANSLOCATION_maximum_length: 10000
10 | TRANSLOCATION_number: 0
11 | INVERSION_minimum_length: 167
12 | INVERSION_maximum_length: 3333
13 | INVERSION_number: 10
14 | INV_del_minimum_length: 600
15 | INV_del_maximum_length: 800
16 | INV_del_number: 0
17 | INV_dup_minimum_length: 600
18 | INV_dup_maximum_length: 800
19 | INV_dup_number: 0
20 |
--------------------------------------------------------------------------------
/build.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 |
3 | $PYTHON -m pip install --no-deps --ignore-installed . # Python command to install the script.
4 |
--------------------------------------------------------------------------------
/res/logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/OLF-Bioinformatics/VariantDetective/5ca633e4879865db12112440f1f0c691f6767cd2/res/logo.png
--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
1 | from setuptools import setup, find_packages
2 |
3 | setup(name='variantdetective',
4 | version='1.0.1',
5 | description='VariantDetective: an accurate all-in-one pipeline for detecting consensus bacterial SNPs and SVs',
6 | url='https://github.com/OLF-Bioinformatics/VariantDetective',
7 | author='Phil Charron',
8 | author_email='phil.charron@inspection.gc.ca',
9 | license='MIT',
10 | #install_requires=['pandas>=2.1.3', 'numpy>=1.26', 'tensorflow>=2.8.0'],
11 | packages=find_packages(),
12 | include_package_data=True,
13 | package_data={
14 | 'variantdetective': ['clair3_models/*/*'],
15 | },
16 | entry_points={
17 | 'console_scripts': [
18 | 'variantdetective=variantdetective.main:main',
19 | ],
20 | }
21 | )
22 |
--------------------------------------------------------------------------------
/spec-file.txt:
--------------------------------------------------------------------------------
1 | # This file may be used to create an environment using:
2 | # $ conda create --name --file
3 | # platform: linux-64
4 | @EXPLICIT
5 | https://conda.anaconda.org/conda-forge/linux-64/_libgcc_mutex-0.1-conda_forge.tar.bz2
6 | https://conda.anaconda.org/conda-forge/noarch/_r-mutex-1.0.1-anacondar_1.tar.bz2
7 | https://repo.anaconda.com/pkgs/main/linux-64/_tflow_select-2.3.0-mkl.conda
8 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/linux-64/ca-certificates-2023.11.17-hbcca054_0.conda
9 | https://conda.anaconda.org/conda-forge/noarch/kernel-headers_linux-64-2.6.32-he073ed8_16.conda
10 | https://conda.anaconda.org/conda-forge/linux-64/ld_impl_linux-64-2.36.1-hea4e1c9_2.tar.bz2
11 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/linux-64/libgcc-devel_linux-64-7.5.0-hda03d7c_20.tar.bz2
12 | https://conda.anaconda.org/conda-forge/linux-64/libgfortran4-7.5.0-h14aa051_20.tar.bz2
13 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/linux-64/libstdcxx-devel_linux-64-7.5.0-hb016644_20.tar.bz2
14 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/linux-64/libstdcxx-ng-13.2.0-h7e041cc_3.conda
15 | https://conda.anaconda.org/conda-forge/linux-64/libgfortran-ng-7.5.0-h14aa051_20.tar.bz2
16 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/linux-64/libgomp-13.2.0-h807b86a_3.conda
17 | https://conda.anaconda.org/conda-forge/noarch/sysroot_linux-64-2.12-he073ed8_16.conda
18 | https://conda.anaconda.org/conda-forge/linux-64/_openmp_mutex-4.5-2_gnu.tar.bz2
19 | https://conda.anaconda.org/conda-forge/linux-64/binutils_impl_linux-64-2.36.1-h193b22a_2.tar.bz2
20 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/linux-64/binutils_linux-64-2.36-hf3e587d_33.tar.bz2
21 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/linux-64/libgcc-ng-13.2.0-h807b86a_3.conda
22 | https://conda.anaconda.org/conda-forge/linux-64/alsa-lib-1.2.3.2-h166bdaf_0.tar.bz2
23 | https://conda.anaconda.org/conda-forge/linux-64/bc-1.07.1-h7f98852_0.tar.bz2
24 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/linux-64/bzip2-1.0.8-hd590300_5.conda
25 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/linux-64/c-ares-1.22.1-hd590300_0.conda
26 | https://conda.anaconda.org/conda-forge/linux-64/fribidi-1.0.10-h36c2ea0_0.tar.bz2
27 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/linux-64/gcc_impl_linux-64-7.5.0-habd7529_20.tar.bz2
28 | https://conda.anaconda.org/conda-forge/linux-64/gettext-0.21.1-h27087fc_0.tar.bz2
29 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/linux-64/giflib-5.2.1-h0b41bf4_3.conda
30 | https://conda.anaconda.org/conda-forge/linux-64/graphite2-1.3.13-h58526e2_1001.tar.bz2
31 | https://conda.anaconda.org/conda-forge/linux-64/icu-58.2-hf484d3e_1000.tar.bz2
32 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/linux-64/isa-l-2.30.0-hd590300_6.conda
33 | https://conda.anaconda.org/conda-forge/linux-64/jpeg-9e-h0b41bf4_3.conda
34 | https://conda.anaconda.org/conda-forge/linux-64/keyutils-1.6.1-h166bdaf_0.tar.bz2
35 | https://conda.anaconda.org/conda-forge/linux-64/libdeflate-1.13-h166bdaf_0.tar.bz2
36 | https://conda.anaconda.org/conda-forge/linux-64/libev-4.33-h516909a_1.tar.bz2
37 | https://conda.anaconda.org/conda-forge/linux-64/libexpat-2.5.0-hcb278e6_1.conda
38 | https://conda.anaconda.org/conda-forge/linux-64/libffi-3.2.1-he1b5a44_1007.tar.bz2
39 | https://conda.anaconda.org/conda-forge/linux-64/libiconv-1.17-h166bdaf_0.tar.bz2
40 | https://conda.anaconda.org/conda-forge/linux-64/libnsl-2.0.1-hd590300_0.conda
41 | https://repo.anaconda.com/pkgs/main/linux-64/libopenblas-0.3.18-hf726d26_0.conda
42 | https://conda.anaconda.org/conda-forge/linux-64/libunistring-0.9.10-h7f98852_0.tar.bz2
43 | https://conda.anaconda.org/conda-forge/linux-64/libuuid-2.38.1-h0b41bf4_0.conda
44 | https://conda.anaconda.org/conda-forge/linux-64/libwebp-base-1.1.0-h36c2ea0_3.tar.bz2
45 | https://conda.anaconda.org/conda-forge/linux-64/libzlib-1.2.13-hd590300_5.conda
46 | https://conda.anaconda.org/conda-forge/linux-64/lz4-c-1.9.2-he1b5a44_3.tar.bz2
47 | https://conda.anaconda.org/conda-forge/linux-64/lzo-2.10-h516909a_1000.tar.bz2
48 | https://conda.anaconda.org/conda-forge/linux-64/ncurses-6.2-h58526e2_4.tar.bz2
49 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/linux-64/openssl-1.1.1w-hd590300_0.conda
50 | https://conda.anaconda.org/conda-forge/linux-64/ossuuid-1.6.2-hf484d3e_1000.tar.bz2
51 | https://conda.anaconda.org/conda-forge/linux-64/pcre-8.45-h9c3ff4c_0.tar.bz2
52 | https://conda.anaconda.org/conda-forge/linux-64/pixman-0.38.0-h516909a_1003.tar.bz2
53 | https://conda.anaconda.org/conda-forge/linux-64/pthread-stubs-0.4-h36c2ea0_1001.tar.bz2
54 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/linux-64/snappy-1.1.10-h9fff704_0.conda
55 | https://conda.anaconda.org/conda-forge/linux-64/xorg-inputproto-2.3.2-h7f98852_1002.tar.bz2
56 | https://conda.anaconda.org/conda-forge/linux-64/xorg-kbproto-1.0.7-h7f98852_1002.tar.bz2
57 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/linux-64/xorg-libice-1.1.1-hd590300_0.conda
58 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/linux-64/xorg-libxau-1.0.11-hd590300_0.conda
59 | https://conda.anaconda.org/conda-forge/linux-64/xorg-libxdmcp-1.1.3-h7f98852_0.tar.bz2
60 | https://conda.anaconda.org/conda-forge/linux-64/xorg-recordproto-1.14.2-h7f98852_1002.tar.bz2
61 | https://conda.anaconda.org/conda-forge/linux-64/xorg-renderproto-0.11.1-h7f98852_1002.tar.bz2
62 | https://conda.anaconda.org/conda-forge/linux-64/xorg-xextproto-7.3.0-h0b41bf4_1003.conda
63 | https://conda.anaconda.org/conda-forge/linux-64/xorg-xproto-7.0.31-h7f98852_1007.tar.bz2
64 | https://conda.anaconda.org/conda-forge/linux-64/xz-5.2.6-h166bdaf_0.tar.bz2
65 | https://conda.anaconda.org/conda-forge/linux-64/blosc-1.21.1-hd32f23e_0.tar.bz2
66 | https://conda.anaconda.org/conda-forge/linux-64/expat-2.5.0-hcb278e6_1.conda
67 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/linux-64/gcc_linux-64-7.5.0-h47867f9_33.tar.bz2
68 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/linux-64/gfortran_impl_linux-64-7.5.0-h56cb351_20.tar.bz2
69 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/linux-64/gxx_impl_linux-64-7.5.0-hd0bb8aa_20.tar.bz2
70 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/linux-64/libblas-3.9.0-13_linux64_openblas.tar.bz2
71 | https://conda.anaconda.org/conda-forge/linux-64/libedit-3.1.20191231-he28a2e2_2.tar.bz2
72 | https://conda.anaconda.org/conda-forge/linux-64/libidn2-2.3.4-h166bdaf_0.tar.bz2
73 | https://conda.anaconda.org/conda-forge/linux-64/libnghttp2-1.51.0-hdcd2b5c_0.conda
74 | https://conda.anaconda.org/conda-forge/linux-64/libpng-1.6.39-h753d276_0.conda
75 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/linux-64/libsqlite-3.44.2-h2797004_0.conda
76 | https://conda.anaconda.org/conda-forge/linux-64/libssh2-1.10.0-haa6b8db_3.tar.bz2
77 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/linux-64/libxcb-1.15-h0b41bf4_0.conda
78 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/linux-64/pbzip2-1.1.13-h1fcc475_2.conda
79 | https://conda.anaconda.org/conda-forge/linux-64/perl-5.32.1-4_hd590300_perl5.conda
80 | https://conda.anaconda.org/t/ch-9f6a872e-99a2-45f8-b0d4-b6aae824fb17/conda-forge/linux-64/readline-7.0-hf8c457e_1001.tar.bz2
81 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/linux-64/tk-8.6.13-noxft_h4845f30_101.conda
82 | https://conda.anaconda.org/conda-forge/linux-64/xorg-fixesproto-5.0-h7f98852_1002.tar.bz2
83 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/linux-64/xorg-libsm-1.2.4-h7391055_0.conda
84 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/linux-64/zlib-1.2.13-hd590300_5.conda
85 | https://conda.anaconda.org/bioconda/linux-64/bedtools-2.30.0-h468198e_3.tar.bz2
86 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/bioconda/linux-64/bwa-0.7.17-he4a0461_11.tar.bz2
87 | https://conda.anaconda.org/conda-forge/linux-64/bwidget-1.9.14-ha770c72_1.tar.bz2
88 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/linux-64/freetype-2.12.1-h267a509_2.conda
89 | https://conda.anaconda.org/t/ch-9f6a872e-99a2-45f8-b0d4-b6aae824fb17/conda-forge/linux-64/gdbm-1.18-h941a26a_0.tar.bz2
90 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/linux-64/gfortran_linux-64-7.5.0-h78c8a43_33.tar.bz2
91 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/linux-64/gxx_linux-64-7.5.0-h555fc39_33.tar.bz2
92 | https://conda.anaconda.org/bioconda/linux-64/k8-0.2.5-hdcf5f25_4.tar.bz2
93 | https://conda.anaconda.org/conda-forge/linux-64/krb5-1.20.1-hf9c8cef_0.conda
94 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/linux-64/libcblas-3.9.0-13_linux64_openblas.tar.bz2
95 | https://conda.anaconda.org/conda-forge/linux-64/libglib-2.66.3-hbe7bbb4_0.tar.bz2
96 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/linux-64/liblapack-3.9.0-13_linux64_openblas.tar.bz2
97 | https://conda.anaconda.org/conda-forge/linux-64/libprotobuf-3.18.0-h780b84a_1.tar.bz2
98 | https://repo.anaconda.com/pkgs/main/linux-64/libxml2-2.9.14-h74e7548_0.conda
99 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/bioconda/linux-64/ngmlr-0.2.7-hdcf5f25_6.tar.bz2
100 | https://conda.anaconda.org/conda-forge/linux-64/parallel-20191122-0.tar.bz2
101 | https://conda.anaconda.org/conda-forge/linux-64/perl-capture-tiny-0.48-pl5321ha770c72_1.tar.bz2
102 | https://conda.anaconda.org/conda-forge/noarch/perl-common-sense-3.75-pl5321hd8ed1ab_0.tar.bz2
103 | https://conda.anaconda.org/conda-forge/linux-64/perl-compress-raw-bzip2-2.201-pl5321h166bdaf_0.tar.bz2
104 | https://conda.anaconda.org/conda-forge/linux-64/perl-compress-raw-zlib-2.202-pl5321h166bdaf_0.tar.bz2
105 | https://conda.anaconda.org/conda-forge/noarch/perl-constant-1.33-pl5321hd8ed1ab_0.tar.bz2
106 | https://conda.anaconda.org/conda-forge/noarch/perl-exporter-5.74-pl5321hd8ed1ab_0.tar.bz2
107 | https://conda.anaconda.org/conda-forge/noarch/perl-exporter-tiny-1.002002-pl5321hd8ed1ab_0.tar.bz2
108 | https://conda.anaconda.org/conda-forge/noarch/perl-extutils-makemaker-7.70-pl5321hd8ed1ab_0.conda
109 | https://conda.anaconda.org/bioconda/noarch/perl-ffi-checklib-0.28-pl5321hdfd78af_0.tar.bz2
110 | https://conda.anaconda.org/conda-forge/noarch/perl-file-which-1.24-pl5321hd8ed1ab_0.tar.bz2
111 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/noarch/perl-importer-0.026-pl5321hd8ed1ab_0.tar.bz2
112 | https://conda.anaconda.org/bioconda/noarch/perl-io-zlib-1.14-pl5321hdfd78af_0.tar.bz2
113 | https://conda.anaconda.org/bioconda/linux-64/perl-list-moreutils-xs-0.430-pl5321h031d066_2.tar.bz2
114 | https://conda.anaconda.org/conda-forge/noarch/perl-parent-0.241-pl5321hd8ed1ab_0.conda
115 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/noarch/perl-path-tiny-0.124-pl5321hd8ed1ab_0.tar.bz2
116 | https://conda.anaconda.org/conda-forge/linux-64/perl-scalar-list-utils-1.63-pl5321h166bdaf_0.tar.bz2
117 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/noarch/perl-scope-guard-0.21-pl5321hd8ed1ab_0.tar.bz2
118 | https://conda.anaconda.org/conda-forge/linux-64/perl-storable-3.15-pl5321h166bdaf_0.tar.bz2
119 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/linux-64/perl-test-warnings-0.031-pl5321ha770c72_0.conda
120 | https://conda.anaconda.org/conda-forge/linux-64/perl-try-tiny-0.31-pl5321ha770c72_0.tar.bz2
121 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/noarch/perl-xml-sax-base-1.09-pl5321hd8ed1ab_0.tar.bz2
122 | https://repo.anaconda.com/pkgs/main/linux-64/pigz-2.4-h84994c4_0.conda
123 | https://repo.anaconda.com/pkgs/main/linux-64/sqlite-3.33.0-h62c20be_0.conda
124 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/bioconda/linux-64/survivor-1.0.7-hdcf5f25_4.tar.bz2
125 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/linux-64/tktable-2.10-h0c5db8f_5.conda
126 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/bioconda/linux-64/vcftools-0.1.16-pl5321hdcf5f25_9.tar.bz2
127 | https://conda.anaconda.org/conda-forge/linux-64/wget-1.20.3-ha56f1ee_1.tar.bz2
128 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/linux-64/xorg-libx11-1.8.7-h8ee46fc_0.conda
129 | https://conda.anaconda.org/conda-forge/linux-64/zstd-1.4.4-h6597ccf_3.tar.bz2
130 | https://conda.anaconda.org/bioconda/linux-64/entrez-direct-16.2-he881be0_1.tar.bz2
131 | https://conda.anaconda.org/conda-forge/linux-64/fontconfig-2.14.2-h14ed4e7_0.conda
132 | https://conda.anaconda.org/conda-forge/linux-64/gsl-2.5-h294904e_1.tar.bz2
133 | https://conda.anaconda.org/conda-forge/linux-64/libcurl-7.87.0-h6312ad2_0.conda
134 | https://conda.anaconda.org/conda-forge/linux-64/libtiff-4.1.0-hc7e4089_6.tar.bz2
135 | https://conda.anaconda.org/bioconda/linux-64/minimap2-2.24-h7132678_1.tar.bz2
136 | https://conda.anaconda.org/conda-forge/noarch/perl-carp-1.50-pl5321hd8ed1ab_0.tar.bz2
137 | https://conda.anaconda.org/conda-forge/linux-64/perl-encode-3.19-pl5321h166bdaf_0.tar.bz2
138 | https://conda.anaconda.org/conda-forge/noarch/perl-file-path-2.18-pl5321hd8ed1ab_0.tar.bz2
139 | https://conda.anaconda.org/bioconda/noarch/perl-list-moreutils-0.430-pl5321hdfd78af_0.tar.bz2
140 | https://conda.anaconda.org/conda-forge/linux-64/perl-test-fatal-0.016-pl5321ha770c72_0.tar.bz2
141 | https://conda.anaconda.org/bioconda/noarch/perl-types-serialiser-1.01-pl5321hdfd78af_0.tar.bz2
142 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/noarch/perl-xml-namespacesupport-1.12-pl5321hd8ed1ab_0.tar.bz2
143 | https://conda.anaconda.org/conda-forge/linux-64/pypy3.6-7.3.2-h45e8706_2.tar.bz2
144 | https://repo.anaconda.com/pkgs/main/linux-64/python-3.6.10-h191fe78_1.conda
145 | https://conda.anaconda.org/conda-forge/linux-64/xorg-libxext-1.3.4-h0b41bf4_2.conda
146 | https://conda.anaconda.org/conda-forge/linux-64/xorg-libxfixes-5.0.3-h7f98852_1004.tar.bz2
147 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/linux-64/xorg-libxrender-0.9.11-hd590300_0.conda
148 | https://conda.anaconda.org/conda-forge/noarch/astor-0.8.1-pyh9f0ad1d_0.tar.bz2
149 | https://conda.anaconda.org/conda-forge/noarch/async-timeout-3.0.1-py_1000.tar.bz2
150 | https://conda.anaconda.org/conda-forge/noarch/attrs-22.2.0-pyh71513ae_0.conda
151 | https://conda.anaconda.org/conda-forge/noarch/blinker-1.5-pyhd8ed1ab_0.tar.bz2
152 | https://conda.anaconda.org/conda-forge/noarch/cachetools-5.0.0-pyhd8ed1ab_0.tar.bz2
153 | https://conda.anaconda.org/conda-forge/noarch/charset-normalizer-2.1.1-pyhd8ed1ab_0.tar.bz2
154 | https://conda.anaconda.org/bioconda/noarch/cigar-0.1.3-pyh864c0ab_1.tar.bz2
155 | https://conda.anaconda.org/conda-forge/linux-64/curl-7.87.0-h6312ad2_0.conda
156 | https://conda.anaconda.org/conda-forge/noarch/cycler-0.11.0-pyhd8ed1ab_0.tar.bz2
157 | https://conda.anaconda.org/conda-forge/noarch/dataclasses-0.8-pyh787bdff_2.tar.bz2
158 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/noarch/gast-0.3.3-py_0.tar.bz2
159 | https://conda.anaconda.org/conda-forge/linux-64/glib-2.66.3-h58526e2_0.tar.bz2
160 | https://conda.anaconda.org/t/ch-9f6a872e-99a2-45f8-b0d4-b6aae824fb17/conda-forge/linux-64/hdf5-1.10.6-nompi_h7c3c948_1111.tar.bz2
161 | https://conda.anaconda.org/bioconda/linux-64/htslib-1.10.2-hd3b49d5_1.tar.bz2
162 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/noarch/idna-3.6-pyhd8ed1ab_0.conda
163 | https://conda.anaconda.org/conda-forge/linux-64/lcms2-2.11-hcbb858e_1.tar.bz2
164 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/noarch/mock-5.1.0-pyhd8ed1ab_0.conda
165 | https://conda.anaconda.org/conda-forge/noarch/natsort-8.2.0-pyhd8ed1ab_0.tar.bz2
166 | https://conda.anaconda.org/conda-forge/noarch/olefile-0.46-pyh9f0ad1d_1.tar.bz2
167 | https://conda.anaconda.org/conda-forge/noarch/perl-business-isbn-data-20210112.006-pl5321hd8ed1ab_0.tar.bz2
168 | https://conda.anaconda.org/conda-forge/noarch/perl-file-temp-0.2304-pl5321hd8ed1ab_0.tar.bz2
169 | https://conda.anaconda.org/bioconda/linux-64/perl-io-compress-2.201-pl5321hdbdd923_2.tar.bz2
170 | https://conda.anaconda.org/bioconda/linux-64/perl-json-xs-2.34-pl5321h4ac6f70_6.tar.bz2
171 | https://conda.anaconda.org/conda-forge/linux-64/perl-pathtools-3.75-pl5321h166bdaf_0.tar.bz2
172 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/noarch/perl-sub-info-0.002-pl5321hd8ed1ab_0.tar.bz2
173 | https://conda.anaconda.org/bioconda/noarch/perl-term-table-0.016-pl5321hdfd78af_0.tar.bz2
174 | https://conda.anaconda.org/conda-forge/noarch/progress-1.6-pyhd8ed1ab_0.tar.bz2
175 | https://conda.anaconda.org/t/ch-9f6a872e-99a2-45f8-b0d4-b6aae824fb17/conda-forge/noarch/pyasn1-0.5.1-pyhd8ed1ab_0.conda
176 | https://conda.anaconda.org/conda-forge/noarch/pycparser-2.21-pyhd8ed1ab_0.tar.bz2
177 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/noarch/pyjwt-2.8.0-pyhd8ed1ab_0.conda
178 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/noarch/pyparsing-3.1.1-pyhd8ed1ab_0.conda
179 | https://conda.anaconda.org/conda-forge/linux-64/python_abi-3.6-2_cp36m.tar.bz2
180 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/noarch/pytz-2023.3.post1-pyhd8ed1ab_0.conda
181 | https://conda.anaconda.org/bioconda/noarch/pyvcf3-1.0.3-pyhdfd78af_0.tar.bz2
182 | https://conda.anaconda.org/conda-forge/noarch/six-1.16.0-pyh6c4a22f_0.tar.bz2
183 | https://conda.anaconda.org/conda-forge/noarch/tensorboard-plugin-wit-1.8.1-pyhd8ed1ab_0.tar.bz2
184 | https://conda.anaconda.org/conda-forge/noarch/termcolor-1.1.0-pyhd8ed1ab_3.tar.bz2
185 | https://conda.anaconda.org/conda-forge/noarch/typing_extensions-4.1.1-pyha770c72_0.tar.bz2
186 | https://conda.anaconda.org/conda-forge/noarch/wheel-0.37.1-pyhd8ed1ab_0.tar.bz2
187 | https://conda.anaconda.org/conda-forge/linux-64/xorg-libxi-1.7.10-h7f98852_0.tar.bz2
188 | https://conda.anaconda.org/conda-forge/noarch/zipp-3.6.0-pyhd8ed1ab_0.tar.bz2
189 | https://conda.anaconda.org/t/ch-9f6a872e-99a2-45f8-b0d4-b6aae824fb17/conda-forge/noarch/absl-py-0.15.0-pyhd8ed1ab_0.tar.bz2
190 | https://conda.anaconda.org/conda-forge/noarch/astunparse-1.6.3-pyhd8ed1ab_0.tar.bz2
191 | https://conda.anaconda.org/bioconda/linux-64/bcftools-1.10-h5d15f04_0.tar.bz2
192 | https://conda.anaconda.org/conda-forge/linux-64/cairo-1.16.0-h18b612c_1001.tar.bz2
193 | https://conda.anaconda.org/conda-forge/linux-64/certifi-2021.5.30-py36h5fab9bb_0.tar.bz2
194 | https://conda.anaconda.org/conda-forge/linux-64/cffi-1.14.4-py36h211aa47_0.tar.bz2
195 | https://conda.anaconda.org/conda-forge/linux-64/chardet-4.0.0-py36h5fab9bb_1.tar.bz2
196 | https://conda.anaconda.org/conda-forge/noarch/google-pasta-0.2.0-pyh8c360ce_0.tar.bz2
197 | https://conda.anaconda.org/conda-forge/noarch/idna_ssl-1.1.0-pyhd8ed1ab_1002.tar.bz2
198 | https://conda.anaconda.org/conda-forge/linux-64/importlib-metadata-4.8.1-py36h5fab9bb_0.tar.bz2
199 | https://conda.anaconda.org/conda-forge/linux-64/kiwisolver-1.3.1-py36h605e78d_1.tar.bz2
200 | https://conda.anaconda.org/conda-forge/linux-64/multidict-5.2.0-py36h8f6f2f9_0.tar.bz2
201 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/linux-64/numpy-1.19.2-py36h68c22af_1.tar.bz2
202 | https://conda.anaconda.org/conda-forge/noarch/packaging-21.3-pyhd8ed1ab_0.tar.bz2
203 | https://conda.anaconda.org/bioconda/noarch/perl-archive-tar-2.40-pl5321hdfd78af_0.tar.bz2
204 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/noarch/perl-business-isbn-3.007-pl5321hd8ed1ab_0.tar.bz2
205 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/noarch/perl-file-chdir-0.1011-pl5321hd8ed1ab_0.tar.bz2
206 | https://conda.anaconda.org/bioconda/noarch/perl-json-4.10-pl5321hdfd78af_0.tar.bz2
207 | https://conda.anaconda.org/bioconda/noarch/perl-test2-suite-0.000145-pl5321hdfd78af_0.tar.bz2
208 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/noarch/perl-xml-sax-1.02-pl5321hd8ed1ab_0.tar.bz2
209 | https://conda.anaconda.org/conda-forge/linux-64/pillow-8.1.0-py36h4f9996e_1.tar.bz2
210 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/linux-64/psutil-5.8.0-py36h8f6f2f9_1.tar.bz2
211 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/noarch/pyasn1-modules-0.3.0-pyhd8ed1ab_0.conda
212 | https://conda.anaconda.org/t/ch-9f6a872e-99a2-45f8-b0d4-b6aae824fb17/bioconda/linux-64/pysam-0.21.0-py36h50b03f4_0.tar.bz2
213 | https://conda.anaconda.org/conda-forge/linux-64/pysocks-1.7.1-py36h5fab9bb_3.tar.bz2
214 | https://conda.anaconda.org/conda-forge/noarch/python-dateutil-2.8.2-pyhd8ed1ab_0.tar.bz2
215 | https://conda.anaconda.org/conda-forge/linux-64/python-isal-0.11.1-py36h8f6f2f9_0.tar.bz2
216 | https://conda.anaconda.org/conda-forge/noarch/pyu2f-0.1.5-pyhd8ed1ab_0.tar.bz2
217 | https://conda.anaconda.org/conda-forge/linux-64/pyvcf-0.6.8-py36h9f0ad1d_1002.tar.bz2
218 | https://conda.anaconda.org/conda-forge/noarch/rsa-4.9-pyhd8ed1ab_0.tar.bz2
219 | https://conda.anaconda.org/bioconda/linux-64/samtools-1.10-h2e538c0_3.tar.bz2
220 | https://conda.anaconda.org/conda-forge/linux-64/setuptools-58.0.4-py36h5fab9bb_2.tar.bz2
221 | https://conda.anaconda.org/bioconda/noarch/tabix-1.11-hdfd78af_0.tar.bz2
222 | https://conda.anaconda.org/conda-forge/linux-64/tensorboard-data-server-0.6.0-py36hc39840e_0.tar.bz2
223 | https://conda.anaconda.org/conda-forge/linux-64/tornado-6.1-py36h8f6f2f9_1.tar.bz2
224 | https://conda.anaconda.org/conda-forge/noarch/typing-extensions-4.1.1-hd8ed1ab_0.tar.bz2
225 | https://conda.anaconda.org/conda-forge/noarch/werkzeug-2.0.2-pyhd8ed1ab_0.tar.bz2
226 | https://conda.anaconda.org/conda-forge/linux-64/wrapt-1.13.1-py36h8f6f2f9_0.tar.bz2
227 | https://conda.anaconda.org/conda-forge/linux-64/xorg-libxtst-1.2.3-h7f98852_1002.tar.bz2
228 | https://conda.anaconda.org/conda-forge/linux-64/biopython-1.79-py36h8f6f2f9_0.tar.bz2
229 | https://conda.anaconda.org/conda-forge/linux-64/brotlipy-0.7.0-py36h8f6f2f9_1001.tar.bz2
230 | https://conda.anaconda.org/conda-forge/linux-64/click-8.0.1-py36h5fab9bb_0.tar.bz2
231 | https://conda.anaconda.org/conda-forge/linux-64/cryptography-35.0.0-py36hb60f036_0.tar.bz2
232 | https://conda.anaconda.org/conda-forge/linux-64/grpcio-1.38.1-py36h8e87921_0.tar.bz2
233 | https://conda.anaconda.org/conda-forge/linux-64/h5py-2.10.0-nompi_py36h4510012_106.tar.bz2
234 | https://conda.anaconda.org/t/ch-9f6a872e-99a2-45f8-b0d4-b6aae824fb17/conda-forge/linux-64/harfbuzz-2.4.0-h37c48d4_1.tar.bz2
235 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/noarch/markdown-3.5.1-pyhd8ed1ab_0.conda
236 | https://conda.anaconda.org/conda-forge/linux-64/matplotlib-base-3.3.4-py36hd391965_0.tar.bz2
237 | https://conda.anaconda.org/bioconda/noarch/nanosv-1.2.4-py_0.tar.bz2
238 | https://conda.anaconda.org/conda-forge/linux-64/numexpr-2.7.3-py36h0cdc3f0_0.tar.bz2
239 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/noarch/opt_einsum-3.3.0-pyhc1e730c_2.conda
240 | https://conda.anaconda.org/conda-forge/linux-64/pandas-1.1.5-py36h284efc9_0.tar.bz2
241 | https://conda.anaconda.org/bioconda/linux-64/perl-alien-build-2.48-pl5321hec16e2b_0.tar.bz2
242 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/linux-64/perl-uri-5.17-pl5321ha770c72_0.conda
243 | https://conda.anaconda.org/conda-forge/noarch/pip-21.3.1-pyhd8ed1ab_0.tar.bz2
244 | https://conda.anaconda.org/conda-forge/linux-64/protobuf-3.18.0-py36hc4f0c31_0.tar.bz2
245 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/linux-64/scipy-1.5.3-py36h976291a_0.tar.bz2
246 | https://conda.anaconda.org/bioconda/linux-64/tabixpp-1.1.0-hd2e4403_5.tar.bz2
247 | https://conda.anaconda.org/conda-forge/linux-64/xopen-1.2.0-py36h5fab9bb_0.tar.bz2
248 | https://conda.anaconda.org/conda-forge/linux-64/yarl-1.6.3-py36h8f6f2f9_2.tar.bz2
249 | https://conda.anaconda.org/conda-forge/linux-64/aiohttp-3.7.4.post0-py36h8f6f2f9_0.tar.bz2
250 | https://conda.anaconda.org/bioconda/noarch/cutesv-1.0.13-pyhdfd78af_0.tar.bz2
251 | https://conda.anaconda.org/conda-forge/noarch/keras-preprocessing-1.1.2-pyhd8ed1ab_0.tar.bz2
252 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/noarch/networkx-2.7-pyhd8ed1ab_0.tar.bz2
253 | https://conda.anaconda.org/conda-forge/noarch/oauthlib-3.2.2-pyhd8ed1ab_0.tar.bz2
254 | https://conda.anaconda.org/conda-forge/linux-64/openjdk-11.0.8-hacce0ff_0.tar.bz2
255 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/linux-64/pango-1.42.4-h7062337_4.tar.bz2
256 | https://conda.anaconda.org/bioconda/linux-64/perl-alien-libxml2-0.17-pl5321hec16e2b_0.tar.bz2
257 | https://conda.anaconda.org/bioconda/linux-64/pybedtools-0.9.0-py36h7281c5b_1.tar.bz2
258 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/bioconda/noarch/pyfaidx-0.7.2.1-pyh7cba7a3_1.tar.bz2
259 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/noarch/pyopenssl-22.0.0-pyhd8ed1ab_1.tar.bz2
260 | https://conda.anaconda.org/conda-forge/linux-64/pytables-3.6.1-py36hb7ec5aa_3.tar.bz2
261 | https://conda.anaconda.org/conda-forge/linux-64/tensorflow-estimator-2.6.0-py36hc4f0c31_0.tar.bz2
262 | https://conda.anaconda.org/bioconda/linux-64/vcflib-1.0.2-hfbaaabd_3.tar.bz2
263 | https://conda.anaconda.org/bioconda/linux-64/freebayes-1.3.5-py36h11ea90d_2.tar.bz2
264 | https://conda.anaconda.org/bioconda/noarch/gatk4-4.2.6.1-py36hdfd78af_1.tar.bz2
265 | https://conda.anaconda.org/bioconda/linux-64/perl-xml-libxml-2.0207-pl5321h661654b_0.tar.bz2
266 | https://repo.anaconda.com/pkgs/r/linux-64/r-base-3.4.3-h9bb98a2_5.conda
267 | https://conda.anaconda.org/bioconda/noarch/svim-1.4.2-py_0.tar.bz2
268 | https://repo.anaconda.com/pkgs/main/linux-64/tensorflow-base-2.2.0-mkl_py36hd506778_0.conda
269 | https://conda.anaconda.org/conda-forge/noarch/urllib3-1.26.15-pyhd8ed1ab_0.conda
270 | https://conda.anaconda.org/bioconda/linux-64/whatshap-1.0-py36hf1ae8f4_1.tar.bz2
271 | https://conda.anaconda.org/t/ch-9f6a872e-99a2-45f8-b0d4-b6aae824fb17/bioconda/linux-64/ncbi-vdb-3.0.9-hdbdd923_0.tar.bz2
272 | https://conda.anaconda.org/bioconda/noarch/picard-2.27.3-hdfd78af_0.tar.bz2
273 | https://conda.anaconda.org/conda-forge/noarch/requests-2.28.1-pyhd8ed1ab_0.tar.bz2
274 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/bioconda/linux-64/blast-2.15.0-pl5321h6f7f691_1.tar.bz2
275 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/conda-forge/noarch/google-auth-2.23.0-pyh1a96a4e_0.conda
276 | https://conda.anaconda.org/conda-forge/noarch/requests-oauthlib-1.3.1-pyhd8ed1ab_0.tar.bz2
277 | https://conda.anaconda.org/conda-forge/noarch/google-auth-oauthlib-0.4.6-pyhd8ed1ab_0.tar.bz2
278 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/bioconda/linux-64/hs-blastn-0.0.5-h4ac6f70_5.tar.bz2
279 | https://conda.anaconda.org/conda-forge/noarch/tensorboard-2.11.2-pyhd8ed1ab_0.conda
280 | https://repo.anaconda.com/pkgs/main/linux-64/tensorflow-2.2.0-mkl_py36h5a57954_0.conda
281 | https://conda.anaconda.org/t/ch-d7e07fb7-b42b-4fb7-a192-f5b023e0d555/bioconda/linux-64/clair3-0.1.11-py36hb9dc472_6.tar.bz2
282 | https://repo.anaconda.com/pkgs/main/linux-64/tensorflow-mkl-2.2.0-h4fcabd2_0.conda
283 | https://conda.anaconda.org/bioconda/linux-64/nanovar-1.3.9-py36hc5360cc_1.tar.bz2
284 |
--------------------------------------------------------------------------------
/testdata/testdata_snp.vcf:
--------------------------------------------------------------------------------
1 | ##fileformat=VCFv4.2
2 | ##source=Sniffles
3 | ##fileDate=20240116
4 | ##contig=
5 | ##ALT=
6 | ##ALT=
7 | ##ALT=
8 | ##ALT=
9 | ##ALT=
10 | ##ALT=
11 | ##INFO=
12 | ##INFO=
13 | ##INFO=
14 | ##INFO=
15 | ##INFO=
16 | ##INFO=
17 | ##FORMAT=
18 | #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE
19 | testdata_ref 5779 SNP7SURVIVOR C G PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
20 | testdata_ref 6259 SNP8SURVIVOR C A PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
21 | testdata_ref 6321 SNP9SURVIVOR G T PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
22 | testdata_ref 7667 SNP10SURVIVOR G C PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
23 | testdata_ref 7686 SNP11SURVIVOR A C PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
24 | testdata_ref 8152 SNP12SURVIVOR C T PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
25 | testdata_ref 8832 SNP13SURVIVOR C G PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
26 | testdata_ref 9785 SNP14SURVIVOR C G PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
27 | testdata_ref 10580 SNP15SURVIVOR T A PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
28 | testdata_ref 10657 SNP16SURVIVOR C T PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
29 | testdata_ref 11402 SNP17SURVIVOR G T PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
30 | testdata_ref 12419 SNP18SURVIVOR T G PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
31 | testdata_ref 13067 SNP19SURVIVOR G A PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
32 | testdata_ref 16352 SNP20SURVIVOR C T PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
33 | testdata_ref 16934 SNP21SURVIVOR C A PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
34 | testdata_ref 17920 SNP22SURVIVOR T G PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
35 | testdata_ref 20963 SNP23SURVIVOR T C PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
36 | testdata_ref 21772 SNP24SURVIVOR T G PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
37 | testdata_ref 21830 SNP25SURVIVOR T A PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
38 | testdata_ref 22589 SNP26SURVIVOR T A PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
39 | testdata_ref 24676 SNP27SURVIVOR C A PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
40 | testdata_ref 26487 SNP28SURVIVOR G T PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
41 | testdata_ref 30044 SNP29SURVIVOR A C PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
42 | testdata_ref 30181 SNP30SURVIVOR A G PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
43 | testdata_ref 30955 SNP31SURVIVOR A G PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
44 | testdata_ref 32641 SNP32SURVIVOR A C PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
45 | testdata_ref 32869 SNP33SURVIVOR T C PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
46 | testdata_ref 36042 SNP34SURVIVOR T G PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
47 | testdata_ref 44779 SNP39SURVIVOR G T PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
48 | testdata_ref 45035 SNP40SURVIVOR G C PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
49 | testdata_ref 45062 SNP41SURVIVOR G C PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
50 | testdata_ref 45661 SNP43SURVIVOR G A PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
51 | testdata_ref 46262 SNP44SURVIVOR G T PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
52 | testdata_ref 47054 SNP45SURVIVOR G A PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
53 | testdata_ref 49239 SNP46SURVIVOR C A PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
54 | testdata_ref 49552 SNP47SURVIVOR T G PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
55 | testdata_ref 51973 SNP48SURVIVOR T G PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
56 | testdata_ref 54182 SNP49SURVIVOR G A PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
57 | testdata_ref 54955 SNP50SURVIVOR G A PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
58 | testdata_ref 55590 SNP51SURVIVOR C A PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
59 | testdata_ref 56291 SNP52SURVIVOR G A PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
60 | testdata_ref 56331 SNP53SURVIVOR C A PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
61 | testdata_ref 56441 SNP54SURVIVOR A T PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
62 | testdata_ref 57509 SNP55SURVIVOR T G PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
63 | testdata_ref 57751 SNP56SURVIVOR C A PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
64 | testdata_ref 58388 SNP57SURVIVOR G C PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
65 | testdata_ref 58772 SNP58SURVIVOR T G PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
66 | testdata_ref 60984 SNP59SURVIVOR T A PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
67 | testdata_ref 62354 SNP60SURVIVOR T C PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
68 | testdata_ref 63089 SNP61SURVIVOR T G PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
69 | testdata_ref 63729 SNP62SURVIVOR T C PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
70 | testdata_ref 63743 SNP63SURVIVOR T G PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
71 | testdata_ref 63760 SNP64SURVIVOR A T PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
72 | testdata_ref 65044 SNP65SURVIVOR G T PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
73 | testdata_ref 66500 SNP66SURVIVOR G C PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
74 | testdata_ref 67874 SNP67SURVIVOR C A PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
75 | testdata_ref 67940 SNP68SURVIVOR G T PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
76 | testdata_ref 68668 SNP69SURVIVOR T C PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
77 | testdata_ref 68848 SNP70SURVIVOR A C PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
78 | testdata_ref 69389 SNP71SURVIVOR C T PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
79 | testdata_ref 69600 SNP72SURVIVOR T G PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
80 | testdata_ref 70764 SNP73SURVIVOR A C PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
81 | testdata_ref 70883 SNP74SURVIVOR C G PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
82 | testdata_ref 70975 SNP75SURVIVOR A G PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
83 | testdata_ref 71216 SNP76SURVIVOR G C PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
84 | testdata_ref 72316 SNP77SURVIVOR C G PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
85 | testdata_ref 73897 SNP78SURVIVOR T C PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
86 | testdata_ref 74704 SNP79SURVIVOR C A PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
87 | testdata_ref 77564 SNP80SURVIVOR C T PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
88 | testdata_ref 79018 SNP81SURVIVOR T G PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
89 | testdata_ref 85378 SNP84SURVIVOR T G PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
90 | testdata_ref 85921 SNP85SURVIVOR T C PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
91 | testdata_ref 85975 SNP86SURVIVOR T G PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
92 | testdata_ref 86100 SNP87SURVIVOR C T PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
93 | testdata_ref 88025 SNP88SURVIVOR A C PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
94 | testdata_ref 88437 SNP89SURVIVOR G C PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
95 | testdata_ref 88767 SNP90SURVIVOR T G PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
96 | testdata_ref 89951 SNP91SURVIVOR T C PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
97 | testdata_ref 90021 SNP92SURVIVOR A G PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
98 | testdata_ref 90347 SNP93SURVIVOR T C PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
99 | testdata_ref 91548 SNP94SURVIVOR T C PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
100 | testdata_ref 91571 SNP95SURVIVOR A G PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
101 | testdata_ref 93429 SNP96SURVIVOR T G PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
102 | testdata_ref 93866 SNP97SURVIVOR A T PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
103 | testdata_ref 94637 SNP98SURVIVOR T G PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
104 |
--------------------------------------------------------------------------------
/testdata/testdata_sv.vcf:
--------------------------------------------------------------------------------
1 | ##fileformat=VCFv4.2
2 | ##source=Sniffles
3 | ##fileDate=20240116
4 | ##contig=
5 | ##ALT=
6 | ##ALT=
7 | ##ALT=
8 | ##ALT=
9 | ##ALT=
10 | ##ALT=
11 | ##INFO=
12 | ##INFO=
13 | ##INFO=
14 | ##INFO=
15 | ##INFO=
16 | ##INFO=
17 | ##FORMAT=
18 | #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
19 | testdata_ref 82364 DEL0SURVIVOR N . LowQual PRECISE;SVTYPE=DEL;SVMETHOD=SURVIVOR_sim;CHR2=testdata_ref;END=85015;SVLEN=2651 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
20 | testdata_ref 4723 INS1SURVIVOR N . LowQual PRECISE;SVTYPE=INS;SVMETHOD=SURVIVOR_sim;CHR2=testdata_ref;END=7492;SVLEN=2769 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
21 | testdata_ref 9901 INS2SURVIVOR N . LowQual PRECISE;SVTYPE=INS;SVMETHOD=SURVIVOR_sim;CHR2=testdata_ref;END=12754;SVLEN=2853 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
22 | testdata_ref 39769 DEL3SURVIVOR N . LowQual PRECISE;SVTYPE=DEL;SVMETHOD=SURVIVOR_sim;CHR2=testdata_ref;END=44753;SVLEN=4984 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
23 | testdata_ref 81152 INV4SURVIVOR N . LowQual PRECISE;SVTYPE=INV;SVMETHOD=SURVIVOR_sim;CHR2=testdata_ref;END=81781;SVLEN=629 GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1
24 |
--------------------------------------------------------------------------------
/variantdetective.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 | """
3 | This module contains miscellaneous functions that are used in various
4 | components of VariantDetective.
5 |
6 | Copyright (C) 2022 Phil Charron (phil.charron@inspection.gc.ca)
7 | https://github.com/philcharron-cfia/VariantDetective
8 | """
9 |
10 | from variantdetective.main import main
11 |
12 | if __name__ == '__main__':
13 | main()
--------------------------------------------------------------------------------
/variantdetective/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/OLF-Bioinformatics/VariantDetective/5ca633e4879865db12112440f1f0c691f6767cd2/variantdetective/__init__.py
--------------------------------------------------------------------------------
/variantdetective/combine_variants.py:
--------------------------------------------------------------------------------
1 | """
2 | This file contains code for the combine_variants subcommand.
3 |
4 | Copyright (C) 2024 Phil Charron (phil.charron@inspection.gc.ca)
5 | https://github.com/OLF-Bioinformatics/VariantDetective
6 | """
7 |
8 | import os
9 | import sys
10 | import datetime
11 | from .tools import run_process, read_vcf, generate_tab_csv_snp_summary, generate_tab_csv_sv_summary
12 |
13 |
14 | def combine_variants(args, vcf_lists, output=sys.stderr):
15 | if len(vcf_lists[0]) != 0:
16 | snp_indel_outdir = os.path.join(args.out, 'snp_indel')
17 | if not os.path.isdir(snp_indel_outdir):
18 | os.makedirs(snp_indel_outdir)
19 |
20 | out_vcf_list = []
21 | for vcf_file in vcf_lists[0]:
22 | basename = os.path.basename(vcf_file)
23 | out_vcf_file = snp_indel_outdir + '/' + basename + '.gz'
24 | out_vcf_list.append(out_vcf_file)
25 | command = 'bgzip -c -@ ' + str(args.threads) + ' ' + \
26 | vcf_file + ' > ' + out_vcf_file
27 | run_process(command)
28 |
29 | command = 'tabix -p vcf ' + out_vcf_file
30 | run_process(command)
31 | out_vcf_string = ' '.join(out_vcf_list)
32 | print(str(datetime.datetime.now().replace(microsecond=0)) + '\tCombining SNP VCF files...', file=output)
33 | command = 'vcf-isec -p ' + snp_indel_outdir + '/snp_ ' + out_vcf_string
34 | run_process(command)
35 |
36 | snp_output_files = []
37 | for path, currentDirectory, files in os.walk(snp_indel_outdir):
38 | for file in files:
39 | if file.startswith("snp_") and file.endswith(".vcf.gz"):
40 | snp_output_files.append(file)
41 |
42 |
43 | for snp_file in snp_output_files:
44 | command = 'tabix -f -p vcf ' + snp_indel_outdir + '/' + snp_file
45 | run_process(command)
46 |
47 | filtered_list = [filename for filename in snp_output_files if filename.count('_') >= args.snp_consensus]
48 | updated_list = [snp_indel_outdir + "/" + filename for filename in filtered_list]
49 |
50 | filtered_string = ' '.join(updated_list)
51 | if len(updated_list) > 0:
52 | command = 'bcftools concat -a ' + filtered_string + \
53 | ' -o ' + snp_indel_outdir + '/snp_final.vcf'
54 | run_process(command)
55 |
56 | command = 'rm ' + snp_indel_outdir + '/*.tbi '
57 | run_process(command)
58 |
59 | generate_tab_csv_snp_summary(read_vcf(snp_indel_outdir + '/snp_final.vcf'), snp_indel_outdir)
60 |
61 | if len(vcf_lists[1]) != 0:
62 | structural_variant_outdir = os.path.join(args.out, 'structural_variant')
63 | if not os.path.isdir(structural_variant_outdir):
64 | os.makedirs(structural_variant_outdir)
65 |
66 | print(str(datetime.datetime.now().replace(microsecond=0)) + '\tCombining SV VCF files...', file=output)
67 | vcf_string = ' '.join(vcf_lists[1])
68 | command = 'ls ' + vcf_string + ' > ' + structural_variant_outdir + '/vcf_list'
69 | run_process(command)
70 |
71 | command = 'SURVIVOR merge ' + structural_variant_outdir + \
72 | '/vcf_list 1000 ' + str(args.sv_consensus) + ' 1 1 0 ' + str(args.minlen_sv) + ' ' \
73 | + structural_variant_outdir + '/combined_sv.vcf'
74 | run_process(command)
75 |
76 | command = 'rm ' + structural_variant_outdir + '/vcf_list'
77 | run_process(command)
78 |
79 | generate_tab_csv_sv_summary(read_vcf(structural_variant_outdir + '/combined_sv.vcf'), structural_variant_outdir)
--------------------------------------------------------------------------------
/variantdetective/fragment_lengths.py:
--------------------------------------------------------------------------------
1 | """
2 | This module contains a class for describing fragment length distributions
3 | (described by the gamma distribution) and related functions.
4 |
5 | Copyright (C) 2024 Phil Charron (phil.charron@inspection.gc.ca)
6 | https://github.com/OLF-Bioinformatics/VariantDetective
7 | Portions Copyright (C) 2018 Ryan Wick (rrwick@gmail.com)
8 | https://github.com/rrwick/Badread
9 | """
10 |
11 | import numpy as np
12 | import sys
13 |
14 | ### NEEDED
15 | class FragmentLengths(object):
16 | def __init__(self, mean, stdev, output=sys.stderr):
17 | self.mean = mean
18 | self.stdev = stdev
19 | if self.stdev == 0:
20 | self.gamma_k, self.gamma_t = None, None
21 | else: # gamma distribution
22 | gamma_a, gamma_b, self.gamma_k, self.gamma_t = gamma_parameters(mean, stdev)
23 |
24 | def get_fragment_length(self):
25 | if self.stdev == 0:
26 | return int(round(self.mean))
27 | else: # gamma distribution
28 | fragment_length = int(round(np.random.gamma(self.gamma_k, self.gamma_t)))
29 | return max(fragment_length, 1)
30 |
31 | def gamma_parameters(gamma_mean, gamma_stdev):
32 | # Shape and rate parametrisation:
33 | gamma_a = (gamma_mean ** 2) / (gamma_stdev ** 2)
34 | gamma_b = gamma_mean / (gamma_stdev ** 2)
35 |
36 | # Shape and scale parametrisation:
37 | gamma_k = (gamma_mean ** 2) / (gamma_stdev ** 2)
38 | gamma_t = (gamma_stdev ** 2) / gamma_mean
39 |
40 | return gamma_a, gamma_b, gamma_k, gamma_t
--------------------------------------------------------------------------------
/variantdetective/main.py:
--------------------------------------------------------------------------------
1 | """
2 | This file contains the main (entry-point) function into VariantDetective.
3 | It can be run using the variantdetective.py script.
4 |
5 | Copyright (C) 2024 Phil Charron (phil.charron@inspection.gc.ca)
6 | https://github.com/OLF-Bioinformatics/VariantDetective
7 | """
8 |
9 | import argparse
10 | import os
11 | import pathlib
12 | import shutil
13 | import sys
14 | import datetime
15 |
16 | from variantdetective.tools import get_new_filename
17 | from .validate_inputs import validate_inputs
18 | from .version import __version__
19 |
20 | def main(output=sys.stderr):
21 | check_python_version()
22 | args = parse_args(sys.argv[1:])
23 |
24 | if args.subparser_name == 'structural_variant':
25 | check_structural_variant_args(args)
26 |
27 | elif args.subparser_name == 'snp_indel':
28 | check_snp_indel_args(args)
29 |
30 | elif args.subparser_name == 'all_variants':
31 | check_all_variants_args(args)
32 |
33 | elif args.subparser_name == 'combine_variants':
34 | check_combine_variants_args(args)
35 |
36 | create_outdir(args)
37 | copy_inputs(args)
38 | #print(str(datetime.datetime.now().replace(microsecond=0)) + '\tValidating input files...', file=output)
39 | validate_inputs(args, output=output)
40 |
41 | def parse_args(args):
42 | parser = argparse.ArgumentParser(description='VariantDetective: Identify single nucleotide variants (SNV), '
43 | 'insertions/deletions (indel) and/or structural variants (SV) from '
44 | 'FASTQ reads or FASTA genomic sequences.',
45 | formatter_class=NoSubparsersMetavarFormatter,
46 | add_help=False)
47 |
48 | help_args = parser.add_argument_group('Help')
49 | help_args.add_argument('-h', '--help', action='help',
50 | default=argparse.SUPPRESS,
51 | help='Show this help message and exit')
52 | help_args.add_argument('-v', '--version', action='version',
53 | version='VariantDetective v' + __version__,
54 | help="Show program version number and exit")
55 |
56 | subparsers = parser.add_subparsers(title='Commands', dest='subparser_name',
57 | metavar=None)
58 | all_variants_subparser(subparsers)
59 | structural_variant_subparser(subparsers)
60 | snp_indel_subparser(subparsers)
61 | combine_variants_subparser(subparsers)
62 |
63 | # If no arguments were used, print the base-level help.
64 | if len(args) == 0:
65 | parser.print_help(file=sys.stderr)
66 | sys.exit(1)
67 |
68 | return parser.parse_args(args)
69 |
70 |
71 | def structural_variant_subparser(subparsers):
72 | help = 'Identify structural variants (SV) from long reads (FASTQ) or genome sequence (FASTA). \
73 | If input is FASTA, long reads will be simulated to detect SVs.'
74 | definition = 'Identify structural variants (SV) from long reads (FASTQ) or genome sequence (FASTA). \
75 | If input is FASTA, long reads will be simulated to detect SVs.'
76 |
77 | group = subparsers.add_parser('structural_variant', description=definition,
78 | help=help,
79 | formatter_class=argparse.HelpFormatter,
80 | add_help=False)
81 |
82 | help_args = group.add_argument_group('Help')
83 | help_args.add_argument('-h', '--help', action='help', default=argparse.SUPPRESS,
84 | help='Show this help message and exit')
85 | help_args.add_argument('-v', '--version', action='version',
86 | version='V v' + __version__,
87 | help="Show program version number and exit")
88 |
89 | input_args = group.add_argument_group('Input')
90 | input_args.add_argument('-l', '--long', type=str, metavar='[FASTQ]',
91 | help="Path to long reads FASTQ file. Can't be combined with -g")
92 | input_args.add_argument('-g', '--genome', type=str, metavar='[FASTA]',
93 | help="Path to query genomic FASTA file. Can't be combined with -l")
94 | input_args.add_argument('-r', '--reference', type=str, required=True, metavar='[FASTA]',
95 | help='Path to reference genome in FASTA. Required')
96 |
97 | simulate_args = group.add_argument_group('Simulate')
98 | simulate_args.add_argument("--readcov", type=str, default='50x',
99 | help='Either an absolute value (e.g. 250M) or a relative depth (e.g. 50x) (default: %(default)s)')
100 | simulate_args.add_argument("--readlen", type=str, default='15000,13000',
101 | help='Fragment length distribution (mean,stdev) (default: %(default)s)')
102 |
103 | nanovar_args = group.add_argument_group('Structural Variant Call')
104 | nanovar_args.add_argument("--data_type_sv", type=str, default='ont', choices=['ont', 'pacbio'],
105 | help="Type of long-read data (ont or pacbio) (default: %(default)s)")
106 | nanovar_args.add_argument("--mincov_sv", type=int, default=2,
107 | help='Minimum number of reads required to call variant (default: %(default)i)')
108 | nanovar_args.add_argument("--minlen_sv", type=int, default=25,
109 | help='Minimum length of SV to be detected (default: %(default)i)')
110 | nanovar_args.add_argument("--minqual_sv", type=int, default=15,
111 | help='Minimum quality of SV to be filtered out from SVIM (default: %(default)i)')
112 | nanovar_args.add_argument("--sv_consensus", type=int, default=3,
113 | help='Specifies the minimum number of tools required to detect an SV to include it in the consensus list (default: %(default)i)')
114 |
115 |
116 | other_args = group.add_argument_group('Other')
117 | other_args.add_argument('-o', "--out", type=str, default='./',
118 | help='Output directory. Will be created if it does not exist')
119 | other_args.add_argument('-t', '--threads', type=int, default=1,
120 | help='Number of threads used for job (default: %(default)i)')
121 |
122 |
123 | def all_variants_subparser(subparsers):
124 | help = 'Identify structural variants (SV) from long reads (FASTQ) and SNPs/indels from short reads (FASTQ). \
125 | If genome sequence (FASTA) is provided instead, simulate reads and predict SV, SNPs and indels.'
126 | definition = 'Identify structural variants (SV) from long reads (FASTQ) and SNPs/indels from short reads (FASTQ). \
127 | If genome sequence (FASTA) is provided instead, simulate reads and predict SV, SNPs and indels.'
128 |
129 | group = subparsers.add_parser('all_variants', description=definition,
130 | help=help,
131 | formatter_class=argparse.HelpFormatter,
132 | add_help=False)
133 |
134 | help_args = group.add_argument_group('Help')
135 | help_args.add_argument('-h', '--help', action='help', default=argparse.SUPPRESS,
136 | help='Show this help message and exit')
137 | help_args.add_argument('-v', '--version', action='version',
138 | version='VariantDetective v' + __version__,
139 | help="Show program version number and exit")
140 |
141 | input_args = group.add_argument_group('Input')
142 | input_args.add_argument('-l', '--long', type=str, metavar='[FASTQ]',
143 | help="Path to long reads FASTQ file. Must be combined with -1 and -2")
144 | input_args.add_argument('-1', '--short1', type=str, metavar='[FASTQ]',
145 | help="Path to pair 1 of short reads FASTQ file. Must be combined with -l and -2")
146 | input_args.add_argument('-2', '--short2', type=str, metavar='[FASTQ]',
147 | help="Path to pair 2 of short reads FASTQ file. Must be combined with -l and -1")
148 | input_args.add_argument('-g', '--genome', type=str, metavar='[FASTA]',
149 | help="Path to query genomic FASTA file. Can't be combined with -l, -1 or -2")
150 | input_args.add_argument('-r', '--reference', type=str, required=True, metavar='[FASTA]',
151 | help='Path to reference genome in FASTA. Required')
152 |
153 | simulate_args = group.add_argument_group('Simulate')
154 | simulate_args.add_argument("--readcov", type=str, default='50x',
155 | help='Either an absolute value (e.g. 250M) or a relative depth (e.g. 50x) (default: %(default)s)')
156 | simulate_args.add_argument("--readlen", type=str, default='15000,13000',
157 | help='Fragment length distribution (mean,stdev) (default: %(default)s)')
158 |
159 | nanovar_args = group.add_argument_group('Structural Variant Call')
160 | nanovar_args.add_argument("--data_type_sv", type=str, default='ont', choices=['ont', 'pacbio'],
161 | help="Type of long-read data (ont or pacbio) (default: %(default)s)")
162 | nanovar_args.add_argument("--mincov_sv", type=int, default=2,
163 | help='Minimum number of reads required to call SV (default: %(default)i)')
164 | nanovar_args.add_argument("--minlen_sv", type=int, default=25,
165 | help='Minimum length of SV to be detected (default: %(default)i)')
166 | nanovar_args.add_argument("--minqual_sv", type=int, default=15,
167 | help='Minimum quality of SV to be filtered out with SVIM (default: %(default)i)')
168 | nanovar_args.add_argument("--sv_consensus", type=int, default=3,
169 | help='Specifies the minimum number of tools required to detect an SV to include it in the consensus list (default: %(default)i)')
170 |
171 | snp_args = group.add_argument_group('SNP/Indel Call')
172 | snp_args.add_argument("--mincov_snp", type=int, default=2,
173 | help='Minimum number of reads required to call SNP/Indel (default: %(default)i)')
174 | snp_args.add_argument("--minqual_snp", type=int, default=20,
175 | help='Minimum quality of SNP/Indel to be filtered out (default: %(default)i)')
176 | snp_args.add_argument("--assembler", type=str, default='bwa', choices=['bwa', 'minimap2'],
177 | help='Choose which assembler (bwa or minimap2) to use when using paired-end short reads (default: %(default)s)')
178 | snp_args.add_argument("--snp_consensus", type=int, default=2,
179 | help='Specifies the minimum number of tools required to detect an SNP or Indel to include it in the consensus list (default: %(default)i)')
180 | snp_args.add_argument("--custom_clair3_model", type=str,
181 | help='Path to custom model for Clair3 variant calling (such as ones from Rerio)')
182 |
183 | other_args = group.add_argument_group('Other')
184 | other_args.add_argument('-o', "--out", type=str, default='./',
185 | help='Output directory. Will be created if it does not exist')
186 | other_args.add_argument('-t', '--threads', type=int, default=1,
187 | help='Number of threads used for job (default: %(default)i)')
188 |
189 | def snp_indel_subparser(subparsers):
190 | help = 'Identify SNPs/indels from short reads (FASTQ). \
191 | If genome sequence (FASTA) is provided instead, simulate reads and predict SNPs and indels.'
192 | definition = 'Identify SNPs/indels from short reads (FASTQ). \
193 | If genome sequence (FASTA) is provided instead, simulate reads and predict SNPs and indels.'
194 |
195 | group = subparsers.add_parser('snp_indel', description=definition,
196 | help=help,
197 | formatter_class=argparse.HelpFormatter,
198 | add_help=False)
199 |
200 | help_args = group.add_argument_group('Help')
201 | help_args.add_argument('-h', '--help', action='help', default=argparse.SUPPRESS,
202 | help='Show this help message and exit')
203 | help_args.add_argument('-v', '--version', action='version',
204 | version='VariantDetective v' + __version__,
205 | help="Show program version number and exit")
206 |
207 | input_args = group.add_argument_group('Input')
208 | input_args.add_argument('-1', '--short1', type=str, metavar='[FASTQ]',
209 | help="Path to pair 1 of short reads FASTQ file. Must be combined with -2")
210 | input_args.add_argument('-2', '--short2', type=str, metavar='[FASTQ]',
211 | help="Path to pair 2 of short reads FASTQ file. Must be combined with -1")
212 | input_args.add_argument('-g', '--genome', type=str, metavar='[FASTA]',
213 | help="Path to query genomic FASTA file. Can't be combined with -1 or -2")
214 | input_args.add_argument('-r', '--reference', type=str, required=True, metavar='[FASTA]',
215 | help='Path to reference genome in FASTA. Required')
216 |
217 | simulate_args = group.add_argument_group('Simulate')
218 | simulate_args.add_argument("--readcov", type=str, default='50x',
219 | help='Either an absolute value (e.g. 250M) or a relative depth (e.g. 50x) (default: %(default)s)')
220 | simulate_args.add_argument("--readlen", type=str, default='15000,13000',
221 | help='Fragment length distribution (mean,stdev) (default: %(default)s)')
222 |
223 | snp_args = group.add_argument_group('SNP/Indel Call')
224 | snp_args.add_argument("--mincov_snp", type=int, default=2,
225 | help='Minimum number of reads required to call SNP/Indel (default: %(default)i)')
226 | snp_args.add_argument("--minqual_snp", type=int, default=20,
227 | help='Minimum quality of SNP/Indel to be filtered out (default: %(default)i)')
228 | snp_args.add_argument("--assembler", type=str, default='bwa', choices=['bwa', 'minimap2'],
229 | help='Choose which assembler (bwa or minimap2) to use when using paired-end short reads (default: %(default)s)')
230 | snp_args.add_argument("--snp_consensus", type=int, default=2,
231 | help='Specifies the minimum number of tools required to detect an SNP or Indel to include it in the consensus list (default: %(default)i)')
232 | snp_args.add_argument("--custom_clair3_model", type=str,
233 | help='Path to custom model for Clair3 variant calling (such as ones from Rerio)')
234 |
235 | other_args = group.add_argument_group('Other')
236 | other_args.add_argument('-o', "--out", type=str, default='./',
237 | help='Output directory. Will be created if it does not exist')
238 | other_args.add_argument('-t', '--threads', type=int, default=1,
239 | help='Number of threads used for job (default: %(default)i)')
240 |
241 | def combine_variants_subparser(subparsers):
242 | help = 'Combine VCF files predicted using other tools.'
243 | definition = 'Combine VCF files predicted using other tools.'
244 |
245 | group = subparsers.add_parser('combine_variants', description=definition,
246 | help=help,
247 | formatter_class=argparse.HelpFormatter,
248 | add_help=False)
249 | help_args = group.add_argument_group('Help')
250 | help_args.add_argument('-h', '--help', action='help', default=argparse.SUPPRESS,
251 | help='Show this help message and exit')
252 | help_args.add_argument('-v', '--version', action='version',
253 | version='VariantDetective v' + __version__,
254 | help="Show program version number and exit")
255 |
256 | input_args = group.add_argument_group('Input')
257 | input_args.add_argument('--snp_vcf', type=str, nargs='+',
258 | help="Path to SNP VCF files. Separate each VCF file path with a space.")
259 | input_args.add_argument("--snp_consensus", type=int,
260 | help='Specifies the minimum number of tools required to detect an SNP or Indel to include it in the consensus list.')
261 | input_args.add_argument('--sv_vcf', type=str, nargs='+',
262 | help="Path to SV VCF files. Separate each VCF file path with a space.")
263 | input_args.add_argument("--sv_consensus", type=int,
264 | help='Specifies the minimum number of tools required to detect an SV to include it in the consensus list.')
265 | input_args.add_argument("--minlen_sv", type=int, default=25,
266 | help='Minimum length of SV to be detected (default: %(default)i)')
267 |
268 | other_args = group.add_argument_group('Other')
269 | other_args.add_argument('-o', "--out", type=str, default='./',
270 | help='Output directory. Will be created if it does not exist')
271 | other_args.add_argument('-t', '--threads', type=int, default=1,
272 | help='Number of threads used for job (default: %(default)i)')
273 |
274 | def check_all_variants_args(args):
275 | if args.long is not None and not pathlib.Path(args.long).is_file():
276 | sys.exit(f'Error: input file {args.long} does not exist')
277 | if args.short1 is not None and not pathlib.Path(args.short1).is_file():
278 | sys.exit(f'Error: input file {args.short1} does not exist')
279 | if args.short2 is not None and not pathlib.Path(args.short2).is_file():
280 | sys.exit(f'Error: input file {args.short2} does not exist')
281 | if args.genome is not None and not pathlib.Path(args.genome).is_file():
282 | sys.exit(f'Error: input file {args.genome} does not exist')
283 | if not pathlib.Path(args.reference).is_file():
284 | sys.exit(f'Error: reference file {args.reference} does not exist')
285 |
286 | if args.long is None and args.short1 is None and args.short2 is None and args.genome is None:
287 | sys.exit("At least one input must be specified. Must use genomic FASTA (-g) or long read FASTQ (-l), short read pair 1 FASTQ (-1) and short read pair 2 FASTQ (-2).")
288 | if args.genome is not None and (args.long is not None or args.short1 is not None or args.short2 is not None):
289 | sys.exit("Cannot use FASTA (-g) with other inputs.")
290 | if args.genome is None and (args.long is None or args.short1 is None or args.short2 is None):
291 | sys.exit("Must use long read FASTQ (-l), short read pair 1 FASTQ (-1) and short read pair 2 FASTQ (-2) when calling all variants.")
292 | if args.mincov_sv < 1:
293 | sys.exit(f'Error: minimum coverage must be over 1')
294 | if args.minlen_sv < 1:
295 | sys.exit('Error: minimum length of SV must be over 1')
296 | if args.mincov_snp < 1:
297 | sys.exit(f'Error: minimum coverage must be over 1')
298 | if args.snp_consensus < 1 or args.snp_consensus > 3:
299 | sys.exit(f'Error: snp_consensus must be between 1 and 3')
300 | if args.sv_consensus < 1 or args.sv_consensus > 4:
301 | sys.exit(f'Error: sv_consensus must be between 1 and 4')
302 | if args.minqual_sv < 0:
303 | sys.exit(f'Error: minimum quality of SV must be over 0')
304 | if args.minqual_snp < 0:
305 | sys.exit(f'Error: minimum quality of SNP must be over 0')
306 | try:
307 | length_parameters = [float(x) for x in args.readlen.split(',')]
308 | args.mean_frag_length = length_parameters[0]
309 | args.frag_length_stdev = length_parameters[1]
310 | except (ValueError, IndexError):
311 | sys.exit('Error: could not parse --length values')
312 | if args.mean_frag_length <= 100:
313 | sys.exit(f'Error: mean read length must be at least 100')
314 | if args.frag_length_stdev < 0:
315 | sys.exit('Error: read length stdev cannot be negative')
316 |
317 |
318 | def check_structural_variant_args(args):
319 | if args.long is not None and not pathlib.Path(args.long).is_file():
320 | sys.exit(f'Error: input file {args.long} does not exist')
321 | if args.genome is not None and not pathlib.Path(args.genome).is_file():
322 | sys.exit(f'Error: input file {args.genome} does not exist')
323 | if not pathlib.Path(args.reference).is_file():
324 | sys.exit(f'Error: reference file {args.reference} does not exist')
325 | if args.long is None and args.genome is None:
326 | sys.exit("At least one input must be specified. Must use long-read FASTQ (-l) or genomic FASTA (-g).")
327 | if args.long is not None and args.genome is not None:
328 | sys.exit("Only one input can be specified. Can't use FASTQ (-l) and FASTA (-g) together.")
329 | if args.mincov_sv < 1:
330 | sys.exit(f'Error: minimum coverage must be over 1')
331 | if args.minlen_sv < 1:
332 | sys.exit('Error: minimum length of SV must be over 1')
333 | if args.sv_consensus < 1 or args.sv_consensus > 4:
334 | sys.exit(f'Error: sv_consensus must be between 1 and 4')
335 | if args.minqual_sv < 0:
336 | sys.exit(f'Error: minimum quality of SV must be over 0')
337 |
338 | try:
339 | length_parameters = [float(x) for x in args.readlen.split(',')]
340 | args.mean_frag_length = length_parameters[0]
341 | args.frag_length_stdev = length_parameters[1]
342 | except (ValueError, IndexError):
343 | sys.exit('Error: could not parse --length values')
344 | if args.mean_frag_length <= 100:
345 | sys.exit(f'Error: mean read length must be at least 100')
346 | if args.frag_length_stdev < 0:
347 | sys.exit('Error: read length stdev cannot be negative')
348 |
349 | def check_snp_indel_args(args):
350 | if args.short1 is not None and not pathlib.Path(args.short1).is_file():
351 | sys.exit(f'Error: input file {args.short1} does not exist')
352 | if args.short2 is not None and not pathlib.Path(args.short2).is_file():
353 | sys.exit(f'Error: input file {args.short2} does not exist')
354 | if args.genome is not None and not pathlib.Path(args.genome).is_file():
355 | sys.exit(f'Error: input file {args.genome} does not exist')
356 | if not pathlib.Path(args.reference).is_file():
357 | sys.exit(f'Error: reference file {args.reference} does not exist')
358 |
359 | if args.short1 is None and args.short2 is None and args.genome is None:
360 | sys.exit("At least one input must be specified. Must use genomic FASTA (-g) or short read pair 1 FASTQ (-1) and short read pair 2 FASTQ (-2).")
361 | if args.genome is not None and (args.short1 is not None or args.short2 is not None):
362 | sys.exit("Cannot use FASTA (-g) with other inputs.")
363 | if args.genome is None and (args.short1 is None or args.short2 is None):
364 | sys.exit("Must use short read pair 1 FASTQ (-1) and short read pair 2 FASTQ (-2) when calling SNPs and indels.")
365 | if args.mincov_snp < 1:
366 | sys.exit(f'Error: minimum coverage must be over 1')
367 | if args.snp_consensus < 1 or args.snp_consensus > 3:
368 | sys.exit(f'Error: snp_consensus must be between 1 and 3')
369 | if args.minqual_snp < 0:
370 | sys.exit(f'Error: minimum quality of SNP must be over 0')
371 |
372 | try:
373 | length_parameters = [float(x) for x in args.readlen.split(',')]
374 | args.mean_frag_length = length_parameters[0]
375 | args.frag_length_stdev = length_parameters[1]
376 | except (ValueError, IndexError):
377 | sys.exit('Error: could not parse --length values')
378 | if args.mean_frag_length <= 100:
379 | sys.exit(f'Error: mean read length must be at least 100')
380 | if args.frag_length_stdev < 0:
381 | sys.exit('Error: read length stdev cannot be negative')
382 |
383 | def check_combine_variants_args(args):
384 | if args.snp_vcf is not None:
385 | num_vcf = len(args.snp_vcf)
386 | if num_vcf == 1:
387 | sys.exit('Error: must have more than 1 VCF to create consensus set.')
388 | if args.snp_consensus is None:
389 | sys.exit('Error: must specify the minimum number of tools required to include SNP in the consensus list using --snp_consensus.')
390 | if args.snp_consensus > num_vcf:
391 | sys.exit('Error: minimum number of consensus VCF files is larger than number of VCF files provided.')
392 | for vcf_file in args.snp_vcf:
393 | if not pathlib.Path(vcf_file).is_file():
394 | sys.exit(f'Error: VCF file {vcf_file} does not exist.')
395 |
396 | if args.sv_vcf is not None:
397 | num_vcf = len(args.sv_vcf)
398 | if num_vcf == 1:
399 | sys.exit('Error: must have more than 1 VCF to create consensus set.')
400 | if args.sv_consensus is None:
401 | sys.exit('Error: must specify the minimum number of tools required to include SNP in the consensus list using --snp_consensus.')
402 | if args.sv_consensus > num_vcf:
403 | sys.exit('Error: minimum number of consensus VCF files is larger than number of VCF files provided.')
404 | for vcf_file in args.sv_vcf:
405 | if not pathlib.Path(vcf_file).is_file():
406 | sys.exit(f'Error: VCF file {vcf_file} does not exist.')
407 |
408 | def check_python_version():
409 | if sys.version_info.major < 3 or sys.version_info.minor < 6:
410 | sys.exit('Error: VariantDetective requires Python 3.6 or later')
411 |
412 | def copy_file(file1, file2):
413 | try:
414 | shutil.copyfile(file1, file2)
415 | except shutil.SameFileError:
416 | pass
417 |
418 | def copy_inputs(args):
419 | if args.subparser_name != 'combine_variants':
420 | copy_file(args.reference, get_new_filename(args.reference, args.out))
421 | if args.genome is not None:
422 | copy_file(args.genome, get_new_filename(args.genome, args.out))
423 | if args.subparser_name == 'structural_variant':
424 | if args.long is not None:
425 | copy_file(args.long, get_new_filename(args.long, args.out))
426 | elif args.subparser_name == 'snp_indel':
427 | if args.short1 is not None:
428 | copy_file(args.short1, get_new_filename(args.short1, args.out))
429 | if args.short2 is not None:
430 | copy_file(args.short2, get_new_filename(args.short2, args.out))
431 | else:
432 | if args.short1 is not None:
433 | copy_file(args.short1, get_new_filename(args.short1, args.out))
434 | if args.short2 is not None:
435 | copy_file(args.short2, get_new_filename(args.short2, args.out))
436 | if args.long is not None:
437 | copy_file(args.long, get_new_filename(args.long, args.out))
438 | else:
439 | if args.snp_vcf is not None:
440 | for vcf_file in args.snp_vcf:
441 | copy_file(vcf_file, get_new_filename(vcf_file, args.out))
442 | if args.sv_vcf is not None:
443 | for vcf_file in args.sv_vcf:
444 | copy_file(vcf_file, get_new_filename(vcf_file, args.out))
445 |
446 | def create_outdir(args):
447 | if not os.path.isdir(args.out):
448 | os.makedirs(args.out)
449 |
450 | class NoSubparsersMetavarFormatter(argparse.HelpFormatter):
451 | """
452 | This is a custom formatter class for argparse. It allows for some custom
453 | formatting, in particular for the help texts when dealing with subparsers
454 | action. It removes subparsers metavar and help line in subcommand argument
455 | group, and removes extra indentation of those subcommands.
456 | https://stackoverflow.com/questions/11070268/
457 | """
458 |
459 | def _format_action(self, action):
460 | result = super()._format_action(action)
461 | if isinstance(action, argparse._SubParsersAction):
462 | # fix indentation on first line
463 | return "%*s%s" % (self._current_indent, "", result.lstrip())
464 | return result
465 | def _format_action_invocation(self, action):
466 | if isinstance(action, argparse._SubParsersAction):
467 |
468 | # remove metavar and help line
469 | return ""
470 | return super()._format_action_invocation(action)
471 | def _iter_indented_subactions(self, action):
472 | if isinstance(action, argparse._SubParsersAction):
473 | try:
474 | get_subactions = action._get_subactions
475 | except AttributeError:
476 | pass
477 | else:
478 | # remove indentation
479 | yield from get_subactions()
480 | else:
481 | yield from super()._iter_indented_subactions(action)
482 |
483 | if __name__ == '__main__':
484 | main()
485 |
--------------------------------------------------------------------------------
/variantdetective/simulate.py:
--------------------------------------------------------------------------------
1 | """
2 | This file contains code required to run the long read simulation of VariantDetective.
3 |
4 | Copyright (C) 2024 Phil Charron (phil.charron@inspection.gc.ca)
5 | https://github.com/OLF-Bioinformatics/VariantDetective
6 | Portions Copyright (C) 2018 Ryan Wick (rrwick@gmail.com)
7 | https://github.com/rrwick/Badread
8 | """
9 | import datetime
10 | import multiprocessing
11 | import os
12 | import random
13 | import sys
14 | import uuid
15 | from .simulate_tools import load_fasta, reverse_complement, random_chance
16 | from .fragment_lengths import FragmentLengths
17 | from .version import __version__
18 |
19 | def simulate(args, input_fasta, output=sys.stderr):
20 | split_path = os.path.splitext(input_fasta)
21 | if split_path[1] == ".gz":
22 | output_name = os.path.splitext(split_path[0])[0] + ".fastq"
23 | else:
24 | output_name = split_path[0] + ".fastq"
25 | print(str(datetime.datetime.now().replace(microsecond=0)) + '\tSimulating long-reads from genomic sequence...', file=output)
26 | ref_seqs, ref_depths, ref_circular = load_reference(input_fasta)
27 | rev_comp_ref_seqs = {name: reverse_complement(seq) for name, seq in ref_seqs.items()}
28 | frag_lengths = FragmentLengths(args.mean_frag_length, args.frag_length_stdev, output)
29 | adjust_depths(ref_seqs, ref_depths, ref_circular, frag_lengths, args)
30 | ref_contigs, ref_contig_weights = get_ref_contig_weights(ref_seqs, ref_depths)
31 | ref_size = sum(len(x) for x in ref_seqs.values())
32 | target_size = get_target_size(ref_size, args.readcov)
33 | process_size = target_size
34 | output_file = open(output_name, 'w')
35 |
36 | process_list = []
37 | p = multiprocessing.Process(target= generate_reads,
38 | args = [process_size, frag_lengths, ref_seqs,
39 | rev_comp_ref_seqs, ref_contigs,
40 | ref_contig_weights, ref_circular, output_file])
41 | p.start()
42 | process_list.append(p)
43 | for process in process_list:
44 | process.join()
45 |
46 | output_file.close()
47 | return output_name
48 |
49 | def generate_reads(target_size, frag_lengths, ref_seqs, rev_comp_ref_seqs,
50 | ref_contigs, ref_contig_weights, ref_circular, output_file):
51 | total_size = 0
52 | count = 0
53 |
54 | while total_size < target_size:
55 | fragment, info = build_fragment(frag_lengths, ref_seqs, rev_comp_ref_seqs, ref_contigs,
56 | ref_contig_weights, ref_circular)
57 | quals = 'S'*len(fragment)
58 |
59 | if len(fragment) == 0:
60 | continue
61 |
62 | info.append(f'length={len(fragment)}')
63 |
64 | read_name = uuid.UUID(int=random.getrandbits(128))
65 | info = ' '.join(info)
66 | print(f'@{read_name} {info}\n'+ fragment + '\n+\n' + quals, file=output_file)
67 |
68 |
69 | total_size += len(fragment)
70 | count += 1
71 |
72 | def adjust_depths(ref_seqs, ref_depths, ref_circular, frag_lengths, args):
73 | sampled_lengths = [frag_lengths.get_fragment_length() for x in range(100000)]
74 | total = sum(sampled_lengths)
75 | for ref_name, ref_seq in ref_seqs.items():
76 | ref_len = len(ref_seq)
77 | ref_circ = ref_circular[ref_name]
78 |
79 | # Circular plasmids may have to have their depth increased due compensate for misses.
80 | if ref_circ:
81 | passing_total = sum(length for length in sampled_lengths if length <= ref_len)
82 | if passing_total == 0:
83 | sys.exit('Error: fragment length distribution incompatible with reference lengths.')
84 | adjustment = total / passing_total
85 | ref_depths[ref_name] *= adjustment
86 |
87 | # Linear plasmids may have to have their depth increased due compensate for truncations.
88 | if not ref_circ:
89 | passing_total = sum(min(ref_len, length) for length in sampled_lengths)
90 | adjustment = total / passing_total
91 | ref_depths[ref_name] *= adjustment
92 |
93 | def build_fragment(frag_lengths, ref_seqs, rev_comp_ref_seqs, ref_contigs, ref_contig_weights,
94 | ref_circular):
95 | info = []
96 | frag_seq, frag_info = get_fragment(frag_lengths, ref_seqs, rev_comp_ref_seqs,
97 | ref_contigs, ref_contig_weights, ref_circular)
98 | info.append(','.join(frag_info))
99 | return frag_seq, info
100 |
101 | def get_fragment(frag_lengths, ref_seqs, rev_comp_ref_seqs, ref_contigs, ref_contig_weights,
102 | ref_circular):
103 | fragment_length = frag_lengths.get_fragment_length()
104 |
105 | # The get_real_fragment function can return nothing so we try repeatedly
106 | # until we get a result.
107 | for _ in range(1000):
108 | seq, info = get_real_fragment(fragment_length, ref_seqs, rev_comp_ref_seqs, ref_contigs,
109 | ref_contig_weights, ref_circular)
110 | if seq != '':
111 | return seq, info
112 | sys.exit('Error: failed to generate any sequence fragments - are your read lengths '
113 | 'incompatible with your reference contig lengths?')
114 |
115 | def get_real_fragment(fragment_length, ref_seqs, rev_comp_ref_seqs, ref_contigs,
116 | ref_contig_weights, ref_circular):
117 | if len(ref_contigs) == 1:
118 | contig = ref_contigs[0]
119 | else:
120 | contig = random.choices(ref_contigs, weights=ref_contig_weights)[0]
121 | info = [contig]
122 | if random_chance(0.5):
123 | seq = ref_seqs[contig]
124 | info.append('+strand')
125 | else:
126 | seq = rev_comp_ref_seqs[contig]
127 | info.append('-strand')
128 |
129 | # If the reference contig is linear and the fragment length is long enough, then we just
130 | # return the entire fragment, start to end.
131 | if fragment_length >= len(seq) and not ref_circular[contig]:
132 | info.append('0-' + str(len(seq)))
133 | return seq, info
134 |
135 | # If the reference contig is circular and the fragment length is too long, then we fail to get
136 | # the read.
137 | if fragment_length > len(seq) and ref_circular[contig]:
138 | return '', ''
139 |
140 | start_pos = random.randint(0, len(seq)-1)
141 | end_pos = start_pos + fragment_length
142 |
143 | info.append(f'{start_pos}-{end_pos}')
144 |
145 | # For circular contigs, we may have to loop the read around the contig.
146 | if ref_circular[contig]:
147 | if end_pos <= len(seq):
148 | return seq[start_pos:end_pos], info
149 | else:
150 | looped_end_pos = end_pos - len(seq)
151 | assert looped_end_pos > 0
152 | return seq[start_pos:] + seq[:looped_end_pos], info
153 |
154 | # For linear contigs, we don't care if the ending position is off the end - that will just
155 | # result in the read ending at the sequence end (and being shorter than the fragment
156 | # length).
157 | else:
158 | return seq[start_pos:end_pos], info
159 |
160 | def get_ref_contig_weights(ref_seqs, ref_depths):
161 | ref_contigs = [x[0] for x in ref_depths.items()]
162 | ref_contig_weights = [x[1] * len(ref_seqs[x[0]]) for x in ref_depths.items()]
163 | return ref_contigs, ref_contig_weights
164 |
165 | def get_target_size(ref_size, coverage):
166 | try:
167 | return int(coverage)
168 | except ValueError:
169 | pass
170 | coverage = coverage.lower()
171 | try:
172 | last_char = coverage[-1]
173 | value = float(coverage[:-1])
174 | if last_char == 'x':
175 | return int(round(value * ref_size))
176 | elif last_char == 'g':
177 | return int(round(value * 1000000000))
178 | elif last_char == 'm':
179 | return int(round(value * 1000000))
180 | elif last_char == 'k':
181 | return int(round(value * 1000))
182 | except (ValueError, IndexError):
183 | pass
184 |
185 | def load_reference(reference):
186 | ref_seqs, ref_depths, ref_circular = load_fasta(reference)
187 | return ref_seqs, ref_depths, ref_circular
--------------------------------------------------------------------------------
/variantdetective/simulate_tools.py:
--------------------------------------------------------------------------------
1 | """
2 | This file contains functions that are used in the long read simulation
3 | component of VariantDetective.
4 |
5 | Copyright (C) 2024 Phil Charron (phil.charron@inspection.gc.ca)
6 | https://github.com/OLF-Bioinformatics/VariantDetective
7 | Portions Copyright (C) 2018 Ryan Wick (rrwick@gmail.com)
8 | https://github.com/rrwick/Badread
9 | """
10 |
11 | import collections
12 | import gzip
13 | import os
14 | import random
15 | import re
16 | import sys
17 |
18 | def complement_base(base):
19 | try:
20 | return REV_COMP_DICT[base]
21 | except KeyError:
22 | return 'N'
23 |
24 | REV_COMP_DICT = {'A': 'T', 'T': 'A', 'G': 'C', 'C': 'G', 'a': 't', 't': 'a',
25 | 'g': 'c', 'c': 'g', 'R': 'Y', 'Y': 'R', 'S': 'S', 'W': 'W',
26 | 'K': 'M', 'M': 'K', 'B': 'V', 'V': 'B', 'D': 'H', 'H': 'D',
27 | 'N': 'N', 'r': 'y', 'y': 'r', 's': 's', 'w': 'w', 'k': 'm',
28 | 'm': 'k', 'b': 'v', 'v': 'b', 'd': 'h', 'h': 'd', 'n': 'n',
29 | '.': '.', '-': '-', '?': '?'}
30 |
31 | def get_compression_type(filename):
32 | """
33 | Attempts to guess the compression (if any) on a file using the first few bytes.
34 | http://stackoverflow.com/questions/13044562
35 | """
36 | magic_dict = {'gz': (b'\x1f', b'\x8b', b'\x08'),
37 | 'bz2': (b'\x42', b'\x5a', b'\x68'),
38 | 'zip': (b'\x50', b'\x4b', b'\x03', b'\x04')}
39 | max_len = max(len(x) for x in magic_dict)
40 |
41 | unknown_file = open(filename, 'rb')
42 | file_start = unknown_file.read(max_len)
43 | unknown_file.close()
44 | compression_type = 'plain'
45 | for file_type, magic_bytes in magic_dict.items():
46 | if file_start.startswith(magic_bytes):
47 | compression_type = file_type
48 | if compression_type == 'bz2':
49 | sys.exit('Error: cannot use bzip2 format - use gzip instead')
50 | if compression_type == 'zip':
51 | sys.exit('Error: cannot use zip format - use gzip instead')
52 | return compression_type
53 |
54 | def get_open_func(filename):
55 | if get_compression_type(filename) == 'gz':
56 | return gzip.open
57 | else: # plain text
58 | return open
59 |
60 | def get_sequence_file_type(filename):
61 | """
62 | Determines whether a file is FASTA.
63 | """
64 | if not os.path.isfile(filename):
65 | sys.exit('Error: could not find {}'.format(filename))
66 | if get_compression_type(filename) == 'gz':
67 | open_func = gzip.open
68 | else: # plain text
69 | open_func = open
70 | with open_func(filename, 'rt') as seq_file:
71 | try:
72 | first_char = seq_file.read(1)
73 | except UnicodeDecodeError:
74 | first_char = ''
75 | if first_char == '>':
76 | return 'FASTA'
77 | else:
78 | raise ValueError('File is not FASTA')
79 |
80 | def load_fasta(filename):
81 | if get_sequence_file_type(filename) != 'FASTA':
82 | sys.exit('Error: {} is not FASTA format'.format(filename))
83 | fasta_seqs = collections.OrderedDict()
84 | depths, circular = {}, {}
85 | p = re.compile(r'depth=([\d.]+)')
86 | with get_open_func(filename)(filename, 'rt') as fasta_file:
87 | name = ''
88 | sequence = []
89 | for line in fasta_file:
90 | line = line.strip()
91 | if not line:
92 | continue
93 | if line[0] == '>': # Header line = start of new contig
94 | if name:
95 | fasta_seqs[name.split()[0]] = ''.join(sequence)
96 | sequence = []
97 | name = line[1:]
98 | short_name = name.split()[0]
99 | if 'depth=' in name.lower():
100 | try:
101 | depths[short_name] = float(p.search(name.lower()).group(1))
102 | except (ValueError, AttributeError):
103 | depths[short_name] = 1.0
104 | else:
105 | depths[short_name] = 1.0
106 | circular[short_name] = 'circular=true' in name.lower()
107 | else:
108 | sequence.append(line)
109 | if name:
110 | fasta_seqs[name.split()[0]] = ''.join(sequence)
111 | return fasta_seqs, depths, circular
112 |
113 | def random_chance(chance):
114 | assert 0.0 <= chance <= 1.0
115 | return random.random() < chance
116 |
117 | def reverse_complement(seq):
118 | return ''.join([complement_base(x) for x in seq][::-1])
119 |
--------------------------------------------------------------------------------
/variantdetective/snp_indel.py:
--------------------------------------------------------------------------------
1 | """
2 | This file contains code for the snp_indel subcommand.
3 |
4 | Copyright (C) 2024 Phil Charron (phil.charron@inspection.gc.ca)
5 | https://github.com/OLF-Bioinformatics/VariantDetective
6 | """
7 |
8 | import os
9 | import sys
10 | import io
11 | import pandas as pd
12 | import datetime
13 | import psutil
14 | import subprocess
15 | from .tools import get_new_filename, run_process, read_vcf, generate_tab_csv_snp_summary
16 |
17 | def snp_indel(args, snp_input, output=sys.stderr):
18 | print(str(datetime.datetime.now().replace(microsecond=0)) + '\tStarting snp_indel tool', file=output)
19 | reference = get_new_filename(args.reference, args.out)
20 | snp_indel_outdir = os.path.join(args.out, 'snp_indel')
21 | haplotypecaller_outdir = os.path.join(snp_indel_outdir, 'haplotypecaller')
22 | freebayes_outdir = os.path.join(snp_indel_outdir, 'freebayes')
23 | clair3_outdir = os.path.join(snp_indel_outdir, 'clair3')
24 |
25 | if not os.path.isdir(snp_indel_outdir):
26 | os.makedirs(snp_indel_outdir)
27 | if not os.path.isdir(haplotypecaller_outdir):
28 | os.makedirs(haplotypecaller_outdir)
29 | if not os.path.isdir(freebayes_outdir):
30 | os.makedirs(freebayes_outdir)
31 | if not os.path.isdir(clair3_outdir):
32 | os.makedirs(clair3_outdir)
33 |
34 | # Map reads if using short reads
35 | if isinstance(snp_input, list):
36 | bam_file_dir = snp_indel_outdir
37 | rgpl = 'ILLUMINA'
38 | if args.assembler == 'minimap2':
39 | print(str(datetime.datetime.now().replace(microsecond=0)) + '\tRunning minimap2...', file=output)
40 | command = 'minimap2 -t ' + str(args.threads) + ' -ax sr '
41 | elif args.assembler == 'bwa':
42 | print(str(datetime.datetime.now().replace(microsecond=0)) + '\tRunning bwa...', file=output)
43 | command = 'bwa index ' + reference
44 | run_process(command)
45 | command = 'bwa mem -t ' + str(args.threads) + ' '
46 | command += reference + ' ' + snp_input[0] + ' ' + snp_input[1] + \
47 | ' | samtools view -Sb - -@ ' + str(args.threads) + \
48 | ' | samtools sort -n - -@ ' + str(args.threads) + \
49 | ' | samtools fixmate -m - - -@ ' + str(args.threads) + \
50 | ' | samtools sort - -@ ' + str(args.threads) + \
51 | ' | samtools markdup -r - -@ ' + str(args.threads) + ' ' + \
52 | bam_file_dir + '/alignment.sorted.bam'
53 | run_process(command)
54 | elif args.subparser_name == 'snp_indel':
55 | bam_file_dir = snp_indel_outdir
56 | rgpl = 'ONT'
57 | print(str(datetime.datetime.now().replace(microsecond=0)) + '\tRunning minimap2...', file=output)
58 | command = 'minimap2 -t ' + str(args.threads) + ' -ax map-ont ' + \
59 | reference + ' ' + snp_input + \
60 | ' | samtools view -Sb - -@ ' + str(args.threads) + \
61 | ' | samtools sort - -@ ' + str(args.threads) + \
62 | ' -o ' + bam_file_dir + '/alignment.sorted.bam'
63 | run_process(command)
64 | else:
65 | bam_file_dir = os.path.join(args.out, 'structural_variant')
66 | rgpl = 'ONT'
67 |
68 | # Run Picard
69 | reference_base = os.path.splitext(reference)[0]
70 | dict_file = reference_base + ".dict"
71 | if os.path.exists(dict_file):
72 | os.remove(dict_file)
73 | command = 'picard CreateSequenceDictionary R=' + reference
74 | run_process(command)
75 | command = 'picard AddOrReplaceReadGroups I=' + bam_file_dir + '/alignment.sorted.bam O=' + \
76 | snp_indel_outdir + '/alignment.rg.sorted.bam RGID=1 RGLB=SAMPLE RGSM=SAMPLE RGPU=SAMPLE RGPL=' + rgpl
77 | run_process(command)
78 | command = 'samtools index ' + snp_indel_outdir + '/alignment.rg.sorted.bam'
79 | run_process(command)
80 | try:
81 | command = 'rm ' + snp_indel_outdir + '/alignment.sorted.bam'
82 | run_process(command)
83 | except:
84 | pass
85 | try:
86 | command = 'samtools faidx ' + reference
87 | run_process(command)
88 | except:
89 | pass
90 |
91 |
92 | # Run Freebayes
93 | print(str(datetime.datetime.now().replace(microsecond=0)) + '\tRunning Freebayes...', file=output)
94 | command = 'freebayes-parallel <(fasta_generate_regions.py ' + reference + '.fai 100000) ' + \
95 | str(args.threads) + ' -f ' + reference + ' ' + \
96 | snp_indel_outdir + '/alignment.rg.sorted.bam -p 1 > ' + \
97 | freebayes_outdir + '/freebayes.vcf'
98 | run_process(command)
99 |
100 | command = 'vcffilter -f "QUAL > ' + str(args.minqual_snp) + '" ' + freebayes_outdir + '/freebayes.vcf > ' + \
101 | freebayes_outdir + '/freebayes.filt.vcf'
102 | run_process(command)
103 |
104 | command = 'bgzip -c -@ ' + str(args.threads) + ' ' + \
105 | freebayes_outdir + '/freebayes.filt.vcf > ' + \
106 | freebayes_outdir + '/freebayes.filt.vcf.gz'
107 | run_process(command)
108 |
109 | command = 'tabix -p vcf ' + freebayes_outdir + '/freebayes.filt.vcf.gz'
110 | run_process(command)
111 |
112 | # Run HaplotypeCaller
113 | print(str(datetime.datetime.now().replace(microsecond=0)) + '\tRunning HaplotypeCaller...', file=output)
114 | total_memory_gb = psutil.virtual_memory().total / (1024 ** 3)
115 | ninety_percent_memory_gb = 0.9 * total_memory_gb
116 | xmx_value = f"-Xmx{int(ninety_percent_memory_gb)}G"
117 | xss_value = f"-Xss{int(ninety_percent_memory_gb // 10)}M" # Assuming we use 10% of Xmx for Xss, and convert to MB
118 | if xss_value == "-Xss0M":
119 | xss_value = "-Xss1M"
120 | command = f'gatk HaplotypeCaller --java-options "{xmx_value} {xss_value}" -R ' + reference + \
121 | ' -I ' + snp_indel_outdir + '/alignment.rg.sorted.bam' + \
122 | ' -O ' + haplotypecaller_outdir + '/haplotypecaller.vcf' + \
123 | ' -ploidy 1'
124 | run_process(command)
125 | command = 'vcffilter -f "QD > ' + str(args.minqual_snp) + '" ' + haplotypecaller_outdir + '/haplotypecaller.vcf > ' + \
126 | haplotypecaller_outdir + '/haplotypecaller.filt.vcf'
127 | run_process(command)
128 |
129 | command = 'bgzip -c -@ ' + str(args.threads) + ' ' + \
130 | haplotypecaller_outdir + '/haplotypecaller.filt.vcf > ' + \
131 | haplotypecaller_outdir + '/haplotypecaller.filt.vcf.gz'
132 | run_process(command)
133 |
134 | command = 'tabix -p vcf ' + haplotypecaller_outdir + '/haplotypecaller.filt.vcf.gz'
135 | run_process(command)
136 |
137 | # Run Clair3
138 | print(str(datetime.datetime.now().replace(microsecond=0)) + '\tRunning Clair3...', file=output)
139 | #model_path = pkg_resources.resource_filename('variantdetective', 'clair3_models/ilmn')
140 |
141 | if args.custom_clair3_model is not None:
142 | model_path = args.custom_clair3_model
143 |
144 | else:
145 | command = "dirname $(which run_clair3.sh)"
146 | # Run the command and capture the output
147 | try:
148 | bin_output = subprocess.check_output(command, shell=True, universal_newlines=True).strip()
149 | except subprocess.CalledProcessError as e:
150 | print("Error:", e)
151 | model_path = bin_output + "/models/ilmn"
152 |
153 | command = 'run_clair3.sh -f ' + reference + \
154 | ' -b ' + snp_indel_outdir + '/alignment.rg.sorted.bam' + \
155 | ' -o ' + clair3_outdir + \
156 | ' -p "ilmn" -m ' + model_path + ' --include_all_ctgs ' + \
157 | ' --no_phasing_for_fa --haploid_precise -t ' + str(args.threads)
158 | run_process(command)
159 |
160 | command = 'mv ' + clair3_outdir + '/merge_output.vcf.gz ' + clair3_outdir + '/clair3.vcf.gz'
161 | run_process(command)
162 |
163 | command = 'gunzip -f ' + clair3_outdir + '/clair3.vcf.gz'
164 | run_process(command)
165 |
166 | command = 'vcffilter -f "QUAL > ' + str(args.minqual_snp) + ' & FILTER = PASS" ' + clair3_outdir + '/clair3.vcf > ' + \
167 | clair3_outdir + '/clair3.filt.vcf'
168 | run_process(command)
169 |
170 | command = 'bgzip -c -@ ' + str(args.threads) + ' ' + \
171 | clair3_outdir + '/clair3.filt.vcf > ' + \
172 | clair3_outdir + '/clair3.filt.vcf.gz'
173 | run_process(command)
174 |
175 | command = 'tabix -p vcf ' + clair3_outdir + '/clair3.filt.vcf.gz'
176 | run_process(command)
177 |
178 | # Combine Variants
179 | print(str(datetime.datetime.now().replace(microsecond=0)) + '\tCombining variants...', file=output)
180 | def has_variants(vcf_file):
181 | """Check if a VCF file has variants."""
182 | command = f"zcat {vcf_file} | grep -v '#' | wc -l"
183 | try:
184 | result = subprocess.check_output(command, shell=True, universal_newlines=True).strip()
185 | return int(result) > 0
186 | except subprocess.CalledProcessError as e:
187 | print("Error:", e)
188 | return False
189 |
190 | def create_vcf_if_not_exists(source_file, dest_file, snp_indel_outdir):
191 | if not os.path.exists(dest_file):
192 | command = f"zcat {snp_indel_outdir}/{source_file} | grep '#' | bgzip -c > {snp_indel_outdir}/{dest_file}"
193 | run_process(command)
194 |
195 | def run_tabix(vcf_file, snp_indel_outdir):
196 | command = 'tabix -p vcf ' + snp_indel_outdir + '/' + vcf_file
197 | run_process(command)
198 |
199 | vcf_files = [
200 | freebayes_outdir + '/freebayes.filt.vcf.gz',
201 | haplotypecaller_outdir + '/haplotypecaller.filt.vcf.gz',
202 | clair3_outdir + '/clair3.filt.vcf.gz'
203 | ]
204 |
205 | # Filter out files with no variants
206 | valid_vcf_files = [file for file in vcf_files if has_variants(file)]
207 |
208 | if len(valid_vcf_files) > 0:
209 | command = 'vcf-isec -p ' + snp_indel_outdir + '/snp_ ' + ' '.join(valid_vcf_files)
210 | run_process(command)
211 | if len(valid_vcf_files) == 3:
212 | source_vcf = 'snp_0_1_2.vcf.gz'
213 | dest_vcfs = ['snp_0_1.vcf.gz', 'snp_0_2.vcf.gz', 'snp_1_2.vcf.gz',
214 | 'snp_0.vcf.gz', 'snp_1.vcf.gz', 'snp_2.vcf.gz']
215 | run_tabix(source_vcf, snp_indel_outdir)
216 | for dest_vcf in dest_vcfs:
217 | create_vcf_if_not_exists(source_vcf, dest_vcf, snp_indel_outdir)
218 | run_tabix(dest_vcf, snp_indel_outdir)
219 | elif len(valid_vcf_files) == 2:
220 | source_vcf = 'snp_0_1.vcf.gz'
221 | dest_vcfs = ['snp_0_1_2.vcf.gz', 'snp_0_2.vcf.gz', 'snp_1_2.vcf.gz',
222 | 'snp_0.vcf.gz', 'snp_1.vcf.gz', 'snp_2.vcf.gz']
223 | run_tabix(source_vcf, snp_indel_outdir)
224 | for dest_vcf in dest_vcfs:
225 | create_vcf_if_not_exists(source_vcf, dest_vcf, snp_indel_outdir)
226 | run_tabix(dest_vcf, snp_indel_outdir)
227 | elif len(valid_vcf_files) == 1:
228 | source_vcf = 'snp_0.vcf.gz'
229 | dest_vcfs = ['snp_0_1_2.vcf.gz', 'snp_0_1.vcf.gz', 'snp_0_2.vcf.gz',
230 | 'snp_1_2.vcf.gz', 'snp_1.vcf.gz', 'snp_2.vcf.gz']
231 | run_tabix(source_vcf, snp_indel_outdir)
232 | for dest_vcf in dest_vcfs:
233 | create_vcf_if_not_exists(source_vcf, dest_vcf, snp_indel_outdir)
234 | run_tabix(dest_vcf, snp_indel_outdir)
235 | if args.snp_consensus == 3:
236 | command = 'gunzip -c ' + snp_indel_outdir + '/snp_0_1_2.vcf.gz > ' + \
237 | snp_indel_outdir + '/snp_final.vcf'
238 | run_process(command)
239 | elif args.snp_consensus == 2:
240 | command = 'bcftools concat -a ' + snp_indel_outdir + '/snp_0_1_2.vcf.gz ' + \
241 | snp_indel_outdir + '/snp_0_1.vcf.gz ' + \
242 | snp_indel_outdir + '/snp_0_2.vcf.gz ' + \
243 | snp_indel_outdir + '/snp_1_2.vcf.gz ' + \
244 | '-o ' + snp_indel_outdir + '/snp_final.vcf'
245 | run_process(command)
246 | elif args.snp_consensus == 1:
247 | command = 'bcftools concat -a ' + snp_indel_outdir + '/snp_0_1_2.vcf.gz ' + \
248 | snp_indel_outdir + '/snp_0_1.vcf.gz ' + \
249 | snp_indel_outdir + '/snp_0_2.vcf.gz ' + \
250 | snp_indel_outdir + '/snp_1_2.vcf.gz ' + \
251 | snp_indel_outdir + '/snp_0.vcf.gz ' + \
252 | snp_indel_outdir + '/snp_1.vcf.gz ' + \
253 | snp_indel_outdir + '/snp_2.vcf.gz ' + \
254 | '-o ' + snp_indel_outdir + '/snp_final.vcf'
255 | run_process(command)
256 | if len(valid_vcf_files) == 3:
257 | command = 'mv ' + snp_indel_outdir + '/snp_0.vcf.gz ' + snp_indel_outdir + '/freebayes.unique.vcf.gz ; ' + \
258 | 'mv ' + snp_indel_outdir + '/snp_1.vcf.gz ' + snp_indel_outdir + '/haplotypecaller.unique.vcf.gz ; ' + \
259 | 'mv ' + snp_indel_outdir + '/snp_2.vcf.gz ' + snp_indel_outdir + '/clair3.unique.vcf.gz ; ' + \
260 | 'mv ' + snp_indel_outdir + '/snp_0_1.vcf.gz ' + snp_indel_outdir + '/freebayes.haplotypecaller.vcf.gz ; ' + \
261 | 'mv ' + snp_indel_outdir + '/snp_0_2.vcf.gz ' + snp_indel_outdir + '/freebayes.clair3.vcf.gz ; ' + \
262 | 'mv ' + snp_indel_outdir + '/snp_1_2.vcf.gz ' + snp_indel_outdir + '/haplotypecaller.clair3.vcf.gz ; ' + \
263 | 'mv ' + snp_indel_outdir + '/snp_0_1_2.vcf.gz ' + snp_indel_outdir + '/freebayes.haplotypecaller.clair3.vcf.gz ; ' + \
264 | 'rm ' + snp_indel_outdir + '/*.tbi ' + snp_indel_outdir + '/snp__README'
265 | elif len(valid_vcf_files) == 2:
266 | missing_vcf = list(set(vcf_files) - set(valid_vcf_files))
267 | if freebayes_outdir + '/freebayes.filt.vcf.gz' in missing_vcf:
268 | command = 'mv ' + snp_indel_outdir + '/snp_0.vcf.gz ' + snp_indel_outdir + '/haplotypecaller.unique.vcf.gz ; ' + \
269 | 'mv ' + snp_indel_outdir + '/snp_1.vcf.gz ' + snp_indel_outdir + '/clair3.unique.vcf.gz ; ' + \
270 | 'mv ' + snp_indel_outdir + '/snp_2.vcf.gz ' + snp_indel_outdir + '/freebayes.unique.vcf.gz ; ' + \
271 | 'mv ' + snp_indel_outdir + '/snp_0_1.vcf.gz ' + snp_indel_outdir + '/haplotypecaller.clair3.vcf.gz ; ' + \
272 | 'mv ' + snp_indel_outdir + '/snp_0_2.vcf.gz ' + snp_indel_outdir + '/freebayes.haplotypecaller.vcf.gz ; ' + \
273 | 'mv ' + snp_indel_outdir + '/snp_1_2.vcf.gz ' + snp_indel_outdir + '/freebayes.clair3.vcf.gz ; ' + \
274 | 'mv ' + snp_indel_outdir + '/snp_0_1_2.vcf.gz ' + snp_indel_outdir + '/freebayes.haplotypecaller.clair3.vcf.gz ; ' + \
275 | 'rm ' + snp_indel_outdir + '/*.tbi ' + snp_indel_outdir + '/snp__README'
276 | elif haplotypecaller_outdir + '/haplotypecaller.filt.vcf.gz' in missing_vcf:
277 | command = 'mv ' + snp_indel_outdir + '/snp_0.vcf.gz ' + snp_indel_outdir + '/freebayes.unique.vcf.gz ; ' + \
278 | 'mv ' + snp_indel_outdir + '/snp_1.vcf.gz ' + snp_indel_outdir + '/clair3.unique.vcf.gz ; ' + \
279 | 'mv ' + snp_indel_outdir + '/snp_2.vcf.gz ' + snp_indel_outdir + '/haplotypecaller.unique.vcf.gz ; ' + \
280 | 'mv ' + snp_indel_outdir + '/snp_0_1.vcf.gz ' + snp_indel_outdir + '/freebayes.clair3.vcf.gz ; ' + \
281 | 'mv ' + snp_indel_outdir + '/snp_0_2.vcf.gz ' + snp_indel_outdir + '/freebayes.haplotypecaller.vcf.gz ; ' + \
282 | 'mv ' + snp_indel_outdir + '/snp_1_2.vcf.gz ' + snp_indel_outdir + '/haplotypecaller.clair3.vcf.gz ; ' + \
283 | 'mv ' + snp_indel_outdir + '/snp_0_1_2.vcf.gz ' + snp_indel_outdir + '/freebayes.haplotypecaller.clair3.vcf.gz ; ' + \
284 | 'rm ' + snp_indel_outdir + '/*.tbi ' + snp_indel_outdir + '/snp__README'
285 | elif clair3_outdir + '/clair3.filt.vcf.gz'in missing_vcf:
286 | command = 'mv ' + snp_indel_outdir + '/snp_0.vcf.gz ' + snp_indel_outdir + '/freebayes.unique.vcf.gz ; ' + \
287 | 'mv ' + snp_indel_outdir + '/snp_1.vcf.gz ' + snp_indel_outdir + '/haplotypecaller.unique.vcf.gz ; ' + \
288 | 'mv ' + snp_indel_outdir + '/snp_2.vcf.gz ' + snp_indel_outdir + '/clair3.unique.vcf.gz ; ' + \
289 | 'mv ' + snp_indel_outdir + '/snp_0_1.vcf.gz ' + snp_indel_outdir + '/freebayes.haplotypecaller.vcf.gz ; ' + \
290 | 'mv ' + snp_indel_outdir + '/snp_0_2.vcf.gz ' + snp_indel_outdir + '/freebayes.clair3.vcf.gz ; ' + \
291 | 'mv ' + snp_indel_outdir + '/snp_1_2.vcf.gz ' + snp_indel_outdir + '/haplotypecaller.clair3.vcf.gz ; ' + \
292 | 'mv ' + snp_indel_outdir + '/snp_0_1_2.vcf.gz ' + snp_indel_outdir + '/freebayes.haplotypecaller.clair3.vcf.gz ; ' + \
293 | 'rm ' + snp_indel_outdir + '/*.tbi ' + snp_indel_outdir + '/snp__README'
294 | elif len(valid_vcf_files) == 1:
295 | if freebayes_outdir + '/freebayes.filt.vcf.gz' in valid_vcf_files:
296 | command = 'mv ' + snp_indel_outdir + '/snp_0.vcf.gz ' + snp_indel_outdir + '/freebayes.unique.vcf.gz ; ' + \
297 | 'mv ' + snp_indel_outdir + '/snp_1.vcf.gz ' + snp_indel_outdir + '/haplotypecaller.unique.vcf.gz ; ' + \
298 | 'mv ' + snp_indel_outdir + '/snp_2.vcf.gz ' + snp_indel_outdir + '/clair3.unique.vcf.gz ; ' + \
299 | 'mv ' + snp_indel_outdir + '/snp_0_1.vcf.gz ' + snp_indel_outdir + '/freebayes.haplotypecaller.vcf.gz ; ' + \
300 | 'mv ' + snp_indel_outdir + '/snp_0_2.vcf.gz ' + snp_indel_outdir + '/freebayes.clair3.vcf.gz ; ' + \
301 | 'mv ' + snp_indel_outdir + '/snp_1_2.vcf.gz ' + snp_indel_outdir + '/haplotypecaller.clair3.vcf.gz ; ' + \
302 | 'mv ' + snp_indel_outdir + '/snp_0_1_2.vcf.gz ' + snp_indel_outdir + '/freebayes.haplotypecaller.clair3.vcf.gz ; ' + \
303 | 'rm ' + snp_indel_outdir + '/*.tbi ' + snp_indel_outdir + '/snp__README'
304 | elif haplotypecaller_outdir + '/haplotypecaller.filt.vcf.gz' in valid_vcf_files:
305 | command = 'mv ' + snp_indel_outdir + '/snp_0.vcf.gz ' + snp_indel_outdir + '/haplotypecaller.unique.vcf.gz ; ' + \
306 | 'mv ' + snp_indel_outdir + '/snp_1.vcf.gz ' + snp_indel_outdir + '/freebayes.unique.vcf.gz ; ' + \
307 | 'mv ' + snp_indel_outdir + '/snp_2.vcf.gz ' + snp_indel_outdir + '/clair3.unique.vcf.gz ; ' + \
308 | 'mv ' + snp_indel_outdir + '/snp_0_1.vcf.gz ' + snp_indel_outdir + '/freebayes.haplotypecaller.vcf.gz ; ' + \
309 | 'mv ' + snp_indel_outdir + '/snp_0_2.vcf.gz ' + snp_indel_outdir + '/haplotypecaller.clair3.vcf.gz ; ' + \
310 | 'mv ' + snp_indel_outdir + '/snp_1_2.vcf.gz ' + snp_indel_outdir + '/freebayes.clair3.vcf.gz ; ' + \
311 | 'mv ' + snp_indel_outdir + '/snp_0_1_2.vcf.gz ' + snp_indel_outdir + '/freebayes.haplotypecaller.clair3.vcf.gz ; ' + \
312 | 'rm ' + snp_indel_outdir + '/*.tbi ' + snp_indel_outdir + '/snp__README'
313 | elif clair3_outdir + '/clair3.filt.vcf.gz'in valid_vcf_files:
314 | command = 'mv ' + snp_indel_outdir + '/snp_0.vcf.gz ' + snp_indel_outdir + '/clair3.unique.vcf.gz ; ' + \
315 | 'mv ' + snp_indel_outdir + '/snp_1.vcf.gz ' + snp_indel_outdir + '/freebayes.unique.vcf.gz ; ' + \
316 | 'mv ' + snp_indel_outdir + '/snp_2.vcf.gz ' + snp_indel_outdir + '/haplotypecaller.unique.vcf.gz ; ' + \
317 | 'mv ' + snp_indel_outdir + '/snp_0_1.vcf.gz ' + snp_indel_outdir + '/freebayes.clair3.vcf.gz ; ' + \
318 | 'mv ' + snp_indel_outdir + '/snp_0_2.vcf.gz ' + snp_indel_outdir + '/haplotypecaller.clair3.vcf.gz ; ' + \
319 | 'mv ' + snp_indel_outdir + '/snp_1_2.vcf.gz ' + snp_indel_outdir + '/freebayes.haplotypecaller.vcf.gz ; ' + \
320 | 'mv ' + snp_indel_outdir + '/snp_0_1_2.vcf.gz ' + snp_indel_outdir + '/freebayes.haplotypecaller.clair3.vcf.gz ; ' + \
321 | 'rm ' + snp_indel_outdir + '/*.tbi ' + snp_indel_outdir + '/snp__README'
322 | run_process(command)
323 | else:
324 | command = 'gunzip -c ' + vcf_files[0] + ' > ' + snp_indel_outdir + '/snp_final.vcf'
325 | run_process(command)
326 |
327 | generate_tab_csv_snp_summary(read_vcf(snp_indel_outdir + '/snp_final.vcf'), snp_indel_outdir)
--------------------------------------------------------------------------------
/variantdetective/structural_variant.py:
--------------------------------------------------------------------------------
1 | """
2 | This file contains code for the structural_variant subcommand.
3 |
4 | Copyright (C) 2024 Phil Charron (phil.charron@inspection.gc.ca)
5 | https://github.com/OLF-Bioinformatics/VariantDetective
6 | """
7 |
8 | import os
9 | import sys
10 | import io
11 | import pandas as pd
12 | import datetime
13 | from .tools import get_new_filename, run_process, read_vcf, generate_tab_csv_sv_summary
14 |
15 | def structural_variant(args, input_reads, output=sys.stderr):
16 | print(str(datetime.datetime.now().replace(microsecond=0)) + '\tStarting structural_variant tool', file=output)
17 | reference = get_new_filename(args.reference, args.out)
18 | structural_variant_outdir = os.path.join(args.out, 'structural_variant')
19 | nanovar_outdir = os.path.join(structural_variant_outdir, 'nanovar')
20 | nanosv_outdir = os.path.join(structural_variant_outdir, 'nanosv')
21 | svim_outdir = os.path.join(structural_variant_outdir, 'svim')
22 | cutesv_outdir = os.path.join(structural_variant_outdir, 'cutesv')
23 |
24 | if not os.path.isdir(structural_variant_outdir):
25 | os.makedirs(structural_variant_outdir)
26 | if not os.path.isdir(nanovar_outdir):
27 | os.makedirs(nanovar_outdir)
28 | if not os.path.isdir(nanosv_outdir):
29 | os.makedirs(nanosv_outdir)
30 | if not os.path.isdir(svim_outdir):
31 | os.makedirs(svim_outdir)
32 | if not os.path.isdir(cutesv_outdir):
33 | os.makedirs(cutesv_outdir)
34 |
35 | # Run NanoVar
36 | print(str(datetime.datetime.now().replace(microsecond=0)) + '\tRunning NanoVar...', file=output)
37 | if args.long is not None and args.data_type_sv == "pacbio":
38 | command = 'nanovar -t ' + str(args.threads) + ' ' + \
39 | input_reads + ' ' + \
40 | reference + ' ' + \
41 | nanovar_outdir + \
42 | ' -x pacbio-clr' + \
43 | ' -c ' + str(args.mincov_sv) + \
44 | ' -l ' + str(args.minlen_sv)
45 | else:
46 | command = 'nanovar -t ' + str(args.threads) + ' ' + \
47 | input_reads + ' ' + \
48 | reference + ' ' + \
49 | nanovar_outdir + \
50 | ' -c ' + str(args.mincov_sv) + \
51 | ' -l ' + str(args.minlen_sv)
52 | run_process(command)
53 |
54 | # Sort bam file
55 | print(str(datetime.datetime.now().replace(microsecond=0)) + '\tSorting BAM file...', file=output)
56 | command = 'samtools sort -@ ' + str(args.threads) + ' ' + \
57 | nanovar_outdir + '/*-mm.bam > ' + \
58 | structural_variant_outdir + '/alignment.sorted.bam'
59 | run_process(command)
60 |
61 | print(str(datetime.datetime.now().replace(microsecond=0)) + '\tIndexing sorted BAM file...', file=output)
62 | command = 'samtools index ' + structural_variant_outdir + '/alignment.sorted.bam'
63 | run_process(command)
64 |
65 | # Run NanoSV
66 | print(str(datetime.datetime.now().replace(microsecond=0)) + '\tRunning NanoSV...', file=output)
67 | command = 'samtools faidx ' + reference
68 | run_process(command)
69 |
70 | command = 'cut -f 1,2 ' + reference + '.fai > ' + nanosv_outdir + '/chrom.sizes'
71 | run_process(command)
72 |
73 | command = 'bedtools random -l 1 -g ' + nanosv_outdir + '/chrom.sizes > ' + nanosv_outdir + '/reference.bed'
74 | run_process(command)
75 |
76 | command = 'NanoSV -t ' + str(args.threads) + \
77 | ' -o ' + nanosv_outdir + '/variants.vcf ' + \
78 | ' -s samtools' + \
79 | ' -b ' + nanosv_outdir + '/reference.bed ' + \
80 | structural_variant_outdir + '/alignment.sorted.bam'
81 | run_process(command)
82 |
83 | # Run SVIM
84 | print(str(datetime.datetime.now().replace(microsecond=0)) + '\tRunning SVIM...', file=output)
85 | command = 'svim alignment ' + svim_outdir + ' ' + \
86 | structural_variant_outdir + '/alignment.sorted.bam ' + \
87 | reference + \
88 | ' --min_sv_size ' + str(args.minlen_sv)
89 | run_process(command)
90 |
91 | command = "bcftools view -i 'QUAL >= " + str(args.minqual_sv) + "' " + \
92 | svim_outdir + '/variants.vcf > ' + \
93 | svim_outdir + '/variants.filt.vcf'
94 | run_process(command)
95 |
96 | # Run CuteSV
97 | print(str(datetime.datetime.now().replace(microsecond=0)) + '\tRunning CuteSV...', file=output)
98 | command = 'cuteSV ' + structural_variant_outdir + '/alignment.sorted.bam ' + \
99 | reference + ' ' + \
100 | cutesv_outdir + '/variants.vcf ' + \
101 | cutesv_outdir + \
102 | ' -t ' + str(args.threads) + \
103 | ' -s ' + str(args.mincov_sv) + \
104 | ' -l ' + str(args.minlen_sv) + \
105 | ' -L -1'
106 | run_process(command)
107 |
108 | # Run SURVIVOR
109 | print(str(datetime.datetime.now().replace(microsecond=0)) + '\tRunning SURVIVOR...', file=output)
110 | command = 'ls ' + nanovar_outdir + '/*pass.vcf ' + \
111 | cutesv_outdir + '/variants.vcf ' + \
112 | nanosv_outdir + '/variants.vcf ' + \
113 | svim_outdir + '/variants.filt.vcf > ' + \
114 | structural_variant_outdir + '/vcf_list'
115 | run_process(command)
116 |
117 | command = 'SURVIVOR merge ' + structural_variant_outdir + \
118 | '/vcf_list 1000 ' + str(args.sv_consensus) + ' 1 1 0 ' + str(args.minlen_sv) + ' ' \
119 | + structural_variant_outdir + '/combined_sv.vcf'
120 | run_process(command)
121 |
122 | generate_tab_csv_sv_summary(read_vcf(structural_variant_outdir + '/combined_sv.vcf'), structural_variant_outdir)
123 | command = 'rm ' + structural_variant_outdir + '/vcf_list'
124 | run_process(command)
125 |
126 | return structural_variant_outdir + '/alignment.sorted.bam'
127 |
--------------------------------------------------------------------------------
/variantdetective/tools.py:
--------------------------------------------------------------------------------
1 | """
2 | This file contains miscellaneous tools that are used in various
3 | components of VariantDetective.
4 |
5 | Copyright (C) 2024 Phil Charron (phil.charron@inspection.gc.ca)
6 | https://github.com/OLF-Bioinformatics/VariantDetective
7 | """
8 |
9 | import gzip
10 | import io
11 | import os
12 | import pandas as pd
13 | import statistics
14 | from subprocess import Popen, PIPE, STDOUT
15 |
16 | def get_fasta_info(open_func, file):
17 | count = 0
18 | with open_func(file, 'rt') as seq_file:
19 | for line in seq_file:
20 | if line.startswith('>'):
21 | count += 1
22 | return count
23 |
24 |
25 | def get_fastq_info(open_func, file):
26 | num_lines = sum(1 for i in open_func(file, 'rb'))
27 | if num_lines % 4 != 0:
28 | raise ValueError('File might be corrupted, unexpected number of lines was found')
29 | else:
30 | with open_func(file, 'rt') as seq_file:
31 | length_list = []
32 | for i in range(int(num_lines/4)):
33 | next(seq_file)
34 | length_list.append(len(next(seq_file).strip()))
35 | next(seq_file)
36 | next(seq_file)
37 | count = int(num_lines/4)
38 | min_value = min(length_list)
39 | max_value = max(length_list)
40 | median = int(statistics.median(length_list))
41 | average = int(statistics.mean(length_list))
42 | return count, min_value, max_value, median, average
43 |
44 |
45 | def get_input_type(open_func, file):
46 | with open_func(file, 'rt') as seq_file:
47 | try:
48 | first_char = seq_file.read(1)
49 | except UnicodeDecodeError:
50 | first_char = ''
51 | if first_char == '>':
52 | return 'FASTA'
53 | elif first_char == '@':
54 | return 'FASTQ'
55 | else:
56 | raise ValueError('File is not FASTA or FASTQ')
57 |
58 | def get_new_filename(file, out):
59 | basename = os.path.basename(file)
60 | new_filename = os.path.join(out, basename)
61 | return new_filename
62 |
63 |
64 | def get_open_function(file_extension):
65 | if file_extension == ".gz":
66 | open_func = gzip.open
67 | else:
68 | open_func = open
69 | return open_func
70 |
71 | def run_process(command):
72 | process = Popen([command],
73 | universal_newlines=True, stdout=PIPE, stderr=PIPE, shell=True, executable='/bin/bash')
74 | output, error = process.communicate()
75 |
76 | if process.returncode != 0:
77 | raise Exception(error)
78 |
79 | def read_vcf(path):
80 | with open(path, 'r') as f:
81 | lines = [l for l in f if not l.startswith('##')]
82 | return pd.read_csv(
83 | io.StringIO(''.join(lines)),
84 | dtype={'#CHROM': str, 'POS': int, 'ID': str, 'REF': str, 'ALT': str,
85 | 'QUAL': str, 'FILTER': str, 'INFO': str},
86 | sep='\t'
87 | ).rename(columns={'#CHROM': 'CHROM'})
88 |
89 | def generate_tab_csv_snp_summary(vcf, output_dir):
90 | if len(vcf) > 0:
91 | CHROM = vcf.iloc[:,0]
92 | POS = vcf.iloc[:,1]
93 | FORMAT_ID = vcf.iloc[:,8].str.split(':', expand=True)
94 | FORMAT = vcf.iloc[:,9].str.split(':', expand=True)
95 | INFO = vcf.iloc[:,7].str.split(';', expand=True)
96 | REF = vcf.iloc[:,3]
97 | ALT = vcf.iloc[:,4]
98 | SUPPORT = pd.Series([None] * len(vcf), name='SUPPORT')
99 | TYPE = INFO.iloc[:,40].str.split('=',expand=True).iloc[:,1]
100 | TYPE.name = 'TYPE'
101 | for i, v in TYPE.items():
102 | try:
103 | AD_index =list(FORMAT_ID.iloc[i]).index('AD')
104 | RAW_SUPPORT = FORMAT.iloc[i,AD_index].split(",")
105 | SUPPORT[i] = "REF=" + RAW_SUPPORT[0] + ";ALT=" + RAW_SUPPORT[1]
106 | except ValueError:
107 | AF_index =list(FORMAT_ID.iloc[i]).index('AF')
108 | DP_index =list(FORMAT_ID.iloc[i]).index('DP')
109 | REF_COUNT = round(int(FORMAT.iloc[i,DP_index]) - (float(FORMAT.iloc[i,AF_index]) * int(FORMAT.iloc[i,DP_index])))
110 | ALT_COUNT = round(float(FORMAT.iloc[i,AF_index]) * int(FORMAT.iloc[i,DP_index]))
111 | SUPPORT[i] = "REF=" + str(REF_COUNT) + ";ALT=" + str(ALT_COUNT)
112 | if (v == None):
113 | if (len(REF[i]) < len(ALT[i])):
114 | TYPE[i] = 'ins'
115 | elif (len(REF[i]) > len(ALT[i])):
116 | TYPE[i] = 'del'
117 | elif (len(REF[i]) == 1):
118 | TYPE[i] = 'snp'
119 | else:
120 | TYPE[i] = 'complex'
121 | TAB_DATA = pd.concat([CHROM, POS, TYPE, REF, ALT, SUPPORT], axis=1)
122 | VARIANT_TYPES = pd.Series(['SNP', 'DEL', 'INS', 'MNP', 'COMPLEX','TOTAL'], name = 'TYPE')
123 | VARIANT_DATA = pd.Series([(TYPE=='snp').sum(), (TYPE=='del').sum(), (TYPE=='ins').sum(), (TYPE=='mnp').sum(), (TYPE=='complex').sum(), len(TYPE)], name = 'COUNT')
124 | SUMMARY_DATA = pd.concat([VARIANT_TYPES, VARIANT_DATA], axis=1)
125 | TAB_DATA.to_csv(output_dir + '/snp_final.csv', index=False)
126 | TAB_DATA.to_csv(output_dir + '/snp_final.tab', sep='\t', index=False)
127 | SUMMARY_DATA.to_csv(output_dir + '/snp_final_summary.txt', sep='\t', index=False)
128 | else:
129 | column_names_str= "CHROM POS TYPE REF ALT SUPPORT"
130 | column_names = column_names_str.split()
131 | TAB_DATA = pd.DataFrame(columns=column_names)
132 | TAB_DATA.to_csv(output_dir + '/snp_final.csv', index=False)
133 | TAB_DATA.to_csv(output_dir + '/snp_final.tab', sep='\t', index=False)
134 | data = {"TYPE": ["SNP", "DEL", "INS", "MNP", "COMPLEX", "TOTAL"],
135 | "COUNT": [0, 0, 0, 0, 0, 0]}
136 | SUMMARY_DATA = pd.DataFrame(data)
137 | SUMMARY_DATA.to_csv(output_dir + '/snp_final_summary.txt', sep='\t', index=False)
138 |
139 |
140 | def generate_tab_csv_sv_summary(vcf, output_dir):
141 | CHROM = vcf.iloc[:,0]
142 | CHROM.name = 'REF_CHROM'
143 | START = vcf.iloc[:,1]
144 | START.name = 'REF_START'
145 | END = vcf.iloc[:,7].str.split(';', expand=True).iloc[:,6].str[4:]
146 | END.name = 'REF_STOP'
147 | SIZE = vcf.iloc[:,7].str.split(';', expand=True).iloc[:,2].str[6:]
148 | SIZE.name = 'SIZE'
149 | TYPE = vcf.iloc[:,7].str.split(';', expand=True).iloc[:,3].str[7:]
150 | TYPE.name = 'TYPE'
151 | INFO = vcf.iloc[:,4].copy()
152 | INFO.name = 'INFO'
153 | for i, v in INFO.items():
154 | if (TYPE[i] == "TRA"):
155 | if ("]" not in v and "[" not in v):
156 | INFO[i] = ''
157 | elif ("]" in v):
158 | INFO[i] = ']'+(v.split(']'))[1].split(']')[0]+']'
159 | elif ("[" in v):
160 | INFO[i] = '['+(v.split('['))[1].split('[')[0]+'['
161 | else:
162 | INFO[i] = ''
163 | else:
164 | INFO[i] = ''
165 |
166 | TAB_DATA = pd.concat([CHROM, START, END, SIZE, TYPE, INFO], axis=1)
167 |
168 | VARIANT_TYPES = pd.Series(['TRANSLOCATION', 'INVERSION', 'DELETION', 'INSERTION', 'DUPLICATION','TOTAL'], name = 'TYPE')
169 | VARIANT_DATA = pd.Series([(TYPE=='TRA').sum(), (TYPE=='INV').sum(), (TYPE=='DEL').sum(), (TYPE=='INS').sum(), (TYPE=='DUP').sum(), len(TYPE)], name = 'COUNT')
170 | SUMMARY_DATA = pd.concat([VARIANT_TYPES, VARIANT_DATA], axis=1)
171 |
172 | TAB_DATA.to_csv(output_dir + '/combined_sv.csv', index=False)
173 | TAB_DATA.to_csv(output_dir + '/combined_sv.tab', sep='\t', index=False)
174 | SUMMARY_DATA.to_csv(output_dir + '/combined_sv_summary.txt', sep='\t', index=False)
--------------------------------------------------------------------------------
/variantdetective/validate_inputs.py:
--------------------------------------------------------------------------------
1 | """
2 | This file contains code needed to validate the inputs used
3 | to ensure they are in the right format for VariantDetective.
4 |
5 | Copyright (C) 2024 Phil Charron (phil.charron@inspection.gc.ca)
6 | https://github.com/OLF-Bioinformatics/VariantDetective
7 | """
8 |
9 | import datetime
10 | import os
11 | import sys
12 |
13 | from .combine_variants import combine_variants
14 | from .simulate import simulate
15 | from .snp_indel import snp_indel
16 | from .structural_variant import structural_variant
17 | from .tools import get_fasta_info, get_fastq_info, get_input_type, get_new_filename, get_open_function
18 |
19 | def validate_inputs(args, output=sys.stderr):
20 | input_file_type = []
21 | snp_vcf_list = []
22 | sv_vcf_list = []
23 | #actual_file_type = []
24 |
25 | if 'genome' in args and args.genome is not None:
26 | genome_file = get_new_filename(args.genome, args.out)
27 | input_file_type.append("Genomic FASTA")
28 | #actual_file_type.append(check_input(genome_file, output))
29 | #actual_file_type.append("Genomic FASTA")
30 | if 'long' in args and args.long is not None:
31 | long_file = get_new_filename(args.long, args.out)
32 | input_file_type.append("Long-read FASTQ")
33 | #actual_file_type.append(check_input(long_file, output))
34 | #actual_file_type.append("Long-read FASTQ")
35 | if 'short1' in args and args.short1 is not None:
36 | short1_file = get_new_filename(args.short1, args.out)
37 | input_file_type.append("Short-read FASTQ")
38 | #actual_file_type.append(check_input(short1_file, output))
39 | #actual_file_type.append("Short-read FASTQ")
40 | if 'short2' in args and args.short2 is not None:
41 | short2_file = get_new_filename(args.short2, args.out)
42 | input_file_type.append("Short-read FASTQ")
43 | #actual_file_type.append(check_input(short2_file, output))
44 | #actual_file_type.append("Short-read FASTQ")
45 | if 'snp_vcf' in args and args.snp_vcf is not None:
46 | for vcf_file in args.snp_vcf:
47 | snp_vcf_list.append(get_new_filename(vcf_file, args.out))
48 | input_file_type.append("SNP VCF")
49 | if 'sv_vcf' in args and args.sv_vcf is not None:
50 | for vcf_file in args.sv_vcf:
51 | sv_vcf_list.append(get_new_filename(vcf_file, args.out))
52 | input_file_type.append("SV VCF")
53 |
54 |
55 | #if input_file_type == actual_file_type:
56 | if 'Genomic FASTA' in input_file_type:
57 | print(str(datetime.datetime.now().replace(microsecond=0)) + '\tStarting genome pipeline')
58 | sim_file = simulate(args, genome_file, output=sys.stderr)
59 | if args.subparser_name == "structural_variant":
60 | long_bam_file = structural_variant(args, sim_file, output=sys.stderr)
61 | if args.subparser_name == "snp_indel":
62 | snp_indel(args, sim_file, output=sys.stderr)
63 | if args.subparser_name == "all_variants":
64 | long_bam_file = structural_variant(args, sim_file, output=sys.stderr)
65 | snp_indel(args, long_bam_file, output=sys.stderr)
66 | elif 'Long-read FASTQ' in input_file_type and 'Short-read FASTQ' in input_file_type:
67 | print(str(datetime.datetime.now().replace(microsecond=0)) + '\tStarting short and long read pipeline')
68 | short_inputs = [short1_file, short2_file]
69 | long_bam_file = structural_variant(args, long_file, output=sys.stderr)
70 | snp_indel(args, short_inputs, output=sys.stderr)
71 | elif 'Long-read FASTQ' in input_file_type:
72 | print(str(datetime.datetime.now().replace(microsecond=0)) + '\tStarting long read pipeline')
73 | long_bam_file = structural_variant(args, long_file, output=sys.stderr)
74 | elif 'Short-read FASTQ' in input_file_type:
75 | print(str(datetime.datetime.now().replace(microsecond=0)) + '\tStarting short read pipeline')
76 | short_inputs = [short1_file, short2_file]
77 | snp_indel(args, short_inputs, output=sys.stderr)
78 | elif 'SNP VCF' or 'SV VCF' in input_file_type:
79 | print(str(datetime.datetime.now().replace(microsecond=0)) + '\tStarting combine variants tool')
80 | vcf_lists = [snp_vcf_list, sv_vcf_list]
81 | combine_variants(args, vcf_lists, output=sys.stderr)
82 |
83 | #else:
84 | # for i in range(len(input_file_type)):
85 | # if input_file_type[i] == "Long-read FASTQ":
86 | # if actual_file_type[i] == "Genomic FASTA":
87 | # message = 'Input file was supposed to be long-read FASTQ but genomic FASTA was detected.'
88 | # elif actual_file_type[i] == "Short-read FASTQ":
89 | # message = 'Input file was supposed to be long-read FASTQ but short-read FASTQ was detected.'
90 | # elif input_file_type[i] == "Genomic FASTA":
91 | # if actual_file_type[i] == "Long-read FASTQ":
92 | # message = 'Input file was supposed to be genomic FASTA but long-read FASTQ was detected.'
93 | # elif actual_file_type[i] == "Short-read FASTQ":
94 | # message = 'Input file was supposed to be genomic FASTA but short-read FASTQ was detected.'
95 | # elif input_file_type[i] == "Short-read FASTQ":
96 | # if actual_file_type[i] == "Long-read FASTQ":
97 | # message = 'Input file was supposed to be short-read FASTQ but long-read FASTQ was detected.'
98 | # elif actual_file_type[i] == "Genomic FASTA":
99 | # message = 'Input file was supposed to be short-read FASTQ but genomic FASTA was detected.'
100 | # message = message + ' Please verify inputs or use appropriate tool and parameters.'
101 | # raise Exception(message)
102 |
103 | def check_input(file, output=sys.stderr):
104 | file_extension = os.path.splitext(file)
105 | open_func = get_open_function(file_extension[1])
106 | file_type = get_input_type(open_func, file)
107 |
108 | if file_type == 'FASTQ':
109 | count, min_value, max_value, median, average = get_fastq_info(open_func, file)
110 | if average > 301:
111 | actual_file_type = "Long-read FASTQ"
112 | print("Input file type:\tLong-read FASTQ", file=output)
113 | elif average > 0:
114 | actual_file_type = "Short-read FASTQ"
115 | print("Input file type:\tShort-read FASTQ", file=output)
116 | else:
117 | raise Exception('Average length of reads is 0')
118 | print("Number of reads:\t{}".format(count), file=output)
119 |
120 | elif file_type == 'FASTA':
121 | count = get_fasta_info(open_func, file)
122 | actual_file_type = "Genomic FASTA"
123 |
124 | return actual_file_type
125 |
126 |
127 |
128 |
129 |
130 |
131 |
--------------------------------------------------------------------------------
/variantdetective/version.py:
--------------------------------------------------------------------------------
1 | __version__ = '1.0.1'
2 |
--------------------------------------------------------------------------------