├── README.md ├── LICENSE └── combiSV2.3.pl /README.md: -------------------------------------------------------------------------------- 1 | # combiSV 2 | 3 | Combine structural variation outputs from long sequencing reads into a superior call set 4 | 5 | **Last updates: 22/04/22 version 2.3** 6 | - Includes now the REF and ALT sequences in the combined VCF 7 | **22/04/22** 8 | - combiSV now reports the END position and the allele depth calls (DR and DV), was needed to be compatible with SVanna 9 | **09/11/21** 10 | - Only Sniffles, pbsv, SVIM or cuteSV are mandatory to run combiSV 11 | - Improved precision 12 | 13 | ### Getting help 14 | 15 | Any issues/requests/problems/comments that are not yet addressed on this page can be posted on [Github issues](https://github.com/ndierckx/Sim-it/issues) and I will try to reply the same day. 16 | 17 | Or you can contact me directly through the following email address: 18 | 19 | nicolasdierckxsens at hotmail dot com 20 | 21 | 22 | ### Cite 23 | 24 | Dierckxsens, N., Li, T., Vermeesch, J.R. et al. A benchmark of structural variation detection by long reads through a realistic simulated model. Genome Biology, 22, 342 (2021). https://doi.org/10.1186/s13059-021-02551-4 25 | 26 | 27 | ### Prerequisites 28 | 29 | Perl
30 | 31 | ### Instructions 32 | 33 | Usage: 34 | 35 | perl combiSV2.1.pl -pbsv -sniffles -cutesv -nanovar -svim -nanosv -c -o 36 | 37 | 38 | ### Output 39 | 40 | #### 1. output_name.vcf: 41 | This is the combined standard vcf output 42 | 43 | #### 2. simplified_output_name.vcf: 44 | This is a simplified vcf output that can be used as input for Sim-it 45 | 46 | #### 3. SVIM/Sniffles/pbsv/NanoVar/NanoSV_output_name.vcf: 47 | For each VCF input, an output file of the SVs that were retained is given. 48 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /combiSV2.3.pl: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env perl 2 | ###################################################### 3 | # SOFTWARE COPYRIGHT NOTICE AGREEMENT # 4 | # Copyright (C) {2020-2022} {Nicolas Dierckxsens} # 5 | # All Rights Reserved # 6 | # See file LICENSE for details. # 7 | ###################################################### 8 | # combiSV 2.3 9 | # nicolasdierckxsens@hotmail.com 10 | use strict; 11 | use Getopt::Long; 12 | use File::Basename; 13 | 14 | print "\n\n-----------------------------------------------"; 15 | print "\ncombiSV\n"; 16 | print "Version 2.3\n"; 17 | print "Author: Nicolas Dierckxsens, (c) 2020-2024\n"; 18 | print "-----------------------------------------------\n\n"; 19 | 20 | my $input_pbsv = ""; 21 | my $input_sniffles = ""; 22 | my $input_cutesv = ""; 23 | my $input_nanovar = ""; 24 | my $input_svim = ""; 25 | my $output_file = ""; 26 | my $output_file2 = ""; 27 | my $high_recall = ""; 28 | my $input_nanosv = ""; 29 | my $count_tools = "2"; 30 | my $min_coverage = '3'; 31 | my $length_match = '800'; 32 | my $min_SV_length = '50'; 33 | 34 | GetOptions ( 35 | "pbsv=s" => \$input_pbsv, 36 | "sniffles=s" => \$input_sniffles, 37 | "cutesv=s" => \$input_cutesv, 38 | "nanovar=s" => \$input_nanovar, 39 | "svim=s" => \$input_svim, 40 | "nanosv=s" => \$input_nanosv, 41 | "c=s" => \$min_coverage, 42 | "o=s" => \$output_file, 43 | ) or die "Incorrect usage!\n"; 44 | 45 | if ($input_sniffles eq "" && $input_cutesv eq "" && $input_pbsv eq "" && $input_svim eq "") 46 | { 47 | print "\n\nUsage: perl combiSV2.3.pl -pbsv -sniffles -cutesv -nanovar -svim -nanosv \n\n"; 48 | print "\nOPTIONAL ARGUMENTS\n"; 49 | print "-c minimum coverage of variation allele [default = 3]\n"; 50 | print "-o name of the output files\n"; 51 | } 52 | 53 | my $total_tools = '0'; 54 | if ($min_coverage eq "") 55 | { 56 | $min_coverage = '3'; 57 | } 58 | if ($min_coverage =~ m/\d+/) 59 | { 60 | } 61 | else 62 | { 63 | die "\n\nWARNING: minimum coverage of variation allele has to be an integer\n\n"; 64 | } 65 | if ($min_coverage < 2) 66 | { 67 | $min_coverage = '2'; 68 | print "\n\nWARNING: minimum coverage of variation allele too low, has been set to 2\n\n"; 69 | } 70 | if ($input_pbsv ne "") 71 | { 72 | open(INPUT_PBSV, $input_pbsv) or die "\n\nCan't open pbsv's vcf file $input_pbsv, $!\n\n"; 73 | $total_tools++; 74 | } 75 | if ($input_sniffles ne "") 76 | { 77 | open(INPUT_SNIFFLES, $input_sniffles) or die "\n\nCan't open Sniffles' vcf file $input_sniffles, $!\n\n"; 78 | $total_tools++; 79 | } 80 | if ($input_cutesv ne "") 81 | { 82 | open(INPUT_CUTESV, $input_cutesv) or die "\n\nCan't open cuteSV's vcf file $input_cutesv, $!\n\n"; 83 | $total_tools++; 84 | } 85 | if ($input_nanovar ne "") 86 | { 87 | open(INPUT_NANOVAR, $input_nanovar) or die "\n\nCan't open NanoVar's vcf file $input_nanovar, $!\n\n"; 88 | $total_tools++; 89 | } 90 | if ($input_svim ne "") 91 | { 92 | open(INPUT_SVIM, $input_svim) or die "\n\nCan't open SVIM's vcf file $input_svim, $!\n\n"; 93 | $total_tools++; 94 | } 95 | if ($input_nanosv ne "") 96 | { 97 | open(INPUT_NANOSV, $input_nanosv) or die "\n\nCan't open NanoSV's vcf file $input_nanosv, $!\n\n"; 98 | $total_tools++; 99 | } 100 | 101 | if ($input_sniffles eq "" && $input_cutesv eq "" && $input_pbsv eq "" && $input_svim eq "") 102 | { 103 | die "\n\nError: A Sniffles, pbsv, SVIM or cuteSV input is mandatory.\n\n"; 104 | } 105 | 106 | if ($output_file eq "") 107 | { 108 | $output_file = "combiSV"; 109 | } 110 | 111 | my($filename, $dir, $suffix) = fileparse($output_file, ('.vcf')); 112 | 113 | if ($output_file eq "") 114 | { 115 | $output_file2 = "combiSV.vcf"; 116 | $output_file = "simplified_combiSV.vcf"; 117 | } 118 | else 119 | { 120 | $suffix = ".vcf"; 121 | $output_file2 = $dir.$filename.$suffix; 122 | $output_file = $dir.'simplified_'.$filename.$suffix; 123 | } 124 | 125 | if ($high_recall eq "") 126 | { 127 | $high_recall = "1"; 128 | } 129 | if ($high_recall ne "" && $high_recall ne "1" && $high_recall ne "2") 130 | { 131 | die "\n\nIncorrect usage of -s parameter, should be '1' or '2'\n\n"; 132 | } 133 | 134 | if ($total_tools > '5') 135 | { 136 | $count_tools = '3'; 137 | $high_recall = '2'; 138 | } 139 | elsif ($total_tools <= 5) 140 | { 141 | $count_tools = '2'; 142 | $high_recall = '2'; 143 | } 144 | 145 | 146 | open(COMBINED, ">" .$output_file) or die "\nCan't open file $output_file, $!\n"; 147 | open(COMBINED2, ">" .$output_file2) or die "\nCan't open file $output_file2, $!\n"; 148 | 149 | my %SVs; 150 | my %count; 151 | my %pbsv; 152 | my %sniffles; 153 | my %cutesv; 154 | my %nanovar; 155 | my %mixed_types; 156 | my $lenght_margin1 = 0.70; 157 | my $lenght_margin2 = 1.3; 158 | 159 | my $count = '0'; 160 | 161 | if ($input_pbsv ne "") 162 | { 163 | while (my $line = ) 164 | { 165 | chomp($line); 166 | my $first_nuc = substr $line, 0, 1; 167 | if ($first_nuc ne "#") 168 | { 169 | my @list = split /\t/, $line; 170 | 171 | my $info = $list[7]; 172 | my $REF = $list[3]; 173 | my $ALT = $list[4]; 174 | my $SVLEN = ""; 175 | my $END = ""; 176 | my $type = ""; 177 | my @info = split /;/, $info; 178 | my @HAP = split /:/, $list[9]; 179 | my @HAP2 = split /,/, $HAP[1]; 180 | my @HAP3 = split /,/, $HAP[3]; 181 | 182 | my $chr = $list[0]; 183 | if ($list[0] =~ m/chr(\d+|X|Y)/) 184 | { 185 | $chr = $1; 186 | } 187 | 188 | foreach my $info_tmp (@info) 189 | { 190 | my $first_five = substr $info_tmp, 0, 5; 191 | my $first_four = substr $info_tmp, 0, 4; 192 | if ($info_tmp =~ m/SVLEN=>*-*(\d+)/) 193 | { 194 | $SVLEN = $1; 195 | } 196 | elsif ($first_five eq "SVTYP") 197 | { 198 | $type = substr $info_tmp, 7; 199 | } 200 | elsif ($first_four eq "END=") 201 | { 202 | $END = substr $info_tmp, 4; 203 | } 204 | } 205 | if ($SVLEN eq "") 206 | { 207 | $SVLEN = "."; 208 | } 209 | if ($SVLEN < 0) 210 | { 211 | $SVLEN *= -1; 212 | } 213 | my $haplo = $HAP[0]; 214 | my @depth = split /,/, $HAP[1]; 215 | my $DR = $depth[0]; 216 | my $DV = $depth[1]; 217 | 218 | my $converted_line = $chr."\t".$list[1]."\t".$SVLEN."\t".$type."\t".$haplo."\t".$END."\t".$DR."\t".$DV."\t".$REF."\t".$ALT; 219 | 220 | if ($type ne "BND" && $type ne "cnv" && $list[6] eq "PASS" && ($type ne "DUP" || $high_recall eq "2" || $haplo eq "1/1") && ($HAP2[1] >= $min_coverage || 221 | ($HAP2[1] >= $min_coverage-1 && $HAP2[1]/$HAP[2] > 0.3)) && $haplo ne "0/0" && $HAP2[1]/$HAP[2] > 0.3 && ($SVLEN eq "." || $SVLEN > 45)) 222 | { 223 | $SVs{$chr}{$list[1]} = $converted_line; 224 | $count{$chr}{$list[1]} = '1'; 225 | if ($type eq "INV" || (($type eq "DEL" || $haplo eq "1/1") && $high_recall eq "2") && ($HAP2[1]+$HAP2[2]) > 9 && (($HAP3[2] > 0 && $HAP3[3] > 0) || $HAP3[2] eq "" || $HAP3[3] eq "")) 226 | { 227 | $count{$chr}{$list[1]} = '2'; 228 | } 229 | if (($type eq "INV" || $type eq "INS") && $haplo eq "1/1" && $total_tools < 5) 230 | { 231 | $count{$chr}{$list[1]} = $count_tools; 232 | } 233 | $pbsv{$chr}{$list[1]} = $line; 234 | } 235 | } 236 | } 237 | } 238 | 239 | my %pbsv2; 240 | my %cutesv2; 241 | my %sniffles2; 242 | my %nanovar2; 243 | my %svim2; 244 | 245 | if ($input_cutesv ne "") 246 | { 247 | CUTESV: while (my $line = ) 248 | { 249 | chomp($line); 250 | my $first = substr $line, 0, 1; 251 | my $prev_match2 = ""; 252 | 253 | if ($first ne "#") 254 | { 255 | my @list = split /\t/, $line; 256 | my $pos = $list[1]; 257 | my $REF = $list[3]; 258 | my $ALT = $list[4]; 259 | my $length; 260 | my $type; 261 | my $v = '1'; 262 | my $min = '1'; 263 | my $END = ""; 264 | 265 | my $info = $list[7]; 266 | my @info = split /;/, $info; 267 | my @HAP = split /:/, $list[9]; 268 | my $DR = $HAP[1]; 269 | my $DV = $HAP[2]; 270 | 271 | foreach my $info_tmp (@info) 272 | { 273 | my $first_five = substr $info_tmp, 0, 5; 274 | my $first_four = substr $info_tmp, 0, 4; 275 | if ($first_five eq "SVLEN") 276 | { 277 | $length = substr $info_tmp, 6; 278 | } 279 | elsif ($first_five eq "SVTYP") 280 | { 281 | $type = substr $info_tmp, 7; 282 | } 283 | elsif ($first_four eq "END=") 284 | { 285 | $END = substr $info_tmp, 4; 286 | } 287 | } 288 | if ($length eq "") 289 | { 290 | $length = "."; 291 | } 292 | if ($length < 0) 293 | { 294 | $length *= -1; 295 | } 296 | my $chr = $list[0]; 297 | if ($list[0] =~ m/chr(\d+|X|Y)/) 298 | { 299 | $chr = $1; 300 | } 301 | my $converted_line = $chr."\t".$list[1]."\t".$length."\t".$type."\t".$HAP[0]."\t".$END."\t".$DR."\t".$DV."\t".$REF."\t".$ALT; 302 | 303 | if (exists ($cutesv{$chr}{$list[1]})) 304 | { 305 | next CUTESV; 306 | } 307 | 308 | if ($list[6] eq "PASS" && ($length eq "." || $length > 45) && ($HAP[2] >= $min_coverage || ($HAP[2] >= $min_coverage-1 && $HAP[2]/($HAP[2]+$HAP[1]) > 0.3)) 309 | && $HAP[2]/($HAP[2]+$HAP[1]) > 0.12 && $type ne "BND" && $HAP[0] ne "./.") 310 | { 311 | my $prev_match = ""; 312 | if (exists($pbsv2{$chr}{$list[1]})) 313 | { 314 | $prev_match = "yes"; 315 | } 316 | 317 | if (exists($pbsv{$chr}{$list[1]}) && exists($SVs{$chr}{$list[1]}) && $prev_match eq "") 318 | { 319 | my $count_tmp = $count{$chr}{$list[1]}; 320 | 321 | my @list2 = split /\t/, $SVs{$chr}{$list[1]}; 322 | my $new_pos = $list2[1]; 323 | my $new_length = $list2[2]; 324 | my $new_type = $list2[3]; 325 | my $new_haplo = $list2[4]; 326 | my $new_END = $list2[5]; 327 | my $new_DR = $list2[6]; 328 | my $new_DV = $list2[7]; 329 | my $new_REF = $list2[8]; 330 | my $new_ALT = $list2[9]; 331 | 332 | $pbsv2{$chr}{$list[1]} = undef; 333 | 334 | $count{$chr}{$list[1]} = $count_tmp+1; 335 | if ($type eq "INS" && $list2[3] ne $type && $type ne ".") 336 | { 337 | $new_type = $type; 338 | } 339 | if ($HAP[0] ne "." && ($type eq "INV" || $type eq "INS")) 340 | { 341 | $new_haplo = $HAP[0]; 342 | $new_DR = $DR; 343 | $new_DV = $DV; 344 | } 345 | if ($type eq "INV" || $list2[2] eq ".") 346 | { 347 | if ($length ne ".") 348 | { 349 | $new_length = $length; 350 | } 351 | if ($HAP[0] ne ".") 352 | { 353 | $new_haplo = $HAP[0]; 354 | $new_DR = $DR; 355 | $new_DV = $DV; 356 | } 357 | } 358 | delete $SVs{$chr}{$pos}; 359 | delete $count{$chr}{$pos}; 360 | my $line2 = $list2[0]."\t".$new_pos."\t".$new_length."\t".$new_type."\t".$new_haplo."\t".$new_END."\t".$new_DR."\t".$new_DV."\t".$new_REF."\t".$new_ALT; 361 | $SVs{$list2[0]}{$new_pos} = $line2; 362 | $count{$chr}{$new_pos} = $count_tmp+1; 363 | $cutesv{$chr}{$list[1]} = $line; 364 | 365 | if ($type eq "INS") 366 | { 367 | $count{$chr}{$new_pos} = $count_tools; 368 | } 369 | next CUTESV; 370 | 371 | } 372 | elsif ($input_pbsv ne "") 373 | { 374 | my $score_prev = ""; 375 | my $pos_prev = ""; 376 | POS_ALMOST2da: 377 | while ($v < $length_match) 378 | { 379 | POS_ALMOST2d: my $pos_tmp = ($min*$v)+$pos; 380 | 381 | if (exists($pbsv2{$chr}{$pos_tmp})) 382 | { 383 | $prev_match2 = "yes"; 384 | } 385 | 386 | if (exists($pbsv{$chr}{$pos_tmp}) && exists($SVs{$chr}{$pos_tmp}) && $prev_match2 eq "") 387 | { 388 | my $count_tmp = $count{$chr}{$pos_tmp}; 389 | my @list2 = split /\t/, $SVs{$chr}{$pos_tmp}; 390 | 391 | if (exists($cutesv{$chr}{$pos_tmp}) && $list2[3] eq $type && ($list2[2] > $length*$lenght_margin1 && $list2[2] < $length*$lenght_margin2)) 392 | { 393 | next CUTESV; 394 | } 395 | else 396 | { 397 | my $score = '0'; 398 | if ($list2[3] eq $type || ($type eq "INS" && $list2[3] eq "DUP") || ($type eq "DUP" && $list2[3] eq "INS")) 399 | { 400 | $score += 1; 401 | } 402 | my $g = (800-$v)/800; 403 | $score += $g; 404 | my $score_tmp = '0'; 405 | if ($list2[2] ne "." && $length ne ".") 406 | { 407 | if ($list2[2] >= $length) 408 | { 409 | $score_tmp = 1-(($list2[2]-$length)/$length); 410 | if ($score_tmp < 0) 411 | { 412 | $score_tmp = '0'; 413 | } 414 | } 415 | else 416 | { 417 | $score_tmp = 1-(($length-$list2[2])/$length); 418 | if ($score_tmp < 0) 419 | { 420 | $score_tmp = '0'; 421 | } 422 | } 423 | } 424 | $score += $score_tmp; 425 | 426 | if ($score_prev ne "" && $score_prev > $score && $score_prev ne "no") 427 | { 428 | $pos_tmp = $pos_prev; 429 | $count_tmp = $count{$chr}{$pos_tmp}; 430 | @list2 = split /\t/, $SVs{$chr}{$pos_tmp}; 431 | } 432 | 433 | if ($score > 2 || ($score_prev ne "" && $score_prev > $score) || $score_prev eq "no") 434 | { 435 | 436 | if ($score_prev eq "no") 437 | { 438 | print $SVs{$chr}{$pos_tmp}."\n"; 439 | print $line."\n"; 440 | } 441 | my $new_pos = $pos_tmp; 442 | my $new_length = $list2[2]; 443 | my $new_type = $list2[3]; 444 | my $new_haplo = $list2[4]; 445 | my $new_END = $list2[5]; 446 | my $new_DR = $list2[6]; 447 | my $new_DV = $list2[7]; 448 | my $new_REF = $list2[8]; 449 | my $new_ALT = $list2[9]; 450 | 451 | if ($type eq "INS" && $list2[3] ne $type) 452 | { 453 | $new_type = $type; 454 | } 455 | if ($HAP[0] ne "." && ($type eq "INV" || $type eq "INS")) 456 | { 457 | $new_haplo = $HAP[0]; 458 | $new_DR = $DR; 459 | $new_DV = $DV; 460 | } 461 | if ($type eq "INV" || $list2[2] eq ".") 462 | { 463 | if ($length ne ".") 464 | { 465 | $new_length = $length; 466 | $new_END = $END; 467 | } 468 | $new_pos = $pos; 469 | } 470 | 471 | delete $SVs{$chr}{$pos_tmp}; 472 | delete $count{$chr}{$pos_tmp}; 473 | my $line2 = $list2[0]."\t".$new_pos."\t".$new_length."\t".$new_type."\t".$new_haplo."\t".$new_END."\t".$new_DR."\t".$new_DV."\t".$new_REF."\t".$new_ALT; 474 | $SVs{$list2[0]}{$new_pos} = $line2; 475 | $count{$chr}{$new_pos} = $count_tmp+1; 476 | $cutesv{$chr}{$new_pos} = $line; 477 | $pbsv2{$chr}{$new_pos} = undef; 478 | 479 | if ($type eq "INS") 480 | { 481 | $count{$chr}{$new_pos} = $count_tools; 482 | } 483 | 484 | if ($new_pos eq $pos) 485 | { 486 | if (exists($pbsv{$chr}{$pos_tmp})) 487 | { 488 | my $tmp = $pbsv{$chr}{$pos_tmp}; 489 | delete $pbsv{$chr}{$pos_tmp}; 490 | $pbsv{$chr}{$new_pos} = $tmp; 491 | } 492 | if (exists($cutesv{$chr}{$pos_tmp})) 493 | { 494 | my $tmp = $cutesv{$chr}{$pos_tmp}; 495 | delete $cutesv{$chr}{$pos_tmp}; 496 | $cutesv{$chr}{$new_pos} = $tmp; 497 | } 498 | } 499 | next CUTESV; 500 | } 501 | elsif ($min eq '1') 502 | { 503 | $min = '-1'; 504 | if ($score > $score_prev) 505 | { 506 | $score_prev = $score; 507 | $pos_prev = $pos_tmp; 508 | } 509 | goto POS_ALMOST2d; 510 | } 511 | else 512 | { 513 | $min = '1'; 514 | if ($score > $score_prev) 515 | { 516 | $score_prev = $score; 517 | $pos_prev = $pos_tmp; 518 | } 519 | } 520 | $v++; 521 | goto POS_ALMOST2d; 522 | } 523 | } 524 | elsif ($min eq '1') 525 | { 526 | $min = '-1'; 527 | goto POS_ALMOST2d; 528 | } 529 | else 530 | { 531 | $min = '1'; 532 | } 533 | $v++; 534 | } 535 | if ($score_prev ne "" && $score_prev ne "no" && $score_prev > 1.6) 536 | { 537 | $score_prev = "no"; 538 | $v = '1'; 539 | #goto POS_ALMOST2da; 540 | } 541 | } 542 | if ($HAP[2]/($HAP[2]+$HAP[1]) > 0.2 && ($total_tools eq "2" || ($type ne "INV" && $type ne "BND"))) 543 | { 544 | $SVs{$chr}{$list[1]} = $converted_line; 545 | $count{$chr}{$list[1]} = '1'; 546 | $cutesv{$chr}{$list[1]} = $line; 547 | if ($type ne "INV" && $type ne "BND" && $type ne "DUP" && $HAP[0] ne "" && $HAP[0] ne "./." && $prev_match2 eq "") 548 | { 549 | $count{$chr}{$list[1]} = '2'; 550 | } 551 | } 552 | } 553 | } 554 | } 555 | } 556 | 557 | undef %pbsv2; 558 | undef %cutesv2; 559 | 560 | if ($input_sniffles ne "") 561 | { 562 | SNIFFLES: while (my $line = ) 563 | { 564 | chomp($line); 565 | my $first = substr $line, 0, 1; 566 | if ($first ne "#") 567 | { 568 | my @list = split /\t/, $line; 569 | my $pos = $list[1]; 570 | my $REF = $list[3]; 571 | my $ALT = $list[4]; 572 | my $length; 573 | my $type; 574 | my $v = '1'; 575 | my $min = '1'; 576 | my $END = ""; 577 | 578 | my $info = $list[7]; 579 | my @info = split /;/, $info; 580 | my @HAP = split /:/, $list[9]; 581 | my $DR = $HAP[1]; 582 | my $DV = $HAP[2]; 583 | 584 | foreach my $info_tmp (@info) 585 | { 586 | my $first_five = substr $info_tmp, 0, 5; 587 | my $first_four = substr $info_tmp, 0, 4; 588 | if ($first_five eq "SVLEN") 589 | { 590 | $length = substr $info_tmp, 6; 591 | } 592 | elsif ($first_five eq "SVTYP") 593 | { 594 | $type = substr $info_tmp, 7; 595 | } 596 | elsif ($first_four eq "END=") 597 | { 598 | $END = substr $info_tmp, 4; 599 | } 600 | } 601 | if ($length eq "") 602 | { 603 | $length = "."; 604 | } 605 | if ($length < 0) 606 | { 607 | $length *= -1; 608 | } 609 | my $chr = $list[0]; 610 | if ($list[0] =~ m/chr(\d+|X|Y)/) 611 | { 612 | $chr = $1; 613 | } 614 | my $converted_line = $chr."\t".$list[1]."\t".$length."\t".$type."\t".$HAP[0]."\t".$END."\t".$DR."\t".$DV."\t".$REF."\t".$ALT; 615 | 616 | if (exists ($sniffles{$chr}{$list[1]})) 617 | { 618 | next SNIFFLES; 619 | } 620 | 621 | if ($list[6] eq "PASS" && ($length eq "." || $length > 45) && ($HAP[2] >= $min_coverage || ($HAP[2] >= $min_coverage-1 && $HAP[2]/($HAP[2]+$HAP[1]) > 0.3)) 622 | && $HAP[2]/($HAP[2]+$HAP[1]) > 0.1 && ($HAP[0] ne "0/0" || $input_cutesv eq "")) 623 | { 624 | my $prev_match_pbsv = ""; 625 | my $prev_match_cutesv = ""; 626 | if (exists($pbsv2{$chr}{$list[1]})) 627 | { 628 | $prev_match_pbsv = "yes"; 629 | } 630 | if (exists($cutesv2{$chr}{$list[1]})) 631 | { 632 | $prev_match_cutesv = "yes"; 633 | } 634 | 635 | if ((exists($pbsv{$chr}{$list[1]}) && $prev_match_pbsv eq "") || (exists($cutesv{$chr}{$list[1]}) && $prev_match_cutesv eq "")) 636 | { 637 | my @list2 = split /\t/, $SVs{$chr}{$list[1]}; 638 | 639 | if (exists($pbsv{$chr}{$list[1]})) 640 | { 641 | $pbsv2{$chr}{$list[1]} = undef; 642 | } 643 | if (exists($cutesv{$chr}{$list[1]})) 644 | { 645 | $cutesv2{$chr}{$list[1]} = undef; 646 | } 647 | 648 | my $no_pbsv = ""; 649 | if (exists($pbsv{$chr}{$list[1]})) 650 | { 651 | } 652 | else 653 | { 654 | $no_pbsv = "yes"; 655 | } 656 | 657 | my $count_tmp = $count{$chr}{$list[1]}; 658 | 659 | my $new_pos = $pos; 660 | my $new_length = $list2[2]; 661 | my $new_type = $list2[3]; 662 | my $new_haplo = $list2[4]; 663 | my $new_END = $list2[5]; 664 | my $new_DR = $list2[6]; 665 | my $new_DV = $list2[7]; 666 | my $new_REF = $list2[8]; 667 | my $new_ALT = $list2[9]; 668 | 669 | if (($list2[3] eq $type || $list2[3] eq "BND" || $list2[3] eq "." || ($type eq "." && $list2[3] ne "BND") || ($list2[3] ne "DEL" && $type ne "DEL" && $list2[3] ne "BND"))) 670 | { 671 | $count{$chr}{$list[1]} = $count_tmp+1; 672 | 673 | if ($no_pbsv ne "") 674 | { 675 | $new_length = $length; 676 | $new_END = $END; 677 | $new_REF = $REF; 678 | $new_ALT = $ALT; 679 | } 680 | 681 | if ($type eq "INS" && $list2[3] ne $type && $type ne ".") 682 | { 683 | $new_type = $type; 684 | } 685 | if ($list2[3] eq "BND" || ($list2[3] eq "INV" && ($type eq "DEL" || $type eq "INS")) || ($list2[3] eq "DUP" && $type eq "INS") || $list2[3] eq ".") 686 | { 687 | if ($length ne ".") 688 | { 689 | $new_length = $length; 690 | $new_END = $END; 691 | } 692 | if ($type ne ".") 693 | { 694 | $new_type = $type; 695 | } 696 | } 697 | if ($type eq "INV" || $list2[2] eq ".") 698 | { 699 | if ($length ne ".") 700 | { 701 | $new_length = $length; 702 | $new_END = $END; 703 | $new_REF = $REF; 704 | $new_ALT = $ALT; 705 | } 706 | } 707 | 708 | delete $SVs{$chr}{$pos}; 709 | delete $count{$chr}{$pos}; 710 | 711 | my $line2 = $list2[0]."\t".$new_pos."\t".$new_length."\t".$new_type."\t".$new_haplo."\t".$new_END."\t".$new_DR."\t".$new_DV."\t".$new_REF."\t".$new_ALT; 712 | $SVs{$list2[0]}{$new_pos} = $line2; 713 | $count{$chr}{$new_pos} = $count_tmp+1; 714 | $sniffles{$chr}{$new_pos} = $line; 715 | 716 | if ($type eq "INS") 717 | { 718 | $count{$chr}{$new_pos} = $count_tools; 719 | } 720 | next SNIFFLES; 721 | } 722 | } 723 | elsif ($input_pbsv ne "" || $input_cutesv ne "") 724 | { 725 | my $score_prev = ""; 726 | my $pos_prev = ""; 727 | POS_ALMOST2ca: 728 | while ($v < $length_match) 729 | { 730 | POS_ALMOST2c: my $pos_tmp = ($min*$v)+$pos; 731 | 732 | my $prev_match_pbsv2 = ""; 733 | my $prev_match_cutesv2 = ""; 734 | if (exists($pbsv2{$chr}{$pos_tmp})) 735 | { 736 | $prev_match_pbsv2 = "yes"; 737 | } 738 | if (exists($cutesv2{$chr}{$pos_tmp})) 739 | { 740 | $prev_match_cutesv2 = "yes"; 741 | } 742 | 743 | if ((exists($pbsv{$chr}{$pos_tmp}) && $prev_match_pbsv2 eq "") || (exists($cutesv{$chr}{$pos_tmp}) && $prev_match_cutesv2 eq "")) 744 | { 745 | my @list2 = split /\t/, $SVs{$chr}{$pos_tmp}; 746 | my $count_tmp = $count{$chr}{$pos_tmp}; 747 | 748 | my $no_pbsv = ""; 749 | if (exists($pbsv{$chr}{$pos_tmp})) 750 | { 751 | } 752 | else 753 | { 754 | $no_pbsv = "yes"; 755 | } 756 | 757 | if (exists ($sniffles{$chr}{$pos_tmp}) && $list2[3] eq $type && ($list2[2] > $length*$lenght_margin1 && $list2[2] < $length*$lenght_margin2)) 758 | { 759 | next SNIFFLES; 760 | } 761 | else 762 | { 763 | if ($list2[3] eq $type || $list2[3] eq "BND" || $list2[3] eq "." || ($type eq "." && $list2[3] ne "BND") || 764 | ($list2[3] ne "DEL" && $type ne "DEL" && $list2[3] ne "BND")) 765 | { 766 | my $score = '0'; 767 | if ($list2[3] eq $type || ($type eq "INS" && $list2[3] eq "DUP") || ($type eq "DUP" && $list2[3] eq "INS")) 768 | { 769 | $score += 1; 770 | } 771 | my $g = (800-$v)/800; 772 | $score += $g; 773 | my $score_tmp = '0'; 774 | if ($list2[2] ne "." && $length ne ".") 775 | { 776 | if ($list2[2] >= $length) 777 | { 778 | $score_tmp = 1-(($list2[2]-$length)/$length); 779 | if ($score_tmp < 0) 780 | { 781 | $score_tmp = '0'; 782 | } 783 | } 784 | else 785 | { 786 | $score_tmp = 1-(($length-$list2[2])/$length); 787 | if ($score_tmp < 0) 788 | { 789 | $score_tmp = '0'; 790 | } 791 | } 792 | } 793 | $score += $score_tmp; 794 | 795 | if ($score_prev ne "" && $score_prev > $score && $score_prev ne "no") 796 | { 797 | $pos_tmp = $pos_prev; 798 | $count_tmp = $count{$chr}{$pos_tmp}; 799 | @list2 = split /\t/, $SVs{$chr}{$pos_tmp}; 800 | } 801 | 802 | if ($score > 2 || ($score_prev ne "" && $score_prev > $score) || $score_prev eq "no") 803 | { 804 | 805 | if (exists($pbsv{$chr}{$pos_tmp})) 806 | { 807 | $pbsv2{$chr}{$pos_tmp} = undef; 808 | } 809 | if (exists($cutesv{$chr}{$pos_tmp})) 810 | { 811 | $cutesv2{$chr}{$pos_tmp} = undef; 812 | } 813 | 814 | my $new_pos = $pos_tmp; 815 | my $new_length = $list2[2]; 816 | my $new_type = $list2[3]; 817 | my $new_haplo = $list2[4]; 818 | my $new_END = $list2[5]; 819 | my $new_DR = $list2[6]; 820 | my $new_DV = $list2[7]; 821 | my $new_REF = $list2[8]; 822 | my $new_ALT = $list2[9]; 823 | 824 | if ($no_pbsv eq "yes") 825 | { 826 | $new_pos = $pos; 827 | $new_length = $length; 828 | $new_END = $END; 829 | $new_REF = $REF; 830 | $new_ALT = $ALT; 831 | } 832 | 833 | if ($type eq "INS" && $list2[3] ne $type && $type ne ".") 834 | { 835 | $new_type = $type; 836 | } 837 | if ($list2[3] eq "BND" || ($list2[3] eq "INV" && ($type eq "DEL" || $type eq "INS")) || $list2[3] eq "." || ($type eq "INS" && $list2[3] ne $type)) 838 | { 839 | if ($length ne ".") 840 | { 841 | $new_length = $length; 842 | $new_END = $END; 843 | } 844 | if ($type ne ".") 845 | { 846 | $new_type = $type; 847 | } 848 | } 849 | if ($type eq "INV" || $list2[2] eq ".") 850 | { 851 | if ($length ne ".") 852 | { 853 | $new_length = $length; 854 | $new_END = $END; 855 | $new_REF = $REF; 856 | $new_ALT = $ALT; 857 | } 858 | $new_pos = $pos; 859 | } 860 | 861 | delete $SVs{$chr}{$pos_tmp}; 862 | delete $count{$chr}{$pos_tmp}; 863 | my $line2 = $list2[0]."\t".$new_pos."\t".$new_length."\t".$new_type."\t".$new_haplo."\t".$new_END."\t".$new_DR."\t".$new_DV."\t".$new_REF."\t".$new_ALT; 864 | $SVs{$list2[0]}{$new_pos} = $line2; 865 | $count{$chr}{$new_pos} = $count_tmp+1; 866 | $sniffles{$chr}{$pos_tmp} = $line; 867 | 868 | if ($type eq "INS") 869 | { 870 | $count{$chr}{$new_pos} = $count_tools; 871 | } 872 | if ($new_pos eq $pos) 873 | { 874 | if (exists($pbsv{$chr}{$pos_tmp})) 875 | { 876 | my $tmp = $pbsv{$chr}{$pos_tmp}; 877 | delete $pbsv{$chr}{$pos_tmp}; 878 | $pbsv{$chr}{$new_pos} = $tmp; 879 | } 880 | if (exists($cutesv{$chr}{$pos_tmp})) 881 | { 882 | my $tmp = $cutesv{$chr}{$pos_tmp}; 883 | delete $cutesv{$chr}{$pos_tmp}; 884 | $cutesv{$chr}{$new_pos} = $tmp; 885 | } 886 | if (exists($sniffles{$chr}{$pos_tmp})) 887 | { 888 | my $tmp = $sniffles{$chr}{$pos_tmp}; 889 | delete $sniffles{$chr}{$pos_tmp}; 890 | $sniffles{$chr}{$new_pos} = $tmp; 891 | } 892 | } 893 | next SNIFFLES; 894 | } 895 | elsif ($min eq '1') 896 | { 897 | $min = '-1'; 898 | if ($score > $score_prev) 899 | { 900 | $score_prev = $score; 901 | $pos_prev = $pos_tmp; 902 | } 903 | goto POS_ALMOST2ca; 904 | } 905 | else 906 | { 907 | $min = '1'; 908 | if ($score > $score_prev) 909 | { 910 | $score_prev = $score; 911 | $pos_prev = $pos_tmp; 912 | } 913 | } 914 | $v++; 915 | goto POS_ALMOST2ca; 916 | } 917 | next SNIFFLES; 918 | } 919 | } 920 | elsif ($min eq '1') 921 | { 922 | $min = '-1'; 923 | goto POS_ALMOST2c; 924 | } 925 | else 926 | { 927 | $min = '1'; 928 | } 929 | $v++; 930 | } 931 | if ($score_prev ne "" && $score_prev ne "no" && $score_prev > 1.6) 932 | { 933 | $score_prev = "no"; 934 | $v = '1'; 935 | goto POS_ALMOST2ca; 936 | } 937 | } 938 | 939 | $SVs{$chr}{$list[1]} = $converted_line; 940 | $count{$chr}{$list[1]} = '1'; 941 | $sniffles{$chr}{$list[1]} = $line; 942 | if (($type eq "INS" || $type eq "DEL") && $HAP[0] ne "0/0" && $HAP[0] ne "./." && $HAP[0] ne "1/1" && ($HAP[2]+$HAP[1]) > 4) 943 | { 944 | $count{$chr}{$list[1]} = $count_tools; 945 | } 946 | elsif ($high_recall eq "2" && $HAP[0] ne "0/0" && $HAP[0] ne "1/1" && ($HAP[2]+$HAP[1]) > 4 && $input_cutesv eq "") 947 | { 948 | $count{$chr}{$list[1]} = '2'; 949 | $SVs{$chr}{$list[1]} = $converted_line; 950 | $sniffles{$chr}{$list[1]} = $line; 951 | } 952 | elsif ($high_recall eq "2" && $HAP[0] ne "0/0" && $HAP[0] ne "1/1" && $input_cutesv eq "") 953 | { 954 | $count{$chr}{$list[1]} = '2'; 955 | $SVs{$chr}{$list[1]} = $converted_line; 956 | $sniffles{$chr}{$list[1]} = $line; 957 | } 958 | } 959 | } 960 | } 961 | } 962 | 963 | undef %pbsv2; 964 | undef %cutesv2; 965 | 966 | if ($input_nanovar ne "") 967 | { 968 | NANOVAR: while (my $line = ) 969 | { 970 | chomp($line); 971 | my $score_prev = ""; 972 | my $pos_prev = ""; 973 | my $first_nuc = substr $line, 0, 1; 974 | if ($first_nuc ne "#") 975 | { 976 | my @list = split /\t/, $line; 977 | my $pos = $list[1]; 978 | my $REF = $list[3]; 979 | my $ALT = $list[4]; 980 | my $type; 981 | my $length; 982 | my $v = '1'; 983 | my $min = '1'; 984 | my $END = ""; 985 | 986 | my $info = $list[7]; 987 | my @info = split /;/, $info; 988 | my @HAP = split /:/, $list[9]; 989 | my @HAP_REF = split /,/, $HAP[1]; 990 | my @HAP2 = split /,/, $HAP[2]; 991 | my $not_save = ""; 992 | 993 | foreach my $info_tmp (@info) 994 | { 995 | my $first_five = substr $info_tmp, 0, 5; 996 | my $first_four = substr $info_tmp, 0, 4; 997 | if ($info_tmp =~ m/SVLEN=>*-*(\d+)/) 998 | { 999 | $length = $1; 1000 | } 1001 | elsif ($first_five eq "SVTYP") 1002 | { 1003 | $type = substr $info_tmp, 7; 1004 | } 1005 | elsif ($first_four eq "END=") 1006 | { 1007 | $END = substr $info_tmp, 4; 1008 | } 1009 | } 1010 | if ($length eq "") 1011 | { 1012 | $length = "."; 1013 | } 1014 | if ($length < 0) 1015 | { 1016 | $length *= -1; 1017 | } 1018 | my $chr = $list[0]; 1019 | if ($list[0] =~ m/chr(\d+|X|Y)/) 1020 | { 1021 | $chr = $1; 1022 | } 1023 | my @depth = split /,/, $HAP[2]; 1024 | my $DR = $depth[0]; 1025 | my $DV = $depth[1]; 1026 | 1027 | my $converted_line = $chr."\t".$list[1]."\t".$length."\t".$type."\t".$HAP[0]."\t".$END."\t".$DR."\t".$DV."\t".$REF."\t".$ALT; 1028 | 1029 | if (exists ($nanovar{$chr}{$list[1]})) 1030 | { 1031 | next NANOVAR; 1032 | } 1033 | 1034 | if (($HAP2[1] >= $min_coverage || ($HAP2[1] >= $min_coverage-1 && $HAP2[0]/($HAP_REF[0]+$HAP2[0]) > 0.3)) && $list[6] eq "PASS" && $type ne "." 1035 | && $type ne "BND" && ($HAP2[0]/($HAP_REF[0]+$HAP2[0]) > 0.12 || $HAP_REF[0] eq ".") && ($length eq "." || $length > 45) && ($HAP[0] ne "./." || $count_tools < 4)) 1036 | { 1037 | my @list2 = split /\t/, $SVs{$chr}{$list[1]}; 1038 | 1039 | my $prev_match_pbsv = ""; 1040 | my $prev_match_cutesv = ""; 1041 | my $prev_match_sniffles = ""; 1042 | if (exists($pbsv2{$chr}{$list[1]})) 1043 | { 1044 | $prev_match_pbsv = "yes"; 1045 | } 1046 | if (exists($cutesv2{$chr}{$list[1]})) 1047 | { 1048 | $prev_match_cutesv = "yes"; 1049 | } 1050 | if (exists($sniffles2{$chr}{$list[1]})) 1051 | { 1052 | $prev_match_sniffles = "yes"; 1053 | } 1054 | 1055 | if ((exists($pbsv{$chr}{$list[1]}) && $prev_match_pbsv eq "") || (exists($cutesv{$chr}{$list[1]}) && $prev_match_cutesv eq "") || 1056 | (exists($sniffles{$chr}{$list[1]}) && $prev_match_sniffles eq "")) 1057 | { 1058 | if (exists($pbsv{$chr}{$list[1]})) 1059 | { 1060 | $pbsv2{$chr}{$list[1]} = undef; 1061 | } 1062 | if (exists($cutesv{$chr}{$list[1]})) 1063 | { 1064 | $cutesv2{$chr}{$list[1]} = undef; 1065 | } 1066 | if (exists($sniffles{$chr}{$list[1]})) 1067 | { 1068 | $sniffles2{$chr}{$list[1]} = undef; 1069 | } 1070 | 1071 | my $count_tmp = $count{$chr}{$list[1]}; 1072 | 1073 | if (($list2[3] eq $type || $list2[3] eq "." || $type eq "." || ($list2[3] ne "DEL" && $type ne "DEL") || $type eq "INV" || $list2[3] eq "INV") 1074 | && ($list2[3] ne "BND" || exists($sniffles{$chr}{$list[1]}) || exists($cutesv{$chr}{$list[1]}))) 1075 | { 1076 | $count{$chr}{$list[1]} = $count_tmp+1; 1077 | 1078 | my $new_pos = $pos; 1079 | my $new_length = $list2[2]; 1080 | my $new_type = $list2[3]; 1081 | my $new_haplo = $list2[4]; 1082 | my $new_END = $list2[5]; 1083 | my $new_DR = $list2[6]; 1084 | my $new_DV = $list2[7]; 1085 | my $new_REF = $list2[8]; 1086 | my $new_ALT = $list2[9]; 1087 | 1088 | if (($list2[3] eq "BND" || ($list2[3] eq "INV" && ($type eq "DEL" || $type eq "INS")) || $list2[3] eq ".") && $type ne "DUP") 1089 | { 1090 | if ($type ne ".") 1091 | { 1092 | $new_type = $type; 1093 | } 1094 | } 1095 | if ($type eq "INV") 1096 | { 1097 | if ($length ne ".") 1098 | { 1099 | $new_length = $length; 1100 | $new_END = $END; 1101 | } 1102 | if ($type ne ".") 1103 | { 1104 | $new_type = $type; 1105 | } 1106 | if ($HAP[0] ne "./.") 1107 | { 1108 | $new_haplo = $HAP[0]; 1109 | $new_DR = $DR; 1110 | $new_DV = $DV; 1111 | $new_REF = $REF; 1112 | $new_ALT = $ALT; 1113 | } 1114 | } 1115 | if ($list2[2] eq ".") 1116 | { 1117 | $new_length = $length; 1118 | $new_END = $END; 1119 | } 1120 | if (exists($pbsv{$chr}{$pos})) 1121 | {} 1122 | else 1123 | { 1124 | if ($HAP[0] ne "./.") 1125 | { 1126 | $new_haplo = $HAP[0]; 1127 | $new_DR = $DR; 1128 | $new_DV = $DV; 1129 | $new_REF = $REF; 1130 | $new_ALT = $ALT; 1131 | } 1132 | } 1133 | if ($new_haplo eq "./.") 1134 | { 1135 | $new_haplo = $HAP[0]; 1136 | $new_DR = $DR; 1137 | $new_DV = $DV; 1138 | $new_REF = $REF; 1139 | $new_ALT = $ALT; 1140 | } 1141 | 1142 | delete $SVs{$chr}{$pos}; 1143 | delete $count{$chr}{$pos}; 1144 | my $line2 = $list2[0]."\t".$new_pos."\t".$new_length."\t".$new_type."\t".$new_haplo."\t".$new_END."\t".$new_DR."\t".$new_DV."\t".$new_REF."\t".$new_ALT; 1145 | $SVs{$list2[0]}{$new_pos} = $line2; 1146 | $count{$chr}{$new_pos} = $count_tmp+1; 1147 | $nanovar{$chr}{$new_pos} = $line; 1148 | 1149 | if (($HAP[0] eq "0/1" || $HAP[0] eq "1/0") && $type eq "INS" && $length < 2000) 1150 | { 1151 | $count{$chr}{$new_pos} = $count_tmp+2; 1152 | } 1153 | } 1154 | next NANOVAR; 1155 | } 1156 | else 1157 | { 1158 | 1159 | POS_ALMOST2fa: 1160 | while ($v < $length_match) 1161 | { 1162 | POS_ALMOST2f: my $pos_tmp = ($min*$v)+$pos; 1163 | 1164 | my $prev_match_pbsv2 = ""; 1165 | my $prev_match_cutesv2 = ""; 1166 | my $prev_match_sniffles2 = ""; 1167 | if (exists($pbsv2{$chr}{$pos_tmp})) 1168 | { 1169 | $prev_match_pbsv2 = "yes"; 1170 | } 1171 | if (exists($cutesv2{$chr}{$pos_tmp})) 1172 | { 1173 | $prev_match_cutesv2 = "yes"; 1174 | } 1175 | if (exists($sniffles2{$chr}{$pos_tmp})) 1176 | { 1177 | $prev_match_sniffles2 = "yes"; 1178 | } 1179 | 1180 | if ((exists($pbsv{$chr}{$pos_tmp}) && $prev_match_pbsv2 eq "") || (exists($cutesv{$chr}{$pos_tmp}) && $prev_match_cutesv2 eq "") || 1181 | (exists($sniffles{$chr}{$pos_tmp}) && $prev_match_sniffles2 eq "")) 1182 | { 1183 | my $count_tmp = $count{$chr}{$pos_tmp}; 1184 | if (exists($nanovar{$chr}{$pos_tmp}) && $high_recall eq "gg") 1185 | { 1186 | if ($min eq '1') 1187 | { 1188 | $min = '-1'; 1189 | goto POS_ALMOST2f; 1190 | } 1191 | else 1192 | { 1193 | $min = '1'; 1194 | } 1195 | } 1196 | else 1197 | { 1198 | my @list2 = split /\t/, $SVs{$chr}{$pos_tmp}; 1199 | 1200 | if (($list2[3] eq $type || ($list2[3] eq "BND" && $type eq "INV") || $list2[3] eq "." || $type eq "." || ($list2[3] ne "DEL" && $type ne "DEL") 1201 | || $type eq "INV" || $list2[3] eq "INV") && ($list2[3] ne "BND" || exists($sniffles{$chr}{$pos_tmp}) || exists($cutesv{$chr}{$pos_tmp}))) 1202 | { 1203 | $count{$chr}{$pos_tmp} = $count_tmp+1; 1204 | 1205 | my $score = '0'; 1206 | if ($list2[3] eq $type || ($type eq "INS" && $list2[3] eq "DUP") || ($type eq "DUP" && $list2[3] eq "INS")) 1207 | { 1208 | $score += 1; 1209 | } 1210 | my $g = (800-$v)/800; 1211 | $score += $g; 1212 | my $score_tmp = '0'; 1213 | if ($list2[2] ne "." && $length ne ".") 1214 | { 1215 | if ($list2[2] >= $length) 1216 | { 1217 | $score_tmp = 1-(($list2[2]-$length)/$length); 1218 | if ($score_tmp < 0) 1219 | { 1220 | $score_tmp = '0'; 1221 | } 1222 | } 1223 | else 1224 | { 1225 | $score_tmp = 1-(($length-$list2[2])/$length); 1226 | if ($score_tmp < 0) 1227 | { 1228 | $score_tmp = '0'; 1229 | } 1230 | } 1231 | } 1232 | $score += $score_tmp; 1233 | 1234 | if ($score_prev ne "" && $score_prev > $score && $score_prev ne "no") 1235 | { 1236 | $pos_tmp = $pos_prev; 1237 | $count_tmp = $count{$chr}{$pos_tmp}; 1238 | @list2 = split /\t/, $SVs{$chr}{$pos_tmp}; 1239 | } 1240 | 1241 | if ($score > 2 || ($score_prev ne "" && $score_prev > $score) || $score_prev eq "no") 1242 | { 1243 | if (exists($pbsv{$chr}{$pos_tmp})) 1244 | { 1245 | $pbsv2{$chr}{$pos_tmp} = undef; 1246 | } 1247 | if (exists($cutesv{$chr}{$pos_tmp})) 1248 | { 1249 | $cutesv2{$chr}{$pos_tmp} = undef; 1250 | } 1251 | if (exists($sniffles{$chr}{$pos_tmp})) 1252 | { 1253 | $sniffles2{$chr}{$pos_tmp} = undef; 1254 | } 1255 | 1256 | my $new_pos = $pos_tmp; 1257 | my $new_length = $list2[2]; 1258 | my $new_type = $list2[3]; 1259 | my $new_haplo = $list2[4]; 1260 | my $new_END = $list2[5]; 1261 | my $new_DR = $list2[6]; 1262 | my $new_DV = $list2[7]; 1263 | my $new_REF = $list2[8]; 1264 | my $new_ALT = $list2[9]; 1265 | 1266 | if (($list2[3] eq "BND" || $list2[3] eq ".") && $type ne "DUP") 1267 | { 1268 | if ($length ne ".") 1269 | { 1270 | $new_length = $length; 1271 | $new_END = $END; 1272 | } 1273 | if ($type ne ".") 1274 | { 1275 | $new_type = $type; 1276 | } 1277 | } 1278 | if ($type eq "INV") 1279 | { 1280 | $new_pos = $pos; 1281 | 1282 | if (exists($pbsv{$chr}{$pos_tmp})) 1283 | { 1284 | my $tmp = $pbsv{$chr}{$pos_tmp}; 1285 | delete $pbsv{$chr}{$pos_tmp}; 1286 | $pbsv{$chr}{$new_pos} = $tmp; 1287 | } 1288 | if (exists($cutesv{$chr}{$pos_tmp})) 1289 | { 1290 | my $tmp = $cutesv{$chr}{$pos_tmp}; 1291 | delete $cutesv{$chr}{$pos_tmp}; 1292 | $cutesv{$chr}{$new_pos} = $tmp; 1293 | } 1294 | if (exists($sniffles{$chr}{$pos_tmp})) 1295 | { 1296 | my $tmp = $sniffles{$chr}{$pos_tmp}; 1297 | delete $sniffles{$chr}{$pos_tmp}; 1298 | $sniffles{$chr}{$new_pos} = $tmp; 1299 | } 1300 | if (exists($nanovar{$chr}{$pos_tmp})) 1301 | { 1302 | my $tmp = $nanovar{$chr}{$pos_tmp}; 1303 | delete $nanovar{$chr}{$pos_tmp}; 1304 | $nanovar{$chr}{$new_pos} = $tmp; 1305 | } 1306 | 1307 | if ($length ne ".") 1308 | { 1309 | $new_length = $length; 1310 | $new_END = $END; 1311 | $new_REF = $REF; 1312 | $new_ALT = $ALT; 1313 | } 1314 | if ($type ne ".") 1315 | { 1316 | $new_type = $type; 1317 | } 1318 | if ($HAP[0] ne "./.") 1319 | { 1320 | $new_haplo = $HAP[0]; 1321 | $new_DR = $DR; 1322 | $new_DV = $DV; 1323 | } 1324 | } 1325 | if ($list2[2] eq ".") 1326 | { 1327 | $new_length = $length; 1328 | $new_END = $END; 1329 | $new_REF = $REF; 1330 | $new_ALT = $ALT; 1331 | } 1332 | if (exists($pbsv{$chr}{$pos_tmp})) 1333 | {} 1334 | else 1335 | { 1336 | if ($HAP[0] ne "./.") 1337 | { 1338 | $new_haplo = $HAP[0]; 1339 | $new_DR = $DR; 1340 | $new_DV = $DV; 1341 | } 1342 | } 1343 | if ($new_haplo eq "./.") 1344 | { 1345 | $new_haplo = $HAP[0]; 1346 | $new_DR = $DR; 1347 | $new_DV = $DV; 1348 | } 1349 | 1350 | delete $SVs{$chr}{$pos_tmp}; 1351 | delete $count{$chr}{$pos_tmp}; 1352 | my $line2 = $list2[0]."\t".$new_pos."\t".$new_length."\t".$new_type."\t".$new_haplo."\t".$new_END."\t".$new_DR."\t".$new_DV."\t".$new_REF."\t".$new_ALT; 1353 | $SVs{$list2[0]}{$new_pos} = $line2; 1354 | $count{$chr}{$new_pos} = $count_tmp+1; 1355 | $nanovar{$chr}{$new_pos} = $line; 1356 | 1357 | if (($HAP[0] eq "0/1" || $HAP[0] eq "1/0") && $type eq "INS" && $length < 2000) 1358 | { 1359 | $count{$chr}{$new_pos} = $count_tmp+2; 1360 | } 1361 | next NANOVAR; 1362 | } 1363 | elsif ($min eq '1') 1364 | { 1365 | $min = '-1'; 1366 | if ($score > $score_prev) 1367 | { 1368 | $score_prev = $score; 1369 | $pos_prev = $pos_tmp; 1370 | } 1371 | goto POS_ALMOST2fa; 1372 | } 1373 | else 1374 | { 1375 | $min = '1'; 1376 | if ($score > $score_prev) 1377 | { 1378 | $score_prev = $score; 1379 | $pos_prev = $pos_tmp; 1380 | } 1381 | } 1382 | $v++; 1383 | goto POS_ALMOST2fa; 1384 | } 1385 | } 1386 | next NANOVAR; 1387 | } 1388 | elsif ($min eq '1') 1389 | { 1390 | $min = '-1'; 1391 | goto POS_ALMOST2f; 1392 | } 1393 | else 1394 | { 1395 | $min = '1'; 1396 | } 1397 | $v++; 1398 | } 1399 | 1400 | } 1401 | 1402 | if ($type ne "BND" && $HAP[0] ne "./.") 1403 | { 1404 | $SVs{$chr}{$list[1]} = $converted_line; 1405 | $nanovar{$chr}{$list[1]} = $line; 1406 | $count{$chr}{$list[1]} = '1'; 1407 | if (($HAP[0] eq "0/1" || $HAP[0] eq "1/0") && $type eq "INS" && $pos_prev eq "") 1408 | { 1409 | $count{$chr}{$list[1]} = 2; 1410 | } 1411 | } 1412 | } 1413 | } 1414 | } 1415 | } 1416 | 1417 | undef %pbsv2; 1418 | undef %cutesv2; 1419 | undef %sniffles2; 1420 | 1421 | my %svim; 1422 | if ($input_svim ne "") 1423 | { 1424 | SVIM: while (my $line = ) 1425 | { 1426 | chomp($line); 1427 | my $first_nuc = substr $line, 0, 1; 1428 | if ($first_nuc ne "#") 1429 | { 1430 | my @list = split /\t/, $line; 1431 | my $pos = $list[1]; 1432 | my $REF = $list[3]; 1433 | my $ALT = $list[4]; 1434 | my $length; 1435 | my $type; 1436 | my $v = '1'; 1437 | my $min = '1'; 1438 | my $END = ""; 1439 | 1440 | my $info = $list[7]; 1441 | my @info = split /;/, $info; 1442 | my @HAP = split /:/, $list[9]; 1443 | 1444 | my @HAP_INFO = split /:/, $list[8]; 1445 | my $support = '0'; 1446 | 1447 | foreach my $info_tmp (@info) 1448 | { 1449 | my $first_five = substr $info_tmp, 0, 5; 1450 | my $first_four = substr $info_tmp, 0, 4; 1451 | if ($first_five eq "SVLEN") 1452 | { 1453 | $length = substr $info_tmp, 6; 1454 | } 1455 | elsif ($first_five eq "SVTYP") 1456 | { 1457 | $type = substr $info_tmp, 7; 1458 | if ($type eq "DUP:TANDEM" || $type eq "DUP:INT") 1459 | { 1460 | $type = "DUP"; 1461 | } 1462 | } 1463 | elsif ($first_five eq "SUPPO") 1464 | { 1465 | $support = substr $info_tmp, 8; 1466 | } 1467 | elsif ($first_four eq "END=") 1468 | { 1469 | $END = substr $info_tmp, 4; 1470 | } 1471 | } 1472 | if ($length eq "") 1473 | { 1474 | $length = "."; 1475 | } 1476 | if ($length < 0) 1477 | { 1478 | $length *= -1; 1479 | } 1480 | my $chr = $list[0]; 1481 | if ($list[0] =~ m/chr(\d+|X|Y)/) 1482 | { 1483 | $chr = $1; 1484 | } 1485 | my $hapi = $HAP[0]; 1486 | 1487 | if (exists ($svim{$chr}{$list[1]})) 1488 | { 1489 | next SVIM; 1490 | } 1491 | my $ratio = '0'; 1492 | 1493 | if ($HAP[1] > 0) 1494 | { 1495 | $ratio = $support/($HAP[1]) 1496 | } 1497 | 1498 | my @depth = split /,/, $HAP[3]; 1499 | my $DR = $depth[0]; 1500 | my $DV = $depth[1]; 1501 | 1502 | if ($list[6] eq "PASS" && ($support >= $min_coverage || ($support >= $min_coverage-1 && $ratio > 0.3)) && $type ne "BND" && $type ne "DUP" && $hapi ne "0/0" && $hapi ne "./." && 1503 | ($ratio > 0.2 || $HAP_INFO[1] eq "CN" || $HAP[1] eq ".") && ($length eq "." || $length > 45)) 1504 | { 1505 | my $converted_line = $chr."\t".$list[1]."\t".$length."\t".$type."\t".$hapi."\t".$END."\t".$DR."\t".$DV."\t".$REF."\t".$ALT; 1506 | my $prev_match_pbsv = ""; 1507 | my $prev_match_cutesv = ""; 1508 | my $prev_match_sniffles = ""; 1509 | my $prev_match_nanovar = ""; 1510 | 1511 | if (exists($pbsv2{$chr}{$list[1]})) 1512 | { 1513 | $prev_match_pbsv = "yes"; 1514 | } 1515 | if (exists($cutesv2{$chr}{$list[1]})) 1516 | { 1517 | $prev_match_cutesv = "yes"; 1518 | } 1519 | if (exists($sniffles2{$chr}{$list[1]})) 1520 | { 1521 | $prev_match_sniffles = "yes"; 1522 | } 1523 | if (exists($nanovar2{$chr}{$list[1]})) 1524 | { 1525 | $prev_match_nanovar = "yes"; 1526 | } 1527 | 1528 | if ((exists($pbsv{$chr}{$list[1]}) && $prev_match_pbsv eq "") || (exists($cutesv{$chr}{$list[1]}) && $prev_match_cutesv eq "") || 1529 | (exists($sniffles{$chr}{$list[1]}) && $prev_match_sniffles eq "") || (exists($nanovar{$chr}{$list[1]}) && $prev_match_nanovar eq "")) 1530 | { 1531 | if (exists($pbsv{$chr}{$list[1]})) 1532 | { 1533 | $pbsv2{$chr}{$list[1]} = undef; 1534 | } 1535 | if (exists($cutesv{$chr}{$list[1]})) 1536 | { 1537 | $cutesv2{$chr}{$list[1]} = undef; 1538 | } 1539 | if (exists($sniffles{$chr}{$list[1]})) 1540 | { 1541 | $sniffles2{$chr}{$list[1]} = undef; 1542 | } 1543 | if (exists($nanovar{$chr}{$list[1]})) 1544 | { 1545 | $nanovar2{$chr}{$list[1]} = undef; 1546 | } 1547 | 1548 | my $count_tmp = $count{$chr}{$list[1]}; 1549 | my @list2 = split /\t/, $SVs{$chr}{$list[1]}; 1550 | 1551 | my $new_pos = $pos; 1552 | my $new_length = $list2[2]; 1553 | my $new_type = $list2[3]; 1554 | my $new_haplo = $list2[4]; 1555 | my $new_END = $list2[5]; 1556 | my $new_REF = $list2[8]; 1557 | my $new_ALT = $list2[9]; 1558 | 1559 | if ($new_haplo eq "./.") 1560 | { 1561 | $new_haplo = $hapi; 1562 | } 1563 | 1564 | delete $SVs{$chr}{$pos}; 1565 | delete $count{$chr}{$pos}; 1566 | my $line2 = $list2[0]."\t".$new_pos."\t".$new_length."\t".$new_type."\t".$new_haplo."\t".$new_END."\t".$DR."\t".$DV."\t".$new_REF."\t".$new_ALT; 1567 | $SVs{$list2[0]}{$new_pos} = $line2; 1568 | $count{$chr}{$new_pos} = $count_tmp+1; 1569 | $svim{$chr}{$new_pos} = $line; 1570 | 1571 | if ($hapi eq "1/1") 1572 | { 1573 | $count{$chr}{$new_pos} = $count_tools; 1574 | } 1575 | 1576 | next SVIM; 1577 | } 1578 | else 1579 | { 1580 | my $score_prev = ""; 1581 | my $pos_prev = ""; 1582 | POS_ALMOST2ga: 1583 | while ($v < $length_match) 1584 | { 1585 | POS_ALMOST2g: my $pos_tmp = ($min*$v)+$pos; 1586 | 1587 | my $prev_match_pbsv2 = ""; 1588 | my $prev_match_cutesv2 = ""; 1589 | my $prev_match_sniffles2 = ""; 1590 | my $prev_match_nanovar2 = ""; 1591 | 1592 | if (exists($pbsv2{$chr}{$pos_tmp})) 1593 | { 1594 | $prev_match_pbsv2 = "yes"; 1595 | } 1596 | if (exists($cutesv2{$chr}{$pos_tmp})) 1597 | { 1598 | $prev_match_cutesv2 = "yes"; 1599 | } 1600 | if (exists($sniffles2{$chr}{$pos_tmp})) 1601 | { 1602 | $prev_match_sniffles2 = "yes"; 1603 | } 1604 | if (exists($nanovar2{$chr}{$pos_tmp})) 1605 | { 1606 | $prev_match_nanovar2 = "yes"; 1607 | } 1608 | 1609 | if ((exists($pbsv{$chr}{$pos_tmp}) && $prev_match_pbsv2 eq "") || (exists($cutesv{$chr}{$pos_tmp}) && $prev_match_cutesv2 eq "") || 1610 | (exists($sniffles{$chr}{$pos_tmp}) && $prev_match_sniffles2 eq "") || (exists($nanovar{$chr}{$pos_tmp}) && $prev_match_nanovar2 eq "")) 1611 | { 1612 | my $count_tmp = $count{$chr}{$pos_tmp}; 1613 | my @list2 = split /\t/, $SVs{$chr}{$pos_tmp}; 1614 | 1615 | if ($count_tmp > 20) 1616 | { 1617 | if ($min eq '1') 1618 | { 1619 | $min = '-1'; 1620 | goto POS_ALMOST2g; 1621 | } 1622 | else 1623 | { 1624 | $min = '1'; 1625 | } 1626 | } 1627 | elsif (exists($svim{$chr}{$pos_tmp}) && $list2[3] eq $type && ($list2[2] > $length*$lenght_margin1 && $list2[2] < $length*$lenght_margin2)) 1628 | { 1629 | next SVIM; 1630 | } 1631 | elsif ($list2[3] ne "DUP") 1632 | { 1633 | my $score = '0'; 1634 | if ($list2[3] eq $type || ($type eq "INS" && $list2[3] eq "DUP") || ($type eq "DUP" && $list2[3] eq "INS")) 1635 | { 1636 | $score += 1; 1637 | } 1638 | my $g = (800-$v)/800; 1639 | $score += $g; 1640 | my $score_tmp = '0'; 1641 | if ($list2[2] ne "." && $length ne ".") 1642 | { 1643 | if ($list2[2] >= $length) 1644 | { 1645 | $score_tmp = 1-(($list2[2]-$length)/$length); 1646 | if ($score_tmp < 0) 1647 | { 1648 | $score_tmp = '0'; 1649 | } 1650 | } 1651 | else 1652 | { 1653 | $score_tmp = 1-(($length-$list2[2])/$length); 1654 | if ($score_tmp < 0) 1655 | { 1656 | $score_tmp = '0'; 1657 | } 1658 | } 1659 | } 1660 | $score += $score_tmp; 1661 | 1662 | if ($score_prev ne "" && $score_prev > $score && $score_prev ne "no") 1663 | { 1664 | $pos_tmp = $pos_prev; 1665 | $count_tmp = $count{$chr}{$pos_tmp}; 1666 | @list2 = split /\t/, $SVs{$chr}{$pos_tmp}; 1667 | } 1668 | 1669 | if ($score > 2 || ($score_prev ne "" && $score_prev > $score) || $score_prev eq "no") 1670 | { 1671 | if (exists($pbsv{$chr}{$pos_tmp})) 1672 | { 1673 | $pbsv2{$chr}{$pos_tmp} = undef; 1674 | } 1675 | if (exists($cutesv{$chr}{$pos_tmp})) 1676 | { 1677 | $cutesv2{$chr}{$pos_tmp} = undef; 1678 | } 1679 | if (exists($sniffles{$chr}{$pos_tmp})) 1680 | { 1681 | $sniffles2{$chr}{$pos_tmp} = undef; 1682 | } 1683 | if (exists($nanovar{$chr}{$pos_tmp})) 1684 | { 1685 | $nanovar2{$chr}{$pos_tmp} = undef; 1686 | } 1687 | 1688 | my $new_pos = $pos_tmp; 1689 | my $new_length = $list2[2]; 1690 | my $new_type = $list2[3]; 1691 | my $new_haplo = $list2[4]; 1692 | my $new_END = $list2[5]; 1693 | my $new_DR = $list2[6]; 1694 | my $new_DV = $list2[7]; 1695 | my $new_REF = $list2[8]; 1696 | my $new_ALT = $list2[9]; 1697 | 1698 | if (exists($sniffles{$chr}{$pos_tmp})) 1699 | {} 1700 | elsif (exists($cutesv{$chr}{$pos_tmp})) 1701 | {} 1702 | elsif (exists($pbsv{$chr}{$pos_tmp})) 1703 | {} 1704 | elsif ($type eq "INV") 1705 | { 1706 | $new_pos = $pos; 1707 | } 1708 | if ($new_haplo eq "./.") 1709 | { 1710 | $new_haplo = $hapi; 1711 | $new_DR = $DR; 1712 | $new_DV = $DV; 1713 | $new_ALT = $ALT; 1714 | $new_REF = $REF; 1715 | } 1716 | 1717 | delete $SVs{$chr}{$pos_tmp}; 1718 | delete $count{$chr}{$pos_tmp}; 1719 | my $line2 = $list2[0]."\t".$new_pos."\t".$new_length."\t".$new_type."\t".$new_haplo."\t".$new_END."\t".$new_DR."\t".$new_DV."\t".$new_REF."\t".$new_ALT; 1720 | $SVs{$list2[0]}{$new_pos} = $line2; 1721 | $count{$chr}{$new_pos} = $count_tmp+1; 1722 | $svim{$chr}{$new_pos} = $line; 1723 | 1724 | if ($hapi eq "1/1") 1725 | { 1726 | $count{$chr}{$new_pos} = $count_tools; 1727 | } 1728 | 1729 | if ($new_pos eq $pos) 1730 | { 1731 | if (exists($pbsv{$chr}{$pos_tmp})) 1732 | { 1733 | my $tmp = $pbsv{$chr}{$pos_tmp}; 1734 | delete $pbsv{$chr}{$pos_tmp}; 1735 | $pbsv{$chr}{$new_pos} = $tmp; 1736 | } 1737 | if (exists($cutesv{$chr}{$pos_tmp})) 1738 | { 1739 | my $tmp = $cutesv{$chr}{$pos_tmp}; 1740 | delete $cutesv{$chr}{$pos_tmp}; 1741 | $cutesv{$chr}{$new_pos} = $tmp; 1742 | } 1743 | if (exists($sniffles{$chr}{$pos_tmp})) 1744 | { 1745 | my $tmp = $sniffles{$chr}{$pos_tmp}; 1746 | delete $sniffles{$chr}{$pos_tmp}; 1747 | $sniffles{$chr}{$new_pos} = $tmp; 1748 | } 1749 | if (exists($nanovar{$chr}{$pos_tmp})) 1750 | { 1751 | my $tmp = $nanovar{$chr}{$pos_tmp}; 1752 | delete $nanovar{$chr}{$pos_tmp}; 1753 | $nanovar{$chr}{$new_pos} = $tmp; 1754 | } 1755 | if (exists($svim{$chr}{$pos_tmp})) 1756 | { 1757 | my $tmp = $svim{$chr}{$pos_tmp}; 1758 | delete $svim{$chr}{$pos_tmp}; 1759 | $svim{$chr}{$new_pos} = $tmp; 1760 | } 1761 | } 1762 | next SVIM; 1763 | } 1764 | elsif ($min eq '1') 1765 | { 1766 | $min = '-1'; 1767 | if ($score > $score_prev) 1768 | { 1769 | $score_prev = $score; 1770 | $pos_prev = $pos_tmp; 1771 | } 1772 | goto POS_ALMOST2ga; 1773 | } 1774 | else 1775 | { 1776 | $min = '1'; 1777 | if ($score > $score_prev) 1778 | { 1779 | $score_prev = $score; 1780 | $pos_prev = $pos_tmp; 1781 | } 1782 | } 1783 | $v++; 1784 | goto POS_ALMOST2ga; 1785 | } 1786 | } 1787 | elsif ($min eq '1') 1788 | { 1789 | $min = '-1'; 1790 | goto POS_ALMOST2g; 1791 | } 1792 | else 1793 | { 1794 | $min = '1'; 1795 | } 1796 | $v++; 1797 | } 1798 | } 1799 | 1800 | $svim{$chr}{$list[1]} = $line; 1801 | $SVs{$chr}{$list[1]} = $converted_line; 1802 | if ($hapi eq "1/1") 1803 | { 1804 | $count{$chr}{$list[1]} = $count_tools; 1805 | } 1806 | else 1807 | { 1808 | $count{$chr}{$list[1]} = '1'; 1809 | } 1810 | } 1811 | } 1812 | } 1813 | } 1814 | undef %pbsv2; 1815 | undef %cutesv2; 1816 | undef %sniffles2; 1817 | undef %nanovar2; 1818 | 1819 | my %nanosv; 1820 | my $nanosv_converted = ""; 1821 | if ($input_nanosv ne "") 1822 | { 1823 | NANOSV: while (my $line = ) 1824 | { 1825 | chomp($line); 1826 | my $first_nuc = substr $line, 0, 1; 1827 | if ($first_nuc ne "#") 1828 | { 1829 | my @list = split /\t/, $line; 1830 | my $pos = $list[1]; 1831 | my $REF = $list[4]; 1832 | my $ALT = $list[4]; 1833 | my $length; 1834 | my $type; 1835 | my $v = '1'; 1836 | my $min = '1'; 1837 | my $END = ""; 1838 | 1839 | my $qual = $list[6]; 1840 | my @qual = split /;/, $qual; 1841 | my $info = $list[7]; 1842 | my @info = split /;/, $info; 1843 | my @HAP = split /:/, $list[9]; 1844 | my @HAP_REF = split /,/, $HAP[1]; 1845 | my @HAP2 = split /,/, $HAP[2]; 1846 | 1847 | foreach my $info_tmp (@info) 1848 | { 1849 | my $first_five = substr $info_tmp, 0, 5; 1850 | my $first_four = substr $info_tmp, 0, 4; 1851 | if ($first_five eq "SVLEN") 1852 | { 1853 | $length = substr $info_tmp, 6; 1854 | } 1855 | elsif ($first_five eq "SVTYP") 1856 | { 1857 | $type = substr $info_tmp, 7; 1858 | } 1859 | elsif ($first_four eq "END=") 1860 | { 1861 | $END = substr $info_tmp, 4; 1862 | } 1863 | } 1864 | if ($length eq "") 1865 | { 1866 | $length = "."; 1867 | } 1868 | if ($length < 0) 1869 | { 1870 | $length *= -1; 1871 | } 1872 | my $chr = $list[0]; 1873 | if ($list[0] =~ m/chr(\d+|X|Y)/) 1874 | { 1875 | $chr = $1; 1876 | } 1877 | my $hapi = $HAP[0]; 1878 | 1879 | if ($nanosv_converted eq "yes") 1880 | { 1881 | $pos = $list[1]; 1882 | $length = $list[2]; 1883 | $type = $list[3]; 1884 | $hapi = $list[4]; 1885 | } 1886 | my @depth1 = split /,/, $HAP[1]; 1887 | my @depth2 = split /,/, $HAP[2]; 1888 | my $DR = $depth1[0]; 1889 | my $DV = $depth2[0]; 1890 | 1891 | my $converted_line = $chr."\t".$list[1]."\t".$length."\t".$type."\t".$hapi."\t".$END."\t".$DR."\t".$DV."\t".$REF."\t".$ALT; 1892 | 1893 | my $qual_check = ""; 1894 | my $CIPOS = ""; 1895 | my $CIEND = ""; 1896 | my $mapq = ""; 1897 | my $cluster = ""; 1898 | foreach my $qual_tmp (@qual) 1899 | { 1900 | if ($qual_tmp eq "LowQual") 1901 | { 1902 | $qual_check = "no"; 1903 | } 1904 | if ($qual_tmp eq "MapQual") 1905 | { 1906 | $mapq = "yes"; 1907 | } 1908 | if ($qual_tmp eq "SVcluster") 1909 | { 1910 | $cluster = "yes"; 1911 | } 1912 | if ($qual_tmp eq "CIPOS") 1913 | { 1914 | $CIPOS = "yes"; 1915 | } 1916 | if ($qual_tmp eq "CIEND") 1917 | { 1918 | $CIEND = "yes"; 1919 | } 1920 | } 1921 | if ($CIEND ne "yes" && $CIPOS ne "yes" && $mapq eq "yes" && $cluster eq "yes") 1922 | { 1923 | $qual_check = "no"; 1924 | } 1925 | 1926 | if (($HAP2[1] >= $min_coverage+1 ) && ($HAP2[0]/($HAP_REF[0]+$HAP2[0]) > 0.2 || $HAP_REF[0] eq ".") && $hapi ne "0/0" && ($length eq "." || $length > 45) && $qual_check ne "no") 1927 | { 1928 | my $prev_match_pbsv = ""; 1929 | my $prev_match_cutesv = ""; 1930 | my $prev_match_sniffles = ""; 1931 | my $prev_match_nanovar = ""; 1932 | my $prev_match_svim = ""; 1933 | 1934 | if (exists($pbsv2{$chr}{$list[1]})) 1935 | { 1936 | $prev_match_pbsv = "yes"; 1937 | } 1938 | if (exists($cutesv2{$chr}{$list[1]})) 1939 | { 1940 | $prev_match_cutesv = "yes"; 1941 | } 1942 | if (exists($sniffles2{$chr}{$list[1]})) 1943 | { 1944 | $prev_match_sniffles = "yes"; 1945 | } 1946 | if (exists($nanovar2{$chr}{$list[1]})) 1947 | { 1948 | $prev_match_nanovar = "yes"; 1949 | } 1950 | if (exists($svim2{$chr}{$list[1]})) 1951 | { 1952 | $prev_match_svim = "yes"; 1953 | } 1954 | 1955 | if ((exists($pbsv{$chr}{$list[1]}) && $prev_match_pbsv eq "") || (exists($cutesv{$chr}{$list[1]}) && $prev_match_cutesv eq "") || 1956 | (exists($sniffles{$chr}{$list[1]}) && $prev_match_sniffles eq "") || (exists($nanovar{$chr}{$list[1]}) && $prev_match_nanovar eq "") 1957 | || (exists($svim{$chr}{$list[1]}) && $prev_match_svim eq "")) 1958 | 1959 | { 1960 | if (exists($pbsv{$chr}{$list[1]})) 1961 | { 1962 | $pbsv2{$chr}{$list[1]} = undef; 1963 | } 1964 | if (exists($cutesv{$chr}{$list[1]})) 1965 | { 1966 | $cutesv2{$chr}{$list[1]} = undef; 1967 | } 1968 | if (exists($sniffles{$chr}{$list[1]})) 1969 | { 1970 | $sniffles2{$chr}{$list[1]} = undef; 1971 | } 1972 | if (exists($nanovar{$chr}{$list[1]})) 1973 | { 1974 | $nanovar2{$chr}{$list[1]} = undef; 1975 | } 1976 | if (exists($svim{$chr}{$list[1]})) 1977 | { 1978 | $svim2{$chr}{$list[1]} = undef; 1979 | } 1980 | 1981 | my $count_tmp = $count{$chr}{$list[1]}; 1982 | my @list3 = split /\t/, $SVs{$chr}{$list[1]}; 1983 | 1984 | if ($list3[3] eq "DEL" && exists($sniffles{$list[0]}{$list[1]}) && $high_recall eq "1") 1985 | { 1986 | } 1987 | else 1988 | { 1989 | my @list2 = split /\t/, $SVs{$chr}{$pos}; 1990 | my $new_pos = $pos; 1991 | my $new_length = $list2[2]; 1992 | my $new_type = $list2[3]; 1993 | my $new_haplo = $list2[4]; 1994 | my $new_END = $list2[5]; 1995 | my $new_DR = $list2[6]; 1996 | my $new_DV = $list2[7]; 1997 | my $new_REF = $list2[8]; 1998 | my $new_ALT = $list2[9]; 1999 | 2000 | if (exists($nanovar{$chr}{$pos})) 2001 | {} 2002 | elsif ($list2[3] eq "INV" && $length ne ".") 2003 | { 2004 | $new_length = $length; 2005 | $new_END = $END; 2006 | $new_ALT = $ALT; 2007 | $new_REF = $REF; 2008 | } 2009 | if (exists($pbsv{$chr}{$list[1]})) 2010 | {} 2011 | elsif ($hapi ne "./.") 2012 | { 2013 | $new_haplo = $hapi; 2014 | if ($new_DR eq '.' || $new_DR eq "") 2015 | { 2016 | $new_DR = $DR; 2017 | } 2018 | if ($new_DV eq '.' || $new_DV eq "") 2019 | { 2020 | $new_DV = $DV; 2021 | } 2022 | } 2023 | 2024 | delete $SVs{$chr}{$pos}; 2025 | delete $count{$chr}{$pos}; 2026 | my $line2 = $list2[0]."\t".$new_pos."\t".$new_length."\t".$new_type."\t".$new_haplo."\t".$new_END."\t".$new_DR."\t".$new_DV."\t".$new_REF."\t".$new_ALT; 2027 | $SVs{$list2[0]}{$new_pos} = $line2; 2028 | $count{$chr}{$new_pos} = $count_tmp+1; 2029 | $nanosv{$chr}{$new_pos} = $line; 2030 | } 2031 | next NANOSV; 2032 | } 2033 | else 2034 | { 2035 | my $score_prev = ""; 2036 | my $pos_prev = ""; 2037 | POS_ALMOST2ha: 2038 | while ($v < $length_match) 2039 | { 2040 | POS_ALMOST2h: my $pos_tmp = ($min*$v)+$pos; 2041 | 2042 | my $prev_match_pbsv2 = ""; 2043 | my $prev_match_cutesv2 = ""; 2044 | my $prev_match_sniffles2 = ""; 2045 | my $prev_match_nanovar2 = ""; 2046 | my $prev_match_svim2 = ""; 2047 | 2048 | if (exists($pbsv2{$chr}{$pos_tmp})) 2049 | { 2050 | $prev_match_pbsv2 = "yes"; 2051 | } 2052 | if (exists($cutesv2{$chr}{$pos_tmp})) 2053 | { 2054 | $prev_match_cutesv2 = "yes"; 2055 | } 2056 | if (exists($sniffles2{$chr}{$pos_tmp})) 2057 | { 2058 | $prev_match_sniffles2 = "yes"; 2059 | } 2060 | if (exists($nanovar2{$chr}{$pos_tmp})) 2061 | { 2062 | $prev_match_nanovar2 = "yes"; 2063 | } 2064 | if (exists($svim2{$chr}{$pos_tmp})) 2065 | { 2066 | $prev_match_svim2 = "yes"; 2067 | } 2068 | 2069 | if ((exists($pbsv{$chr}{$pos_tmp}) && $prev_match_pbsv2 eq "") || (exists($cutesv{$chr}{$pos_tmp}) && $prev_match_cutesv2 eq "") || 2070 | (exists($sniffles{$chr}{$pos_tmp}) && $prev_match_sniffles2 eq "") || (exists($nanovar{$chr}{$pos_tmp}) && $prev_match_nanovar2 eq "") 2071 | || (exists($svim{$chr}{$pos_tmp}) && $prev_match_svim2 eq "")) 2072 | { 2073 | my $count_tmp = $count{$chr}{$pos_tmp}; 2074 | my @list3 = split /\t/, $SVs{$chr}{$list[1]}; 2075 | 2076 | if ($list3[3] eq "DEL" && exists($sniffles{$list[0]}{$pos_tmp}) && $high_recall eq "1") 2077 | { 2078 | } 2079 | else 2080 | { 2081 | if ($count_tmp > 20) 2082 | { 2083 | if ($min eq '1') 2084 | { 2085 | $min = '-1'; 2086 | goto POS_ALMOST2h; 2087 | } 2088 | else 2089 | { 2090 | $min = '1'; 2091 | } 2092 | } 2093 | else 2094 | { 2095 | my @list2 = split /\t/, $SVs{$chr}{$pos_tmp}; 2096 | 2097 | my $score = '0'; 2098 | if ($list2[3] eq $type || ($type eq "INS" && $list2[3] eq "DUP") || ($type eq "DUP" && $list2[3] eq "INS")) 2099 | { 2100 | $score += 1; 2101 | } 2102 | my $g = (800-$v)/800; 2103 | $score += $g; 2104 | my $score_tmp = '0'; 2105 | if ($list2[2] ne "." && $length ne ".") 2106 | { 2107 | if ($list2[2] >= $length) 2108 | { 2109 | $score_tmp = 1-(($list2[2]-$length)/$length); 2110 | if ($score_tmp < 0) 2111 | { 2112 | $score_tmp = '0'; 2113 | } 2114 | } 2115 | else 2116 | { 2117 | $score_tmp = 1-(($length-$list2[2])/$length); 2118 | if ($score_tmp < 0) 2119 | { 2120 | $score_tmp = '0'; 2121 | } 2122 | } 2123 | } 2124 | $score += $score_tmp; 2125 | 2126 | if ($score_prev ne "" && $score_prev > $score && $score_prev ne "no") 2127 | { 2128 | $pos_tmp = $pos_prev; 2129 | $count_tmp = $count{$chr}{$pos_tmp}; 2130 | @list2 = split /\t/, $SVs{$chr}{$pos_tmp}; 2131 | } 2132 | 2133 | if ($score > 1.5 || ($score_prev ne "" && $score_prev > $score) || $score_prev eq "no") 2134 | { 2135 | if (exists($pbsv{$chr}{$pos_tmp})) 2136 | { 2137 | $pbsv2{$chr}{$pos_tmp} = undef; 2138 | } 2139 | if (exists($cutesv{$chr}{$pos_tmp})) 2140 | { 2141 | $cutesv2{$chr}{$pos_tmp} = undef; 2142 | } 2143 | if (exists($sniffles{$chr}{$pos_tmp})) 2144 | { 2145 | $sniffles2{$chr}{$pos_tmp} = undef; 2146 | } 2147 | if (exists($nanovar{$chr}{$pos_tmp})) 2148 | { 2149 | $nanovar2{$chr}{$pos_tmp} = undef; 2150 | } 2151 | if (exists($svim{$chr}{$pos_tmp})) 2152 | { 2153 | $svim2{$chr}{$pos_tmp} = undef; 2154 | } 2155 | 2156 | my $new_pos = $pos_tmp; 2157 | my $new_length = $list2[2]; 2158 | my $new_type = $list2[3]; 2159 | my $new_haplo = $list2[4]; 2160 | my $new_END = $list2[5]; 2161 | my $new_DR = $list2[6]; 2162 | my $new_DV = $list2[7]; 2163 | my $new_REF = $list2[8]; 2164 | my $new_ALT = $list2[9]; 2165 | 2166 | if (exists($nanovar{$chr}{$pos_tmp})) 2167 | {} 2168 | elsif ($list2[3] eq "INV" && $length ne ".") 2169 | { 2170 | $new_length = $length; 2171 | $new_END = $END; 2172 | $new_ALT = $ALT; 2173 | $new_REF = $REF; 2174 | } 2175 | if (exists($pbsv{$chr}{$list[1]})) 2176 | {} 2177 | elsif ($hapi ne "./.") 2178 | { 2179 | $new_haplo = $hapi; 2180 | if ($new_DR eq '.' || $new_DR eq "") 2181 | { 2182 | $new_DR = $DR; 2183 | } 2184 | if ($new_DV eq '.' || $new_DV eq "") 2185 | { 2186 | $new_DV = $DV; 2187 | } 2188 | } 2189 | if (exists($pbsv{$chr}{$list[1]})) 2190 | {} 2191 | else 2192 | { 2193 | $new_pos = $pos; 2194 | } 2195 | 2196 | delete $SVs{$chr}{$pos_tmp}; 2197 | delete $count{$chr}{$pos_tmp}; 2198 | my $line2 = $list2[0]."\t".$new_pos."\t".$new_length."\t".$new_type."\t".$new_haplo."\t".$new_END."\t".$new_DR."\t".$new_DV."\t".$new_REF."\t".$new_ALT; 2199 | $SVs{$list2[0]}{$new_pos} = $line2; 2200 | $count{$chr}{$new_pos} = $count_tmp+1; 2201 | $nanosv{$chr}{$new_pos} = $line; 2202 | 2203 | if ($new_pos eq $pos) 2204 | { 2205 | if (exists($pbsv{$chr}{$pos_tmp})) 2206 | { 2207 | my $tmp = $pbsv{$chr}{$pos_tmp}; 2208 | delete $pbsv{$chr}{$pos_tmp}; 2209 | $pbsv{$chr}{$new_pos} = $tmp; 2210 | } 2211 | if (exists($cutesv{$chr}{$pos_tmp})) 2212 | { 2213 | my $tmp = $cutesv{$chr}{$pos_tmp}; 2214 | delete $cutesv{$chr}{$pos_tmp}; 2215 | $cutesv{$chr}{$new_pos} = $tmp; 2216 | } 2217 | if (exists($sniffles{$chr}{$pos_tmp})) 2218 | { 2219 | my $tmp = $sniffles{$chr}{$pos_tmp}; 2220 | delete $sniffles{$chr}{$pos_tmp}; 2221 | $sniffles{$chr}{$new_pos} = $tmp; 2222 | } 2223 | if (exists($nanovar{$chr}{$pos_tmp})) 2224 | { 2225 | my $tmp = $nanovar{$chr}{$pos_tmp}; 2226 | delete $nanovar{$chr}{$pos_tmp}; 2227 | $nanovar{$chr}{$new_pos} = $tmp; 2228 | } 2229 | if (exists($svim{$chr}{$pos_tmp})) 2230 | { 2231 | my $tmp = $svim{$chr}{$pos_tmp}; 2232 | delete $svim{$chr}{$pos_tmp}; 2233 | $svim{$chr}{$new_pos} = $tmp; 2234 | } 2235 | if (exists($nanosv{$chr}{$pos_tmp})) 2236 | { 2237 | my $tmp = $nanosv{$chr}{$pos_tmp}; 2238 | delete $nanosv{$chr}{$pos_tmp}; 2239 | $nanosv{$chr}{$new_pos} = $tmp; 2240 | } 2241 | } 2242 | 2243 | next NANOSV; 2244 | } 2245 | elsif ($min eq '1') 2246 | { 2247 | $min = '-1'; 2248 | if ($score > $score_prev) 2249 | { 2250 | $score_prev = $score; 2251 | $pos_prev = $pos_tmp; 2252 | } 2253 | goto POS_ALMOST2ha; 2254 | } 2255 | else 2256 | { 2257 | $min = '1'; 2258 | if ($score > $score_prev) 2259 | { 2260 | $score_prev = $score; 2261 | $pos_prev = $pos_tmp; 2262 | } 2263 | } 2264 | $v++; 2265 | goto POS_ALMOST2ha; 2266 | } 2267 | } 2268 | } 2269 | elsif ($min eq '1') 2270 | { 2271 | $min = '-1'; 2272 | goto POS_ALMOST2h; 2273 | } 2274 | else 2275 | { 2276 | $min = '1'; 2277 | } 2278 | $v++; 2279 | } 2280 | } 2281 | } 2282 | } 2283 | } 2284 | } 2285 | 2286 | 2287 | #Print SVs------------------------------------------------------------------ 2288 | 2289 | my $output_sniffles = $dir."Sniffles_".$filename.$suffix; 2290 | if ($input_sniffles ne "") 2291 | { 2292 | open(SNIFFLES, ">" .$output_sniffles) or die "\nCan't open file $output_sniffles, $!\n"; 2293 | } 2294 | my $output_pbsv = $dir."pbsv_".$filename.$suffix; 2295 | if ($input_pbsv ne "") 2296 | { 2297 | open(PBSV, ">" .$output_pbsv) or die "\nCan't open file $output_pbsv, $!\n"; 2298 | } 2299 | my $output_nanovar = $dir."NanoVar_".$filename.$suffix; 2300 | if ($input_nanovar ne "") 2301 | { 2302 | open(NANOVAR, ">" .$output_nanovar) or die "\nCan't open file $output_nanovar, $!\n"; 2303 | } 2304 | my $output_svim = $dir."SVIM_".$filename.$suffix; 2305 | if ($input_svim ne "") 2306 | { 2307 | open(SVIM, ">" .$output_svim) or die "\nCan't open file $output_svim, $!\n"; 2308 | } 2309 | my $output_nanosv = $dir."NanoSV_".$filename.$suffix; 2310 | if ($input_nanosv ne "") 2311 | { 2312 | open(NANOSV, ">" .$output_nanosv) or die "\nCan't open file $output_nanosv, $!\n"; 2313 | } 2314 | my $output_cutesv = $dir."cuteSV_".$filename.$suffix; 2315 | if ($input_cutesv ne "") 2316 | { 2317 | open(CUTESV, ">" .$output_cutesv) or die "\nCan't open file $output_cutesv, $!\n"; 2318 | } 2319 | 2320 | my $datetime = localtime(); 2321 | print COMBINED2 "##fileformat=VCFv4.2\n"; 2322 | print COMBINED2 "##source=combiSV-v2.3\n"; 2323 | print COMBINED2 "##fileDate=".$datetime."\n"; 2324 | if ($input_cutesv ne "") 2325 | { 2326 | open(INPUT_CUTESV2, $input_cutesv) or die "\n\nCan't open cuteSV's vcf file $input_cutesv, $!\n\n"; 2327 | while (my $line = ) 2328 | { 2329 | chomp($line); 2330 | my $check_text = substr $line, 0, 9; 2331 | if ($check_text eq "##contig=") 2332 | { 2333 | print COMBINED2 $line."\n"; 2334 | } 2335 | } 2336 | close INPUT_CUTESV2; 2337 | } 2338 | 2339 | print COMBINED2 "##ALT=\n"; 2340 | print COMBINED2 "##ALT=\n"; 2341 | print COMBINED2 "##ALT=\n"; 2342 | print COMBINED2 "##ALT=\n"; 2343 | ##ALT= 2344 | print COMBINED2 "##INFO=\n"; 2345 | print COMBINED2 "##INFO=\n"; 2346 | print COMBINED2 "##INFO=\n"; 2347 | print COMBINED2 "##INFO=\n"; 2348 | print COMBINED2 "##FORMAT=\n"; 2349 | print COMBINED2 "##FORMAT=\n"; 2350 | print COMBINED2 "##FORMAT=\n"; 2351 | print COMBINED2 "#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\tFORMAT\tSample\n"; 2352 | 2353 | print COMBINED "#CHROM\tPOS\tSVLENGTH\tTYPE\tVARHAP\n"; 2354 | my %no_number; 2355 | 2356 | my $INS_count = '0'; 2357 | my $DEL_count = '0'; 2358 | my $DUP_count = '0'; 2359 | my $INV_count = '0'; 2360 | my $BND_count = '0'; 2361 | my $id_count = '1'; 2362 | 2363 | foreach my $chr2 (sort {$a <=> $b} keys %SVs) 2364 | { 2365 | if ($chr2 =~ m/^\d+$/) 2366 | { 2367 | foreach my $pos2 (sort {$a <=> $b} keys %{$SVs{$chr2}}) 2368 | { 2369 | my $SV_callers = ""; 2370 | my @list = split /\t/, $SVs{$chr2}{$pos2}; 2371 | if ($count{$chr2}{$pos2} >= $count_tools || ($count{$chr2}{$pos2} > 1 && $list[3] eq "INV" && $list[4] eq "1/1")) 2372 | { 2373 | $count++; 2374 | if ($list[3] eq "INS") 2375 | { 2376 | $INS_count++; 2377 | } 2378 | elsif ($list[3] eq "DEL") 2379 | { 2380 | $DEL_count++; 2381 | } 2382 | elsif ($list[3] eq "DUP") 2383 | { 2384 | $DUP_count++; 2385 | } 2386 | elsif ($list[3] eq "INV") 2387 | { 2388 | $INV_count++; 2389 | } 2390 | elsif ($list[3] eq "BND") 2391 | { 2392 | $BND_count++; 2393 | } 2394 | 2395 | if ($input_sniffles ne "") 2396 | { 2397 | if (exists($sniffles{$chr2}{$pos2})) 2398 | { 2399 | print SNIFFLES $sniffles{$chr2}{$pos2}."\n"; 2400 | $SV_callers = "Sniffles"; 2401 | } 2402 | } 2403 | if ($input_pbsv ne "") 2404 | { 2405 | if (exists($pbsv{$chr2}{$pos2})) 2406 | { 2407 | print PBSV $pbsv{$chr2}{$pos2}."\n"; 2408 | if ($SV_callers eq "") 2409 | { 2410 | $SV_callers = "pbsv"; 2411 | } 2412 | else 2413 | { 2414 | $SV_callers .= ",pbsv"; 2415 | } 2416 | } 2417 | } 2418 | if ($input_cutesv ne "") 2419 | { 2420 | if (exists($cutesv{$chr2}{$pos2})) 2421 | { 2422 | print CUTESV $cutesv{$chr2}{$pos2}."\n"; 2423 | if ($SV_callers eq "") 2424 | { 2425 | $SV_callers = "cutesv"; 2426 | } 2427 | else 2428 | { 2429 | $SV_callers .= ",cutesv"; 2430 | } 2431 | } 2432 | } 2433 | if ($input_nanovar ne "") 2434 | { 2435 | if (exists($nanovar{$chr2}{$pos2})) 2436 | { 2437 | print NANOVAR $nanovar{$chr2}{$pos2}."\n"; 2438 | if ($SV_callers eq "") 2439 | { 2440 | $SV_callers = "NanoVar"; 2441 | } 2442 | else 2443 | { 2444 | $SV_callers .= ",NanoVar"; 2445 | } 2446 | } 2447 | } 2448 | if ($input_svim ne "") 2449 | { 2450 | if (exists($svim{$chr2}{$pos2})) 2451 | { 2452 | print SVIM $svim{$chr2}{$pos2}."\n"; 2453 | if ($SV_callers eq "") 2454 | { 2455 | $SV_callers = "SVIM"; 2456 | } 2457 | else 2458 | { 2459 | $SV_callers .= ",SVIM"; 2460 | } 2461 | } 2462 | } 2463 | if ($input_nanosv ne "") 2464 | { 2465 | if (exists($nanosv{$chr2}{$pos2})) 2466 | { 2467 | print NANOSV $nanosv{$chr2}{$pos2}."\n"; 2468 | if ($SV_callers eq "") 2469 | { 2470 | $SV_callers = "NanoSV"; 2471 | } 2472 | else 2473 | { 2474 | $SV_callers .= ",NanoSV"; 2475 | } 2476 | } 2477 | } 2478 | 2479 | my @simplified_line = split /\t/, $SVs{$chr2}{$pos2}; 2480 | print COMBINED $simplified_line[0]."\t".$simplified_line[1]."\t".$simplified_line[2]."\t".$simplified_line[3]."\t".$simplified_line[4]."\n"; 2481 | my @list_tmp = split /\t/, $SVs{$chr2}{$pos2}; 2482 | print COMBINED2 $list_tmp[0]."\t".$list_tmp[1]."\tid.".$id_count."\t".$list_tmp[8]."\t".$list_tmp[9]."\t.\tPASS\tSVTYPE=".$list_tmp[3]. 2483 | ";SVLEN=".$list_tmp[2].";END=".$list_tmp[5].";SVCALLERS=".$SV_callers."\tGT:DR:DV\t".$list_tmp[4].":".$list_tmp[6].":".$list_tmp[7]."\n"; 2484 | 2485 | $id_count++; 2486 | delete $SVs{$chr2}{$pos2}; 2487 | } 2488 | } 2489 | } 2490 | } 2491 | foreach my $chr2 (sort {$a <=> $b} keys %SVs) 2492 | { 2493 | foreach my $pos2 (sort {$a <=> $b} keys %{$SVs{$chr2}}) 2494 | { 2495 | my $SV_callers = ""; 2496 | my @list = split /\t/, $SVs{$chr2}{$pos2}; 2497 | if ($count{$chr2}{$pos2} >= $count_tools || ($count{$chr2}{$pos2} > 1 && $list[3] eq "INV" && $list[4] eq "1/1")) 2498 | { 2499 | $count++; 2500 | 2501 | if ($list[3] eq "INS") 2502 | { 2503 | $INS_count++; 2504 | } 2505 | elsif ($list[3] eq "DEL") 2506 | { 2507 | $DEL_count++; 2508 | } 2509 | elsif ($list[3] eq "DUP") 2510 | { 2511 | $DUP_count++; 2512 | } 2513 | elsif ($list[3] eq "INV") 2514 | { 2515 | $INV_count++; 2516 | } 2517 | elsif ($list[3] eq "BND") 2518 | { 2519 | $BND_count++; 2520 | } 2521 | 2522 | if ($input_sniffles ne "") 2523 | { 2524 | if (exists($sniffles{$chr2}{$pos2})) 2525 | { 2526 | print SNIFFLES $sniffles{$chr2}{$pos2}."\n"; 2527 | $SV_callers = "Sniffles"; 2528 | } 2529 | } 2530 | if ($input_pbsv ne "") 2531 | { 2532 | if (exists($pbsv{$chr2}{$pos2})) 2533 | { 2534 | print PBSV $pbsv{$chr2}{$pos2}."\n"; 2535 | if ($SV_callers eq "") 2536 | { 2537 | $SV_callers = "pbsv"; 2538 | } 2539 | else 2540 | { 2541 | $SV_callers .= ",pbsv"; 2542 | } 2543 | } 2544 | } 2545 | if ($input_cutesv ne "") 2546 | { 2547 | if (exists($cutesv{$chr2}{$pos2})) 2548 | { 2549 | print CUTESV $cutesv{$chr2}{$pos2}."\n"; 2550 | if ($SV_callers eq "") 2551 | { 2552 | $SV_callers = "cutesv"; 2553 | } 2554 | else 2555 | { 2556 | $SV_callers .= ",cutesv"; 2557 | } 2558 | } 2559 | } 2560 | if ($input_nanovar ne "") 2561 | { 2562 | if (exists($nanovar{$chr2}{$pos2})) 2563 | { 2564 | print NANOVAR $nanovar{$chr2}{$pos2}."\n"; 2565 | if ($SV_callers eq "") 2566 | { 2567 | $SV_callers = "NanoVar"; 2568 | } 2569 | else 2570 | { 2571 | $SV_callers .= ",NanoVar"; 2572 | } 2573 | } 2574 | } 2575 | if ($input_svim ne "") 2576 | { 2577 | if (exists($svim{$chr2}{$pos2})) 2578 | { 2579 | print SVIM $svim{$chr2}{$pos2}."\n"; 2580 | if ($SV_callers eq "") 2581 | { 2582 | $SV_callers = "SVIM"; 2583 | } 2584 | else 2585 | { 2586 | $SV_callers .= ",SVIM"; 2587 | } 2588 | } 2589 | } 2590 | if ($input_nanosv ne "") 2591 | { 2592 | if (exists($nanosv{$chr2}{$pos2})) 2593 | { 2594 | print NANOSV $nanosv{$chr2}{$pos2}."\n"; 2595 | if ($SV_callers eq "") 2596 | { 2597 | $SV_callers = "NanoSV"; 2598 | } 2599 | else 2600 | { 2601 | $SV_callers .= ",NanoSV"; 2602 | } 2603 | } 2604 | } 2605 | 2606 | my @simplified_line = split /\t/, $SVs{$chr2}{$pos2}; 2607 | print COMBINED $simplified_line[0]."\t".$simplified_line[1]."\t".$simplified_line[2]."\t".$simplified_line[3]."\t".$simplified_line[4]."\n"; 2608 | my @list_tmp = split /\t/, $SVs{$chr2}{$pos2}; 2609 | print COMBINED2 $list_tmp[0]."\t".$list_tmp[1]."\tid.".$id_count."\t".$list_tmp[8]."\t".$list_tmp[9]."\t.\tPASS\tSVTYPE=".$list_tmp[3]. 2610 | ";SVLEN=".$list_tmp[2].";END=".$list_tmp[5].";SVCALLERS=".$SV_callers."\tGT:DR:DV\t".$list_tmp[4].":".$list_tmp[6].":".$list_tmp[7]."\n"; 2611 | 2612 | $id_count++; 2613 | delete $SVs{$chr2}{$pos2}; 2614 | } 2615 | } 2616 | } 2617 | print "Combined SVs : ".$count."\n"; 2618 | print "Insertions : ".$INS_count."\n"; 2619 | print "Deletions : ".$DEL_count."\n"; 2620 | print "Duplications : ".$DUP_count."\n"; 2621 | print "Inversions : ".$INV_count."\n"; 2622 | print "BND : ".$BND_count."\n\n"; 2623 | 2624 | 2625 | close SNIFFLES; 2626 | close PBSV; 2627 | close CUTESV; 2628 | close NANOVAR; 2629 | close SVIM; 2630 | close NANOSV; 2631 | close INPUT_SNIFFLES; 2632 | close INPUT_CUTESV; 2633 | close INPUT_SVIM; 2634 | close INPUT_NANOVAR; 2635 | close INPUT_PBSV; 2636 | close COMBINED; 2637 | close COMBINED2; 2638 | close INPUT_NANOSV; 2639 | --------------------------------------------------------------------------------