├── README.md
├── LICENSE
└── combiSV2.3.pl
/README.md:
--------------------------------------------------------------------------------
1 | # combiSV
2 |
3 | Combine structural variation outputs from long sequencing reads into a superior call set
4 |
5 | **Last updates: 22/04/22 version 2.3**
6 | - Includes now the REF and ALT sequences in the combined VCF
7 | **22/04/22**
8 | - combiSV now reports the END position and the allele depth calls (DR and DV), was needed to be compatible with SVanna
9 | **09/11/21**
10 | - Only Sniffles, pbsv, SVIM or cuteSV are mandatory to run combiSV
11 | - Improved precision
12 |
13 | ### Getting help
14 |
15 | Any issues/requests/problems/comments that are not yet addressed on this page can be posted on [Github issues](https://github.com/ndierckx/Sim-it/issues) and I will try to reply the same day.
16 |
17 | Or you can contact me directly through the following email address:
18 |
19 | nicolasdierckxsens at hotmail dot com
20 |
21 |
22 | ### Cite
23 |
24 | Dierckxsens, N., Li, T., Vermeesch, J.R. et al. A benchmark of structural variation detection by long reads through a realistic simulated model. Genome Biology, 22, 342 (2021). https://doi.org/10.1186/s13059-021-02551-4
25 |
26 |
27 | ### Prerequisites
28 |
29 | Perl
30 |
31 | ### Instructions
32 |
33 | Usage:
34 |
35 | perl combiSV2.1.pl -pbsv -sniffles -cutesv -nanovar -svim -nanosv -c -o
36 |
37 |
38 | ### Output
39 |
40 | #### 1. output_name.vcf:
41 | This is the combined standard vcf output
42 |
43 | #### 2. simplified_output_name.vcf:
44 | This is a simplified vcf output that can be used as input for Sim-it
45 |
46 | #### 3. SVIM/Sniffles/pbsv/NanoVar/NanoSV_output_name.vcf:
47 | For each VCF input, an output file of the SVs that were retained is given.
48 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | Apache License
2 | Version 2.0, January 2004
3 | http://www.apache.org/licenses/
4 |
5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6 |
7 | 1. Definitions.
8 |
9 | "License" shall mean the terms and conditions for use, reproduction,
10 | and distribution as defined by Sections 1 through 9 of this document.
11 |
12 | "Licensor" shall mean the copyright owner or entity authorized by
13 | the copyright owner that is granting the License.
14 |
15 | "Legal Entity" shall mean the union of the acting entity and all
16 | other entities that control, are controlled by, or are under common
17 | control with that entity. For the purposes of this definition,
18 | "control" means (i) the power, direct or indirect, to cause the
19 | direction or management of such entity, whether by contract or
20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the
21 | outstanding shares, or (iii) beneficial ownership of such entity.
22 |
23 | "You" (or "Your") shall mean an individual or Legal Entity
24 | exercising permissions granted by this License.
25 |
26 | "Source" form shall mean the preferred form for making modifications,
27 | including but not limited to software source code, documentation
28 | source, and configuration files.
29 |
30 | "Object" form shall mean any form resulting from mechanical
31 | transformation or translation of a Source form, including but
32 | not limited to compiled object code, generated documentation,
33 | and conversions to other media types.
34 |
35 | "Work" shall mean the work of authorship, whether in Source or
36 | Object form, made available under the License, as indicated by a
37 | copyright notice that is included in or attached to the work
38 | (an example is provided in the Appendix below).
39 |
40 | "Derivative Works" shall mean any work, whether in Source or Object
41 | form, that is based on (or derived from) the Work and for which the
42 | editorial revisions, annotations, elaborations, or other modifications
43 | represent, as a whole, an original work of authorship. For the purposes
44 | of this License, Derivative Works shall not include works that remain
45 | separable from, or merely link (or bind by name) to the interfaces of,
46 | the Work and Derivative Works thereof.
47 |
48 | "Contribution" shall mean any work of authorship, including
49 | the original version of the Work and any modifications or additions
50 | to that Work or Derivative Works thereof, that is intentionally
51 | submitted to Licensor for inclusion in the Work by the copyright owner
52 | or by an individual or Legal Entity authorized to submit on behalf of
53 | the copyright owner. For the purposes of this definition, "submitted"
54 | means any form of electronic, verbal, or written communication sent
55 | to the Licensor or its representatives, including but not limited to
56 | communication on electronic mailing lists, source code control systems,
57 | and issue tracking systems that are managed by, or on behalf of, the
58 | Licensor for the purpose of discussing and improving the Work, but
59 | excluding communication that is conspicuously marked or otherwise
60 | designated in writing by the copyright owner as "Not a Contribution."
61 |
62 | "Contributor" shall mean Licensor and any individual or Legal Entity
63 | on behalf of whom a Contribution has been received by Licensor and
64 | subsequently incorporated within the Work.
65 |
66 | 2. Grant of Copyright License. Subject to the terms and conditions of
67 | this License, each Contributor hereby grants to You a perpetual,
68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69 | copyright license to reproduce, prepare Derivative Works of,
70 | publicly display, publicly perform, sublicense, and distribute the
71 | Work and such Derivative Works in Source or Object form.
72 |
73 | 3. Grant of Patent License. Subject to the terms and conditions of
74 | this License, each Contributor hereby grants to You a perpetual,
75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76 | (except as stated in this section) patent license to make, have made,
77 | use, offer to sell, sell, import, and otherwise transfer the Work,
78 | where such license applies only to those patent claims licensable
79 | by such Contributor that are necessarily infringed by their
80 | Contribution(s) alone or by combination of their Contribution(s)
81 | with the Work to which such Contribution(s) was submitted. If You
82 | institute patent litigation against any entity (including a
83 | cross-claim or counterclaim in a lawsuit) alleging that the Work
84 | or a Contribution incorporated within the Work constitutes direct
85 | or contributory patent infringement, then any patent licenses
86 | granted to You under this License for that Work shall terminate
87 | as of the date such litigation is filed.
88 |
89 | 4. Redistribution. You may reproduce and distribute copies of the
90 | Work or Derivative Works thereof in any medium, with or without
91 | modifications, and in Source or Object form, provided that You
92 | meet the following conditions:
93 |
94 | (a) You must give any other recipients of the Work or
95 | Derivative Works a copy of this License; and
96 |
97 | (b) You must cause any modified files to carry prominent notices
98 | stating that You changed the files; and
99 |
100 | (c) You must retain, in the Source form of any Derivative Works
101 | that You distribute, all copyright, patent, trademark, and
102 | attribution notices from the Source form of the Work,
103 | excluding those notices that do not pertain to any part of
104 | the Derivative Works; and
105 |
106 | (d) If the Work includes a "NOTICE" text file as part of its
107 | distribution, then any Derivative Works that You distribute must
108 | include a readable copy of the attribution notices contained
109 | within such NOTICE file, excluding those notices that do not
110 | pertain to any part of the Derivative Works, in at least one
111 | of the following places: within a NOTICE text file distributed
112 | as part of the Derivative Works; within the Source form or
113 | documentation, if provided along with the Derivative Works; or,
114 | within a display generated by the Derivative Works, if and
115 | wherever such third-party notices normally appear. The contents
116 | of the NOTICE file are for informational purposes only and
117 | do not modify the License. You may add Your own attribution
118 | notices within Derivative Works that You distribute, alongside
119 | or as an addendum to the NOTICE text from the Work, provided
120 | that such additional attribution notices cannot be construed
121 | as modifying the License.
122 |
123 | You may add Your own copyright statement to Your modifications and
124 | may provide additional or different license terms and conditions
125 | for use, reproduction, or distribution of Your modifications, or
126 | for any such Derivative Works as a whole, provided Your use,
127 | reproduction, and distribution of the Work otherwise complies with
128 | the conditions stated in this License.
129 |
130 | 5. Submission of Contributions. Unless You explicitly state otherwise,
131 | any Contribution intentionally submitted for inclusion in the Work
132 | by You to the Licensor shall be under the terms and conditions of
133 | this License, without any additional terms or conditions.
134 | Notwithstanding the above, nothing herein shall supersede or modify
135 | the terms of any separate license agreement you may have executed
136 | with Licensor regarding such Contributions.
137 |
138 | 6. Trademarks. This License does not grant permission to use the trade
139 | names, trademarks, service marks, or product names of the Licensor,
140 | except as required for reasonable and customary use in describing the
141 | origin of the Work and reproducing the content of the NOTICE file.
142 |
143 | 7. Disclaimer of Warranty. Unless required by applicable law or
144 | agreed to in writing, Licensor provides the Work (and each
145 | Contributor provides its Contributions) on an "AS IS" BASIS,
146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 | implied, including, without limitation, any warranties or conditions
148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 | PARTICULAR PURPOSE. You are solely responsible for determining the
150 | appropriateness of using or redistributing the Work and assume any
151 | risks associated with Your exercise of permissions under this License.
152 |
153 | 8. Limitation of Liability. In no event and under no legal theory,
154 | whether in tort (including negligence), contract, or otherwise,
155 | unless required by applicable law (such as deliberate and grossly
156 | negligent acts) or agreed to in writing, shall any Contributor be
157 | liable to You for damages, including any direct, indirect, special,
158 | incidental, or consequential damages of any character arising as a
159 | result of this License or out of the use or inability to use the
160 | Work (including but not limited to damages for loss of goodwill,
161 | work stoppage, computer failure or malfunction, or any and all
162 | other commercial damages or losses), even if such Contributor
163 | has been advised of the possibility of such damages.
164 |
165 | 9. Accepting Warranty or Additional Liability. While redistributing
166 | the Work or Derivative Works thereof, You may choose to offer,
167 | and charge a fee for, acceptance of support, warranty, indemnity,
168 | or other liability obligations and/or rights consistent with this
169 | License. However, in accepting such obligations, You may act only
170 | on Your own behalf and on Your sole responsibility, not on behalf
171 | of any other Contributor, and only if You agree to indemnify,
172 | defend, and hold each Contributor harmless for any liability
173 | incurred by, or claims asserted against, such Contributor by reason
174 | of your accepting any such warranty or additional liability.
175 |
176 | END OF TERMS AND CONDITIONS
177 |
178 | APPENDIX: How to apply the Apache License to your work.
179 |
180 | To apply the Apache License to your work, attach the following
181 | boilerplate notice, with the fields enclosed by brackets "[]"
182 | replaced with your own identifying information. (Don't include
183 | the brackets!) The text should be enclosed in the appropriate
184 | comment syntax for the file format. We also recommend that a
185 | file or class name and description of purpose be included on the
186 | same "printed page" as the copyright notice for easier
187 | identification within third-party archives.
188 |
189 | Copyright [yyyy] [name of copyright owner]
190 |
191 | Licensed under the Apache License, Version 2.0 (the "License");
192 | you may not use this file except in compliance with the License.
193 | You may obtain a copy of the License at
194 |
195 | http://www.apache.org/licenses/LICENSE-2.0
196 |
197 | Unless required by applicable law or agreed to in writing, software
198 | distributed under the License is distributed on an "AS IS" BASIS,
199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200 | See the License for the specific language governing permissions and
201 | limitations under the License.
202 |
--------------------------------------------------------------------------------
/combiSV2.3.pl:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env perl
2 | ######################################################
3 | # SOFTWARE COPYRIGHT NOTICE AGREEMENT #
4 | # Copyright (C) {2020-2022} {Nicolas Dierckxsens} #
5 | # All Rights Reserved #
6 | # See file LICENSE for details. #
7 | ######################################################
8 | # combiSV 2.3
9 | # nicolasdierckxsens@hotmail.com
10 | use strict;
11 | use Getopt::Long;
12 | use File::Basename;
13 |
14 | print "\n\n-----------------------------------------------";
15 | print "\ncombiSV\n";
16 | print "Version 2.3\n";
17 | print "Author: Nicolas Dierckxsens, (c) 2020-2024\n";
18 | print "-----------------------------------------------\n\n";
19 |
20 | my $input_pbsv = "";
21 | my $input_sniffles = "";
22 | my $input_cutesv = "";
23 | my $input_nanovar = "";
24 | my $input_svim = "";
25 | my $output_file = "";
26 | my $output_file2 = "";
27 | my $high_recall = "";
28 | my $input_nanosv = "";
29 | my $count_tools = "2";
30 | my $min_coverage = '3';
31 | my $length_match = '800';
32 | my $min_SV_length = '50';
33 |
34 | GetOptions (
35 | "pbsv=s" => \$input_pbsv,
36 | "sniffles=s" => \$input_sniffles,
37 | "cutesv=s" => \$input_cutesv,
38 | "nanovar=s" => \$input_nanovar,
39 | "svim=s" => \$input_svim,
40 | "nanosv=s" => \$input_nanosv,
41 | "c=s" => \$min_coverage,
42 | "o=s" => \$output_file,
43 | ) or die "Incorrect usage!\n";
44 |
45 | if ($input_sniffles eq "" && $input_cutesv eq "" && $input_pbsv eq "" && $input_svim eq "")
46 | {
47 | print "\n\nUsage: perl combiSV2.3.pl -pbsv -sniffles -cutesv -nanovar -svim -nanosv \n\n";
48 | print "\nOPTIONAL ARGUMENTS\n";
49 | print "-c minimum coverage of variation allele [default = 3]\n";
50 | print "-o name of the output files\n";
51 | }
52 |
53 | my $total_tools = '0';
54 | if ($min_coverage eq "")
55 | {
56 | $min_coverage = '3';
57 | }
58 | if ($min_coverage =~ m/\d+/)
59 | {
60 | }
61 | else
62 | {
63 | die "\n\nWARNING: minimum coverage of variation allele has to be an integer\n\n";
64 | }
65 | if ($min_coverage < 2)
66 | {
67 | $min_coverage = '2';
68 | print "\n\nWARNING: minimum coverage of variation allele too low, has been set to 2\n\n";
69 | }
70 | if ($input_pbsv ne "")
71 | {
72 | open(INPUT_PBSV, $input_pbsv) or die "\n\nCan't open pbsv's vcf file $input_pbsv, $!\n\n";
73 | $total_tools++;
74 | }
75 | if ($input_sniffles ne "")
76 | {
77 | open(INPUT_SNIFFLES, $input_sniffles) or die "\n\nCan't open Sniffles' vcf file $input_sniffles, $!\n\n";
78 | $total_tools++;
79 | }
80 | if ($input_cutesv ne "")
81 | {
82 | open(INPUT_CUTESV, $input_cutesv) or die "\n\nCan't open cuteSV's vcf file $input_cutesv, $!\n\n";
83 | $total_tools++;
84 | }
85 | if ($input_nanovar ne "")
86 | {
87 | open(INPUT_NANOVAR, $input_nanovar) or die "\n\nCan't open NanoVar's vcf file $input_nanovar, $!\n\n";
88 | $total_tools++;
89 | }
90 | if ($input_svim ne "")
91 | {
92 | open(INPUT_SVIM, $input_svim) or die "\n\nCan't open SVIM's vcf file $input_svim, $!\n\n";
93 | $total_tools++;
94 | }
95 | if ($input_nanosv ne "")
96 | {
97 | open(INPUT_NANOSV, $input_nanosv) or die "\n\nCan't open NanoSV's vcf file $input_nanosv, $!\n\n";
98 | $total_tools++;
99 | }
100 |
101 | if ($input_sniffles eq "" && $input_cutesv eq "" && $input_pbsv eq "" && $input_svim eq "")
102 | {
103 | die "\n\nError: A Sniffles, pbsv, SVIM or cuteSV input is mandatory.\n\n";
104 | }
105 |
106 | if ($output_file eq "")
107 | {
108 | $output_file = "combiSV";
109 | }
110 |
111 | my($filename, $dir, $suffix) = fileparse($output_file, ('.vcf'));
112 |
113 | if ($output_file eq "")
114 | {
115 | $output_file2 = "combiSV.vcf";
116 | $output_file = "simplified_combiSV.vcf";
117 | }
118 | else
119 | {
120 | $suffix = ".vcf";
121 | $output_file2 = $dir.$filename.$suffix;
122 | $output_file = $dir.'simplified_'.$filename.$suffix;
123 | }
124 |
125 | if ($high_recall eq "")
126 | {
127 | $high_recall = "1";
128 | }
129 | if ($high_recall ne "" && $high_recall ne "1" && $high_recall ne "2")
130 | {
131 | die "\n\nIncorrect usage of -s parameter, should be '1' or '2'\n\n";
132 | }
133 |
134 | if ($total_tools > '5')
135 | {
136 | $count_tools = '3';
137 | $high_recall = '2';
138 | }
139 | elsif ($total_tools <= 5)
140 | {
141 | $count_tools = '2';
142 | $high_recall = '2';
143 | }
144 |
145 |
146 | open(COMBINED, ">" .$output_file) or die "\nCan't open file $output_file, $!\n";
147 | open(COMBINED2, ">" .$output_file2) or die "\nCan't open file $output_file2, $!\n";
148 |
149 | my %SVs;
150 | my %count;
151 | my %pbsv;
152 | my %sniffles;
153 | my %cutesv;
154 | my %nanovar;
155 | my %mixed_types;
156 | my $lenght_margin1 = 0.70;
157 | my $lenght_margin2 = 1.3;
158 |
159 | my $count = '0';
160 |
161 | if ($input_pbsv ne "")
162 | {
163 | while (my $line = )
164 | {
165 | chomp($line);
166 | my $first_nuc = substr $line, 0, 1;
167 | if ($first_nuc ne "#")
168 | {
169 | my @list = split /\t/, $line;
170 |
171 | my $info = $list[7];
172 | my $REF = $list[3];
173 | my $ALT = $list[4];
174 | my $SVLEN = "";
175 | my $END = "";
176 | my $type = "";
177 | my @info = split /;/, $info;
178 | my @HAP = split /:/, $list[9];
179 | my @HAP2 = split /,/, $HAP[1];
180 | my @HAP3 = split /,/, $HAP[3];
181 |
182 | my $chr = $list[0];
183 | if ($list[0] =~ m/chr(\d+|X|Y)/)
184 | {
185 | $chr = $1;
186 | }
187 |
188 | foreach my $info_tmp (@info)
189 | {
190 | my $first_five = substr $info_tmp, 0, 5;
191 | my $first_four = substr $info_tmp, 0, 4;
192 | if ($info_tmp =~ m/SVLEN=>*-*(\d+)/)
193 | {
194 | $SVLEN = $1;
195 | }
196 | elsif ($first_five eq "SVTYP")
197 | {
198 | $type = substr $info_tmp, 7;
199 | }
200 | elsif ($first_four eq "END=")
201 | {
202 | $END = substr $info_tmp, 4;
203 | }
204 | }
205 | if ($SVLEN eq "")
206 | {
207 | $SVLEN = ".";
208 | }
209 | if ($SVLEN < 0)
210 | {
211 | $SVLEN *= -1;
212 | }
213 | my $haplo = $HAP[0];
214 | my @depth = split /,/, $HAP[1];
215 | my $DR = $depth[0];
216 | my $DV = $depth[1];
217 |
218 | my $converted_line = $chr."\t".$list[1]."\t".$SVLEN."\t".$type."\t".$haplo."\t".$END."\t".$DR."\t".$DV."\t".$REF."\t".$ALT;
219 |
220 | if ($type ne "BND" && $type ne "cnv" && $list[6] eq "PASS" && ($type ne "DUP" || $high_recall eq "2" || $haplo eq "1/1") && ($HAP2[1] >= $min_coverage ||
221 | ($HAP2[1] >= $min_coverage-1 && $HAP2[1]/$HAP[2] > 0.3)) && $haplo ne "0/0" && $HAP2[1]/$HAP[2] > 0.3 && ($SVLEN eq "." || $SVLEN > 45))
222 | {
223 | $SVs{$chr}{$list[1]} = $converted_line;
224 | $count{$chr}{$list[1]} = '1';
225 | if ($type eq "INV" || (($type eq "DEL" || $haplo eq "1/1") && $high_recall eq "2") && ($HAP2[1]+$HAP2[2]) > 9 && (($HAP3[2] > 0 && $HAP3[3] > 0) || $HAP3[2] eq "" || $HAP3[3] eq ""))
226 | {
227 | $count{$chr}{$list[1]} = '2';
228 | }
229 | if (($type eq "INV" || $type eq "INS") && $haplo eq "1/1" && $total_tools < 5)
230 | {
231 | $count{$chr}{$list[1]} = $count_tools;
232 | }
233 | $pbsv{$chr}{$list[1]} = $line;
234 | }
235 | }
236 | }
237 | }
238 |
239 | my %pbsv2;
240 | my %cutesv2;
241 | my %sniffles2;
242 | my %nanovar2;
243 | my %svim2;
244 |
245 | if ($input_cutesv ne "")
246 | {
247 | CUTESV: while (my $line = )
248 | {
249 | chomp($line);
250 | my $first = substr $line, 0, 1;
251 | my $prev_match2 = "";
252 |
253 | if ($first ne "#")
254 | {
255 | my @list = split /\t/, $line;
256 | my $pos = $list[1];
257 | my $REF = $list[3];
258 | my $ALT = $list[4];
259 | my $length;
260 | my $type;
261 | my $v = '1';
262 | my $min = '1';
263 | my $END = "";
264 |
265 | my $info = $list[7];
266 | my @info = split /;/, $info;
267 | my @HAP = split /:/, $list[9];
268 | my $DR = $HAP[1];
269 | my $DV = $HAP[2];
270 |
271 | foreach my $info_tmp (@info)
272 | {
273 | my $first_five = substr $info_tmp, 0, 5;
274 | my $first_four = substr $info_tmp, 0, 4;
275 | if ($first_five eq "SVLEN")
276 | {
277 | $length = substr $info_tmp, 6;
278 | }
279 | elsif ($first_five eq "SVTYP")
280 | {
281 | $type = substr $info_tmp, 7;
282 | }
283 | elsif ($first_four eq "END=")
284 | {
285 | $END = substr $info_tmp, 4;
286 | }
287 | }
288 | if ($length eq "")
289 | {
290 | $length = ".";
291 | }
292 | if ($length < 0)
293 | {
294 | $length *= -1;
295 | }
296 | my $chr = $list[0];
297 | if ($list[0] =~ m/chr(\d+|X|Y)/)
298 | {
299 | $chr = $1;
300 | }
301 | my $converted_line = $chr."\t".$list[1]."\t".$length."\t".$type."\t".$HAP[0]."\t".$END."\t".$DR."\t".$DV."\t".$REF."\t".$ALT;
302 |
303 | if (exists ($cutesv{$chr}{$list[1]}))
304 | {
305 | next CUTESV;
306 | }
307 |
308 | if ($list[6] eq "PASS" && ($length eq "." || $length > 45) && ($HAP[2] >= $min_coverage || ($HAP[2] >= $min_coverage-1 && $HAP[2]/($HAP[2]+$HAP[1]) > 0.3))
309 | && $HAP[2]/($HAP[2]+$HAP[1]) > 0.12 && $type ne "BND" && $HAP[0] ne "./.")
310 | {
311 | my $prev_match = "";
312 | if (exists($pbsv2{$chr}{$list[1]}))
313 | {
314 | $prev_match = "yes";
315 | }
316 |
317 | if (exists($pbsv{$chr}{$list[1]}) && exists($SVs{$chr}{$list[1]}) && $prev_match eq "")
318 | {
319 | my $count_tmp = $count{$chr}{$list[1]};
320 |
321 | my @list2 = split /\t/, $SVs{$chr}{$list[1]};
322 | my $new_pos = $list2[1];
323 | my $new_length = $list2[2];
324 | my $new_type = $list2[3];
325 | my $new_haplo = $list2[4];
326 | my $new_END = $list2[5];
327 | my $new_DR = $list2[6];
328 | my $new_DV = $list2[7];
329 | my $new_REF = $list2[8];
330 | my $new_ALT = $list2[9];
331 |
332 | $pbsv2{$chr}{$list[1]} = undef;
333 |
334 | $count{$chr}{$list[1]} = $count_tmp+1;
335 | if ($type eq "INS" && $list2[3] ne $type && $type ne ".")
336 | {
337 | $new_type = $type;
338 | }
339 | if ($HAP[0] ne "." && ($type eq "INV" || $type eq "INS"))
340 | {
341 | $new_haplo = $HAP[0];
342 | $new_DR = $DR;
343 | $new_DV = $DV;
344 | }
345 | if ($type eq "INV" || $list2[2] eq ".")
346 | {
347 | if ($length ne ".")
348 | {
349 | $new_length = $length;
350 | }
351 | if ($HAP[0] ne ".")
352 | {
353 | $new_haplo = $HAP[0];
354 | $new_DR = $DR;
355 | $new_DV = $DV;
356 | }
357 | }
358 | delete $SVs{$chr}{$pos};
359 | delete $count{$chr}{$pos};
360 | my $line2 = $list2[0]."\t".$new_pos."\t".$new_length."\t".$new_type."\t".$new_haplo."\t".$new_END."\t".$new_DR."\t".$new_DV."\t".$new_REF."\t".$new_ALT;
361 | $SVs{$list2[0]}{$new_pos} = $line2;
362 | $count{$chr}{$new_pos} = $count_tmp+1;
363 | $cutesv{$chr}{$list[1]} = $line;
364 |
365 | if ($type eq "INS")
366 | {
367 | $count{$chr}{$new_pos} = $count_tools;
368 | }
369 | next CUTESV;
370 |
371 | }
372 | elsif ($input_pbsv ne "")
373 | {
374 | my $score_prev = "";
375 | my $pos_prev = "";
376 | POS_ALMOST2da:
377 | while ($v < $length_match)
378 | {
379 | POS_ALMOST2d: my $pos_tmp = ($min*$v)+$pos;
380 |
381 | if (exists($pbsv2{$chr}{$pos_tmp}))
382 | {
383 | $prev_match2 = "yes";
384 | }
385 |
386 | if (exists($pbsv{$chr}{$pos_tmp}) && exists($SVs{$chr}{$pos_tmp}) && $prev_match2 eq "")
387 | {
388 | my $count_tmp = $count{$chr}{$pos_tmp};
389 | my @list2 = split /\t/, $SVs{$chr}{$pos_tmp};
390 |
391 | if (exists($cutesv{$chr}{$pos_tmp}) && $list2[3] eq $type && ($list2[2] > $length*$lenght_margin1 && $list2[2] < $length*$lenght_margin2))
392 | {
393 | next CUTESV;
394 | }
395 | else
396 | {
397 | my $score = '0';
398 | if ($list2[3] eq $type || ($type eq "INS" && $list2[3] eq "DUP") || ($type eq "DUP" && $list2[3] eq "INS"))
399 | {
400 | $score += 1;
401 | }
402 | my $g = (800-$v)/800;
403 | $score += $g;
404 | my $score_tmp = '0';
405 | if ($list2[2] ne "." && $length ne ".")
406 | {
407 | if ($list2[2] >= $length)
408 | {
409 | $score_tmp = 1-(($list2[2]-$length)/$length);
410 | if ($score_tmp < 0)
411 | {
412 | $score_tmp = '0';
413 | }
414 | }
415 | else
416 | {
417 | $score_tmp = 1-(($length-$list2[2])/$length);
418 | if ($score_tmp < 0)
419 | {
420 | $score_tmp = '0';
421 | }
422 | }
423 | }
424 | $score += $score_tmp;
425 |
426 | if ($score_prev ne "" && $score_prev > $score && $score_prev ne "no")
427 | {
428 | $pos_tmp = $pos_prev;
429 | $count_tmp = $count{$chr}{$pos_tmp};
430 | @list2 = split /\t/, $SVs{$chr}{$pos_tmp};
431 | }
432 |
433 | if ($score > 2 || ($score_prev ne "" && $score_prev > $score) || $score_prev eq "no")
434 | {
435 |
436 | if ($score_prev eq "no")
437 | {
438 | print $SVs{$chr}{$pos_tmp}."\n";
439 | print $line."\n";
440 | }
441 | my $new_pos = $pos_tmp;
442 | my $new_length = $list2[2];
443 | my $new_type = $list2[3];
444 | my $new_haplo = $list2[4];
445 | my $new_END = $list2[5];
446 | my $new_DR = $list2[6];
447 | my $new_DV = $list2[7];
448 | my $new_REF = $list2[8];
449 | my $new_ALT = $list2[9];
450 |
451 | if ($type eq "INS" && $list2[3] ne $type)
452 | {
453 | $new_type = $type;
454 | }
455 | if ($HAP[0] ne "." && ($type eq "INV" || $type eq "INS"))
456 | {
457 | $new_haplo = $HAP[0];
458 | $new_DR = $DR;
459 | $new_DV = $DV;
460 | }
461 | if ($type eq "INV" || $list2[2] eq ".")
462 | {
463 | if ($length ne ".")
464 | {
465 | $new_length = $length;
466 | $new_END = $END;
467 | }
468 | $new_pos = $pos;
469 | }
470 |
471 | delete $SVs{$chr}{$pos_tmp};
472 | delete $count{$chr}{$pos_tmp};
473 | my $line2 = $list2[0]."\t".$new_pos."\t".$new_length."\t".$new_type."\t".$new_haplo."\t".$new_END."\t".$new_DR."\t".$new_DV."\t".$new_REF."\t".$new_ALT;
474 | $SVs{$list2[0]}{$new_pos} = $line2;
475 | $count{$chr}{$new_pos} = $count_tmp+1;
476 | $cutesv{$chr}{$new_pos} = $line;
477 | $pbsv2{$chr}{$new_pos} = undef;
478 |
479 | if ($type eq "INS")
480 | {
481 | $count{$chr}{$new_pos} = $count_tools;
482 | }
483 |
484 | if ($new_pos eq $pos)
485 | {
486 | if (exists($pbsv{$chr}{$pos_tmp}))
487 | {
488 | my $tmp = $pbsv{$chr}{$pos_tmp};
489 | delete $pbsv{$chr}{$pos_tmp};
490 | $pbsv{$chr}{$new_pos} = $tmp;
491 | }
492 | if (exists($cutesv{$chr}{$pos_tmp}))
493 | {
494 | my $tmp = $cutesv{$chr}{$pos_tmp};
495 | delete $cutesv{$chr}{$pos_tmp};
496 | $cutesv{$chr}{$new_pos} = $tmp;
497 | }
498 | }
499 | next CUTESV;
500 | }
501 | elsif ($min eq '1')
502 | {
503 | $min = '-1';
504 | if ($score > $score_prev)
505 | {
506 | $score_prev = $score;
507 | $pos_prev = $pos_tmp;
508 | }
509 | goto POS_ALMOST2d;
510 | }
511 | else
512 | {
513 | $min = '1';
514 | if ($score > $score_prev)
515 | {
516 | $score_prev = $score;
517 | $pos_prev = $pos_tmp;
518 | }
519 | }
520 | $v++;
521 | goto POS_ALMOST2d;
522 | }
523 | }
524 | elsif ($min eq '1')
525 | {
526 | $min = '-1';
527 | goto POS_ALMOST2d;
528 | }
529 | else
530 | {
531 | $min = '1';
532 | }
533 | $v++;
534 | }
535 | if ($score_prev ne "" && $score_prev ne "no" && $score_prev > 1.6)
536 | {
537 | $score_prev = "no";
538 | $v = '1';
539 | #goto POS_ALMOST2da;
540 | }
541 | }
542 | if ($HAP[2]/($HAP[2]+$HAP[1]) > 0.2 && ($total_tools eq "2" || ($type ne "INV" && $type ne "BND")))
543 | {
544 | $SVs{$chr}{$list[1]} = $converted_line;
545 | $count{$chr}{$list[1]} = '1';
546 | $cutesv{$chr}{$list[1]} = $line;
547 | if ($type ne "INV" && $type ne "BND" && $type ne "DUP" && $HAP[0] ne "" && $HAP[0] ne "./." && $prev_match2 eq "")
548 | {
549 | $count{$chr}{$list[1]} = '2';
550 | }
551 | }
552 | }
553 | }
554 | }
555 | }
556 |
557 | undef %pbsv2;
558 | undef %cutesv2;
559 |
560 | if ($input_sniffles ne "")
561 | {
562 | SNIFFLES: while (my $line = )
563 | {
564 | chomp($line);
565 | my $first = substr $line, 0, 1;
566 | if ($first ne "#")
567 | {
568 | my @list = split /\t/, $line;
569 | my $pos = $list[1];
570 | my $REF = $list[3];
571 | my $ALT = $list[4];
572 | my $length;
573 | my $type;
574 | my $v = '1';
575 | my $min = '1';
576 | my $END = "";
577 |
578 | my $info = $list[7];
579 | my @info = split /;/, $info;
580 | my @HAP = split /:/, $list[9];
581 | my $DR = $HAP[1];
582 | my $DV = $HAP[2];
583 |
584 | foreach my $info_tmp (@info)
585 | {
586 | my $first_five = substr $info_tmp, 0, 5;
587 | my $first_four = substr $info_tmp, 0, 4;
588 | if ($first_five eq "SVLEN")
589 | {
590 | $length = substr $info_tmp, 6;
591 | }
592 | elsif ($first_five eq "SVTYP")
593 | {
594 | $type = substr $info_tmp, 7;
595 | }
596 | elsif ($first_four eq "END=")
597 | {
598 | $END = substr $info_tmp, 4;
599 | }
600 | }
601 | if ($length eq "")
602 | {
603 | $length = ".";
604 | }
605 | if ($length < 0)
606 | {
607 | $length *= -1;
608 | }
609 | my $chr = $list[0];
610 | if ($list[0] =~ m/chr(\d+|X|Y)/)
611 | {
612 | $chr = $1;
613 | }
614 | my $converted_line = $chr."\t".$list[1]."\t".$length."\t".$type."\t".$HAP[0]."\t".$END."\t".$DR."\t".$DV."\t".$REF."\t".$ALT;
615 |
616 | if (exists ($sniffles{$chr}{$list[1]}))
617 | {
618 | next SNIFFLES;
619 | }
620 |
621 | if ($list[6] eq "PASS" && ($length eq "." || $length > 45) && ($HAP[2] >= $min_coverage || ($HAP[2] >= $min_coverage-1 && $HAP[2]/($HAP[2]+$HAP[1]) > 0.3))
622 | && $HAP[2]/($HAP[2]+$HAP[1]) > 0.1 && ($HAP[0] ne "0/0" || $input_cutesv eq ""))
623 | {
624 | my $prev_match_pbsv = "";
625 | my $prev_match_cutesv = "";
626 | if (exists($pbsv2{$chr}{$list[1]}))
627 | {
628 | $prev_match_pbsv = "yes";
629 | }
630 | if (exists($cutesv2{$chr}{$list[1]}))
631 | {
632 | $prev_match_cutesv = "yes";
633 | }
634 |
635 | if ((exists($pbsv{$chr}{$list[1]}) && $prev_match_pbsv eq "") || (exists($cutesv{$chr}{$list[1]}) && $prev_match_cutesv eq ""))
636 | {
637 | my @list2 = split /\t/, $SVs{$chr}{$list[1]};
638 |
639 | if (exists($pbsv{$chr}{$list[1]}))
640 | {
641 | $pbsv2{$chr}{$list[1]} = undef;
642 | }
643 | if (exists($cutesv{$chr}{$list[1]}))
644 | {
645 | $cutesv2{$chr}{$list[1]} = undef;
646 | }
647 |
648 | my $no_pbsv = "";
649 | if (exists($pbsv{$chr}{$list[1]}))
650 | {
651 | }
652 | else
653 | {
654 | $no_pbsv = "yes";
655 | }
656 |
657 | my $count_tmp = $count{$chr}{$list[1]};
658 |
659 | my $new_pos = $pos;
660 | my $new_length = $list2[2];
661 | my $new_type = $list2[3];
662 | my $new_haplo = $list2[4];
663 | my $new_END = $list2[5];
664 | my $new_DR = $list2[6];
665 | my $new_DV = $list2[7];
666 | my $new_REF = $list2[8];
667 | my $new_ALT = $list2[9];
668 |
669 | if (($list2[3] eq $type || $list2[3] eq "BND" || $list2[3] eq "." || ($type eq "." && $list2[3] ne "BND") || ($list2[3] ne "DEL" && $type ne "DEL" && $list2[3] ne "BND")))
670 | {
671 | $count{$chr}{$list[1]} = $count_tmp+1;
672 |
673 | if ($no_pbsv ne "")
674 | {
675 | $new_length = $length;
676 | $new_END = $END;
677 | $new_REF = $REF;
678 | $new_ALT = $ALT;
679 | }
680 |
681 | if ($type eq "INS" && $list2[3] ne $type && $type ne ".")
682 | {
683 | $new_type = $type;
684 | }
685 | if ($list2[3] eq "BND" || ($list2[3] eq "INV" && ($type eq "DEL" || $type eq "INS")) || ($list2[3] eq "DUP" && $type eq "INS") || $list2[3] eq ".")
686 | {
687 | if ($length ne ".")
688 | {
689 | $new_length = $length;
690 | $new_END = $END;
691 | }
692 | if ($type ne ".")
693 | {
694 | $new_type = $type;
695 | }
696 | }
697 | if ($type eq "INV" || $list2[2] eq ".")
698 | {
699 | if ($length ne ".")
700 | {
701 | $new_length = $length;
702 | $new_END = $END;
703 | $new_REF = $REF;
704 | $new_ALT = $ALT;
705 | }
706 | }
707 |
708 | delete $SVs{$chr}{$pos};
709 | delete $count{$chr}{$pos};
710 |
711 | my $line2 = $list2[0]."\t".$new_pos."\t".$new_length."\t".$new_type."\t".$new_haplo."\t".$new_END."\t".$new_DR."\t".$new_DV."\t".$new_REF."\t".$new_ALT;
712 | $SVs{$list2[0]}{$new_pos} = $line2;
713 | $count{$chr}{$new_pos} = $count_tmp+1;
714 | $sniffles{$chr}{$new_pos} = $line;
715 |
716 | if ($type eq "INS")
717 | {
718 | $count{$chr}{$new_pos} = $count_tools;
719 | }
720 | next SNIFFLES;
721 | }
722 | }
723 | elsif ($input_pbsv ne "" || $input_cutesv ne "")
724 | {
725 | my $score_prev = "";
726 | my $pos_prev = "";
727 | POS_ALMOST2ca:
728 | while ($v < $length_match)
729 | {
730 | POS_ALMOST2c: my $pos_tmp = ($min*$v)+$pos;
731 |
732 | my $prev_match_pbsv2 = "";
733 | my $prev_match_cutesv2 = "";
734 | if (exists($pbsv2{$chr}{$pos_tmp}))
735 | {
736 | $prev_match_pbsv2 = "yes";
737 | }
738 | if (exists($cutesv2{$chr}{$pos_tmp}))
739 | {
740 | $prev_match_cutesv2 = "yes";
741 | }
742 |
743 | if ((exists($pbsv{$chr}{$pos_tmp}) && $prev_match_pbsv2 eq "") || (exists($cutesv{$chr}{$pos_tmp}) && $prev_match_cutesv2 eq ""))
744 | {
745 | my @list2 = split /\t/, $SVs{$chr}{$pos_tmp};
746 | my $count_tmp = $count{$chr}{$pos_tmp};
747 |
748 | my $no_pbsv = "";
749 | if (exists($pbsv{$chr}{$pos_tmp}))
750 | {
751 | }
752 | else
753 | {
754 | $no_pbsv = "yes";
755 | }
756 |
757 | if (exists ($sniffles{$chr}{$pos_tmp}) && $list2[3] eq $type && ($list2[2] > $length*$lenght_margin1 && $list2[2] < $length*$lenght_margin2))
758 | {
759 | next SNIFFLES;
760 | }
761 | else
762 | {
763 | if ($list2[3] eq $type || $list2[3] eq "BND" || $list2[3] eq "." || ($type eq "." && $list2[3] ne "BND") ||
764 | ($list2[3] ne "DEL" && $type ne "DEL" && $list2[3] ne "BND"))
765 | {
766 | my $score = '0';
767 | if ($list2[3] eq $type || ($type eq "INS" && $list2[3] eq "DUP") || ($type eq "DUP" && $list2[3] eq "INS"))
768 | {
769 | $score += 1;
770 | }
771 | my $g = (800-$v)/800;
772 | $score += $g;
773 | my $score_tmp = '0';
774 | if ($list2[2] ne "." && $length ne ".")
775 | {
776 | if ($list2[2] >= $length)
777 | {
778 | $score_tmp = 1-(($list2[2]-$length)/$length);
779 | if ($score_tmp < 0)
780 | {
781 | $score_tmp = '0';
782 | }
783 | }
784 | else
785 | {
786 | $score_tmp = 1-(($length-$list2[2])/$length);
787 | if ($score_tmp < 0)
788 | {
789 | $score_tmp = '0';
790 | }
791 | }
792 | }
793 | $score += $score_tmp;
794 |
795 | if ($score_prev ne "" && $score_prev > $score && $score_prev ne "no")
796 | {
797 | $pos_tmp = $pos_prev;
798 | $count_tmp = $count{$chr}{$pos_tmp};
799 | @list2 = split /\t/, $SVs{$chr}{$pos_tmp};
800 | }
801 |
802 | if ($score > 2 || ($score_prev ne "" && $score_prev > $score) || $score_prev eq "no")
803 | {
804 |
805 | if (exists($pbsv{$chr}{$pos_tmp}))
806 | {
807 | $pbsv2{$chr}{$pos_tmp} = undef;
808 | }
809 | if (exists($cutesv{$chr}{$pos_tmp}))
810 | {
811 | $cutesv2{$chr}{$pos_tmp} = undef;
812 | }
813 |
814 | my $new_pos = $pos_tmp;
815 | my $new_length = $list2[2];
816 | my $new_type = $list2[3];
817 | my $new_haplo = $list2[4];
818 | my $new_END = $list2[5];
819 | my $new_DR = $list2[6];
820 | my $new_DV = $list2[7];
821 | my $new_REF = $list2[8];
822 | my $new_ALT = $list2[9];
823 |
824 | if ($no_pbsv eq "yes")
825 | {
826 | $new_pos = $pos;
827 | $new_length = $length;
828 | $new_END = $END;
829 | $new_REF = $REF;
830 | $new_ALT = $ALT;
831 | }
832 |
833 | if ($type eq "INS" && $list2[3] ne $type && $type ne ".")
834 | {
835 | $new_type = $type;
836 | }
837 | if ($list2[3] eq "BND" || ($list2[3] eq "INV" && ($type eq "DEL" || $type eq "INS")) || $list2[3] eq "." || ($type eq "INS" && $list2[3] ne $type))
838 | {
839 | if ($length ne ".")
840 | {
841 | $new_length = $length;
842 | $new_END = $END;
843 | }
844 | if ($type ne ".")
845 | {
846 | $new_type = $type;
847 | }
848 | }
849 | if ($type eq "INV" || $list2[2] eq ".")
850 | {
851 | if ($length ne ".")
852 | {
853 | $new_length = $length;
854 | $new_END = $END;
855 | $new_REF = $REF;
856 | $new_ALT = $ALT;
857 | }
858 | $new_pos = $pos;
859 | }
860 |
861 | delete $SVs{$chr}{$pos_tmp};
862 | delete $count{$chr}{$pos_tmp};
863 | my $line2 = $list2[0]."\t".$new_pos."\t".$new_length."\t".$new_type."\t".$new_haplo."\t".$new_END."\t".$new_DR."\t".$new_DV."\t".$new_REF."\t".$new_ALT;
864 | $SVs{$list2[0]}{$new_pos} = $line2;
865 | $count{$chr}{$new_pos} = $count_tmp+1;
866 | $sniffles{$chr}{$pos_tmp} = $line;
867 |
868 | if ($type eq "INS")
869 | {
870 | $count{$chr}{$new_pos} = $count_tools;
871 | }
872 | if ($new_pos eq $pos)
873 | {
874 | if (exists($pbsv{$chr}{$pos_tmp}))
875 | {
876 | my $tmp = $pbsv{$chr}{$pos_tmp};
877 | delete $pbsv{$chr}{$pos_tmp};
878 | $pbsv{$chr}{$new_pos} = $tmp;
879 | }
880 | if (exists($cutesv{$chr}{$pos_tmp}))
881 | {
882 | my $tmp = $cutesv{$chr}{$pos_tmp};
883 | delete $cutesv{$chr}{$pos_tmp};
884 | $cutesv{$chr}{$new_pos} = $tmp;
885 | }
886 | if (exists($sniffles{$chr}{$pos_tmp}))
887 | {
888 | my $tmp = $sniffles{$chr}{$pos_tmp};
889 | delete $sniffles{$chr}{$pos_tmp};
890 | $sniffles{$chr}{$new_pos} = $tmp;
891 | }
892 | }
893 | next SNIFFLES;
894 | }
895 | elsif ($min eq '1')
896 | {
897 | $min = '-1';
898 | if ($score > $score_prev)
899 | {
900 | $score_prev = $score;
901 | $pos_prev = $pos_tmp;
902 | }
903 | goto POS_ALMOST2ca;
904 | }
905 | else
906 | {
907 | $min = '1';
908 | if ($score > $score_prev)
909 | {
910 | $score_prev = $score;
911 | $pos_prev = $pos_tmp;
912 | }
913 | }
914 | $v++;
915 | goto POS_ALMOST2ca;
916 | }
917 | next SNIFFLES;
918 | }
919 | }
920 | elsif ($min eq '1')
921 | {
922 | $min = '-1';
923 | goto POS_ALMOST2c;
924 | }
925 | else
926 | {
927 | $min = '1';
928 | }
929 | $v++;
930 | }
931 | if ($score_prev ne "" && $score_prev ne "no" && $score_prev > 1.6)
932 | {
933 | $score_prev = "no";
934 | $v = '1';
935 | goto POS_ALMOST2ca;
936 | }
937 | }
938 |
939 | $SVs{$chr}{$list[1]} = $converted_line;
940 | $count{$chr}{$list[1]} = '1';
941 | $sniffles{$chr}{$list[1]} = $line;
942 | if (($type eq "INS" || $type eq "DEL") && $HAP[0] ne "0/0" && $HAP[0] ne "./." && $HAP[0] ne "1/1" && ($HAP[2]+$HAP[1]) > 4)
943 | {
944 | $count{$chr}{$list[1]} = $count_tools;
945 | }
946 | elsif ($high_recall eq "2" && $HAP[0] ne "0/0" && $HAP[0] ne "1/1" && ($HAP[2]+$HAP[1]) > 4 && $input_cutesv eq "")
947 | {
948 | $count{$chr}{$list[1]} = '2';
949 | $SVs{$chr}{$list[1]} = $converted_line;
950 | $sniffles{$chr}{$list[1]} = $line;
951 | }
952 | elsif ($high_recall eq "2" && $HAP[0] ne "0/0" && $HAP[0] ne "1/1" && $input_cutesv eq "")
953 | {
954 | $count{$chr}{$list[1]} = '2';
955 | $SVs{$chr}{$list[1]} = $converted_line;
956 | $sniffles{$chr}{$list[1]} = $line;
957 | }
958 | }
959 | }
960 | }
961 | }
962 |
963 | undef %pbsv2;
964 | undef %cutesv2;
965 |
966 | if ($input_nanovar ne "")
967 | {
968 | NANOVAR: while (my $line = )
969 | {
970 | chomp($line);
971 | my $score_prev = "";
972 | my $pos_prev = "";
973 | my $first_nuc = substr $line, 0, 1;
974 | if ($first_nuc ne "#")
975 | {
976 | my @list = split /\t/, $line;
977 | my $pos = $list[1];
978 | my $REF = $list[3];
979 | my $ALT = $list[4];
980 | my $type;
981 | my $length;
982 | my $v = '1';
983 | my $min = '1';
984 | my $END = "";
985 |
986 | my $info = $list[7];
987 | my @info = split /;/, $info;
988 | my @HAP = split /:/, $list[9];
989 | my @HAP_REF = split /,/, $HAP[1];
990 | my @HAP2 = split /,/, $HAP[2];
991 | my $not_save = "";
992 |
993 | foreach my $info_tmp (@info)
994 | {
995 | my $first_five = substr $info_tmp, 0, 5;
996 | my $first_four = substr $info_tmp, 0, 4;
997 | if ($info_tmp =~ m/SVLEN=>*-*(\d+)/)
998 | {
999 | $length = $1;
1000 | }
1001 | elsif ($first_five eq "SVTYP")
1002 | {
1003 | $type = substr $info_tmp, 7;
1004 | }
1005 | elsif ($first_four eq "END=")
1006 | {
1007 | $END = substr $info_tmp, 4;
1008 | }
1009 | }
1010 | if ($length eq "")
1011 | {
1012 | $length = ".";
1013 | }
1014 | if ($length < 0)
1015 | {
1016 | $length *= -1;
1017 | }
1018 | my $chr = $list[0];
1019 | if ($list[0] =~ m/chr(\d+|X|Y)/)
1020 | {
1021 | $chr = $1;
1022 | }
1023 | my @depth = split /,/, $HAP[2];
1024 | my $DR = $depth[0];
1025 | my $DV = $depth[1];
1026 |
1027 | my $converted_line = $chr."\t".$list[1]."\t".$length."\t".$type."\t".$HAP[0]."\t".$END."\t".$DR."\t".$DV."\t".$REF."\t".$ALT;
1028 |
1029 | if (exists ($nanovar{$chr}{$list[1]}))
1030 | {
1031 | next NANOVAR;
1032 | }
1033 |
1034 | if (($HAP2[1] >= $min_coverage || ($HAP2[1] >= $min_coverage-1 && $HAP2[0]/($HAP_REF[0]+$HAP2[0]) > 0.3)) && $list[6] eq "PASS" && $type ne "."
1035 | && $type ne "BND" && ($HAP2[0]/($HAP_REF[0]+$HAP2[0]) > 0.12 || $HAP_REF[0] eq ".") && ($length eq "." || $length > 45) && ($HAP[0] ne "./." || $count_tools < 4))
1036 | {
1037 | my @list2 = split /\t/, $SVs{$chr}{$list[1]};
1038 |
1039 | my $prev_match_pbsv = "";
1040 | my $prev_match_cutesv = "";
1041 | my $prev_match_sniffles = "";
1042 | if (exists($pbsv2{$chr}{$list[1]}))
1043 | {
1044 | $prev_match_pbsv = "yes";
1045 | }
1046 | if (exists($cutesv2{$chr}{$list[1]}))
1047 | {
1048 | $prev_match_cutesv = "yes";
1049 | }
1050 | if (exists($sniffles2{$chr}{$list[1]}))
1051 | {
1052 | $prev_match_sniffles = "yes";
1053 | }
1054 |
1055 | if ((exists($pbsv{$chr}{$list[1]}) && $prev_match_pbsv eq "") || (exists($cutesv{$chr}{$list[1]}) && $prev_match_cutesv eq "") ||
1056 | (exists($sniffles{$chr}{$list[1]}) && $prev_match_sniffles eq ""))
1057 | {
1058 | if (exists($pbsv{$chr}{$list[1]}))
1059 | {
1060 | $pbsv2{$chr}{$list[1]} = undef;
1061 | }
1062 | if (exists($cutesv{$chr}{$list[1]}))
1063 | {
1064 | $cutesv2{$chr}{$list[1]} = undef;
1065 | }
1066 | if (exists($sniffles{$chr}{$list[1]}))
1067 | {
1068 | $sniffles2{$chr}{$list[1]} = undef;
1069 | }
1070 |
1071 | my $count_tmp = $count{$chr}{$list[1]};
1072 |
1073 | if (($list2[3] eq $type || $list2[3] eq "." || $type eq "." || ($list2[3] ne "DEL" && $type ne "DEL") || $type eq "INV" || $list2[3] eq "INV")
1074 | && ($list2[3] ne "BND" || exists($sniffles{$chr}{$list[1]}) || exists($cutesv{$chr}{$list[1]})))
1075 | {
1076 | $count{$chr}{$list[1]} = $count_tmp+1;
1077 |
1078 | my $new_pos = $pos;
1079 | my $new_length = $list2[2];
1080 | my $new_type = $list2[3];
1081 | my $new_haplo = $list2[4];
1082 | my $new_END = $list2[5];
1083 | my $new_DR = $list2[6];
1084 | my $new_DV = $list2[7];
1085 | my $new_REF = $list2[8];
1086 | my $new_ALT = $list2[9];
1087 |
1088 | if (($list2[3] eq "BND" || ($list2[3] eq "INV" && ($type eq "DEL" || $type eq "INS")) || $list2[3] eq ".") && $type ne "DUP")
1089 | {
1090 | if ($type ne ".")
1091 | {
1092 | $new_type = $type;
1093 | }
1094 | }
1095 | if ($type eq "INV")
1096 | {
1097 | if ($length ne ".")
1098 | {
1099 | $new_length = $length;
1100 | $new_END = $END;
1101 | }
1102 | if ($type ne ".")
1103 | {
1104 | $new_type = $type;
1105 | }
1106 | if ($HAP[0] ne "./.")
1107 | {
1108 | $new_haplo = $HAP[0];
1109 | $new_DR = $DR;
1110 | $new_DV = $DV;
1111 | $new_REF = $REF;
1112 | $new_ALT = $ALT;
1113 | }
1114 | }
1115 | if ($list2[2] eq ".")
1116 | {
1117 | $new_length = $length;
1118 | $new_END = $END;
1119 | }
1120 | if (exists($pbsv{$chr}{$pos}))
1121 | {}
1122 | else
1123 | {
1124 | if ($HAP[0] ne "./.")
1125 | {
1126 | $new_haplo = $HAP[0];
1127 | $new_DR = $DR;
1128 | $new_DV = $DV;
1129 | $new_REF = $REF;
1130 | $new_ALT = $ALT;
1131 | }
1132 | }
1133 | if ($new_haplo eq "./.")
1134 | {
1135 | $new_haplo = $HAP[0];
1136 | $new_DR = $DR;
1137 | $new_DV = $DV;
1138 | $new_REF = $REF;
1139 | $new_ALT = $ALT;
1140 | }
1141 |
1142 | delete $SVs{$chr}{$pos};
1143 | delete $count{$chr}{$pos};
1144 | my $line2 = $list2[0]."\t".$new_pos."\t".$new_length."\t".$new_type."\t".$new_haplo."\t".$new_END."\t".$new_DR."\t".$new_DV."\t".$new_REF."\t".$new_ALT;
1145 | $SVs{$list2[0]}{$new_pos} = $line2;
1146 | $count{$chr}{$new_pos} = $count_tmp+1;
1147 | $nanovar{$chr}{$new_pos} = $line;
1148 |
1149 | if (($HAP[0] eq "0/1" || $HAP[0] eq "1/0") && $type eq "INS" && $length < 2000)
1150 | {
1151 | $count{$chr}{$new_pos} = $count_tmp+2;
1152 | }
1153 | }
1154 | next NANOVAR;
1155 | }
1156 | else
1157 | {
1158 |
1159 | POS_ALMOST2fa:
1160 | while ($v < $length_match)
1161 | {
1162 | POS_ALMOST2f: my $pos_tmp = ($min*$v)+$pos;
1163 |
1164 | my $prev_match_pbsv2 = "";
1165 | my $prev_match_cutesv2 = "";
1166 | my $prev_match_sniffles2 = "";
1167 | if (exists($pbsv2{$chr}{$pos_tmp}))
1168 | {
1169 | $prev_match_pbsv2 = "yes";
1170 | }
1171 | if (exists($cutesv2{$chr}{$pos_tmp}))
1172 | {
1173 | $prev_match_cutesv2 = "yes";
1174 | }
1175 | if (exists($sniffles2{$chr}{$pos_tmp}))
1176 | {
1177 | $prev_match_sniffles2 = "yes";
1178 | }
1179 |
1180 | if ((exists($pbsv{$chr}{$pos_tmp}) && $prev_match_pbsv2 eq "") || (exists($cutesv{$chr}{$pos_tmp}) && $prev_match_cutesv2 eq "") ||
1181 | (exists($sniffles{$chr}{$pos_tmp}) && $prev_match_sniffles2 eq ""))
1182 | {
1183 | my $count_tmp = $count{$chr}{$pos_tmp};
1184 | if (exists($nanovar{$chr}{$pos_tmp}) && $high_recall eq "gg")
1185 | {
1186 | if ($min eq '1')
1187 | {
1188 | $min = '-1';
1189 | goto POS_ALMOST2f;
1190 | }
1191 | else
1192 | {
1193 | $min = '1';
1194 | }
1195 | }
1196 | else
1197 | {
1198 | my @list2 = split /\t/, $SVs{$chr}{$pos_tmp};
1199 |
1200 | if (($list2[3] eq $type || ($list2[3] eq "BND" && $type eq "INV") || $list2[3] eq "." || $type eq "." || ($list2[3] ne "DEL" && $type ne "DEL")
1201 | || $type eq "INV" || $list2[3] eq "INV") && ($list2[3] ne "BND" || exists($sniffles{$chr}{$pos_tmp}) || exists($cutesv{$chr}{$pos_tmp})))
1202 | {
1203 | $count{$chr}{$pos_tmp} = $count_tmp+1;
1204 |
1205 | my $score = '0';
1206 | if ($list2[3] eq $type || ($type eq "INS" && $list2[3] eq "DUP") || ($type eq "DUP" && $list2[3] eq "INS"))
1207 | {
1208 | $score += 1;
1209 | }
1210 | my $g = (800-$v)/800;
1211 | $score += $g;
1212 | my $score_tmp = '0';
1213 | if ($list2[2] ne "." && $length ne ".")
1214 | {
1215 | if ($list2[2] >= $length)
1216 | {
1217 | $score_tmp = 1-(($list2[2]-$length)/$length);
1218 | if ($score_tmp < 0)
1219 | {
1220 | $score_tmp = '0';
1221 | }
1222 | }
1223 | else
1224 | {
1225 | $score_tmp = 1-(($length-$list2[2])/$length);
1226 | if ($score_tmp < 0)
1227 | {
1228 | $score_tmp = '0';
1229 | }
1230 | }
1231 | }
1232 | $score += $score_tmp;
1233 |
1234 | if ($score_prev ne "" && $score_prev > $score && $score_prev ne "no")
1235 | {
1236 | $pos_tmp = $pos_prev;
1237 | $count_tmp = $count{$chr}{$pos_tmp};
1238 | @list2 = split /\t/, $SVs{$chr}{$pos_tmp};
1239 | }
1240 |
1241 | if ($score > 2 || ($score_prev ne "" && $score_prev > $score) || $score_prev eq "no")
1242 | {
1243 | if (exists($pbsv{$chr}{$pos_tmp}))
1244 | {
1245 | $pbsv2{$chr}{$pos_tmp} = undef;
1246 | }
1247 | if (exists($cutesv{$chr}{$pos_tmp}))
1248 | {
1249 | $cutesv2{$chr}{$pos_tmp} = undef;
1250 | }
1251 | if (exists($sniffles{$chr}{$pos_tmp}))
1252 | {
1253 | $sniffles2{$chr}{$pos_tmp} = undef;
1254 | }
1255 |
1256 | my $new_pos = $pos_tmp;
1257 | my $new_length = $list2[2];
1258 | my $new_type = $list2[3];
1259 | my $new_haplo = $list2[4];
1260 | my $new_END = $list2[5];
1261 | my $new_DR = $list2[6];
1262 | my $new_DV = $list2[7];
1263 | my $new_REF = $list2[8];
1264 | my $new_ALT = $list2[9];
1265 |
1266 | if (($list2[3] eq "BND" || $list2[3] eq ".") && $type ne "DUP")
1267 | {
1268 | if ($length ne ".")
1269 | {
1270 | $new_length = $length;
1271 | $new_END = $END;
1272 | }
1273 | if ($type ne ".")
1274 | {
1275 | $new_type = $type;
1276 | }
1277 | }
1278 | if ($type eq "INV")
1279 | {
1280 | $new_pos = $pos;
1281 |
1282 | if (exists($pbsv{$chr}{$pos_tmp}))
1283 | {
1284 | my $tmp = $pbsv{$chr}{$pos_tmp};
1285 | delete $pbsv{$chr}{$pos_tmp};
1286 | $pbsv{$chr}{$new_pos} = $tmp;
1287 | }
1288 | if (exists($cutesv{$chr}{$pos_tmp}))
1289 | {
1290 | my $tmp = $cutesv{$chr}{$pos_tmp};
1291 | delete $cutesv{$chr}{$pos_tmp};
1292 | $cutesv{$chr}{$new_pos} = $tmp;
1293 | }
1294 | if (exists($sniffles{$chr}{$pos_tmp}))
1295 | {
1296 | my $tmp = $sniffles{$chr}{$pos_tmp};
1297 | delete $sniffles{$chr}{$pos_tmp};
1298 | $sniffles{$chr}{$new_pos} = $tmp;
1299 | }
1300 | if (exists($nanovar{$chr}{$pos_tmp}))
1301 | {
1302 | my $tmp = $nanovar{$chr}{$pos_tmp};
1303 | delete $nanovar{$chr}{$pos_tmp};
1304 | $nanovar{$chr}{$new_pos} = $tmp;
1305 | }
1306 |
1307 | if ($length ne ".")
1308 | {
1309 | $new_length = $length;
1310 | $new_END = $END;
1311 | $new_REF = $REF;
1312 | $new_ALT = $ALT;
1313 | }
1314 | if ($type ne ".")
1315 | {
1316 | $new_type = $type;
1317 | }
1318 | if ($HAP[0] ne "./.")
1319 | {
1320 | $new_haplo = $HAP[0];
1321 | $new_DR = $DR;
1322 | $new_DV = $DV;
1323 | }
1324 | }
1325 | if ($list2[2] eq ".")
1326 | {
1327 | $new_length = $length;
1328 | $new_END = $END;
1329 | $new_REF = $REF;
1330 | $new_ALT = $ALT;
1331 | }
1332 | if (exists($pbsv{$chr}{$pos_tmp}))
1333 | {}
1334 | else
1335 | {
1336 | if ($HAP[0] ne "./.")
1337 | {
1338 | $new_haplo = $HAP[0];
1339 | $new_DR = $DR;
1340 | $new_DV = $DV;
1341 | }
1342 | }
1343 | if ($new_haplo eq "./.")
1344 | {
1345 | $new_haplo = $HAP[0];
1346 | $new_DR = $DR;
1347 | $new_DV = $DV;
1348 | }
1349 |
1350 | delete $SVs{$chr}{$pos_tmp};
1351 | delete $count{$chr}{$pos_tmp};
1352 | my $line2 = $list2[0]."\t".$new_pos."\t".$new_length."\t".$new_type."\t".$new_haplo."\t".$new_END."\t".$new_DR."\t".$new_DV."\t".$new_REF."\t".$new_ALT;
1353 | $SVs{$list2[0]}{$new_pos} = $line2;
1354 | $count{$chr}{$new_pos} = $count_tmp+1;
1355 | $nanovar{$chr}{$new_pos} = $line;
1356 |
1357 | if (($HAP[0] eq "0/1" || $HAP[0] eq "1/0") && $type eq "INS" && $length < 2000)
1358 | {
1359 | $count{$chr}{$new_pos} = $count_tmp+2;
1360 | }
1361 | next NANOVAR;
1362 | }
1363 | elsif ($min eq '1')
1364 | {
1365 | $min = '-1';
1366 | if ($score > $score_prev)
1367 | {
1368 | $score_prev = $score;
1369 | $pos_prev = $pos_tmp;
1370 | }
1371 | goto POS_ALMOST2fa;
1372 | }
1373 | else
1374 | {
1375 | $min = '1';
1376 | if ($score > $score_prev)
1377 | {
1378 | $score_prev = $score;
1379 | $pos_prev = $pos_tmp;
1380 | }
1381 | }
1382 | $v++;
1383 | goto POS_ALMOST2fa;
1384 | }
1385 | }
1386 | next NANOVAR;
1387 | }
1388 | elsif ($min eq '1')
1389 | {
1390 | $min = '-1';
1391 | goto POS_ALMOST2f;
1392 | }
1393 | else
1394 | {
1395 | $min = '1';
1396 | }
1397 | $v++;
1398 | }
1399 |
1400 | }
1401 |
1402 | if ($type ne "BND" && $HAP[0] ne "./.")
1403 | {
1404 | $SVs{$chr}{$list[1]} = $converted_line;
1405 | $nanovar{$chr}{$list[1]} = $line;
1406 | $count{$chr}{$list[1]} = '1';
1407 | if (($HAP[0] eq "0/1" || $HAP[0] eq "1/0") && $type eq "INS" && $pos_prev eq "")
1408 | {
1409 | $count{$chr}{$list[1]} = 2;
1410 | }
1411 | }
1412 | }
1413 | }
1414 | }
1415 | }
1416 |
1417 | undef %pbsv2;
1418 | undef %cutesv2;
1419 | undef %sniffles2;
1420 |
1421 | my %svim;
1422 | if ($input_svim ne "")
1423 | {
1424 | SVIM: while (my $line = )
1425 | {
1426 | chomp($line);
1427 | my $first_nuc = substr $line, 0, 1;
1428 | if ($first_nuc ne "#")
1429 | {
1430 | my @list = split /\t/, $line;
1431 | my $pos = $list[1];
1432 | my $REF = $list[3];
1433 | my $ALT = $list[4];
1434 | my $length;
1435 | my $type;
1436 | my $v = '1';
1437 | my $min = '1';
1438 | my $END = "";
1439 |
1440 | my $info = $list[7];
1441 | my @info = split /;/, $info;
1442 | my @HAP = split /:/, $list[9];
1443 |
1444 | my @HAP_INFO = split /:/, $list[8];
1445 | my $support = '0';
1446 |
1447 | foreach my $info_tmp (@info)
1448 | {
1449 | my $first_five = substr $info_tmp, 0, 5;
1450 | my $first_four = substr $info_tmp, 0, 4;
1451 | if ($first_five eq "SVLEN")
1452 | {
1453 | $length = substr $info_tmp, 6;
1454 | }
1455 | elsif ($first_five eq "SVTYP")
1456 | {
1457 | $type = substr $info_tmp, 7;
1458 | if ($type eq "DUP:TANDEM" || $type eq "DUP:INT")
1459 | {
1460 | $type = "DUP";
1461 | }
1462 | }
1463 | elsif ($first_five eq "SUPPO")
1464 | {
1465 | $support = substr $info_tmp, 8;
1466 | }
1467 | elsif ($first_four eq "END=")
1468 | {
1469 | $END = substr $info_tmp, 4;
1470 | }
1471 | }
1472 | if ($length eq "")
1473 | {
1474 | $length = ".";
1475 | }
1476 | if ($length < 0)
1477 | {
1478 | $length *= -1;
1479 | }
1480 | my $chr = $list[0];
1481 | if ($list[0] =~ m/chr(\d+|X|Y)/)
1482 | {
1483 | $chr = $1;
1484 | }
1485 | my $hapi = $HAP[0];
1486 |
1487 | if (exists ($svim{$chr}{$list[1]}))
1488 | {
1489 | next SVIM;
1490 | }
1491 | my $ratio = '0';
1492 |
1493 | if ($HAP[1] > 0)
1494 | {
1495 | $ratio = $support/($HAP[1])
1496 | }
1497 |
1498 | my @depth = split /,/, $HAP[3];
1499 | my $DR = $depth[0];
1500 | my $DV = $depth[1];
1501 |
1502 | if ($list[6] eq "PASS" && ($support >= $min_coverage || ($support >= $min_coverage-1 && $ratio > 0.3)) && $type ne "BND" && $type ne "DUP" && $hapi ne "0/0" && $hapi ne "./." &&
1503 | ($ratio > 0.2 || $HAP_INFO[1] eq "CN" || $HAP[1] eq ".") && ($length eq "." || $length > 45))
1504 | {
1505 | my $converted_line = $chr."\t".$list[1]."\t".$length."\t".$type."\t".$hapi."\t".$END."\t".$DR."\t".$DV."\t".$REF."\t".$ALT;
1506 | my $prev_match_pbsv = "";
1507 | my $prev_match_cutesv = "";
1508 | my $prev_match_sniffles = "";
1509 | my $prev_match_nanovar = "";
1510 |
1511 | if (exists($pbsv2{$chr}{$list[1]}))
1512 | {
1513 | $prev_match_pbsv = "yes";
1514 | }
1515 | if (exists($cutesv2{$chr}{$list[1]}))
1516 | {
1517 | $prev_match_cutesv = "yes";
1518 | }
1519 | if (exists($sniffles2{$chr}{$list[1]}))
1520 | {
1521 | $prev_match_sniffles = "yes";
1522 | }
1523 | if (exists($nanovar2{$chr}{$list[1]}))
1524 | {
1525 | $prev_match_nanovar = "yes";
1526 | }
1527 |
1528 | if ((exists($pbsv{$chr}{$list[1]}) && $prev_match_pbsv eq "") || (exists($cutesv{$chr}{$list[1]}) && $prev_match_cutesv eq "") ||
1529 | (exists($sniffles{$chr}{$list[1]}) && $prev_match_sniffles eq "") || (exists($nanovar{$chr}{$list[1]}) && $prev_match_nanovar eq ""))
1530 | {
1531 | if (exists($pbsv{$chr}{$list[1]}))
1532 | {
1533 | $pbsv2{$chr}{$list[1]} = undef;
1534 | }
1535 | if (exists($cutesv{$chr}{$list[1]}))
1536 | {
1537 | $cutesv2{$chr}{$list[1]} = undef;
1538 | }
1539 | if (exists($sniffles{$chr}{$list[1]}))
1540 | {
1541 | $sniffles2{$chr}{$list[1]} = undef;
1542 | }
1543 | if (exists($nanovar{$chr}{$list[1]}))
1544 | {
1545 | $nanovar2{$chr}{$list[1]} = undef;
1546 | }
1547 |
1548 | my $count_tmp = $count{$chr}{$list[1]};
1549 | my @list2 = split /\t/, $SVs{$chr}{$list[1]};
1550 |
1551 | my $new_pos = $pos;
1552 | my $new_length = $list2[2];
1553 | my $new_type = $list2[3];
1554 | my $new_haplo = $list2[4];
1555 | my $new_END = $list2[5];
1556 | my $new_REF = $list2[8];
1557 | my $new_ALT = $list2[9];
1558 |
1559 | if ($new_haplo eq "./.")
1560 | {
1561 | $new_haplo = $hapi;
1562 | }
1563 |
1564 | delete $SVs{$chr}{$pos};
1565 | delete $count{$chr}{$pos};
1566 | my $line2 = $list2[0]."\t".$new_pos."\t".$new_length."\t".$new_type."\t".$new_haplo."\t".$new_END."\t".$DR."\t".$DV."\t".$new_REF."\t".$new_ALT;
1567 | $SVs{$list2[0]}{$new_pos} = $line2;
1568 | $count{$chr}{$new_pos} = $count_tmp+1;
1569 | $svim{$chr}{$new_pos} = $line;
1570 |
1571 | if ($hapi eq "1/1")
1572 | {
1573 | $count{$chr}{$new_pos} = $count_tools;
1574 | }
1575 |
1576 | next SVIM;
1577 | }
1578 | else
1579 | {
1580 | my $score_prev = "";
1581 | my $pos_prev = "";
1582 | POS_ALMOST2ga:
1583 | while ($v < $length_match)
1584 | {
1585 | POS_ALMOST2g: my $pos_tmp = ($min*$v)+$pos;
1586 |
1587 | my $prev_match_pbsv2 = "";
1588 | my $prev_match_cutesv2 = "";
1589 | my $prev_match_sniffles2 = "";
1590 | my $prev_match_nanovar2 = "";
1591 |
1592 | if (exists($pbsv2{$chr}{$pos_tmp}))
1593 | {
1594 | $prev_match_pbsv2 = "yes";
1595 | }
1596 | if (exists($cutesv2{$chr}{$pos_tmp}))
1597 | {
1598 | $prev_match_cutesv2 = "yes";
1599 | }
1600 | if (exists($sniffles2{$chr}{$pos_tmp}))
1601 | {
1602 | $prev_match_sniffles2 = "yes";
1603 | }
1604 | if (exists($nanovar2{$chr}{$pos_tmp}))
1605 | {
1606 | $prev_match_nanovar2 = "yes";
1607 | }
1608 |
1609 | if ((exists($pbsv{$chr}{$pos_tmp}) && $prev_match_pbsv2 eq "") || (exists($cutesv{$chr}{$pos_tmp}) && $prev_match_cutesv2 eq "") ||
1610 | (exists($sniffles{$chr}{$pos_tmp}) && $prev_match_sniffles2 eq "") || (exists($nanovar{$chr}{$pos_tmp}) && $prev_match_nanovar2 eq ""))
1611 | {
1612 | my $count_tmp = $count{$chr}{$pos_tmp};
1613 | my @list2 = split /\t/, $SVs{$chr}{$pos_tmp};
1614 |
1615 | if ($count_tmp > 20)
1616 | {
1617 | if ($min eq '1')
1618 | {
1619 | $min = '-1';
1620 | goto POS_ALMOST2g;
1621 | }
1622 | else
1623 | {
1624 | $min = '1';
1625 | }
1626 | }
1627 | elsif (exists($svim{$chr}{$pos_tmp}) && $list2[3] eq $type && ($list2[2] > $length*$lenght_margin1 && $list2[2] < $length*$lenght_margin2))
1628 | {
1629 | next SVIM;
1630 | }
1631 | elsif ($list2[3] ne "DUP")
1632 | {
1633 | my $score = '0';
1634 | if ($list2[3] eq $type || ($type eq "INS" && $list2[3] eq "DUP") || ($type eq "DUP" && $list2[3] eq "INS"))
1635 | {
1636 | $score += 1;
1637 | }
1638 | my $g = (800-$v)/800;
1639 | $score += $g;
1640 | my $score_tmp = '0';
1641 | if ($list2[2] ne "." && $length ne ".")
1642 | {
1643 | if ($list2[2] >= $length)
1644 | {
1645 | $score_tmp = 1-(($list2[2]-$length)/$length);
1646 | if ($score_tmp < 0)
1647 | {
1648 | $score_tmp = '0';
1649 | }
1650 | }
1651 | else
1652 | {
1653 | $score_tmp = 1-(($length-$list2[2])/$length);
1654 | if ($score_tmp < 0)
1655 | {
1656 | $score_tmp = '0';
1657 | }
1658 | }
1659 | }
1660 | $score += $score_tmp;
1661 |
1662 | if ($score_prev ne "" && $score_prev > $score && $score_prev ne "no")
1663 | {
1664 | $pos_tmp = $pos_prev;
1665 | $count_tmp = $count{$chr}{$pos_tmp};
1666 | @list2 = split /\t/, $SVs{$chr}{$pos_tmp};
1667 | }
1668 |
1669 | if ($score > 2 || ($score_prev ne "" && $score_prev > $score) || $score_prev eq "no")
1670 | {
1671 | if (exists($pbsv{$chr}{$pos_tmp}))
1672 | {
1673 | $pbsv2{$chr}{$pos_tmp} = undef;
1674 | }
1675 | if (exists($cutesv{$chr}{$pos_tmp}))
1676 | {
1677 | $cutesv2{$chr}{$pos_tmp} = undef;
1678 | }
1679 | if (exists($sniffles{$chr}{$pos_tmp}))
1680 | {
1681 | $sniffles2{$chr}{$pos_tmp} = undef;
1682 | }
1683 | if (exists($nanovar{$chr}{$pos_tmp}))
1684 | {
1685 | $nanovar2{$chr}{$pos_tmp} = undef;
1686 | }
1687 |
1688 | my $new_pos = $pos_tmp;
1689 | my $new_length = $list2[2];
1690 | my $new_type = $list2[3];
1691 | my $new_haplo = $list2[4];
1692 | my $new_END = $list2[5];
1693 | my $new_DR = $list2[6];
1694 | my $new_DV = $list2[7];
1695 | my $new_REF = $list2[8];
1696 | my $new_ALT = $list2[9];
1697 |
1698 | if (exists($sniffles{$chr}{$pos_tmp}))
1699 | {}
1700 | elsif (exists($cutesv{$chr}{$pos_tmp}))
1701 | {}
1702 | elsif (exists($pbsv{$chr}{$pos_tmp}))
1703 | {}
1704 | elsif ($type eq "INV")
1705 | {
1706 | $new_pos = $pos;
1707 | }
1708 | if ($new_haplo eq "./.")
1709 | {
1710 | $new_haplo = $hapi;
1711 | $new_DR = $DR;
1712 | $new_DV = $DV;
1713 | $new_ALT = $ALT;
1714 | $new_REF = $REF;
1715 | }
1716 |
1717 | delete $SVs{$chr}{$pos_tmp};
1718 | delete $count{$chr}{$pos_tmp};
1719 | my $line2 = $list2[0]."\t".$new_pos."\t".$new_length."\t".$new_type."\t".$new_haplo."\t".$new_END."\t".$new_DR."\t".$new_DV."\t".$new_REF."\t".$new_ALT;
1720 | $SVs{$list2[0]}{$new_pos} = $line2;
1721 | $count{$chr}{$new_pos} = $count_tmp+1;
1722 | $svim{$chr}{$new_pos} = $line;
1723 |
1724 | if ($hapi eq "1/1")
1725 | {
1726 | $count{$chr}{$new_pos} = $count_tools;
1727 | }
1728 |
1729 | if ($new_pos eq $pos)
1730 | {
1731 | if (exists($pbsv{$chr}{$pos_tmp}))
1732 | {
1733 | my $tmp = $pbsv{$chr}{$pos_tmp};
1734 | delete $pbsv{$chr}{$pos_tmp};
1735 | $pbsv{$chr}{$new_pos} = $tmp;
1736 | }
1737 | if (exists($cutesv{$chr}{$pos_tmp}))
1738 | {
1739 | my $tmp = $cutesv{$chr}{$pos_tmp};
1740 | delete $cutesv{$chr}{$pos_tmp};
1741 | $cutesv{$chr}{$new_pos} = $tmp;
1742 | }
1743 | if (exists($sniffles{$chr}{$pos_tmp}))
1744 | {
1745 | my $tmp = $sniffles{$chr}{$pos_tmp};
1746 | delete $sniffles{$chr}{$pos_tmp};
1747 | $sniffles{$chr}{$new_pos} = $tmp;
1748 | }
1749 | if (exists($nanovar{$chr}{$pos_tmp}))
1750 | {
1751 | my $tmp = $nanovar{$chr}{$pos_tmp};
1752 | delete $nanovar{$chr}{$pos_tmp};
1753 | $nanovar{$chr}{$new_pos} = $tmp;
1754 | }
1755 | if (exists($svim{$chr}{$pos_tmp}))
1756 | {
1757 | my $tmp = $svim{$chr}{$pos_tmp};
1758 | delete $svim{$chr}{$pos_tmp};
1759 | $svim{$chr}{$new_pos} = $tmp;
1760 | }
1761 | }
1762 | next SVIM;
1763 | }
1764 | elsif ($min eq '1')
1765 | {
1766 | $min = '-1';
1767 | if ($score > $score_prev)
1768 | {
1769 | $score_prev = $score;
1770 | $pos_prev = $pos_tmp;
1771 | }
1772 | goto POS_ALMOST2ga;
1773 | }
1774 | else
1775 | {
1776 | $min = '1';
1777 | if ($score > $score_prev)
1778 | {
1779 | $score_prev = $score;
1780 | $pos_prev = $pos_tmp;
1781 | }
1782 | }
1783 | $v++;
1784 | goto POS_ALMOST2ga;
1785 | }
1786 | }
1787 | elsif ($min eq '1')
1788 | {
1789 | $min = '-1';
1790 | goto POS_ALMOST2g;
1791 | }
1792 | else
1793 | {
1794 | $min = '1';
1795 | }
1796 | $v++;
1797 | }
1798 | }
1799 |
1800 | $svim{$chr}{$list[1]} = $line;
1801 | $SVs{$chr}{$list[1]} = $converted_line;
1802 | if ($hapi eq "1/1")
1803 | {
1804 | $count{$chr}{$list[1]} = $count_tools;
1805 | }
1806 | else
1807 | {
1808 | $count{$chr}{$list[1]} = '1';
1809 | }
1810 | }
1811 | }
1812 | }
1813 | }
1814 | undef %pbsv2;
1815 | undef %cutesv2;
1816 | undef %sniffles2;
1817 | undef %nanovar2;
1818 |
1819 | my %nanosv;
1820 | my $nanosv_converted = "";
1821 | if ($input_nanosv ne "")
1822 | {
1823 | NANOSV: while (my $line = )
1824 | {
1825 | chomp($line);
1826 | my $first_nuc = substr $line, 0, 1;
1827 | if ($first_nuc ne "#")
1828 | {
1829 | my @list = split /\t/, $line;
1830 | my $pos = $list[1];
1831 | my $REF = $list[4];
1832 | my $ALT = $list[4];
1833 | my $length;
1834 | my $type;
1835 | my $v = '1';
1836 | my $min = '1';
1837 | my $END = "";
1838 |
1839 | my $qual = $list[6];
1840 | my @qual = split /;/, $qual;
1841 | my $info = $list[7];
1842 | my @info = split /;/, $info;
1843 | my @HAP = split /:/, $list[9];
1844 | my @HAP_REF = split /,/, $HAP[1];
1845 | my @HAP2 = split /,/, $HAP[2];
1846 |
1847 | foreach my $info_tmp (@info)
1848 | {
1849 | my $first_five = substr $info_tmp, 0, 5;
1850 | my $first_four = substr $info_tmp, 0, 4;
1851 | if ($first_five eq "SVLEN")
1852 | {
1853 | $length = substr $info_tmp, 6;
1854 | }
1855 | elsif ($first_five eq "SVTYP")
1856 | {
1857 | $type = substr $info_tmp, 7;
1858 | }
1859 | elsif ($first_four eq "END=")
1860 | {
1861 | $END = substr $info_tmp, 4;
1862 | }
1863 | }
1864 | if ($length eq "")
1865 | {
1866 | $length = ".";
1867 | }
1868 | if ($length < 0)
1869 | {
1870 | $length *= -1;
1871 | }
1872 | my $chr = $list[0];
1873 | if ($list[0] =~ m/chr(\d+|X|Y)/)
1874 | {
1875 | $chr = $1;
1876 | }
1877 | my $hapi = $HAP[0];
1878 |
1879 | if ($nanosv_converted eq "yes")
1880 | {
1881 | $pos = $list[1];
1882 | $length = $list[2];
1883 | $type = $list[3];
1884 | $hapi = $list[4];
1885 | }
1886 | my @depth1 = split /,/, $HAP[1];
1887 | my @depth2 = split /,/, $HAP[2];
1888 | my $DR = $depth1[0];
1889 | my $DV = $depth2[0];
1890 |
1891 | my $converted_line = $chr."\t".$list[1]."\t".$length."\t".$type."\t".$hapi."\t".$END."\t".$DR."\t".$DV."\t".$REF."\t".$ALT;
1892 |
1893 | my $qual_check = "";
1894 | my $CIPOS = "";
1895 | my $CIEND = "";
1896 | my $mapq = "";
1897 | my $cluster = "";
1898 | foreach my $qual_tmp (@qual)
1899 | {
1900 | if ($qual_tmp eq "LowQual")
1901 | {
1902 | $qual_check = "no";
1903 | }
1904 | if ($qual_tmp eq "MapQual")
1905 | {
1906 | $mapq = "yes";
1907 | }
1908 | if ($qual_tmp eq "SVcluster")
1909 | {
1910 | $cluster = "yes";
1911 | }
1912 | if ($qual_tmp eq "CIPOS")
1913 | {
1914 | $CIPOS = "yes";
1915 | }
1916 | if ($qual_tmp eq "CIEND")
1917 | {
1918 | $CIEND = "yes";
1919 | }
1920 | }
1921 | if ($CIEND ne "yes" && $CIPOS ne "yes" && $mapq eq "yes" && $cluster eq "yes")
1922 | {
1923 | $qual_check = "no";
1924 | }
1925 |
1926 | if (($HAP2[1] >= $min_coverage+1 ) && ($HAP2[0]/($HAP_REF[0]+$HAP2[0]) > 0.2 || $HAP_REF[0] eq ".") && $hapi ne "0/0" && ($length eq "." || $length > 45) && $qual_check ne "no")
1927 | {
1928 | my $prev_match_pbsv = "";
1929 | my $prev_match_cutesv = "";
1930 | my $prev_match_sniffles = "";
1931 | my $prev_match_nanovar = "";
1932 | my $prev_match_svim = "";
1933 |
1934 | if (exists($pbsv2{$chr}{$list[1]}))
1935 | {
1936 | $prev_match_pbsv = "yes";
1937 | }
1938 | if (exists($cutesv2{$chr}{$list[1]}))
1939 | {
1940 | $prev_match_cutesv = "yes";
1941 | }
1942 | if (exists($sniffles2{$chr}{$list[1]}))
1943 | {
1944 | $prev_match_sniffles = "yes";
1945 | }
1946 | if (exists($nanovar2{$chr}{$list[1]}))
1947 | {
1948 | $prev_match_nanovar = "yes";
1949 | }
1950 | if (exists($svim2{$chr}{$list[1]}))
1951 | {
1952 | $prev_match_svim = "yes";
1953 | }
1954 |
1955 | if ((exists($pbsv{$chr}{$list[1]}) && $prev_match_pbsv eq "") || (exists($cutesv{$chr}{$list[1]}) && $prev_match_cutesv eq "") ||
1956 | (exists($sniffles{$chr}{$list[1]}) && $prev_match_sniffles eq "") || (exists($nanovar{$chr}{$list[1]}) && $prev_match_nanovar eq "")
1957 | || (exists($svim{$chr}{$list[1]}) && $prev_match_svim eq ""))
1958 |
1959 | {
1960 | if (exists($pbsv{$chr}{$list[1]}))
1961 | {
1962 | $pbsv2{$chr}{$list[1]} = undef;
1963 | }
1964 | if (exists($cutesv{$chr}{$list[1]}))
1965 | {
1966 | $cutesv2{$chr}{$list[1]} = undef;
1967 | }
1968 | if (exists($sniffles{$chr}{$list[1]}))
1969 | {
1970 | $sniffles2{$chr}{$list[1]} = undef;
1971 | }
1972 | if (exists($nanovar{$chr}{$list[1]}))
1973 | {
1974 | $nanovar2{$chr}{$list[1]} = undef;
1975 | }
1976 | if (exists($svim{$chr}{$list[1]}))
1977 | {
1978 | $svim2{$chr}{$list[1]} = undef;
1979 | }
1980 |
1981 | my $count_tmp = $count{$chr}{$list[1]};
1982 | my @list3 = split /\t/, $SVs{$chr}{$list[1]};
1983 |
1984 | if ($list3[3] eq "DEL" && exists($sniffles{$list[0]}{$list[1]}) && $high_recall eq "1")
1985 | {
1986 | }
1987 | else
1988 | {
1989 | my @list2 = split /\t/, $SVs{$chr}{$pos};
1990 | my $new_pos = $pos;
1991 | my $new_length = $list2[2];
1992 | my $new_type = $list2[3];
1993 | my $new_haplo = $list2[4];
1994 | my $new_END = $list2[5];
1995 | my $new_DR = $list2[6];
1996 | my $new_DV = $list2[7];
1997 | my $new_REF = $list2[8];
1998 | my $new_ALT = $list2[9];
1999 |
2000 | if (exists($nanovar{$chr}{$pos}))
2001 | {}
2002 | elsif ($list2[3] eq "INV" && $length ne ".")
2003 | {
2004 | $new_length = $length;
2005 | $new_END = $END;
2006 | $new_ALT = $ALT;
2007 | $new_REF = $REF;
2008 | }
2009 | if (exists($pbsv{$chr}{$list[1]}))
2010 | {}
2011 | elsif ($hapi ne "./.")
2012 | {
2013 | $new_haplo = $hapi;
2014 | if ($new_DR eq '.' || $new_DR eq "")
2015 | {
2016 | $new_DR = $DR;
2017 | }
2018 | if ($new_DV eq '.' || $new_DV eq "")
2019 | {
2020 | $new_DV = $DV;
2021 | }
2022 | }
2023 |
2024 | delete $SVs{$chr}{$pos};
2025 | delete $count{$chr}{$pos};
2026 | my $line2 = $list2[0]."\t".$new_pos."\t".$new_length."\t".$new_type."\t".$new_haplo."\t".$new_END."\t".$new_DR."\t".$new_DV."\t".$new_REF."\t".$new_ALT;
2027 | $SVs{$list2[0]}{$new_pos} = $line2;
2028 | $count{$chr}{$new_pos} = $count_tmp+1;
2029 | $nanosv{$chr}{$new_pos} = $line;
2030 | }
2031 | next NANOSV;
2032 | }
2033 | else
2034 | {
2035 | my $score_prev = "";
2036 | my $pos_prev = "";
2037 | POS_ALMOST2ha:
2038 | while ($v < $length_match)
2039 | {
2040 | POS_ALMOST2h: my $pos_tmp = ($min*$v)+$pos;
2041 |
2042 | my $prev_match_pbsv2 = "";
2043 | my $prev_match_cutesv2 = "";
2044 | my $prev_match_sniffles2 = "";
2045 | my $prev_match_nanovar2 = "";
2046 | my $prev_match_svim2 = "";
2047 |
2048 | if (exists($pbsv2{$chr}{$pos_tmp}))
2049 | {
2050 | $prev_match_pbsv2 = "yes";
2051 | }
2052 | if (exists($cutesv2{$chr}{$pos_tmp}))
2053 | {
2054 | $prev_match_cutesv2 = "yes";
2055 | }
2056 | if (exists($sniffles2{$chr}{$pos_tmp}))
2057 | {
2058 | $prev_match_sniffles2 = "yes";
2059 | }
2060 | if (exists($nanovar2{$chr}{$pos_tmp}))
2061 | {
2062 | $prev_match_nanovar2 = "yes";
2063 | }
2064 | if (exists($svim2{$chr}{$pos_tmp}))
2065 | {
2066 | $prev_match_svim2 = "yes";
2067 | }
2068 |
2069 | if ((exists($pbsv{$chr}{$pos_tmp}) && $prev_match_pbsv2 eq "") || (exists($cutesv{$chr}{$pos_tmp}) && $prev_match_cutesv2 eq "") ||
2070 | (exists($sniffles{$chr}{$pos_tmp}) && $prev_match_sniffles2 eq "") || (exists($nanovar{$chr}{$pos_tmp}) && $prev_match_nanovar2 eq "")
2071 | || (exists($svim{$chr}{$pos_tmp}) && $prev_match_svim2 eq ""))
2072 | {
2073 | my $count_tmp = $count{$chr}{$pos_tmp};
2074 | my @list3 = split /\t/, $SVs{$chr}{$list[1]};
2075 |
2076 | if ($list3[3] eq "DEL" && exists($sniffles{$list[0]}{$pos_tmp}) && $high_recall eq "1")
2077 | {
2078 | }
2079 | else
2080 | {
2081 | if ($count_tmp > 20)
2082 | {
2083 | if ($min eq '1')
2084 | {
2085 | $min = '-1';
2086 | goto POS_ALMOST2h;
2087 | }
2088 | else
2089 | {
2090 | $min = '1';
2091 | }
2092 | }
2093 | else
2094 | {
2095 | my @list2 = split /\t/, $SVs{$chr}{$pos_tmp};
2096 |
2097 | my $score = '0';
2098 | if ($list2[3] eq $type || ($type eq "INS" && $list2[3] eq "DUP") || ($type eq "DUP" && $list2[3] eq "INS"))
2099 | {
2100 | $score += 1;
2101 | }
2102 | my $g = (800-$v)/800;
2103 | $score += $g;
2104 | my $score_tmp = '0';
2105 | if ($list2[2] ne "." && $length ne ".")
2106 | {
2107 | if ($list2[2] >= $length)
2108 | {
2109 | $score_tmp = 1-(($list2[2]-$length)/$length);
2110 | if ($score_tmp < 0)
2111 | {
2112 | $score_tmp = '0';
2113 | }
2114 | }
2115 | else
2116 | {
2117 | $score_tmp = 1-(($length-$list2[2])/$length);
2118 | if ($score_tmp < 0)
2119 | {
2120 | $score_tmp = '0';
2121 | }
2122 | }
2123 | }
2124 | $score += $score_tmp;
2125 |
2126 | if ($score_prev ne "" && $score_prev > $score && $score_prev ne "no")
2127 | {
2128 | $pos_tmp = $pos_prev;
2129 | $count_tmp = $count{$chr}{$pos_tmp};
2130 | @list2 = split /\t/, $SVs{$chr}{$pos_tmp};
2131 | }
2132 |
2133 | if ($score > 1.5 || ($score_prev ne "" && $score_prev > $score) || $score_prev eq "no")
2134 | {
2135 | if (exists($pbsv{$chr}{$pos_tmp}))
2136 | {
2137 | $pbsv2{$chr}{$pos_tmp} = undef;
2138 | }
2139 | if (exists($cutesv{$chr}{$pos_tmp}))
2140 | {
2141 | $cutesv2{$chr}{$pos_tmp} = undef;
2142 | }
2143 | if (exists($sniffles{$chr}{$pos_tmp}))
2144 | {
2145 | $sniffles2{$chr}{$pos_tmp} = undef;
2146 | }
2147 | if (exists($nanovar{$chr}{$pos_tmp}))
2148 | {
2149 | $nanovar2{$chr}{$pos_tmp} = undef;
2150 | }
2151 | if (exists($svim{$chr}{$pos_tmp}))
2152 | {
2153 | $svim2{$chr}{$pos_tmp} = undef;
2154 | }
2155 |
2156 | my $new_pos = $pos_tmp;
2157 | my $new_length = $list2[2];
2158 | my $new_type = $list2[3];
2159 | my $new_haplo = $list2[4];
2160 | my $new_END = $list2[5];
2161 | my $new_DR = $list2[6];
2162 | my $new_DV = $list2[7];
2163 | my $new_REF = $list2[8];
2164 | my $new_ALT = $list2[9];
2165 |
2166 | if (exists($nanovar{$chr}{$pos_tmp}))
2167 | {}
2168 | elsif ($list2[3] eq "INV" && $length ne ".")
2169 | {
2170 | $new_length = $length;
2171 | $new_END = $END;
2172 | $new_ALT = $ALT;
2173 | $new_REF = $REF;
2174 | }
2175 | if (exists($pbsv{$chr}{$list[1]}))
2176 | {}
2177 | elsif ($hapi ne "./.")
2178 | {
2179 | $new_haplo = $hapi;
2180 | if ($new_DR eq '.' || $new_DR eq "")
2181 | {
2182 | $new_DR = $DR;
2183 | }
2184 | if ($new_DV eq '.' || $new_DV eq "")
2185 | {
2186 | $new_DV = $DV;
2187 | }
2188 | }
2189 | if (exists($pbsv{$chr}{$list[1]}))
2190 | {}
2191 | else
2192 | {
2193 | $new_pos = $pos;
2194 | }
2195 |
2196 | delete $SVs{$chr}{$pos_tmp};
2197 | delete $count{$chr}{$pos_tmp};
2198 | my $line2 = $list2[0]."\t".$new_pos."\t".$new_length."\t".$new_type."\t".$new_haplo."\t".$new_END."\t".$new_DR."\t".$new_DV."\t".$new_REF."\t".$new_ALT;
2199 | $SVs{$list2[0]}{$new_pos} = $line2;
2200 | $count{$chr}{$new_pos} = $count_tmp+1;
2201 | $nanosv{$chr}{$new_pos} = $line;
2202 |
2203 | if ($new_pos eq $pos)
2204 | {
2205 | if (exists($pbsv{$chr}{$pos_tmp}))
2206 | {
2207 | my $tmp = $pbsv{$chr}{$pos_tmp};
2208 | delete $pbsv{$chr}{$pos_tmp};
2209 | $pbsv{$chr}{$new_pos} = $tmp;
2210 | }
2211 | if (exists($cutesv{$chr}{$pos_tmp}))
2212 | {
2213 | my $tmp = $cutesv{$chr}{$pos_tmp};
2214 | delete $cutesv{$chr}{$pos_tmp};
2215 | $cutesv{$chr}{$new_pos} = $tmp;
2216 | }
2217 | if (exists($sniffles{$chr}{$pos_tmp}))
2218 | {
2219 | my $tmp = $sniffles{$chr}{$pos_tmp};
2220 | delete $sniffles{$chr}{$pos_tmp};
2221 | $sniffles{$chr}{$new_pos} = $tmp;
2222 | }
2223 | if (exists($nanovar{$chr}{$pos_tmp}))
2224 | {
2225 | my $tmp = $nanovar{$chr}{$pos_tmp};
2226 | delete $nanovar{$chr}{$pos_tmp};
2227 | $nanovar{$chr}{$new_pos} = $tmp;
2228 | }
2229 | if (exists($svim{$chr}{$pos_tmp}))
2230 | {
2231 | my $tmp = $svim{$chr}{$pos_tmp};
2232 | delete $svim{$chr}{$pos_tmp};
2233 | $svim{$chr}{$new_pos} = $tmp;
2234 | }
2235 | if (exists($nanosv{$chr}{$pos_tmp}))
2236 | {
2237 | my $tmp = $nanosv{$chr}{$pos_tmp};
2238 | delete $nanosv{$chr}{$pos_tmp};
2239 | $nanosv{$chr}{$new_pos} = $tmp;
2240 | }
2241 | }
2242 |
2243 | next NANOSV;
2244 | }
2245 | elsif ($min eq '1')
2246 | {
2247 | $min = '-1';
2248 | if ($score > $score_prev)
2249 | {
2250 | $score_prev = $score;
2251 | $pos_prev = $pos_tmp;
2252 | }
2253 | goto POS_ALMOST2ha;
2254 | }
2255 | else
2256 | {
2257 | $min = '1';
2258 | if ($score > $score_prev)
2259 | {
2260 | $score_prev = $score;
2261 | $pos_prev = $pos_tmp;
2262 | }
2263 | }
2264 | $v++;
2265 | goto POS_ALMOST2ha;
2266 | }
2267 | }
2268 | }
2269 | elsif ($min eq '1')
2270 | {
2271 | $min = '-1';
2272 | goto POS_ALMOST2h;
2273 | }
2274 | else
2275 | {
2276 | $min = '1';
2277 | }
2278 | $v++;
2279 | }
2280 | }
2281 | }
2282 | }
2283 | }
2284 | }
2285 |
2286 |
2287 | #Print SVs------------------------------------------------------------------
2288 |
2289 | my $output_sniffles = $dir."Sniffles_".$filename.$suffix;
2290 | if ($input_sniffles ne "")
2291 | {
2292 | open(SNIFFLES, ">" .$output_sniffles) or die "\nCan't open file $output_sniffles, $!\n";
2293 | }
2294 | my $output_pbsv = $dir."pbsv_".$filename.$suffix;
2295 | if ($input_pbsv ne "")
2296 | {
2297 | open(PBSV, ">" .$output_pbsv) or die "\nCan't open file $output_pbsv, $!\n";
2298 | }
2299 | my $output_nanovar = $dir."NanoVar_".$filename.$suffix;
2300 | if ($input_nanovar ne "")
2301 | {
2302 | open(NANOVAR, ">" .$output_nanovar) or die "\nCan't open file $output_nanovar, $!\n";
2303 | }
2304 | my $output_svim = $dir."SVIM_".$filename.$suffix;
2305 | if ($input_svim ne "")
2306 | {
2307 | open(SVIM, ">" .$output_svim) or die "\nCan't open file $output_svim, $!\n";
2308 | }
2309 | my $output_nanosv = $dir."NanoSV_".$filename.$suffix;
2310 | if ($input_nanosv ne "")
2311 | {
2312 | open(NANOSV, ">" .$output_nanosv) or die "\nCan't open file $output_nanosv, $!\n";
2313 | }
2314 | my $output_cutesv = $dir."cuteSV_".$filename.$suffix;
2315 | if ($input_cutesv ne "")
2316 | {
2317 | open(CUTESV, ">" .$output_cutesv) or die "\nCan't open file $output_cutesv, $!\n";
2318 | }
2319 |
2320 | my $datetime = localtime();
2321 | print COMBINED2 "##fileformat=VCFv4.2\n";
2322 | print COMBINED2 "##source=combiSV-v2.3\n";
2323 | print COMBINED2 "##fileDate=".$datetime."\n";
2324 | if ($input_cutesv ne "")
2325 | {
2326 | open(INPUT_CUTESV2, $input_cutesv) or die "\n\nCan't open cuteSV's vcf file $input_cutesv, $!\n\n";
2327 | while (my $line = )
2328 | {
2329 | chomp($line);
2330 | my $check_text = substr $line, 0, 9;
2331 | if ($check_text eq "##contig=")
2332 | {
2333 | print COMBINED2 $line."\n";
2334 | }
2335 | }
2336 | close INPUT_CUTESV2;
2337 | }
2338 |
2339 | print COMBINED2 "##ALT=\n";
2340 | print COMBINED2 "##ALT=\n";
2341 | print COMBINED2 "##ALT=\n";
2342 | print COMBINED2 "##ALT=\n";
2343 | ##ALT=
2344 | print COMBINED2 "##INFO=\n";
2345 | print COMBINED2 "##INFO=\n";
2346 | print COMBINED2 "##INFO=\n";
2347 | print COMBINED2 "##INFO=\n";
2348 | print COMBINED2 "##FORMAT=\n";
2349 | print COMBINED2 "##FORMAT=\n";
2350 | print COMBINED2 "##FORMAT=\n";
2351 | print COMBINED2 "#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\tFORMAT\tSample\n";
2352 |
2353 | print COMBINED "#CHROM\tPOS\tSVLENGTH\tTYPE\tVARHAP\n";
2354 | my %no_number;
2355 |
2356 | my $INS_count = '0';
2357 | my $DEL_count = '0';
2358 | my $DUP_count = '0';
2359 | my $INV_count = '0';
2360 | my $BND_count = '0';
2361 | my $id_count = '1';
2362 |
2363 | foreach my $chr2 (sort {$a <=> $b} keys %SVs)
2364 | {
2365 | if ($chr2 =~ m/^\d+$/)
2366 | {
2367 | foreach my $pos2 (sort {$a <=> $b} keys %{$SVs{$chr2}})
2368 | {
2369 | my $SV_callers = "";
2370 | my @list = split /\t/, $SVs{$chr2}{$pos2};
2371 | if ($count{$chr2}{$pos2} >= $count_tools || ($count{$chr2}{$pos2} > 1 && $list[3] eq "INV" && $list[4] eq "1/1"))
2372 | {
2373 | $count++;
2374 | if ($list[3] eq "INS")
2375 | {
2376 | $INS_count++;
2377 | }
2378 | elsif ($list[3] eq "DEL")
2379 | {
2380 | $DEL_count++;
2381 | }
2382 | elsif ($list[3] eq "DUP")
2383 | {
2384 | $DUP_count++;
2385 | }
2386 | elsif ($list[3] eq "INV")
2387 | {
2388 | $INV_count++;
2389 | }
2390 | elsif ($list[3] eq "BND")
2391 | {
2392 | $BND_count++;
2393 | }
2394 |
2395 | if ($input_sniffles ne "")
2396 | {
2397 | if (exists($sniffles{$chr2}{$pos2}))
2398 | {
2399 | print SNIFFLES $sniffles{$chr2}{$pos2}."\n";
2400 | $SV_callers = "Sniffles";
2401 | }
2402 | }
2403 | if ($input_pbsv ne "")
2404 | {
2405 | if (exists($pbsv{$chr2}{$pos2}))
2406 | {
2407 | print PBSV $pbsv{$chr2}{$pos2}."\n";
2408 | if ($SV_callers eq "")
2409 | {
2410 | $SV_callers = "pbsv";
2411 | }
2412 | else
2413 | {
2414 | $SV_callers .= ",pbsv";
2415 | }
2416 | }
2417 | }
2418 | if ($input_cutesv ne "")
2419 | {
2420 | if (exists($cutesv{$chr2}{$pos2}))
2421 | {
2422 | print CUTESV $cutesv{$chr2}{$pos2}."\n";
2423 | if ($SV_callers eq "")
2424 | {
2425 | $SV_callers = "cutesv";
2426 | }
2427 | else
2428 | {
2429 | $SV_callers .= ",cutesv";
2430 | }
2431 | }
2432 | }
2433 | if ($input_nanovar ne "")
2434 | {
2435 | if (exists($nanovar{$chr2}{$pos2}))
2436 | {
2437 | print NANOVAR $nanovar{$chr2}{$pos2}."\n";
2438 | if ($SV_callers eq "")
2439 | {
2440 | $SV_callers = "NanoVar";
2441 | }
2442 | else
2443 | {
2444 | $SV_callers .= ",NanoVar";
2445 | }
2446 | }
2447 | }
2448 | if ($input_svim ne "")
2449 | {
2450 | if (exists($svim{$chr2}{$pos2}))
2451 | {
2452 | print SVIM $svim{$chr2}{$pos2}."\n";
2453 | if ($SV_callers eq "")
2454 | {
2455 | $SV_callers = "SVIM";
2456 | }
2457 | else
2458 | {
2459 | $SV_callers .= ",SVIM";
2460 | }
2461 | }
2462 | }
2463 | if ($input_nanosv ne "")
2464 | {
2465 | if (exists($nanosv{$chr2}{$pos2}))
2466 | {
2467 | print NANOSV $nanosv{$chr2}{$pos2}."\n";
2468 | if ($SV_callers eq "")
2469 | {
2470 | $SV_callers = "NanoSV";
2471 | }
2472 | else
2473 | {
2474 | $SV_callers .= ",NanoSV";
2475 | }
2476 | }
2477 | }
2478 |
2479 | my @simplified_line = split /\t/, $SVs{$chr2}{$pos2};
2480 | print COMBINED $simplified_line[0]."\t".$simplified_line[1]."\t".$simplified_line[2]."\t".$simplified_line[3]."\t".$simplified_line[4]."\n";
2481 | my @list_tmp = split /\t/, $SVs{$chr2}{$pos2};
2482 | print COMBINED2 $list_tmp[0]."\t".$list_tmp[1]."\tid.".$id_count."\t".$list_tmp[8]."\t".$list_tmp[9]."\t.\tPASS\tSVTYPE=".$list_tmp[3].
2483 | ";SVLEN=".$list_tmp[2].";END=".$list_tmp[5].";SVCALLERS=".$SV_callers."\tGT:DR:DV\t".$list_tmp[4].":".$list_tmp[6].":".$list_tmp[7]."\n";
2484 |
2485 | $id_count++;
2486 | delete $SVs{$chr2}{$pos2};
2487 | }
2488 | }
2489 | }
2490 | }
2491 | foreach my $chr2 (sort {$a <=> $b} keys %SVs)
2492 | {
2493 | foreach my $pos2 (sort {$a <=> $b} keys %{$SVs{$chr2}})
2494 | {
2495 | my $SV_callers = "";
2496 | my @list = split /\t/, $SVs{$chr2}{$pos2};
2497 | if ($count{$chr2}{$pos2} >= $count_tools || ($count{$chr2}{$pos2} > 1 && $list[3] eq "INV" && $list[4] eq "1/1"))
2498 | {
2499 | $count++;
2500 |
2501 | if ($list[3] eq "INS")
2502 | {
2503 | $INS_count++;
2504 | }
2505 | elsif ($list[3] eq "DEL")
2506 | {
2507 | $DEL_count++;
2508 | }
2509 | elsif ($list[3] eq "DUP")
2510 | {
2511 | $DUP_count++;
2512 | }
2513 | elsif ($list[3] eq "INV")
2514 | {
2515 | $INV_count++;
2516 | }
2517 | elsif ($list[3] eq "BND")
2518 | {
2519 | $BND_count++;
2520 | }
2521 |
2522 | if ($input_sniffles ne "")
2523 | {
2524 | if (exists($sniffles{$chr2}{$pos2}))
2525 | {
2526 | print SNIFFLES $sniffles{$chr2}{$pos2}."\n";
2527 | $SV_callers = "Sniffles";
2528 | }
2529 | }
2530 | if ($input_pbsv ne "")
2531 | {
2532 | if (exists($pbsv{$chr2}{$pos2}))
2533 | {
2534 | print PBSV $pbsv{$chr2}{$pos2}."\n";
2535 | if ($SV_callers eq "")
2536 | {
2537 | $SV_callers = "pbsv";
2538 | }
2539 | else
2540 | {
2541 | $SV_callers .= ",pbsv";
2542 | }
2543 | }
2544 | }
2545 | if ($input_cutesv ne "")
2546 | {
2547 | if (exists($cutesv{$chr2}{$pos2}))
2548 | {
2549 | print CUTESV $cutesv{$chr2}{$pos2}."\n";
2550 | if ($SV_callers eq "")
2551 | {
2552 | $SV_callers = "cutesv";
2553 | }
2554 | else
2555 | {
2556 | $SV_callers .= ",cutesv";
2557 | }
2558 | }
2559 | }
2560 | if ($input_nanovar ne "")
2561 | {
2562 | if (exists($nanovar{$chr2}{$pos2}))
2563 | {
2564 | print NANOVAR $nanovar{$chr2}{$pos2}."\n";
2565 | if ($SV_callers eq "")
2566 | {
2567 | $SV_callers = "NanoVar";
2568 | }
2569 | else
2570 | {
2571 | $SV_callers .= ",NanoVar";
2572 | }
2573 | }
2574 | }
2575 | if ($input_svim ne "")
2576 | {
2577 | if (exists($svim{$chr2}{$pos2}))
2578 | {
2579 | print SVIM $svim{$chr2}{$pos2}."\n";
2580 | if ($SV_callers eq "")
2581 | {
2582 | $SV_callers = "SVIM";
2583 | }
2584 | else
2585 | {
2586 | $SV_callers .= ",SVIM";
2587 | }
2588 | }
2589 | }
2590 | if ($input_nanosv ne "")
2591 | {
2592 | if (exists($nanosv{$chr2}{$pos2}))
2593 | {
2594 | print NANOSV $nanosv{$chr2}{$pos2}."\n";
2595 | if ($SV_callers eq "")
2596 | {
2597 | $SV_callers = "NanoSV";
2598 | }
2599 | else
2600 | {
2601 | $SV_callers .= ",NanoSV";
2602 | }
2603 | }
2604 | }
2605 |
2606 | my @simplified_line = split /\t/, $SVs{$chr2}{$pos2};
2607 | print COMBINED $simplified_line[0]."\t".$simplified_line[1]."\t".$simplified_line[2]."\t".$simplified_line[3]."\t".$simplified_line[4]."\n";
2608 | my @list_tmp = split /\t/, $SVs{$chr2}{$pos2};
2609 | print COMBINED2 $list_tmp[0]."\t".$list_tmp[1]."\tid.".$id_count."\t".$list_tmp[8]."\t".$list_tmp[9]."\t.\tPASS\tSVTYPE=".$list_tmp[3].
2610 | ";SVLEN=".$list_tmp[2].";END=".$list_tmp[5].";SVCALLERS=".$SV_callers."\tGT:DR:DV\t".$list_tmp[4].":".$list_tmp[6].":".$list_tmp[7]."\n";
2611 |
2612 | $id_count++;
2613 | delete $SVs{$chr2}{$pos2};
2614 | }
2615 | }
2616 | }
2617 | print "Combined SVs : ".$count."\n";
2618 | print "Insertions : ".$INS_count."\n";
2619 | print "Deletions : ".$DEL_count."\n";
2620 | print "Duplications : ".$DUP_count."\n";
2621 | print "Inversions : ".$INV_count."\n";
2622 | print "BND : ".$BND_count."\n\n";
2623 |
2624 |
2625 | close SNIFFLES;
2626 | close PBSV;
2627 | close CUTESV;
2628 | close NANOVAR;
2629 | close SVIM;
2630 | close NANOSV;
2631 | close INPUT_SNIFFLES;
2632 | close INPUT_CUTESV;
2633 | close INPUT_SVIM;
2634 | close INPUT_NANOVAR;
2635 | close INPUT_PBSV;
2636 | close COMBINED;
2637 | close COMBINED2;
2638 | close INPUT_NANOSV;
2639 |
--------------------------------------------------------------------------------