├── sampleInfo.example ├── README.md ├── clusterNumtsVcf.pl ├── clusterNumtsVcf.38.pl ├── refNumts.38.bed ├── refNumts.bed ├── dinumt.pl └── dinumt.cram.pl /sampleInfo.example: -------------------------------------------------------------------------------- 1 | sample pop filename median_insert_size median_absolute_deviation min_insert_size max_insert_size mean_insert_size standard_deviation read_pairs pair_orientation mean_coverage granular_third_quartile_coverage granular_median_coverage granular_first_quartile_coverage perc_bases_above_15 2 | HG00096 GBR /scratch/data/HG00096/alignment/HG00096.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam 182 13 1 242935653 179.814467 22.033335 69840414 FR 4.69 1 1 1 0.3 3 | HG00097 GBR /scratch/data/HG00097/alignment/HG00097.mapped.ILLUMINA.bwa.GBR.low_coverage.20130415.bam 298 47 1 236665975 311.391152 78.881308 143438038 FR 9.74 1 1 1 9.2 4 | HG00099 GBR /scratch/data/HG00099/alignment/HG00099.mapped.ILLUMINA.bwa.GBR.low_coverage.20130415.bam 272 39 1 242605424 281.291213 65.094938 119283666 FR 8.21 1 1 1 5.5 5 | HG00100 GBR /scratch/data/HG00100/alignment/HG00100.mapped.ILLUMINA.bwa.GBR.low_coverage.20130415.bam 401 56 1 242555614 397.279083 85.304412 192131793 FR 13.52 1 1 1 33.2 6 | HG00101 GBR /scratch/data/HG00101/alignment/HG00101.mapped.ILLUMINA.bwa.GBR.low_coverage.20130415.bam 355 26 1 242936105 354.052565 42.810666 103896826 FR 7.16 1 1 1 1.6 7 | HG00102 GBR /scratch/data/HG00102/alignment/HG00102.mapped.ILLUMINA.bwa.GBR.low_coverage.20130415.bam 348 24 1 245095606 345.448138 42.816054 101742959 FR 7.04 1 1 1 1.5 8 | HG00103 GBR /scratch/data/HG00103/alignment/HG00103.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam 171 14 1 236807482 169.641 22.076763 76995432 FR 5.15 1 1 1 0.5 9 | HG00105 GBR /scratch/data/HG00105/alignment/HG00105.mapped.ILLUMINA.bwa.GBR.low_coverage.20130415.bam 352 23 1 242894070 343.991933 53.387586 106239655 FR 7.25 1 1 1 1.7 10 | HG00106 GBR /scratch/data/HG00106/alignment/HG00106.mapped.ILLUMINA.bwa.GBR.low_coverage.20121211.bam 366 31 1 242551118 361.498437 58.568002 109688999 FR 7.70 1 1 1 2.4 11 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | Discovery of Nuclear Mitochondrial Insertions (dinumt) 2 | ====================================================== 3 | 4 | Description 5 | ----------- 6 | 7 | This software is designed to identify and genotype nuclear insertions of mitochondrial origin from whole genome sequence data. It consists of two programs: dinumt (di-nu-mite), which identifies sites of insertions in a single sample and gnomit (geno-mite), which genotypes those sites across multiple samples. There is an additional program named clusterNumtsVcf which will merge sites identified in multiple samples into a single merged file for genotyping. 8 | 9 | Required third-party resources 10 | ------------------------------ 11 | 12 | A number of third party software packages are required by these programs: 13 | 14 | * samtools: http://samtools.sourceforge.net/ 15 | * exonerate: http://www.ebi.ac.uk/~guy/exonerate/ 16 | * vcftools: http://vcftools.sourceforge.net/ 17 | - you will need to make sure Vcf.pm is in your perl library path 18 | 19 | In addition, you will need: 20 | 21 | * reference genome in fasta format (e.g. hs37d5.fasta) 22 | * individual MT sequence (e.g. MT.fa or chrM.fa) 23 | * bed file of annotated numts in reference (refNumts.bed for hg19 is included in package) 24 | 25 | The genotyping step requires the use of a sample index file containing various sample-level information (mean insert size, coverage, etc). A template has been provided, and the relevant data can be obtained by using GATK (DepthOfCoverage walker) and Picard (CollectInsertSizeMetrics) or custom scripts. 26 | If you are running dinumt in cram files under reference genome version GRCh38, please use the corresponding .pl in the folder. 27 | 28 | Parameters 29 | ---------- 30 | 31 | Additional information about various parameters below: 32 | 33 | * `--len_cluster_include` : width of window to consider anchor reads as part of the same cluster, typically calculated as `mean_insert_size` + 3 * `standard_deviation` 34 | * `--len_cluster_link` : width of window to link two clusters of anchor reads in proper orientation, typically calculated as 2 * `len_cluster_include` 35 | * `--max_read_cov` : maximum read depth at potential breakpoint location, used to filter out noisy regions of the genome, typically calculated as 5 * `mean_coverage` 36 | * `--output_support` : output all sequence reads supporting an insertion event in SAM format to filename in `--support_filename` option 37 | * `--mask_filename` : bed file of all numts annotated in reference sequence, one is provided for GRCh37 but additional versions can be obtained from UCSC Genome Browser 38 | * `--min_map_qual` : mininum mapping quality required for anchor read to be considered for cluster support, default is 10 but can be adjusted as needed 39 | 40 | Example workflow 41 | ---------------- 42 | An example workflow would be as follows: 43 | 44 | ~~~ 45 | dinumt.pl \ 46 | --mask_filename=refNumts.bed \ 47 | --input_filename=sample1.bam \ 48 | --reference=hs37d5.fa \ 49 | --min_reads_cluster=1 \ 50 | --include_mask \ 51 | --output_filename=sample1.vcf \ 52 | --prefix=sample1 \ 53 | --len_cluster_include=577 \ 54 | --len_cluster_link=1154 \ 55 | --insert_size=334.844984 \ 56 | --max_read_cov=29 \ 57 | --output_support \ 58 | --support_filename=sample1_support.sam 59 | ~~~ 60 | 61 | ~~~ 62 | grep ^# sample1.vcf > header.txt 63 | cat *vcf | grep -v ^# | vcf-sort.pl | clusterNumtsVcf.pl --reference=hs37d5.fa > data.txt 64 | cat header.txt data.txt > merged.vcf 65 | ~~~ 66 | 67 | (merged vcf can be split into smaller pieces with multiple sets of sites run in parallel, if need be) 68 | 69 | ~~~ 70 | gnomit.pl \ 71 | --input_filename=merged.vcf \ 72 | --mask_filename=refNumts.bed \ 73 | --info_filename=sampleInfo \ 74 | --output_filename=merged_geno.vcf \ 75 | --samtools=samtools \ 76 | --reference=hs37d5.fa \ 77 | --breakpoint \ 78 | --min_map_qual=13 \ 79 | --dir_tmp=/tmp \ 80 | --exonerate=exonerate \ 81 | --mt_filename=MT.fa 82 | ~~~ 83 | 84 | ## Citation 85 | * Dayama, Gargi, Weichen Zhou, Javier Prado-Martinez, Tomas Marques-Bonet, and Ryan E. Mills. 2020. [Characterization of Nuclear Mitochondrial Insertions in the Whole Genomes of Primates](https://academic.oup.com/nargab/article/2/4/lqaa089/5983420), 86 | NAR Genomics and Bioinformatics, 2020, lqaa089,. `https://doi.org/10.1093/nargab/lqaa089` 87 | 88 | * Dayama, Gargi, Sarah B Emery, Jeffrey M Kidd, and Ryan E. Mills. 2014. [The genomic landscape of polymorphic human nuclear mitochondrial insertions](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4227756/pdf/gku1038.pdf), 89 | Nucleic Acids Research, 2014, gku1038, `https://doi.org/10.1093/nar/gku1038` 90 | 91 | 92 | Contact 93 | ------- 94 | Questions: Please contact Ryan Mills at remills@umich.edu 95 | 04/16/2014 -------------------------------------------------------------------------------- /clusterNumtsVcf.pl: -------------------------------------------------------------------------------- 1 | #!/usr/bin/perl 2 | 3 | use strict; 4 | use warnings; 5 | use Getopt::Long; 6 | 7 | my %data = (); 8 | my %opts = (); 9 | my $version = 0.0.23; 10 | 11 | $opts{reference} = "hs37d5.fa"; 12 | $opts{samtools} = "samtools"; 13 | 14 | my $optResult = GetOptions( 15 | "reference=s" => \$opts{reference}, 16 | "samtools=s" => \$opts{samtools} 17 | ); 18 | 19 | checkOptions( $optResult, \%opts, $version ); 20 | 21 | my $lastChr = ""; 22 | my $lastPos = 0; 23 | my $index = 0; 24 | my $cnt = 1; 25 | my $last = ""; 26 | my $sumLen = 0; 27 | my $sumSupport =0; 28 | my $c = 0; 29 | my $len_mt = 16596; 30 | 31 | while (<>) { 32 | chomp; 33 | my ($chr, $pos, $id, $ref, $alt, $qual, $filter, $info) = split(/\t/); 34 | if (($chr eq $lastChr && $pos - $lastPos > 1000) || ($chr ne $lastChr && $lastChr ne "")) { 35 | $c++; 36 | $index = 0; 37 | } 38 | my ($end) = $info =~ /END=(\d+)/; 39 | my ($mlen) = $info =~ /MLEN=(\d+)/; 40 | my ($mstart) = $info =~ /MSTART=(\d+)/; 41 | my ($mend) = $info =~ /MEND=(\d+)/; 42 | $data{$c}[$index]{chr} = $chr; 43 | $data{$c}[$index]{pos} = $pos; 44 | $data{$c}[$index]{end} = $end; 45 | $data{$c}[$index]{id} = $id; 46 | $data{$c}[$index]{qual} = $qual; 47 | $data{$c}[$index]{filter} = $filter; 48 | $data{$c}[$index]{mlen} = (defined($mlen)) ? $mlen : "NA"; 49 | $data{$c}[$index]{mstart} = (defined($mstart)) ? $mstart : "NA"; 50 | $data{$c}[$index]{mend} = (defined($mend)) ? $mend : "NA"; 51 | 52 | $lastChr = $chr; 53 | 54 | $lastPos = $pos; 55 | $index++; 56 | } 57 | 58 | my $mergedId = 0; 59 | for (my $i=0; $i<=$c; $i++) { 60 | my $n = 0; 61 | my %seg = (); 62 | 63 | my @sortedStarts = sort { $a->{pos} <=> $b->{pos} } @{$data{$i}}; 64 | my @sortedEnds = sort { $a->{end} <=> $b->{end} } @{$data{$i}}; 65 | 66 | if ($#sortedStarts == 0 && $data{$i}[0]{id} =~ /DG196/) { next; } 67 | 68 | my $qual_sum = 0; 69 | my $mstart_min = 1e100; 70 | my $mend_max = 0; 71 | my %samples = (); 72 | my $filter = "LowQual"; 73 | my $chr = $sortedStarts[0]->{chr}; 74 | 75 | for (my $s=0; $s<=$#sortedStarts; $s++) { 76 | my $sample = $sortedStarts[$s]->{id}; 77 | if ($sortedStarts[$s]->{mstart} ne "NA") { 78 | if ($sortedStarts[$s]->{mstart} < $mstart_min) { $mstart_min = $sortedStarts[$s]->{mstart}; } 79 | if ($sortedStarts[$s]->{mend} > $mend_max) { $mend_max = $sortedStarts[$s]->{mend}; } 80 | } 81 | 82 | $sample =~ s/_.*//; 83 | $samples{$sample} = 1; 84 | $qual_sum += $sortedStarts[$s]->{qual}; 85 | if ($sortedStarts[$s]->{filter} eq "PASS") { $filter = "PASS"; } 86 | my $lastD = -1; 87 | if ($s > 0) { $lastD = $sortedStarts[$s - 1]->{pos}; } 88 | for (my $e=0; $e < $s; $e++) { 89 | if ($sortedEnds[$e]->{end} >= $sortedStarts[$s - 1]->{pos} && 90 | $sortedEnds[$e]->{end} <= $sortedStarts[$s]->{pos} ) { 91 | if (!defined($seg{$lastD})) { 92 | $seg{$lastD}{end} = $sortedEnds[$e]->{end}; 93 | $seg{$lastD}{n} = $n; 94 | } 95 | $n--; 96 | $lastD = $sortedEnds[$e]->{end}; 97 | } 98 | } 99 | if ($lastD > -1) { 100 | $seg{$lastD}{end} = $sortedStarts[$s]->{pos}; 101 | $seg{$lastD}{n} = $n; 102 | } 103 | $n++; 104 | } 105 | 106 | if (scalar @sortedStarts == 1) { $seg{$sortedStarts[0]->{pos}}{end} = $sortedStarts[0]->{end}; $seg{$sortedStarts[0]->{pos}}{n} = 1; } 107 | $mergedId++; 108 | my %info = (); 109 | 110 | my @sortedSeg = sort { $seg{$b}{n} <=> $seg{$a}{n} } keys %seg; 111 | 112 | my $id = "MERGED_NUMT_$mergedId"; 113 | my $total = scalar @sortedStarts; 114 | my $qual = int($qual_sum / $total); 115 | my $pos = $sortedSeg[0]; 116 | my $end = $seg{$sortedSeg[0]}{end}; 117 | my $ciDelta = $end - $pos + 1; 118 | my $alt = ""; 119 | 120 | my $refline = `samtools faidx $opts{reference} $chr:$pos-$pos`; 121 | my $ref = ( split( /\n/, $refline ) )[1]; 122 | if ( !defined($ref) ) { $ref = "N"; } 123 | 124 | $info{END} = $end; 125 | $info{SAMPLES} = join(",", sort keys %samples); 126 | $info{IMPRECISE} = undef; 127 | $info{CIPOS} = "0,$ciDelta"; 128 | $info{CIEND} = "-$ciDelta,0"; 129 | $info{SVTYPE} = "INS"; 130 | if ($mend_max > 0) { 131 | $info{MSTART} = $mstart_min; 132 | $info{MEND} = $mend_max; 133 | $info{MLEN} = ($mend_max - $mstart_min + 1); 134 | if (abs($mend_max - $len_mt + $mstart_min + 1) < $info{MLEN}) { $info{MLEN} = abs($mend_max - $len_mt + $mstart_min + 1); } 135 | } 136 | my $info = ""; 137 | my @sKeys = sort { $a cmp $b } keys %info; 138 | for ( my $i = 0 ; $i <= $#sKeys ; $i++ ) { 139 | if ( $i > 0 ) { $info .= ";"; } 140 | if ( defined( $info{ $sKeys[$i] } ) ) { 141 | $info .= "$sKeys[$i]=$info{$sKeys[$i]}"; 142 | } 143 | else { 144 | $info .= "$sKeys[$i]"; 145 | } 146 | } 147 | 148 | print "$chr\t$pos\t$id\t$ref\t$alt\t$qual\t$filter\t$info\n"; 149 | } 150 | 151 | sub checkOptions { 152 | my $optResult = shift; 153 | my $opts = shift; 154 | my $version = shift; 155 | 156 | if ( !$optResult || $$opts{help} ) { 157 | usage($version); 158 | exit; 159 | } 160 | if (-t STDIN and not @ARGV) { 161 | print "\n***ERROR***\tNo files passed as arguments\n"; 162 | usage($version); 163 | exit; 164 | } 165 | } 166 | 167 | sub usage { 168 | my $version = shift; 169 | printf("\n"); 170 | printf( "%-9s %s\n", "Program:", "clusterNumtsVcf.pl" ); 171 | printf( "%-9s %s\n", "Version:", "$version" ); 172 | printf("\n"); 173 | printf( "%-9s %s\n", "Usage:", "clusterNumtsVcf.pl [options] " ); 174 | printf("\n"); 175 | printf( "%-9s %-35s %s\n", "Options:", "--samtools=[filename]", "Path to samtools" ); 176 | printf( "%-9s %-35s %s\n", "", "--reference=[filename]", "Reference sequence, indexed with samtools faidx (required)" ) ; 177 | printf("\n"); 178 | } 179 | -------------------------------------------------------------------------------- /clusterNumtsVcf.38.pl: -------------------------------------------------------------------------------- 1 | #!/usr/bin/perl 2 | 3 | use strict; 4 | use warnings; 5 | use Getopt::Long; 6 | 7 | my %data = (); 8 | my %opts = (); 9 | my $version = 0.0.23; 10 | 11 | $opts{reference} = "/home/arthurz/scratch_arthur/20.04.07.Numt.1kg/ref.GRCh38_full_analysis_set_plus_decoy_hla/GRCh38_full_analysis_set_plus_decoy_hla.fa"; 12 | $opts{samtools} = "samtools"; 13 | 14 | my $optResult = GetOptions( 15 | "reference=s" => \$opts{reference}, 16 | "samtools=s" => \$opts{samtools} 17 | ); 18 | 19 | checkOptions( $optResult, \%opts, $version ); 20 | 21 | my $lastChr = ""; 22 | my $lastPos = 0; 23 | my $index = 0; 24 | my $cnt = 1; 25 | my $last = ""; 26 | my $sumLen = 0; 27 | my $sumSupport =0; 28 | my $c = 0; 29 | my $len_mt = 16596; 30 | 31 | while (<>) { 32 | chomp; 33 | my ($chr, $pos, $id, $ref, $alt, $qual, $filter, $info) = split(/\t/); 34 | if (($chr eq $lastChr && $pos - $lastPos > 1000) || ($chr ne $lastChr && $lastChr ne "")) { 35 | $c++; 36 | $index = 0; 37 | } 38 | my ($end) = $info =~ /END=(\d+)/; 39 | my ($mlen) = $info =~ /MLEN=(\d+)/; 40 | my ($mstart) = $info =~ /MSTART=(\d+)/; 41 | my ($mend) = $info =~ /MEND=(\d+)/; 42 | $data{$c}[$index]{chr} = $chr; 43 | $data{$c}[$index]{pos} = $pos; 44 | $data{$c}[$index]{end} = $end; 45 | $data{$c}[$index]{id} = $id; 46 | $data{$c}[$index]{qual} = $qual; 47 | $data{$c}[$index]{filter} = $filter; 48 | $data{$c}[$index]{mlen} = (defined($mlen)) ? $mlen : "NA"; 49 | $data{$c}[$index]{mstart} = (defined($mstart)) ? $mstart : "NA"; 50 | $data{$c}[$index]{mend} = (defined($mend)) ? $mend : "NA"; 51 | 52 | $lastChr = $chr; 53 | 54 | $lastPos = $pos; 55 | $index++; 56 | } 57 | 58 | my $mergedId = 0; 59 | for (my $i=0; $i<=$c; $i++) { 60 | my $n = 0; 61 | my %seg = (); 62 | 63 | my @sortedStarts = sort { $a->{pos} <=> $b->{pos} } @{$data{$i}}; 64 | my @sortedEnds = sort { $a->{end} <=> $b->{end} } @{$data{$i}}; 65 | 66 | if ($#sortedStarts == 0 && $data{$i}[0]{id} =~ /DG196/) { next; } 67 | 68 | my $qual_sum = 0; 69 | my $mstart_min = 1e100; 70 | my $mend_max = 0; 71 | my %samples = (); 72 | my $filter = "LowQual"; 73 | my $chr = $sortedStarts[0]->{chr}; 74 | 75 | for (my $s=0; $s<=$#sortedStarts; $s++) { 76 | my $sample = $sortedStarts[$s]->{id}; 77 | if ($sortedStarts[$s]->{mstart} ne "NA") { 78 | if ($sortedStarts[$s]->{mstart} < $mstart_min) { $mstart_min = $sortedStarts[$s]->{mstart}; } 79 | if ($sortedStarts[$s]->{mend} > $mend_max) { $mend_max = $sortedStarts[$s]->{mend}; } 80 | } 81 | 82 | $sample =~ s/_.*//; 83 | $samples{$sample} = 1; 84 | $qual_sum += $sortedStarts[$s]->{qual}; 85 | if ($sortedStarts[$s]->{filter} eq "PASS") { $filter = "PASS"; } 86 | my $lastD = -1; 87 | if ($s > 0) { $lastD = $sortedStarts[$s - 1]->{pos}; } 88 | for (my $e=0; $e < $s; $e++) { 89 | if ($sortedEnds[$e]->{end} >= $sortedStarts[$s - 1]->{pos} && 90 | $sortedEnds[$e]->{end} <= $sortedStarts[$s]->{pos} ) { 91 | if (!defined($seg{$lastD})) { 92 | $seg{$lastD}{end} = $sortedEnds[$e]->{end}; 93 | $seg{$lastD}{n} = $n; 94 | } 95 | $n--; 96 | $lastD = $sortedEnds[$e]->{end}; 97 | } 98 | } 99 | if ($lastD > -1) { 100 | $seg{$lastD}{end} = $sortedStarts[$s]->{pos}; 101 | $seg{$lastD}{n} = $n; 102 | } 103 | $n++; 104 | } 105 | 106 | if (scalar @sortedStarts == 1) { $seg{$sortedStarts[0]->{pos}}{end} = $sortedStarts[0]->{end}; $seg{$sortedStarts[0]->{pos}}{n} = 1; } 107 | $mergedId++; 108 | my %info = (); 109 | 110 | my @sortedSeg = sort { $seg{$b}{n} <=> $seg{$a}{n} } keys %seg; 111 | 112 | my $id = "MERGED_NUMT_$mergedId"; 113 | my $total = scalar @sortedStarts; 114 | my $qual = int($qual_sum / $total); 115 | my $pos = $sortedSeg[0]; 116 | my $end = $seg{$sortedSeg[0]}{end}; 117 | my $ciDelta = $end - $pos + 1; 118 | my $alt = ""; 119 | 120 | my $refline = `samtools faidx $opts{reference} $chr:$pos-$pos`; 121 | my $ref = ( split( /\n/, $refline ) )[1]; 122 | if ( !defined($ref) ) { $ref = "N"; } 123 | 124 | $info{END} = $end; 125 | $info{SAMPLES} = join(",", sort keys %samples); 126 | $info{IMPRECISE} = undef; 127 | $info{CIPOS} = "0,$ciDelta"; 128 | $info{CIEND} = "-$ciDelta,0"; 129 | $info{SVTYPE} = "INS"; 130 | if ($mend_max > 0) { 131 | $info{MSTART} = $mstart_min; 132 | $info{MEND} = $mend_max; 133 | $info{MLEN} = ($mend_max - $mstart_min + 1); 134 | if (abs($mend_max - $len_mt + $mstart_min + 1) < $info{MLEN}) { $info{MLEN} = abs($mend_max - $len_mt + $mstart_min + 1); } 135 | } 136 | my $info = ""; 137 | my @sKeys = sort { $a cmp $b } keys %info; 138 | for ( my $i = 0 ; $i <= $#sKeys ; $i++ ) { 139 | if ( $i > 0 ) { $info .= ";"; } 140 | if ( defined( $info{ $sKeys[$i] } ) ) { 141 | $info .= "$sKeys[$i]=$info{$sKeys[$i]}"; 142 | } 143 | else { 144 | $info .= "$sKeys[$i]"; 145 | } 146 | } 147 | 148 | print "$chr\t$pos\t$id\t$ref\t$alt\t$qual\t$filter\t$info\n"; 149 | } 150 | 151 | sub checkOptions { 152 | my $optResult = shift; 153 | my $opts = shift; 154 | my $version = shift; 155 | 156 | if ( !$optResult || $$opts{help} ) { 157 | usage($version); 158 | exit; 159 | } 160 | if (-t STDIN and not @ARGV) { 161 | print "\n***ERROR***\tNo files passed as arguments\n"; 162 | usage($version); 163 | exit; 164 | } 165 | } 166 | 167 | sub usage { 168 | my $version = shift; 169 | printf("\n"); 170 | printf( "%-9s %s\n", "Program:", "clusterNumtsVcf.pl" ); 171 | printf( "%-9s %s\n", "Version:", "$version" ); 172 | printf("\n"); 173 | printf( "%-9s %s\n", "Usage:", "clusterNumtsVcf.pl [options] " ); 174 | printf("\n"); 175 | printf( "%-9s %-35s %s\n", "Options:", "--samtools=[filename]", "Path to samtools" ); 176 | printf( "%-9s %-35s %s\n", "", "--reference=[filename]", "Reference sequence, indexed with samtools faidx (required)" ) ; 177 | printf("\n"); 178 | } 179 | -------------------------------------------------------------------------------- /refNumts.38.bed: -------------------------------------------------------------------------------- 1 | chr1 629084 634924 HSA_NumtS_001_b1 2 | chr1 5554746 5554877 HSA_NumtS_002_b1 3 | chr1 5850258 5850468 HSA_NumtS_003_b1 4 | chr1 8909743 8909908 HSA_NumtS_004_b1 5 | chr1 9574629 9574829 HSA_NumtS_005_b1 6 | chr1 11142847 11142918 HSA_NumtS_006_b1 7 | chr1 11425262 11425772 HSA_NumtS_007_b1 8 | chr1 18595412 18595626 HSA_NumtS_008_b1 9 | chr1 50017096 50017505 HSA_NumtS_010_b1 10 | chr1 55372550 55373651 HSA_NumtS_011_b1 11 | chr1 76971249 76971415 HSA_NumtS_012_b1 12 | chr1 81080801 81081175 HSA_NumtS_013_b1 13 | chr1 93920175 93920414 HSA_NumtS_014_b1 14 | chr1 93920710 93920777 HSA_NumtS_014_b2 15 | chr1 93921981 93922276 HSA_NumtS_014_b3 16 | chr1 93924276 93924860 HSA_NumtS_014_b4 17 | chr1 93925390 93926074 HSA_NumtS_014_b5 18 | chr1 93926635 93927025 HSA_NumtS_014_b6 19 | chr1 93927476 93928308 HSA_NumtS_014_b7 20 | chr1 93933749 93934276 HSA_NumtS_014_b8 21 | chr1 93935038 93935228 HSA_NumtS_014_b9 22 | chr1 93936406 93937743 HSA_NumtS_014_b10 23 | chr1 103621156 103621198 HSA_NumtS_015_b1 24 | chr1 106802629 106806125 HSA_NumtS_016_b1 25 | chr1 113576758 113576881 HSA_NumtS_017_b1 26 | chr1 147860672 147860783 HSA_NumtS_030_b1 27 | chr1 169474063 169474158 HSA_NumtS_031_b1 28 | chr1 172710531 172710602 HSA_NumtS_032_b1 29 | chr1 181422785 181423178 HSA_NumtS_033_b1 30 | chr1 190909727 190910060 HSA_NumtS_034_b1 31 | chr1 191961693 191961992 HSA_NumtS_035_b1 32 | chr1 205475336 205475504 HSA_NumtS_036_b1 33 | chr1 212504996 212505072 HSA_NumtS_037_b1 34 | chr1 215499796 215499834 HSA_NumtS_038_b1 35 | chr1 220455154 220455255 HSA_NumtS_039_b1 36 | chr1 226958158 226958258 HSA_NumtS_040_b1 37 | chr1 235520022 235520923 HSA_NumtS_041_b1 38 | chr1 235537396 235542389 HSA_NumtS_042_b1 39 | chr1 237940763 237946391 HSA_NumtS_043_b1 40 | chr1 237947076 237947319 HSA_NumtS_043_b2 41 | chr1 237947320 237951721 HSA_NumtS_043_b3 42 | chr1 240549886 240550017 HSA_NumtS_044_b1 43 | chr1 248904383 248904502 HSA_NumtS_045_b1 44 | chr10 2235689 2235860 HSA_NumtS_354_b1 45 | chr10 19746637 19747170 HSA_NumtS_355_b1 46 | chr10 19747173 19747516 HSA_NumtS_356_b1 47 | chr10 19747518 19748650 HSA_NumtS_357_b1 48 | chr10 20804290 20804576 HSA_NumtS_358_b1 49 | chr10 20841371 20841544 HSA_NumtS_359_b1 50 | chr10 25638517 25638591 HSA_NumtS_360_b1 51 | chr10 26873261 26873405 HSA_NumtS_361_b1 52 | chr10 27876748 27876809 HSA_NumtS_362_b1 53 | chr10 30565268 30565340 HSA_NumtS_363_b1 54 | chr10 33023099 33023226 HSA_NumtS_364_b1 55 | chr10 36214899 36215015 HSA_NumtS_365_b1 56 | chr10 36432876 36435162 HSA_NumtS_366_b1 57 | chr10 37600797 37602974 HSA_NumtS_367_b1 58 | chr10 38761813 38761953 HSA_NumtS_368_b1 59 | chr10 42210525 42210665 HSA_NumtS_369_b1 60 | chr10 46706106 46706544 HSA_NumtS_370_b1 61 | chr10 47633857 47634295 HSA_NumtS_371_b1 62 | chr10 47633857 47634295 HSA_NumtS_372_b1 63 | chr10 55597036 55597086 HSA_NumtS_373_b1 64 | chr10 55597816 55600081 HSA_NumtS_374_b1 65 | chr10 55600373 55600733 HSA_NumtS_374_b2 66 | chr10 69155315 69155426 HSA_NumtS_375_b1 67 | chr10 69589879 69593360 HSA_NumtS_376_b1 68 | chr10 69594019 69594932 HSA_NumtS_376_b2 69 | chr10 69595258 69596450 HSA_NumtS_376_b3 70 | chr10 79411226 79411293 HSA_NumtS_377_b1 71 | chr10 89786624 89787110 HSA_NumtS_378_b1 72 | chr10 94774172 94774628 HSA_NumtS_379_b1 73 | chr10 94940259 94941121 HSA_NumtS_380_b1 74 | chr10 96650909 96651032 HSA_NumtS_381_b1 75 | chr10 100057383 100057915 HSA_NumtS_382_b1 76 | chr10 112894578 112894690 HSA_NumtS_383_b1 77 | chr10 119838008 119838180 HSA_NumtS_384_b1 78 | chr10 125819798 125819937 HSA_NumtS_385_b1 79 | chr10 130131136 130131204 HSA_NumtS_386_b1 80 | chr11 6501903 6502030 HSA_NumtS_387_b1 81 | chr11 10507887 10510280 HSA_NumtS_388_b1 82 | chr11 11239977 11240174 HSA_NumtS_389_b1 83 | chr11 31555109 31555305 HSA_NumtS_390_b1 84 | chr11 37541526 37542060 HSA_NumtS_391_b1 85 | chr11 39766246 39773242 HSA_NumtS_392_b1 86 | chr11 47323984 47324181 HSA_NumtS_393_b1 87 | chr11 48972953 48973463 HSA_NumtS_394_b1 88 | chr11 49812519 49813024 HSA_NumtS_395_b1 89 | chr11 55273486 55273984 HSA_NumtS_396_b1 90 | chr11 64187335 64187471 HSA_NumtS_397_b1 91 | chr11 73510661 73510823 HSA_NumtS_398_b1 92 | chr11 74566998 74567136 HSA_NumtS_399_b1 93 | chr11 77596175 77596256 HSA_NumtS_400_b1 94 | chr11 81551574 81557335 HSA_NumtS_401_b1 95 | chr11 87813548 87814107 HSA_NumtS_402_b1 96 | chr11 87814120 87814728 HSA_NumtS_403_b1 97 | chr11 87815033 87816474 HSA_NumtS_403_b2 98 | chr11 89908541 89909051 HSA_NumtS_404_b1 99 | chr11 89935493 89936003 HSA_NumtS_405_b1 100 | chr11 90098132 90098273 HSA_NumtS_406_b1 101 | chr11 103402016 103409577 HSA_NumtS_407_b1 102 | chr11 103409581 103409623 HSA_NumtS_408_b1 103 | chr11 103409663 103411124 HSA_NumtS_407_b2 104 | chr11 110876992 110877151 HSA_NumtS_409_b1 105 | chr11 123003606 123003677 HSA_NumtS_410_b1 106 | chr11 123139885 123140386 HSA_NumtS_411_b1 107 | chr11 125851931 125852041 HSA_NumtS_412_b1 108 | chr11 125915516 125915586 HSA_NumtS_413_b1 109 | chr11 128066532 128066583 HSA_NumtS_414_b1 110 | chr12 4232638 4232692 HSA_NumtS_415_b1 111 | chr12 7620315 7620597 HSA_NumtS_416_b1 112 | chr12 7637414 7637645 HSA_NumtS_417_b1 113 | chr12 7646872 7647086 HSA_NumtS_418_b1 114 | chr12 9322708 9322986 HSA_NumtS_419_b1 115 | chr12 9408797 9409072 HSA_NumtS_420_b1 116 | chr12 22005832 22005936 HSA_NumtS_421_b1 117 | chr12 26572542 26572767 HSA_NumtS_422_b1 118 | chr12 31015446 31015677 HSA_NumtS_423_b1 119 | chr12 31246708 31247762 HSA_NumtS_424_b1 120 | chr12 31248048 31248973 HSA_NumtS_424_b2 121 | chr12 31249114 31249238 HSA_NumtS_424_b3 122 | chr12 40286289 40286661 HSA_NumtS_425_b1 123 | chr12 41363635 41363723 HSA_NumtS_426_b1 124 | chr12 41698558 41700251 HSA_NumtS_427_b1 125 | chr12 49817339 49817502 HSA_NumtS_428_b1 126 | chr12 62774010 62774077 HSA_NumtS_429_b1 127 | chr12 108178336 108178709 HSA_NumtS_430_b1 128 | chr12 114172419 114172480 HSA_NumtS_431_b1 129 | chr12 126583309 126583369 HSA_NumtS_432_b1 130 | chr12 126583370 126583416 HSA_NumtS_433_b1 131 | chr12 126583417 126583463 HSA_NumtS_434_b1 132 | chr12 126583462 126583510 HSA_NumtS_435_b1 133 | chr12 126583511 126583557 HSA_NumtS_436_b1 134 | chr12 126583558 126583604 HSA_NumtS_437_b1 135 | chr12 126583605 126583651 HSA_NumtS_438_b1 136 | chr12 126583652 126583698 HSA_NumtS_439_b1 137 | chr12 126583699 126583745 HSA_NumtS_440_b1 138 | chr12 126583746 126583792 HSA_NumtS_441_b1 139 | chr12 126583793 126583839 HSA_NumtS_442_b1 140 | chr12 126583840 126583886 HSA_NumtS_443_b1 141 | chr12 126583887 126583933 HSA_NumtS_444_b1 142 | chr12 126583934 126583980 HSA_NumtS_445_b1 143 | chr12 126583981 126584027 HSA_NumtS_446_b1 144 | chr12 126584028 126584074 HSA_NumtS_447_b1 145 | chr12 126584075 126584121 HSA_NumtS_448_b1 146 | chr12 126584122 126584168 HSA_NumtS_449_b1 147 | chr12 126584169 126584215 HSA_NumtS_450_b1 148 | chr12 126584216 126584262 HSA_NumtS_451_b1 149 | chr12 126584263 126584309 HSA_NumtS_452_b1 150 | chr12 126584310 126584396 HSA_NumtS_453_b1 151 | chr12 130315599 130315801 HSA_NumtS_454_b1 152 | chr13 22113189 22113315 HSA_NumtS_455_b1 153 | chr13 23765814 23765933 HSA_NumtS_456_b1 154 | chr13 23765934 23766085 HSA_NumtS_457_b1 155 | chr13 32193340 32193484 HSA_NumtS_458_b1 156 | chr13 36065491 36065696 HSA_NumtS_459_b1 157 | chr13 40768367 40768422 HSA_NumtS_460_b1 158 | chr13 47570843 47571128 HSA_NumtS_461_b1 159 | chr13 49986944 49987121 HSA_NumtS_462_b1 160 | chr13 53891587 53891751 HSA_NumtS_463_b1 161 | chr13 55971634 55971759 HSA_NumtS_464_b1 162 | chr13 56688476 56688649 HSA_NumtS_465_b1 163 | chr13 57204381 57204904 HSA_NumtS_466_b1 164 | chr13 75592239 75592407 HSA_NumtS_467_b1 165 | chr13 84520256 84522173 HSA_NumtS_468_b1 166 | chr13 84522557 84524065 HSA_NumtS_468_b2 167 | chr13 88571651 88572210 HSA_NumtS_469_b1 168 | chr13 95692541 95692828 HSA_NumtS_470_b1 169 | chr13 95693122 95696453 HSA_NumtS_470_b2 170 | chr13 95696454 95696513 HSA_NumtS_470_b3 171 | chr13 96697685 96697830 HSA_NumtS_471_b1 172 | chr13 109424125 109424380 HSA_NumtS_472_b1 173 | chr14 23426464 23426531 HSA_NumtS_473_b1 174 | chr14 32484098 32485118 HSA_NumtS_474_b1 175 | chr14 51587368 51587768 HSA_NumtS_475_b1 176 | chr14 78854879 78854951 HSA_NumtS_476_b1 177 | chr14 83587506 83589092 HSA_NumtS_477_b1 178 | chr14 84171352 84172840 HSA_NumtS_478_b1 179 | chr14 84172843 84173800 HSA_NumtS_479_b1 180 | chr14 84173792 84175691 HSA_NumtS_479_b2 181 | chr14 84176009 84177082 HSA_NumtS_479_b3 182 | chr14 95199233 95199347 HSA_NumtS_480_b1 183 | chr15 34394594 34394991 HSA_NumtS_481_b1 184 | chr15 34540815 34541212 HSA_NumtS_482_b1 185 | chr15 35396242 35396412 HSA_NumtS_483_b1 186 | chr15 39201373 39201515 HSA_NumtS_484_b1 187 | chr15 40133770 40135192 HSA_NumtS_485_b1 188 | chr15 41157175 41157300 HSA_NumtS_486_b1 189 | chr15 46341391 46341598 HSA_NumtS_487_b1 190 | chr15 48473906 48473993 HSA_NumtS_488_b1 191 | chr15 57790218 57790276 HSA_NumtS_489_b1 192 | chr15 58150296 58151595 HSA_NumtS_490_b1 193 | chr15 58151604 58152609 HSA_NumtS_490_b2 194 | chr15 58152662 58156198 HSA_NumtS_490_b3 195 | chr15 67040911 67040953 HSA_NumtS_491_b1 196 | chr15 88091357 88091408 HSA_NumtS_492_b1 197 | chr16 3367483 3369297 HSA_NumtS_493_b1 198 | chr16 3369598 3371388 HSA_NumtS_493_b2 199 | chr16 3371716 3372332 HSA_NumtS_493_b3 200 | chr16 3372637 3372740 HSA_NumtS_493_b4 201 | chr16 7029727 7029785 HSA_NumtS_494_b1 202 | chr16 10718483 10720013 HSA_NumtS_495_b1 203 | chr16 10720086 10725324 HSA_NumtS_495_b2 204 | chr16 13991926 13991999 HSA_NumtS_496_b1 205 | chr16 20721048 20722401 HSA_NumtS_497_b1 206 | chr16 49066600 49066743 HSA_NumtS_498_b1 207 | chr16 69358670 69358849 HSA_NumtS_499_b1 208 | chr16 70682712 70682766 HSA_NumtS_500_b1 209 | chr16 82119859 82119918 HSA_NumtS_501_b1 210 | chr16 82119919 82120186 HSA_NumtS_501_b2 211 | chr16 82120489 82122609 HSA_NumtS_501_b3 212 | chr16 83673020 83673513 HSA_NumtS_502_b1 213 | chr17 18983428 18983515 HSA_NumtS_503_b1 214 | chr17 18986188 18986352 HSA_NumtS_504_b1 215 | chr17 19597536 19597811 HSA_NumtS_505_b1 216 | chr17 19597812 19597869 HSA_NumtS_505_b2 217 | chr17 19598559 19600074 HSA_NumtS_505_b3 218 | chr17 19600386 19600960 HSA_NumtS_505_b4 219 | chr17 19601264 19601884 HSA_NumtS_505_b5 220 | chr17 19602214 19603548 HSA_NumtS_505_b6 221 | chr17 19603848 19604159 HSA_NumtS_505_b7 222 | chr17 19604480 19605933 HSA_NumtS_505_b8 223 | chr17 19746394 19746494 HSA_NumtS_506_b1 224 | chr17 22519195 22521400 HSA_NumtS_507_b1 225 | chr17 22521401 22521855 HSA_NumtS_508_b1 226 | chr17 22522035 22532515 HSA_NumtS_508_b2 227 | chr17 22532515 22532766 HSA_NumtS_509_b1 228 | chr17 28974348 28974426 HSA_NumtS_510_b1 229 | chr17 35654871 35655194 HSA_NumtS_511_b1 230 | chr17 43997716 43997783 HSA_NumtS_512_b1 231 | chr17 53105733 53106385 HSA_NumtS_513_b1 232 | chr17 54903101 54903174 HSA_NumtS_514_b1 233 | chr17 61127328 61127401 HSA_NumtS_515_b1 234 | chr17 63393355 63393585 HSA_NumtS_516_b1 235 | chr17 68938714 68939123 HSA_NumtS_517_b1 236 | chr17 80617576 80617622 HSA_NumtS_518_b1 237 | chr18 2842232 2842354 HSA_NumtS_519_b1 238 | chr18 14136791 14136857 HSA_NumtS_520_b1 239 | chr18 34628925 34628995 HSA_NumtS_521_b1 240 | chr18 47853246 47853437 HSA_NumtS_522_b1 241 | chr18 50669528 50669878 HSA_NumtS_523_b1 242 | chr18 61874553 61874910 HSA_NumtS_524_b1 243 | chr18 64154887 64155067 HSA_NumtS_525_b1 244 | chr19 12095653 12096334 HSA_NumtS_526_b1 245 | chr19 12104884 12105189 HSA_NumtS_527_b1 246 | chr19 12202931 12203042 HSA_NumtS_528_b1 247 | chr19 12204190 12204544 HSA_NumtS_528_b2 248 | chr19 12502675 12502837 HSA_NumtS_529_b1 249 | chr19 12503138 12503769 HSA_NumtS_529_b2 250 | chr19 12505652 12506006 HSA_NumtS_529_b3 251 | chr19 12507551 12508103 HSA_NumtS_529_b4 252 | chr19 17459193 17459245 HSA_NumtS_530_b1 253 | chr19 27741809 27741878 HSA_NumtS_531_b1 254 | chr19 30022334 30022405 HSA_NumtS_532_b1 255 | chr19 37570479 37571215 HSA_NumtS_533_b1 256 | chr19 37571814 37571912 HSA_NumtS_534_b1 257 | chr19 37572646 37572764 HSA_NumtS_533_b2 258 | chr19 37572891 37573518 HSA_NumtS_533_b3 259 | chr19 37573877 37574617 HSA_NumtS_533_b4 260 | chr19 37575611 37575676 HSA_NumtS_533_b5 261 | chr19 37576256 37577724 HSA_NumtS_533_b6 262 | chr19 37579615 37580500 HSA_NumtS_533_b7 263 | chr19 42190702 42190936 HSA_NumtS_535_b1 264 | chr19 44146428 44146509 HSA_NumtS_536_b1 265 | chr19 44146507 44146567 HSA_NumtS_537_b1 266 | chr19 56691001 56691166 HSA_NumtS_538_b1 267 | chr19 56691507 56691812 HSA_NumtS_538_b2 268 | chr19 56922001 56922424 HSA_NumtS_539_b1 269 | chr2 9746713 9746865 HSA_NumtS_046_b1 270 | chr2 11256916 11256985 HSA_NumtS_047_b1 271 | chr2 15521167 15521282 HSA_NumtS_048_b1 272 | chr2 22317051 22317248 HSA_NumtS_049_b1 273 | chr2 33767471 33767526 HSA_NumtS_050_b1 274 | chr2 40784957 40785117 HSA_NumtS_051_b1 275 | chr2 49229628 49229899 HSA_NumtS_052_b1 276 | chr2 50588692 50589390 HSA_NumtS_053_b1 277 | chr2 56465022 56465191 HSA_NumtS_054_b1 278 | chr2 63689410 63689621 HSA_NumtS_055_b1 279 | chr2 68260740 68261008 HSA_NumtS_056_b1 280 | chr2 75063748 75063853 HSA_NumtS_057_b1 281 | chr2 81666477 81666728 HSA_NumtS_058_b1 282 | chr2 82814979 82819966 HSA_NumtS_059_b1 283 | chr2 82819967 82820026 HSA_NumtS_059_b2 284 | chr2 82820346 82820376 HSA_NumtS_059_b3 285 | chr2 82820872 82821005 HSA_NumtS_059_b4 286 | chr2 85068805 85069034 HSA_NumtS_060_b1 287 | chr2 85724900 85724960 HSA_NumtS_061_b1 288 | chr2 87824890 87825365 HSA_NumtS_062_b1 289 | chr2 92006447 92006583 HSA_NumtS_063_b1 290 | chr2 94899009 94899498 HSA_NumtS_064_b1 291 | chr2 94899547 94899589 HSA_NumtS_065_b1 292 | chr2 94899591 94900970 HSA_NumtS_066_b1 293 | chr2 94901001 94901746 HSA_NumtS_067_b1 294 | chr2 94901807 94901958 HSA_NumtS_067_b2 295 | chr2 96680839 96680919 HSA_NumtS_068_b1 296 | chr2 102505774 102505829 HSA_NumtS_069_b1 297 | chr2 103598726 103598836 HSA_NumtS_070_b1 298 | chr2 108913568 108913709 HSA_NumtS_071_b1 299 | chr2 114490352 114490423 HSA_NumtS_072_b1 300 | chr2 116751185 116752312 HSA_NumtS_073_b1 301 | chr2 116752313 116752365 HSA_NumtS_073_b2 302 | chr2 117019928 117019947 HSA_NumtS_074_b1 303 | chr2 117020197 117020378 HSA_NumtS_074_b2 304 | chr2 117020379 117020454 HSA_NumtS_074_b3 305 | chr2 117020554 117020619 HSA_NumtS_074_b4 306 | chr2 117021213 117027027 HSA_NumtS_074_b5 307 | chr2 120210850 120212275 HSA_NumtS_075_b1 308 | chr2 120212661 120215600 HSA_NumtS_075_b2 309 | chr2 120215900 120216349 HSA_NumtS_075_b3 310 | chr2 120216555 120217227 HSA_NumtS_075_b4 311 | chr2 124680719 124681002 HSA_NumtS_076_b1 312 | chr2 125810817 125811092 HSA_NumtS_077_b1 313 | chr2 125811093 125811168 HSA_NumtS_077_b2 314 | chr2 125812120 125812539 HSA_NumtS_077_b3 315 | chr2 129090943 129091030 HSA_NumtS_078_b1 316 | chr2 130270396 130270539 HSA_NumtS_079_b1 317 | chr2 130270561 130270674 HSA_NumtS_080_b1 318 | chr2 130271810 130283331 HSA_NumtS_080_b2 319 | chr2 131369061 131373756 HSA_NumtS_081_b1 320 | chr2 131379466 131379522 HSA_NumtS_082_b1 321 | chr2 131379541 131379607 HSA_NumtS_083_b1 322 | chr2 131379593 131383702 HSA_NumtS_084_b1 323 | chr2 131383703 131386318 HSA_NumtS_084_b2 324 | chr2 131386320 131386463 HSA_NumtS_085_b1 325 | chr2 132425100 132425175 HSA_NumtS_086_b1 326 | chr2 134526365 134526527 HSA_NumtS_087_b1 327 | chr2 140217001 140217971 HSA_NumtS_088_b1 328 | chr2 140219662 140224224 HSA_NumtS_088_b2 329 | chr2 140224905 140224965 HSA_NumtS_088_b3 330 | chr2 140224966 140225240 HSA_NumtS_088_b4 331 | chr2 143088658 143088874 HSA_NumtS_089_b1 332 | chr2 143090015 143091584 HSA_NumtS_089_b2 333 | chr2 143092265 143092340 HSA_NumtS_090_b1 334 | chr2 143092341 143099047 HSA_NumtS_090_b2 335 | chr2 143099378 143101345 HSA_NumtS_090_b3 336 | chr2 147265184 147265272 HSA_NumtS_091_b1 337 | chr2 148881726 148881857 HSA_NumtS_092_b1 338 | chr2 155263465 155264802 HSA_NumtS_093_b1 339 | chr2 155311086 155314353 HSA_NumtS_094_b1 340 | chr2 166414495 166414713 HSA_NumtS_095_b1 341 | chr2 174780950 174781096 HSA_NumtS_096_b1 342 | chr2 179739347 179739740 HSA_NumtS_097_b1 343 | chr2 196434786 196434846 HSA_NumtS_098_b1 344 | chr2 201212296 201214981 HSA_NumtS_099_b1 345 | chr2 201549268 201550264 HSA_NumtS_100_b1 346 | chr2 201552756 201555229 HSA_NumtS_100_b2 347 | chr2 201555524 201556227 HSA_NumtS_100_b3 348 | chr2 201557102 201558129 HSA_NumtS_100_b4 349 | chr2 202614210 202618290 HSA_NumtS_101_b1 350 | chr2 202618616 202618872 HSA_NumtS_101_b2 351 | chr2 202618883 202618995 HSA_NumtS_102_b1 352 | chr2 202618994 202619186 HSA_NumtS_103_b1 353 | chr2 202619189 202620037 HSA_NumtS_104_b1 354 | chr2 211773781 211774040 HSA_NumtS_105_b1 355 | chr2 211774382 211776629 HSA_NumtS_105_b2 356 | chr2 211777209 211780005 HSA_NumtS_106_b1 357 | chr2 215812668 215812790 HSA_NumtS_107_b1 358 | chr2 216879717 216879838 HSA_NumtS_108_b1 359 | chr2 217574801 217574855 HSA_NumtS_109_b1 360 | chr2 220048819 220049271 HSA_NumtS_110_b1 361 | chr2 226421703 226421794 HSA_NumtS_111_b1 362 | chr2 226722269 226722426 HSA_NumtS_112_b1 363 | chr2 229028787 229028907 HSA_NumtS_113_b1 364 | chr2 235455990 235456247 HSA_NumtS_114_b1 365 | chr2 237520604 237521777 HSA_NumtS_115_b1 366 | chr2 237521783 237521838 HSA_NumtS_116_b1 367 | chr2 237521844 237521888 HSA_NumtS_117_b1 368 | chr2 237521905 237521960 HSA_NumtS_118_b1 369 | chr2 237521966 237522021 HSA_NumtS_119_b1 370 | chr2 237522027 237522082 HSA_NumtS_120_b1 371 | chr2 237522088 237522143 HSA_NumtS_121_b1 372 | chr2 237522149 237522204 HSA_NumtS_122_b1 373 | chr2 237522210 237522265 HSA_NumtS_123_b1 374 | chr2 237522271 237522326 HSA_NumtS_124_b1 375 | chr2 237522332 237522376 HSA_NumtS_125_b1 376 | chr2 237522393 237522448 HSA_NumtS_126_b1 377 | chr2 237522454 237522509 HSA_NumtS_127_b1 378 | chr2 237522515 237522559 HSA_NumtS_128_b1 379 | chr2 237522576 237522631 HSA_NumtS_129_b1 380 | chr2 237522637 237523909 HSA_NumtS_130_b1 381 | chr2 237523972 237524204 HSA_NumtS_131_b1 382 | chr20 2188409 2188483 HSA_NumtS_540_b1 383 | chr20 4728916 4729018 HSA_NumtS_541_b1 384 | chr20 5965180 5965503 HSA_NumtS_542_b1 385 | chr20 9168924 9168965 HSA_NumtS_543_b1 386 | chr20 13167312 13167354 HSA_NumtS_544_b1 387 | chr20 34688663 34689035 HSA_NumtS_545_b1 388 | chr20 57064054 57064123 HSA_NumtS_546_b1 389 | chr20 57357408 57361058 HSA_NumtS_547_b1 390 | chr20 57361440 57361470 HSA_NumtS_547_b2 391 | chr20 57361713 57361772 HSA_NumtS_547_b3 392 | chr20 57361773 57361810 HSA_NumtS_547_b4 393 | chr20 60414394 60414865 HSA_NumtS_548_b1 394 | chr21 14021580 14021646 HSA_NumtS_549_b1 395 | chr21 16470420 16470516 HSA_NumtS_550_b1 396 | chr21 24931381 24931477 HSA_NumtS_551_b1 397 | chr21 35890274 35891173 HSA_NumtS_552_b1 398 | chr21 35891778 35892124 HSA_NumtS_552_b2 399 | chr21 39611005 39611181 HSA_NumtS_553_b1 400 | chr21 42406711 42406933 HSA_NumtS_554_b1 401 | chr21 44470675 44470928 HSA_NumtS_555_b1 402 | chr21 44472099 44472495 HSA_NumtS_555_b2 403 | chr21 44472830 44473143 HSA_NumtS_555_b3 404 | chr21 44473655 44473891 HSA_NumtS_555_b4 405 | chr21 44474210 44474375 HSA_NumtS_555_b5 406 | chr21 44474512 44475175 HSA_NumtS_555_b6 407 | chr21 45376188 45376384 HSA_NumtS_556_b1 408 | chr22 16876224 16876364 HSA_NumtS_557_b1 409 | chr22 32894883 32895223 HSA_NumtS_559_b1 410 | chr22 35885672 35885718 HSA_NumtS_560_b1 411 | chr22 36172427 36172988 HSA_NumtS_561_b1 412 | chr22 36173438 36174674 HSA_NumtS_561_b2 413 | chr22 36175121 36175858 HSA_NumtS_561_b3 414 | chr22 36175897 36176066 HSA_NumtS_561_b4 415 | chr22 36176194 36176681 HSA_NumtS_561_b5 416 | chr22 36176968 36177243 HSA_NumtS_561_b6 417 | chr22 36177832 36178526 HSA_NumtS_561_b7 418 | chr22 36178529 36178753 HSA_NumtS_561_b8 419 | chr22 36179064 36179549 HSA_NumtS_561_b9 420 | chr22 41764957 41765025 HSA_NumtS_562_b1 421 | chr22 46470266 46470333 HSA_NumtS_563_b1 422 | chr3 12164841 12164977 HSA_NumtS_132_b1 423 | chr3 12165300 12165633 HSA_NumtS_132_b2 424 | chr3 25467504 25467542 HSA_NumtS_133_b1 425 | chr3 28334886 28334940 HSA_NumtS_134_b1 426 | chr3 29797652 29797895 HSA_NumtS_135_b1 427 | chr3 40252147 40253767 HSA_NumtS_136_b1 428 | chr3 43229326 43229731 HSA_NumtS_137_b1 429 | chr3 63847026 63847100 HSA_NumtS_138_b1 430 | chr3 68658950 68659038 HSA_NumtS_139_b1 431 | chr3 68659056 68659131 HSA_NumtS_139_b2 432 | chr3 68659129 68659287 HSA_NumtS_140_b1 433 | chr3 72583300 72583536 HSA_NumtS_141_b1 434 | chr3 89586853 89587656 HSA_NumtS_142_b1 435 | chr3 89587785 89587870 HSA_NumtS_142_b2 436 | chr3 89587898 89588115 HSA_NumtS_142_b3 437 | chr3 89588249 89588418 HSA_NumtS_142_b4 438 | chr3 89588492 89588754 HSA_NumtS_142_b5 439 | chr3 89588825 89590096 HSA_NumtS_142_b6 440 | chr3 96617188 96618510 HSA_NumtS_143_b1 441 | chr3 96765136 96765284 HSA_NumtS_144_b1 442 | chr3 98844865 98844997 HSA_NumtS_145_b1 443 | chr3 106894139 106895900 HSA_NumtS_146_b1 444 | chr3 106896293 106898254 HSA_NumtS_146_b2 445 | chr3 106898599 106898879 HSA_NumtS_147_b1 446 | chr3 106898887 106900816 HSA_NumtS_148_b1 447 | chr3 106900859 106901066 HSA_NumtS_149_b1 448 | chr3 106901068 106902522 HSA_NumtS_150_b1 449 | chr3 106902523 106903555 HSA_NumtS_151_b1 450 | chr3 120722023 120722645 HSA_NumtS_152_b1 451 | chr3 122688738 122688804 HSA_NumtS_153_b1 452 | chr3 127005476 127005601 HSA_NumtS_154_b1 453 | chr3 128989160 128989239 HSA_NumtS_155_b1 454 | chr3 150842071 150842227 HSA_NumtS_156_b1 455 | chr3 152919694 152919856 HSA_NumtS_157_b1 456 | chr3 153658803 153658987 HSA_NumtS_158_b1 457 | chr3 160947654 160948042 HSA_NumtS_159_b1 458 | chr3 166159903 166160123 HSA_NumtS_160_b1 459 | chr3 166160230 166160348 HSA_NumtS_160_b2 460 | chr3 166160405 166162064 HSA_NumtS_160_b3 461 | chr3 166226707 166226802 HSA_NumtS_161_b1 462 | chr3 166227122 166227543 HSA_NumtS_161_b2 463 | chr3 166232482 166233257 HSA_NumtS_162_b1 464 | chr3 169936797 169936868 HSA_NumtS_163_b1 465 | chr3 171534423 171534651 HSA_NumtS_164_b1 466 | chr3 176216490 176216547 HSA_NumtS_165_b1 467 | chr3 176966318 176966436 HSA_NumtS_166_b1 468 | chr3 177010720 177011028 HSA_NumtS_167_b1 469 | chr3 186058647 186058760 HSA_NumtS_168_b1 470 | chr3 187466181 187466251 HSA_NumtS_169_b1 471 | chr4 5404539 5404717 HSA_NumtS_170_b1 472 | chr4 12640294 12640635 HSA_NumtS_171_b1 473 | chr4 14505906 14506449 HSA_NumtS_172_b1 474 | chr4 17061877 17062167 HSA_NumtS_173_b1 475 | chr4 25717914 25718883 HSA_NumtS_174_b1 476 | chr4 25719212 25721057 HSA_NumtS_174_b2 477 | chr4 27730424 27730575 HSA_NumtS_175_b1 478 | chr4 30884397 30885182 HSA_NumtS_176_b1 479 | chr4 41023226 41023289 HSA_NumtS_177_b1 480 | chr4 46118149 46118215 HSA_NumtS_178_b1 481 | chr4 47772272 47772364 HSA_NumtS_179_b1 482 | chr4 49246020 49246683 HSA_NumtS_180_b1 483 | chr4 49246760 49246916 HSA_NumtS_180_b2 484 | chr4 49548563 49549436 HSA_NumtS_181_b1 485 | chr4 55328160 55328290 HSA_NumtS_182_b1 486 | chr4 64606380 64606595 HSA_NumtS_183_b1 487 | chr4 64606702 64607017 HSA_NumtS_184_b1 488 | chr4 64607023 64610541 HSA_NumtS_184_b2 489 | chr4 64610543 64611882 HSA_NumtS_184_b3 490 | chr4 66065344 66065456 HSA_NumtS_185_b1 491 | chr4 68049637 68050446 HSA_NumtS_186_b1 492 | chr4 68572491 68572619 HSA_NumtS_187_b1 493 | chr4 78008530 78008761 HSA_NumtS_188_b1 494 | chr4 78187532 78187596 HSA_NumtS_189_b1 495 | chr4 81733389 81734046 HSA_NumtS_190_b1 496 | chr4 87676077 87676206 HSA_NumtS_191_b1 497 | chr4 89731797 89732017 HSA_NumtS_192_b1 498 | chr4 92701825 92702653 HSA_NumtS_193_b1 499 | chr4 93817332 93817509 HSA_NumtS_194_b1 500 | chr4 107374106 107374217 HSA_NumtS_195_b1 501 | chr4 110444593 110444658 HSA_NumtS_196_b1 502 | chr4 116296602 116296867 HSA_NumtS_197_b1 503 | chr4 116296868 116296948 HSA_NumtS_197_b2 504 | chr4 116296995 116297101 HSA_NumtS_197_b3 505 | chr4 116297741 116300312 HSA_NumtS_197_b4 506 | chr4 128081405 128081776 HSA_NumtS_198_b1 507 | chr4 155451635 155466470 HSA_NumtS_199_b1 508 | chr4 160044596 160044674 HSA_NumtS_200_b1 509 | chr4 162421374 162421541 HSA_NumtS_201_b1 510 | chr4 163229039 163229149 HSA_NumtS_202_b1 511 | chr4 179069437 179069591 HSA_NumtS_203_b1 512 | chr4 181237363 181237546 HSA_NumtS_204_b1 513 | chr4 185294150 185294213 HSA_NumtS_205_b1 514 | chr5 1370544 1370768 HSA_NumtS_206_b1 515 | chr5 5395380 5395937 HSA_NumtS_207_b1 516 | chr5 5395969 5396968 HSA_NumtS_208_b1 517 | chr5 8618577 8621019 HSA_NumtS_209_b1 518 | chr5 8621020 8621260 HSA_NumtS_209_b2 519 | chr5 8622089 8622573 HSA_NumtS_209_b3 520 | chr5 18457220 18457337 HSA_NumtS_210_b1 521 | chr5 29533605 29533677 HSA_NumtS_211_b1 522 | chr5 60761539 60762024 HSA_NumtS_212_b1 523 | chr5 73239368 73239452 HSA_NumtS_213_b1 524 | chr5 73775892 73775932 HSA_NumtS_214_b1 525 | chr5 80650022 80652368 HSA_NumtS_215_b1 526 | chr5 85237550 85238064 HSA_NumtS_216_b1 527 | chr5 87187732 87187787 HSA_NumtS_217_b1 528 | chr5 94183137 94183267 HSA_NumtS_218_b1 529 | chr5 94567456 94570918 HSA_NumtS_219_b1 530 | chr5 97677393 97679010 HSA_NumtS_220_b1 531 | chr5 97684851 97685278 HSA_NumtS_220_b2 532 | chr5 98409694 98411780 HSA_NumtS_221_b1 533 | chr5 100045938 100055045 HSA_NumtS_222_b1 534 | chr5 106553363 106553532 HSA_NumtS_223_b1 535 | chr5 119127399 119127454 HSA_NumtS_224_b1 536 | chr5 121030987 121031317 HSA_NumtS_225_b1 537 | chr5 122338839 122338973 HSA_NumtS_226_b1 538 | chr5 123760203 123760533 HSA_NumtS_227_b1 539 | chr5 123760802 123761766 HSA_NumtS_227_b2 540 | chr5 134923309 134928527 HSA_NumtS_228_b1 541 | chr5 137305100 137305220 HSA_NumtS_229_b1 542 | chr5 163009429 163009619 HSA_NumtS_230_b1 543 | chr5 163009619 163009706 HSA_NumtS_231_b1 544 | chr5 166530419 166530461 HSA_NumtS_232_b1 545 | chr5 170635558 170635626 HSA_NumtS_233_b1 546 | chr6 1705978 1706165 HSA_NumtS_234_b1 547 | chr6 5477425 5477486 HSA_NumtS_235_b1 548 | chr6 24947835 24948393 HSA_NumtS_236_b1 549 | chr6 32706123 32706901 HSA_NumtS_237_b1 550 | chr6 42037656 42037726 HSA_NumtS_238_b1 551 | chr6 43491167 43491278 HSA_NumtS_239_b1 552 | chr6 51986979 51987073 HSA_NumtS_240_b1 553 | chr6 61574113 61574629 HSA_NumtS_241_b1 554 | chr6 74225483 74225542 HSA_NumtS_242_b1 555 | chr6 74819195 74819317 HSA_NumtS_243_b1 556 | chr6 88555292 88555420 HSA_NumtS_244_b1 557 | chr6 91726720 91727057 HSA_NumtS_245_b1 558 | chr6 91727058 91727162 HSA_NumtS_246_b1 559 | chr6 91727163 91727311 HSA_NumtS_245_b2 560 | chr6 91727310 91727542 HSA_NumtS_247_b1 561 | chr6 94446773 94447105 HSA_NumtS_248_b1 562 | chr6 94447105 94447442 HSA_NumtS_249_b1 563 | chr6 119159307 119159421 HSA_NumtS_250_b1 564 | chr6 125396572 125396756 HSA_NumtS_251_b1 565 | chr6 127312127 127312193 HSA_NumtS_252_b1 566 | chr6 132799651 132799767 HSA_NumtS_253_b1 567 | chr6 133150568 133150832 HSA_NumtS_254_b1 568 | chr6 141120896 141120955 HSA_NumtS_255_b1 569 | chr6 141120956 141121231 HSA_NumtS_255_b2 570 | chr6 143077801 143077943 HSA_NumtS_256_b1 571 | chr6 144728937 144729203 HSA_NumtS_257_b1 572 | chr6 153665572 153669780 HSA_NumtS_258_b1 573 | chr6 154516516 154516589 HSA_NumtS_259_b1 574 | chr6 156547836 156548053 HSA_NumtS_260_b1 575 | chr6 159835594 159835665 HSA_NumtS_261_b1 576 | chr6 161561294 161561344 HSA_NumtS_262_b1 577 | chr6 169186396 169187688 HSA_NumtS_263_b1 578 | chr7 22746881 22746936 HSA_NumtS_264_b1 579 | chr7 22746937 22747213 HSA_NumtS_264_b2 580 | chr7 22747524 22748838 HSA_NumtS_264_b3 581 | chr7 36228621 36228884 HSA_NumtS_265_b1 582 | chr7 37518747 37518835 HSA_NumtS_266_b1 583 | chr7 45251971 45252127 HSA_NumtS_267_b1 584 | chr7 46189015 46189089 HSA_NumtS_268_b1 585 | chr7 57166975 57174197 HSA_NumtS_269_b1 586 | chr7 57174238 57175065 HSA_NumtS_269_b2 587 | chr7 57175066 57175125 HSA_NumtS_269_b3 588 | chr7 57185765 57198470 HSA_NumtS_270_b1 589 | chr7 57198471 57198530 HSA_NumtS_270_b2 590 | chr7 63187065 63187130 HSA_NumtS_271_b1 591 | chr7 64104094 64107359 HSA_NumtS_272_b1 592 | chr7 64107457 64112585 HSA_NumtS_272_b2 593 | chr7 64112593 64112691 HSA_NumtS_272_b3 594 | chr7 67627832 67628353 HSA_NumtS_273_b1 595 | chr7 68097736 68097867 HSA_NumtS_274_b1 596 | chr7 68098114 68098352 HSA_NumtS_274_b2 597 | chr7 68266320 68266542 HSA_NumtS_275_b1 598 | chr7 68266578 68266859 HSA_NumtS_275_b2 599 | chr7 68736528 68736633 HSA_NumtS_276_b1 600 | chr7 69329981 69330259 HSA_NumtS_277_b1 601 | chr7 69330592 69331449 HSA_NumtS_277_b2 602 | chr7 69331741 69332726 HSA_NumtS_277_b3 603 | chr7 69333024 69334004 HSA_NumtS_277_b4 604 | chr7 69334304 69334595 HSA_NumtS_277_b5 605 | chr7 111085401 111085486 HSA_NumtS_278_b1 606 | chr7 112372646 112375580 HSA_NumtS_279_b1 607 | chr7 116306359 116306502 HSA_NumtS_280_b1 608 | chr7 117263549 117263879 HSA_NumtS_281_b1 609 | chr7 117263934 117264386 HSA_NumtS_282_b1 610 | chr7 117264395 117264530 HSA_NumtS_283_b1 611 | chr7 118896897 118897084 HSA_NumtS_284_b1 612 | chr7 133275456 133275654 HSA_NumtS_285_b1 613 | chr7 141173044 141173213 HSA_NumtS_286_b1 614 | chr7 141801337 141804665 HSA_NumtS_287_b1 615 | chr7 141804969 141805530 HSA_NumtS_287_b2 616 | chr7 142664441 142664453 HSA_NumtS_288_b1 617 | chr7 142664454 142664691 HSA_NumtS_288_b2 618 | chr7 142665177 142667693 HSA_NumtS_288_b3 619 | chr7 145997333 145997428 HSA_NumtS_289_b1 620 | chr7 147053045 147053170 HSA_NumtS_290_b1 621 | chr7 150132001 150132160 HSA_NumtS_291_b1 622 | chr7 151425270 151425327 HSA_NumtS_292_b1 623 | chr7 153968663 153968820 HSA_NumtS_293_b1 624 | chr7 155074617 155074762 HSA_NumtS_294_b1 625 | chr8 13353397 13353521 HSA_NumtS_295_b1 626 | chr8 16057004 16057112 HSA_NumtS_296_b1 627 | chr8 18200916 18201345 HSA_NumtS_297_b1 628 | chr8 18849262 18849708 HSA_NumtS_298_b1 629 | chr8 20551196 20551406 HSA_NumtS_299_b1 630 | chr8 26951274 26951450 HSA_NumtS_300_b1 631 | chr8 27903245 27903719 HSA_NumtS_301_b1 632 | chr8 32805885 32805967 HSA_NumtS_302_b1 633 | chr8 33010609 33010720 HSA_NumtS_303_b1 634 | chr8 33010721 33010796 HSA_NumtS_303_b2 635 | chr8 33011450 33015866 HSA_NumtS_303_b3 636 | chr8 33016661 33017255 HSA_NumtS_303_b4 637 | chr8 36277610 36279696 HSA_NumtS_304_b1 638 | chr8 37353323 37353408 HSA_NumtS_305_b1 639 | chr8 40070590 40070647 HSA_NumtS_306_b1 640 | chr8 40070648 40070687 HSA_NumtS_307_b1 641 | chr8 46827238 46828119 HSA_NumtS_308_b1 642 | chr8 46828202 46831050 HSA_NumtS_308_b2 643 | chr8 46836022 46836126 HSA_NumtS_308_b3 644 | chr8 46836642 46836709 HSA_NumtS_308_b4 645 | chr8 46836829 46836869 HSA_NumtS_308_b5 646 | chr8 46836870 46839027 HSA_NumtS_308_b6 647 | chr8 48400687 48400781 HSA_NumtS_309_b1 648 | chr8 48400793 48400905 HSA_NumtS_310_b1 649 | chr8 52751743 52751957 HSA_NumtS_311_b1 650 | chr8 67580528 67580558 HSA_NumtS_312_b1 651 | chr8 67580804 67580863 HSA_NumtS_312_b2 652 | chr8 67580864 67583874 HSA_NumtS_312_b3 653 | chr8 67584187 67587086 HSA_NumtS_312_b4 654 | chr8 67587096 67588442 HSA_NumtS_312_b5 655 | chr8 69102973 69103231 HSA_NumtS_313_b1 656 | chr8 72985697 72985854 HSA_NumtS_314_b1 657 | chr8 76201763 76202139 HSA_NumtS_315_b1 658 | chr8 76645457 76645669 HSA_NumtS_316_b1 659 | chr8 78282214 78282468 HSA_NumtS_317_b1 660 | chr8 97908037 97908155 HSA_NumtS_318_b1 661 | chr8 99495870 99495953 HSA_NumtS_319_b1 662 | chr8 103081963 103082084 HSA_NumtS_320_b1 663 | chr8 103082687 103084355 HSA_NumtS_320_b2 664 | chr8 103084765 103086701 HSA_NumtS_320_b3 665 | chr8 103087000 103087538 HSA_NumtS_320_b4 666 | chr8 103088415 103088930 HSA_NumtS_320_b5 667 | chr8 103089426 103090628 HSA_NumtS_320_b6 668 | chr8 103091121 103091463 HSA_NumtS_320_b7 669 | chr8 110933244 110935366 HSA_NumtS_321_b1 670 | chr8 111248647 111248908 HSA_NumtS_322_b1 671 | chr8 111249111 111249387 HSA_NumtS_323_b1 672 | chr8 120224313 120224440 HSA_NumtS_324_b1 673 | chr8 133515505 133515844 HSA_NumtS_325_b1 674 | chr8 133754807 133756648 HSA_NumtS_326_b1 675 | chr8 139651453 139651818 HSA_NumtS_327_b1 676 | chr9 5091067 5092005 HSA_NumtS_328_b1 677 | chr9 5092087 5098698 HSA_NumtS_329_b1 678 | chr9 5098996 5100917 HSA_NumtS_329_b2 679 | chr9 5107046 5110735 HSA_NumtS_329_b3 680 | chr9 18336644 18336733 HSA_NumtS_330_b1 681 | chr9 18826068 18826128 HSA_NumtS_331_b1 682 | chr9 33655881 33655893 HSA_NumtS_332_b1 683 | chr9 33655894 33655968 HSA_NumtS_332_b2 684 | chr9 33656611 33659130 HSA_NumtS_332_b3 685 | chr9 33916480 33916741 HSA_NumtS_333_b1 686 | chr9 34999144 34999510 HSA_NumtS_334_b1 687 | chr9 65222579 65222722 HSA_NumtS_335_b1 688 | chr9 63026134 63026277 HSA_NumtS_336_b1 689 | chr9 64737075 64737218 HSA_NumtS_337_b1 690 | chr9 65222580 65222723 HSA_NumtS_338_b1 691 | chr9 65799262 65799405 HSA_NumtS_339_b1 692 | chr9 77964854 77965344 HSA_NumtS_340_b1 693 | chr9 78741011 78742818 HSA_NumtS_341_b1 694 | chr9 78743087 78743508 HSA_NumtS_341_b2 695 | chr9 78744227 78744286 HSA_NumtS_341_b3 696 | chr9 78744287 78744442 HSA_NumtS_341_b4 697 | chr9 78810865 78810972 HSA_NumtS_342_b1 698 | chr9 80563554 80565930 HSA_NumtS_343_b1 699 | chr9 82427347 82428115 HSA_NumtS_344_b1 700 | chr9 90435761 90435837 HSA_NumtS_345_b1 701 | chr9 92108918 92109690 HSA_NumtS_346_b1 702 | chr9 92109986 92111770 HSA_NumtS_346_b2 703 | chr9 92539063 92539686 HSA_NumtS_347_b1 704 | chr9 95877109 95877158 HSA_NumtS_348_b1 705 | chr9 101793454 101793877 HSA_NumtS_349_b1 706 | chr9 103065409 103065612 HSA_NumtS_350_b1 707 | chr9 109300229 109300444 HSA_NumtS_351_b1 708 | chr9 109300864 109301143 HSA_NumtS_351_b2 709 | chr9 129878857 129878931 HSA_NumtS_352_b1 710 | chr9 133417316 133417432 HSA_NumtS_353_b1 711 | chrX 5168837 5170772 HSA_NumtS_564_b1 712 | chrX 5171069 5171231 HSA_NumtS_564_b2 713 | chrX 5171386 5171444 HSA_NumtS_564_b3 714 | chrX 55178235 55184027 HSA_NumtS_565_b1 715 | chrX 62838407 62838454 HSA_NumtS_566_b1 716 | chrX 62838897 62839033 HSA_NumtS_566_b2 717 | chrX 62839034 62839093 HSA_NumtS_566_b3 718 | chrX 62839625 62841817 HSA_NumtS_566_b4 719 | chrX 62842012 62844145 HSA_NumtS_566_b5 720 | chrX 102772553 102772697 HSA_NumtS_567_b1 721 | chrX 102772996 102773271 HSA_NumtS_567_b2 722 | chrX 102802913 102803041 HSA_NumtS_568_b1 723 | chrX 102804641 102805623 HSA_NumtS_568_b2 724 | chrX 102806440 102807943 HSA_NumtS_568_b3 725 | chrX 102808251 102808514 HSA_NumtS_568_b4 726 | chrX 102808682 102808721 HSA_NumtS_568_b5 727 | chrX 102809122 102810300 HSA_NumtS_568_b6 728 | chrX 126471704 126472452 HSA_NumtS_569_b1 729 | chrX 126472467 126472732 HSA_NumtS_570_b1 730 | chrX 126472731 126473284 HSA_NumtS_571_b1 731 | chrX 126729042 126730366 HSA_NumtS_572_b1 732 | chrX 143429276 143429308 HSA_NumtS_573_b1 733 | chrX 143429309 143429368 HSA_NumtS_573_b2 734 | chrX 143429926 143431536 HSA_NumtS_573_b3 735 | chrX 143431694 143432101 HSA_NumtS_573_b4 736 | chrX 143432440 143434118 HSA_NumtS_573_b5 737 | chrX 143434475 143435147 HSA_NumtS_573_b6 738 | chrY 4344781 4344851 HSA_NumtS_574_b1 739 | chrY 8363562 8365288 HSA_NumtS_575_b1 740 | chrY 8365576 8365845 HSA_NumtS_576_b1 741 | chrY 8366628 8367199 HSA_NumtS_577_b1 742 | chrY 8369519 8371813 HSA_NumtS_577_b2 743 | chrY 8371838 8372714 HSA_NumtS_577_b3 744 | chrY 9141896 9141961 HSA_NumtS_578_b1 745 | chrY 11132098 11133053 HSA_NumtS_579_b1 746 | chrY 11133066 11134411 HSA_NumtS_580_b1 747 | chrY 11134403 11134452 HSA_NumtS_581_b1 748 | chrY 11134475 11134991 HSA_NumtS_582_b1 749 | chrY 11445424 11445564 HSA_NumtS_583_b1 750 | chrY 14068156 14068288 HSA_NumtS_584_b1 751 | chrY 18872102 18872247 HSA_NumtS_585_b1 752 | -------------------------------------------------------------------------------- /refNumts.bed: -------------------------------------------------------------------------------- 1 | chr1 564464 570304 HSA_NumtS_001_b1 2 | chr1 5614806 5614937 HSA_NumtS_002_b1 3 | chr1 5910318 5910528 HSA_NumtS_003_b1 4 | chr1 8969802 8969967 HSA_NumtS_004_b1 5 | chr1 9634687 9634887 HSA_NumtS_005_b1 6 | chr1 11202904 11202975 HSA_NumtS_006_b1 7 | chr1 11485319 11485829 HSA_NumtS_007_b1 8 | chr1 18921906 18922120 HSA_NumtS_008_b1 9 | chr1 38077348 38077421 HSA_NumtS_009_b1 10 | chr1 50482768 50483177 HSA_NumtS_010_b1 11 | chr1 55838223 55839324 HSA_NumtS_011_b1 12 | chr1 77436934 77437100 HSA_NumtS_012_b1 13 | chr1 81546486 81546860 HSA_NumtS_013_b1 14 | chr1 94385731 94385970 HSA_NumtS_014_b1 15 | chr1 94386266 94386333 HSA_NumtS_014_b2 16 | chr1 94387537 94387832 HSA_NumtS_014_b3 17 | chr1 94389832 94390416 HSA_NumtS_014_b4 18 | chr1 94390946 94391630 HSA_NumtS_014_b5 19 | chr1 94392191 94392581 HSA_NumtS_014_b6 20 | chr1 94393032 94393864 HSA_NumtS_014_b7 21 | chr1 94399305 94399832 HSA_NumtS_014_b8 22 | chr1 94400594 94400784 HSA_NumtS_014_b9 23 | chr1 94401962 94403299 HSA_NumtS_014_b10 24 | chr1 104163778 104163820 HSA_NumtS_015_b1 25 | chr1 107345251 107348747 HSA_NumtS_016_b1 26 | chr1 114119380 114119503 HSA_NumtS_017_b1 27 | chr1 142790231 142791177 HSA_NumtS_018_b1 28 | chr1 142791197 142792537 HSA_NumtS_019_b1 29 | chr1 142792529 142792578 HSA_NumtS_020_b1 30 | chr1 142792601 142793117 HSA_NumtS_021_b1 31 | chr1 143243224 143243741 HSA_NumtS_022_b1 32 | chr1 143243764 143243813 HSA_NumtS_023_b1 33 | chr1 143243808 143245150 HSA_NumtS_024_b1 34 | chr1 143245170 143246118 HSA_NumtS_025_b1 35 | chr1 143342817 143343249 HSA_NumtS_026_b1 36 | chr1 143343269 143344612 HSA_NumtS_027_b1 37 | chr1 143344607 143344656 HSA_NumtS_028_b1 38 | chr1 143344679 143345193 HSA_NumtS_029_b1 39 | chr1 147332804 147332915 HSA_NumtS_030_b1 40 | chr1 169443301 169443396 HSA_NumtS_031_b1 41 | chr1 172679671 172679742 HSA_NumtS_032_b1 42 | chr1 181391921 181392314 HSA_NumtS_033_b1 43 | chr1 190878857 190879190 HSA_NumtS_034_b1 44 | chr1 191930823 191931122 HSA_NumtS_035_b1 45 | chr1 205444464 205444632 HSA_NumtS_036_b1 46 | chr1 212678338 212678414 HSA_NumtS_037_b1 47 | chr1 215673139 215673177 HSA_NumtS_038_b1 48 | chr1 220628496 220628597 HSA_NumtS_039_b1 49 | chr1 227145859 227145959 HSA_NumtS_040_b1 50 | chr1 235683320 235684221 HSA_NumtS_041_b1 51 | chr1 235700696 235705689 HSA_NumtS_042_b1 52 | chr1 238104063 238109691 HSA_NumtS_043_b1 53 | chr1 238110376 238110619 HSA_NumtS_043_b2 54 | chr1 238110620 238115021 HSA_NumtS_043_b3 55 | chr1 240713186 240713317 HSA_NumtS_044_b1 56 | chr1 249198582 249198701 HSA_NumtS_045_b1 57 | chr10 2277883 2278054 HSA_NumtS_354_b1 58 | chr10 20035566 20036099 HSA_NumtS_355_b1 59 | chr10 20036102 20036445 HSA_NumtS_356_b1 60 | chr10 20036447 20037579 HSA_NumtS_357_b1 61 | chr10 21093219 21093505 HSA_NumtS_358_b1 62 | chr10 21130300 21130473 HSA_NumtS_359_b1 63 | chr10 25927446 25927520 HSA_NumtS_360_b1 64 | chr10 27162190 27162334 HSA_NumtS_361_b1 65 | chr10 28165677 28165738 HSA_NumtS_362_b1 66 | chr10 30854197 30854269 HSA_NumtS_363_b1 67 | chr10 33312027 33312154 HSA_NumtS_364_b1 68 | chr10 36503827 36503943 HSA_NumtS_365_b1 69 | chr10 36721804 36724090 HSA_NumtS_366_b1 70 | chr10 37889725 37891902 HSA_NumtS_367_b1 71 | chr10 39054944 39055084 HSA_NumtS_368_b1 72 | chr10 42705973 42706113 HSA_NumtS_369_b1 73 | chr10 46843072 46843510 HSA_NumtS_370_b1 74 | chr10 47323877 47324315 HSA_NumtS_371_b1 75 | chr10 49032828 49033266 HSA_NumtS_372_b1 76 | chr10 57356796 57356846 HSA_NumtS_373_b1 77 | chr10 57357576 57359841 HSA_NumtS_374_b1 78 | chr10 57360133 57360493 HSA_NumtS_374_b2 79 | chr10 70915071 70915182 HSA_NumtS_375_b1 80 | chr10 71349635 71353116 HSA_NumtS_376_b1 81 | chr10 71353775 71354688 HSA_NumtS_376_b2 82 | chr10 71355014 71356206 HSA_NumtS_376_b3 83 | chr10 81170982 81171049 HSA_NumtS_377_b1 84 | chr10 91546381 91546867 HSA_NumtS_378_b1 85 | chr10 96533929 96534385 HSA_NumtS_379_b1 86 | chr10 96700016 96700878 HSA_NumtS_380_b1 87 | chr10 98410666 98410789 HSA_NumtS_381_b1 88 | chr10 101817140 101817672 HSA_NumtS_382_b1 89 | chr10 114654337 114654449 HSA_NumtS_383_b1 90 | chr10 121597520 121597692 HSA_NumtS_384_b1 91 | chr10 127508367 127508506 HSA_NumtS_385_b1 92 | chr10 131929400 131929468 HSA_NumtS_386_b1 93 | chr11 6523133 6523260 HSA_NumtS_387_b1 94 | chr11 10529434 10531827 HSA_NumtS_388_b1 95 | chr11 11261524 11261721 HSA_NumtS_389_b1 96 | chr11 31576656 31576852 HSA_NumtS_390_b1 97 | chr11 37563076 37563610 HSA_NumtS_391_b1 98 | chr11 39787796 39794792 HSA_NumtS_392_b1 99 | chr11 47345535 47345732 HSA_NumtS_393_b1 100 | chr11 48994505 48995015 HSA_NumtS_394_b1 101 | chr11 49834071 49834576 HSA_NumtS_395_b1 102 | chr11 55040962 55041460 HSA_NumtS_396_b1 103 | chr11 63954807 63954943 HSA_NumtS_397_b1 104 | chr11 73221706 73221868 HSA_NumtS_398_b1 105 | chr11 74278043 74278181 HSA_NumtS_399_b1 106 | chr11 77307220 77307301 HSA_NumtS_400_b1 107 | chr11 81262616 81268377 HSA_NumtS_401_b1 108 | chr11 87524440 87524999 HSA_NumtS_402_b1 109 | chr11 87525012 87525620 HSA_NumtS_403_b1 110 | chr11 87525925 87527366 HSA_NumtS_403_b2 111 | chr11 89641709 89642219 HSA_NumtS_404_b1 112 | chr11 89668661 89669171 HSA_NumtS_405_b1 113 | chr11 89831300 89831441 HSA_NumtS_406_b1 114 | chr11 103272744 103280305 HSA_NumtS_407_b1 115 | chr11 103280309 103280351 HSA_NumtS_408_b1 116 | chr11 103280391 103281852 HSA_NumtS_407_b2 117 | chr11 110747716 110747875 HSA_NumtS_409_b1 118 | chr11 122874314 122874385 HSA_NumtS_410_b1 119 | chr11 123010593 123011094 HSA_NumtS_411_b1 120 | chr11 125721826 125721936 HSA_NumtS_412_b1 121 | chr11 125785411 125785481 HSA_NumtS_413_b1 122 | chr11 127936427 127936478 HSA_NumtS_414_b1 123 | chr12 4341804 4341858 HSA_NumtS_415_b1 124 | chr12 7772911 7773193 HSA_NumtS_416_b1 125 | chr12 7790010 7790241 HSA_NumtS_417_b1 126 | chr12 7799468 7799682 HSA_NumtS_418_b1 127 | chr12 9475304 9475582 HSA_NumtS_419_b1 128 | chr12 9561393 9561668 HSA_NumtS_420_b1 129 | chr12 22158766 22158870 HSA_NumtS_421_b1 130 | chr12 26725475 26725700 HSA_NumtS_422_b1 131 | chr12 31168380 31168611 HSA_NumtS_423_b1 132 | chr12 31399642 31400696 HSA_NumtS_424_b1 133 | chr12 31400982 31401907 HSA_NumtS_424_b2 134 | chr12 31402048 31402172 HSA_NumtS_424_b3 135 | chr12 40680091 40680463 HSA_NumtS_425_b1 136 | chr12 41757437 41757525 HSA_NumtS_426_b1 137 | chr12 42092360 42094053 HSA_NumtS_427_b1 138 | chr12 50211122 50211285 HSA_NumtS_428_b1 139 | chr12 63167790 63167857 HSA_NumtS_429_b1 140 | chr12 108572113 108572486 HSA_NumtS_430_b1 141 | chr12 114610224 114610285 HSA_NumtS_431_b1 142 | chr12 127067855 127067915 HSA_NumtS_432_b1 143 | chr12 127067916 127067962 HSA_NumtS_433_b1 144 | chr12 127067963 127068009 HSA_NumtS_434_b1 145 | chr12 127068008 127068056 HSA_NumtS_435_b1 146 | chr12 127068057 127068103 HSA_NumtS_436_b1 147 | chr12 127068104 127068150 HSA_NumtS_437_b1 148 | chr12 127068151 127068197 HSA_NumtS_438_b1 149 | chr12 127068198 127068244 HSA_NumtS_439_b1 150 | chr12 127068245 127068291 HSA_NumtS_440_b1 151 | chr12 127068292 127068338 HSA_NumtS_441_b1 152 | chr12 127068339 127068385 HSA_NumtS_442_b1 153 | chr12 127068386 127068432 HSA_NumtS_443_b1 154 | chr12 127068433 127068479 HSA_NumtS_444_b1 155 | chr12 127068480 127068526 HSA_NumtS_445_b1 156 | chr12 127068527 127068573 HSA_NumtS_446_b1 157 | chr12 127068574 127068620 HSA_NumtS_447_b1 158 | chr12 127068621 127068667 HSA_NumtS_448_b1 159 | chr12 127068668 127068714 HSA_NumtS_449_b1 160 | chr12 127068715 127068761 HSA_NumtS_450_b1 161 | chr12 127068762 127068808 HSA_NumtS_451_b1 162 | chr12 127068809 127068855 HSA_NumtS_452_b1 163 | chr12 127068856 127068942 HSA_NumtS_453_b1 164 | chr12 130800144 130800346 HSA_NumtS_454_b1 165 | chr13 22687328 22687454 HSA_NumtS_455_b1 166 | chr13 24339953 24340072 HSA_NumtS_456_b1 167 | chr13 24340073 24340224 HSA_NumtS_457_b1 168 | chr13 32767477 32767621 HSA_NumtS_458_b1 169 | chr13 36639628 36639833 HSA_NumtS_459_b1 170 | chr13 41342503 41342558 HSA_NumtS_460_b1 171 | chr13 48144978 48145263 HSA_NumtS_461_b1 172 | chr13 50561080 50561257 HSA_NumtS_462_b1 173 | chr13 54465722 54465886 HSA_NumtS_463_b1 174 | chr13 56545768 56545893 HSA_NumtS_464_b1 175 | chr13 57262610 57262783 HSA_NumtS_465_b1 176 | chr13 57778515 57779038 HSA_NumtS_466_b1 177 | chr13 76166375 76166543 HSA_NumtS_467_b1 178 | chr13 85094391 85096308 HSA_NumtS_468_b1 179 | chr13 85096692 85098200 HSA_NumtS_468_b2 180 | chr13 89223906 89224465 HSA_NumtS_469_b1 181 | chr13 96344795 96345082 HSA_NumtS_470_b1 182 | chr13 96345376 96348707 HSA_NumtS_470_b2 183 | chr13 96348708 96348767 HSA_NumtS_470_b3 184 | chr13 97349939 97350084 HSA_NumtS_471_b1 185 | chr13 110076472 110076727 HSA_NumtS_472_b1 186 | chr14 23895673 23895740 HSA_NumtS_473_b1 187 | chr14 32953304 32954324 HSA_NumtS_474_b1 188 | chr14 52054086 52054486 HSA_NumtS_475_b1 189 | chr14 79321222 79321294 HSA_NumtS_476_b1 190 | chr14 84053850 84055436 HSA_NumtS_477_b1 191 | chr14 84637696 84639184 HSA_NumtS_478_b1 192 | chr14 84639187 84640144 HSA_NumtS_479_b1 193 | chr14 84640136 84642035 HSA_NumtS_479_b2 194 | chr14 84642353 84643426 HSA_NumtS_479_b3 195 | chr14 95665570 95665684 HSA_NumtS_480_b1 196 | chr15 34686795 34687192 HSA_NumtS_481_b1 197 | chr15 34833016 34833413 HSA_NumtS_482_b1 198 | chr15 35688443 35688613 HSA_NumtS_483_b1 199 | chr15 39493574 39493716 HSA_NumtS_484_b1 200 | chr15 40425971 40427393 HSA_NumtS_485_b1 201 | chr15 41449373 41449498 HSA_NumtS_486_b1 202 | chr15 46633589 46633796 HSA_NumtS_487_b1 203 | chr15 48766103 48766190 HSA_NumtS_488_b1 204 | chr15 58082416 58082474 HSA_NumtS_489_b1 205 | chr15 58442495 58443794 HSA_NumtS_490_b1 206 | chr15 58443803 58444808 HSA_NumtS_490_b2 207 | chr15 58444861 58448397 HSA_NumtS_490_b3 208 | chr15 67333249 67333291 HSA_NumtS_491_b1 209 | chr15 88634588 88634639 HSA_NumtS_492_b1 210 | chr16 3417483 3419297 HSA_NumtS_493_b1 211 | chr16 3419598 3421388 HSA_NumtS_493_b2 212 | chr16 3421716 3422332 HSA_NumtS_493_b3 213 | chr16 3422637 3422740 HSA_NumtS_493_b4 214 | chr16 7079728 7079786 HSA_NumtS_494_b1 215 | chr16 10812340 10813870 HSA_NumtS_495_b1 216 | chr16 10813943 10819181 HSA_NumtS_495_b2 217 | chr16 14085783 14085856 HSA_NumtS_496_b1 218 | chr16 20732370 20733723 HSA_NumtS_497_b1 219 | chr16 49100511 49100654 HSA_NumtS_498_b1 220 | chr16 69392573 69392752 HSA_NumtS_499_b1 221 | chr16 70716615 70716669 HSA_NumtS_500_b1 222 | chr16 82153464 82153523 HSA_NumtS_501_b1 223 | chr16 82153524 82153791 HSA_NumtS_501_b2 224 | chr16 82154094 82156214 HSA_NumtS_501_b3 225 | chr16 83706625 83707118 HSA_NumtS_502_b1 226 | chr17 18886741 18886828 HSA_NumtS_503_b1 227 | chr17 18889501 18889665 HSA_NumtS_504_b1 228 | chr17 19500849 19501124 HSA_NumtS_505_b1 229 | chr17 19501125 19501182 HSA_NumtS_505_b2 230 | chr17 19501872 19503387 HSA_NumtS_505_b3 231 | chr17 19503699 19504273 HSA_NumtS_505_b4 232 | chr17 19504577 19505197 HSA_NumtS_505_b5 233 | chr17 19505527 19506861 HSA_NumtS_505_b6 234 | chr17 19507161 19507472 HSA_NumtS_505_b7 235 | chr17 19507793 19509246 HSA_NumtS_505_b8 236 | chr17 19649707 19649807 HSA_NumtS_506_b1 237 | chr17 22018521 22020726 HSA_NumtS_507_b1 238 | chr17 22020727 22021181 HSA_NumtS_508_b1 239 | chr17 22021361 22031841 HSA_NumtS_508_b2 240 | chr17 22031841 22032092 HSA_NumtS_509_b1 241 | chr17 27301366 27301444 HSA_NumtS_510_b1 242 | chr17 33981890 33982213 HSA_NumtS_511_b1 243 | chr17 42075084 42075151 HSA_NumtS_512_b1 244 | chr17 51183094 51183746 HSA_NumtS_513_b1 245 | chr17 52980462 52980535 HSA_NumtS_514_b1 246 | chr17 59204689 59204762 HSA_NumtS_515_b1 247 | chr17 61470716 61470946 HSA_NumtS_516_b1 248 | chr17 66934855 66935264 HSA_NumtS_517_b1 249 | chr17 78591376 78591422 HSA_NumtS_518_b1 250 | chr18 2842230 2842352 HSA_NumtS_519_b1 251 | chr18 14136790 14136856 HSA_NumtS_520_b1 252 | chr18 32208889 32208959 HSA_NumtS_521_b1 253 | chr18 45379617 45379808 HSA_NumtS_522_b1 254 | chr18 48195898 48196248 HSA_NumtS_523_b1 255 | chr18 59541786 59542143 HSA_NumtS_524_b1 256 | chr18 61822121 61822301 HSA_NumtS_525_b1 257 | chr19 12206468 12207149 HSA_NumtS_526_b1 258 | chr19 12215699 12216004 HSA_NumtS_527_b1 259 | chr19 12313746 12313857 HSA_NumtS_528_b1 260 | chr19 12315005 12315359 HSA_NumtS_528_b2 261 | chr19 12613489 12613651 HSA_NumtS_529_b1 262 | chr19 12613952 12614583 HSA_NumtS_529_b2 263 | chr19 12616466 12616820 HSA_NumtS_529_b3 264 | chr19 12618365 12618917 HSA_NumtS_529_b4 265 | chr19 17570002 17570054 HSA_NumtS_530_b1 266 | chr19 28232717 28232786 HSA_NumtS_531_b1 267 | chr19 30513241 30513312 HSA_NumtS_532_b1 268 | chr19 38061381 38062117 HSA_NumtS_533_b1 269 | chr19 38062716 38062814 HSA_NumtS_534_b1 270 | chr19 38063548 38063666 HSA_NumtS_533_b2 271 | chr19 38063793 38064420 HSA_NumtS_533_b3 272 | chr19 38064779 38065519 HSA_NumtS_533_b4 273 | chr19 38066513 38066578 HSA_NumtS_533_b5 274 | chr19 38067158 38068626 HSA_NumtS_533_b6 275 | chr19 38070516 38071401 HSA_NumtS_533_b7 276 | chr19 42694854 42695088 HSA_NumtS_535_b1 277 | chr19 44650581 44650662 HSA_NumtS_536_b1 278 | chr19 44650660 44650720 HSA_NumtS_537_b1 279 | chr19 57202369 57202534 HSA_NumtS_538_b1 280 | chr19 57202875 57203180 HSA_NumtS_538_b2 281 | chr19 57433369 57433792 HSA_NumtS_539_b1 282 | chr2 9886842 9886994 HSA_NumtS_046_b1 283 | chr2 11397042 11397111 HSA_NumtS_047_b1 284 | chr2 15661291 15661406 HSA_NumtS_048_b1 285 | chr2 22539923 22540120 HSA_NumtS_049_b1 286 | chr2 33992538 33992593 HSA_NumtS_050_b1 287 | chr2 41012097 41012257 HSA_NumtS_051_b1 288 | chr2 49456767 49457038 HSA_NumtS_052_b1 289 | chr2 50815830 50816528 HSA_NumtS_053_b1 290 | chr2 56692157 56692326 HSA_NumtS_054_b1 291 | chr2 63916544 63916755 HSA_NumtS_055_b1 292 | chr2 68487872 68488140 HSA_NumtS_056_b1 293 | chr2 75290875 75290980 HSA_NumtS_057_b1 294 | chr2 81893601 81893852 HSA_NumtS_058_b1 295 | chr2 83042103 83047090 HSA_NumtS_059_b1 296 | chr2 83047091 83047150 HSA_NumtS_059_b2 297 | chr2 83047470 83047500 HSA_NumtS_059_b3 298 | chr2 83047996 83048129 HSA_NumtS_059_b4 299 | chr2 85295928 85296157 HSA_NumtS_060_b1 300 | chr2 85952023 85952083 HSA_NumtS_061_b1 301 | chr2 88124409 88124884 HSA_NumtS_062_b1 302 | chr2 92194473 92194609 HSA_NumtS_063_b1 303 | chr2 95564754 95565243 HSA_NumtS_064_b1 304 | chr2 95565292 95565334 HSA_NumtS_065_b1 305 | chr2 95565336 95566715 HSA_NumtS_066_b1 306 | chr2 95566746 95567491 HSA_NumtS_067_b1 307 | chr2 95567552 95567703 HSA_NumtS_067_b2 308 | chr2 97346576 97346656 HSA_NumtS_068_b1 309 | chr2 103122233 103122288 HSA_NumtS_069_b1 310 | chr2 104215184 104215294 HSA_NumtS_070_b1 311 | chr2 109530024 109530165 HSA_NumtS_071_b1 312 | chr2 115247929 115248000 HSA_NumtS_072_b1 313 | chr2 117508761 117509888 HSA_NumtS_073_b1 314 | chr2 117509889 117509941 HSA_NumtS_073_b2 315 | chr2 117777504 117777523 HSA_NumtS_074_b1 316 | chr2 117777773 117777954 HSA_NumtS_074_b2 317 | chr2 117777955 117778030 HSA_NumtS_074_b3 318 | chr2 117778130 117778195 HSA_NumtS_074_b4 319 | chr2 117778789 117784603 HSA_NumtS_074_b5 320 | chr2 120968426 120969851 HSA_NumtS_075_b1 321 | chr2 120970237 120973176 HSA_NumtS_075_b2 322 | chr2 120973476 120973925 HSA_NumtS_075_b3 323 | chr2 120974131 120974803 HSA_NumtS_075_b4 324 | chr2 125438296 125438579 HSA_NumtS_076_b1 325 | chr2 126568394 126568669 HSA_NumtS_077_b1 326 | chr2 126568670 126568745 HSA_NumtS_077_b2 327 | chr2 126569697 126570116 HSA_NumtS_077_b3 328 | chr2 129848516 129848603 HSA_NumtS_078_b1 329 | chr2 131027969 131028112 HSA_NumtS_079_b1 330 | chr2 131028134 131028247 HSA_NumtS_080_b1 331 | chr2 131029383 131040904 HSA_NumtS_080_b2 332 | chr2 132126634 132131329 HSA_NumtS_081_b1 333 | chr2 132137039 132137095 HSA_NumtS_082_b1 334 | chr2 132137114 132137180 HSA_NumtS_083_b1 335 | chr2 132137166 132141275 HSA_NumtS_084_b1 336 | chr2 132141276 132143891 HSA_NumtS_084_b2 337 | chr2 132143893 132144036 HSA_NumtS_085_b1 338 | chr2 133182673 133182748 HSA_NumtS_086_b1 339 | chr2 135283936 135284098 HSA_NumtS_087_b1 340 | chr2 140974570 140975540 HSA_NumtS_088_b1 341 | chr2 140977231 140981793 HSA_NumtS_088_b2 342 | chr2 140982474 140982534 HSA_NumtS_088_b3 343 | chr2 140982535 140982809 HSA_NumtS_088_b4 344 | chr2 143846227 143846443 HSA_NumtS_089_b1 345 | chr2 143847584 143849153 HSA_NumtS_089_b2 346 | chr2 143849834 143849909 HSA_NumtS_090_b1 347 | chr2 143849910 143856616 HSA_NumtS_090_b2 348 | chr2 143856947 143858914 HSA_NumtS_090_b3 349 | chr2 148022752 148022840 HSA_NumtS_091_b1 350 | chr2 149639295 149639426 HSA_NumtS_092_b1 351 | chr2 156119977 156121314 HSA_NumtS_093_b1 352 | chr2 156167598 156170865 HSA_NumtS_094_b1 353 | chr2 167271005 167271223 HSA_NumtS_095_b1 354 | chr2 175645678 175645824 HSA_NumtS_096_b1 355 | chr2 180604074 180604467 HSA_NumtS_097_b1 356 | chr2 197299510 197299570 HSA_NumtS_098_b1 357 | chr2 202077019 202079704 HSA_NumtS_099_b1 358 | chr2 202413991 202414987 HSA_NumtS_100_b1 359 | chr2 202417479 202419952 HSA_NumtS_100_b2 360 | chr2 202420247 202420950 HSA_NumtS_100_b3 361 | chr2 202421825 202422852 HSA_NumtS_100_b4 362 | chr2 203478933 203483013 HSA_NumtS_101_b1 363 | chr2 203483339 203483595 HSA_NumtS_101_b2 364 | chr2 203483606 203483718 HSA_NumtS_102_b1 365 | chr2 203483717 203483909 HSA_NumtS_103_b1 366 | chr2 203483912 203484760 HSA_NumtS_104_b1 367 | chr2 212638506 212638765 HSA_NumtS_105_b1 368 | chr2 212639107 212641354 HSA_NumtS_105_b2 369 | chr2 212641934 212644730 HSA_NumtS_106_b1 370 | chr2 216677391 216677513 HSA_NumtS_107_b1 371 | chr2 217744440 217744561 HSA_NumtS_108_b1 372 | chr2 218439524 218439578 HSA_NumtS_109_b1 373 | chr2 220913540 220913992 HSA_NumtS_110_b1 374 | chr2 227286419 227286510 HSA_NumtS_111_b1 375 | chr2 227586985 227587142 HSA_NumtS_112_b1 376 | chr2 229893503 229893623 HSA_NumtS_113_b1 377 | chr2 236364634 236364891 HSA_NumtS_114_b1 378 | chr2 238429247 238430420 HSA_NumtS_115_b1 379 | chr2 238430426 238430481 HSA_NumtS_116_b1 380 | chr2 238430487 238430531 HSA_NumtS_117_b1 381 | chr2 238430548 238430603 HSA_NumtS_118_b1 382 | chr2 238430609 238430664 HSA_NumtS_119_b1 383 | chr2 238430670 238430725 HSA_NumtS_120_b1 384 | chr2 238430731 238430786 HSA_NumtS_121_b1 385 | chr2 238430792 238430847 HSA_NumtS_122_b1 386 | chr2 238430853 238430908 HSA_NumtS_123_b1 387 | chr2 238430914 238430969 HSA_NumtS_124_b1 388 | chr2 238430975 238431019 HSA_NumtS_125_b1 389 | chr2 238431036 238431091 HSA_NumtS_126_b1 390 | chr2 238431097 238431152 HSA_NumtS_127_b1 391 | chr2 238431158 238431202 HSA_NumtS_128_b1 392 | chr2 238431219 238431274 HSA_NumtS_129_b1 393 | chr2 238431280 238432552 HSA_NumtS_130_b1 394 | chr2 238432615 238432847 HSA_NumtS_131_b1 395 | chr20 2169055 2169129 HSA_NumtS_540_b1 396 | chr20 4709562 4709664 HSA_NumtS_541_b1 397 | chr20 5945826 5946149 HSA_NumtS_542_b1 398 | chr20 9149571 9149612 HSA_NumtS_543_b1 399 | chr20 13147959 13148001 HSA_NumtS_544_b1 400 | chr20 33276467 33276839 HSA_NumtS_545_b1 401 | chr20 55639110 55639179 HSA_NumtS_546_b1 402 | chr20 55932464 55936114 HSA_NumtS_547_b1 403 | chr20 55936496 55936526 HSA_NumtS_547_b2 404 | chr20 55936769 55936828 HSA_NumtS_547_b3 405 | chr20 55936829 55936866 HSA_NumtS_547_b4 406 | chr20 58989452 58989923 HSA_NumtS_548_b1 407 | chr21 15393901 15393967 HSA_NumtS_549_b1 408 | chr21 17842740 17842836 HSA_NumtS_550_b1 409 | chr21 26303695 26303791 HSA_NumtS_551_b1 410 | chr21 37262572 37263471 HSA_NumtS_552_b1 411 | chr21 37264076 37264422 HSA_NumtS_552_b2 412 | chr21 40982932 40983108 HSA_NumtS_553_b1 413 | chr21 43826820 43827042 HSA_NumtS_554_b1 414 | chr21 45890558 45890811 HSA_NumtS_555_b1 415 | chr21 45891982 45892378 HSA_NumtS_555_b2 416 | chr21 45892713 45893026 HSA_NumtS_555_b3 417 | chr21 45893538 45893774 HSA_NumtS_555_b4 418 | chr21 45894093 45894258 HSA_NumtS_555_b5 419 | chr21 45894395 45895058 HSA_NumtS_555_b6 420 | chr21 46796103 46796299 HSA_NumtS_556_b1 421 | chr22 17357114 17357254 HSA_NumtS_557_b1 422 | chr22 24347895 24348398 HSA_NumtS_558_b1 423 | chr22 24349293 24349718 HSA_NumtS_558_b2 424 | chr22 33290869 33291209 HSA_NumtS_559_b1 425 | chr22 36281719 36281765 HSA_NumtS_560_b1 426 | chr22 36568475 36569036 HSA_NumtS_561_b1 427 | chr22 36569486 36570722 HSA_NumtS_561_b2 428 | chr22 36571169 36571906 HSA_NumtS_561_b3 429 | chr22 36571945 36572114 HSA_NumtS_561_b4 430 | chr22 36572242 36572729 HSA_NumtS_561_b5 431 | chr22 36573016 36573291 HSA_NumtS_561_b6 432 | chr22 36573880 36574574 HSA_NumtS_561_b7 433 | chr22 36574577 36574801 HSA_NumtS_561_b8 434 | chr22 36575112 36575597 HSA_NumtS_561_b9 435 | chr22 42160961 42161029 HSA_NumtS_562_b1 436 | chr22 46866163 46866230 HSA_NumtS_563_b1 437 | chr3 12206341 12206477 HSA_NumtS_132_b1 438 | chr3 12206800 12207133 HSA_NumtS_132_b2 439 | chr3 25508995 25509033 HSA_NumtS_133_b1 440 | chr3 28376377 28376431 HSA_NumtS_134_b1 441 | chr3 29839143 29839386 HSA_NumtS_135_b1 442 | chr3 40293638 40295258 HSA_NumtS_136_b1 443 | chr3 43270818 43271223 HSA_NumtS_137_b1 444 | chr3 63832702 63832776 HSA_NumtS_138_b1 445 | chr3 68708101 68708189 HSA_NumtS_139_b1 446 | chr3 68708207 68708282 HSA_NumtS_139_b2 447 | chr3 68708280 68708438 HSA_NumtS_140_b1 448 | chr3 72632451 72632687 HSA_NumtS_141_b1 449 | chr3 89636003 89636806 HSA_NumtS_142_b1 450 | chr3 89636935 89637020 HSA_NumtS_142_b2 451 | chr3 89637048 89637265 HSA_NumtS_142_b3 452 | chr3 89637399 89637568 HSA_NumtS_142_b4 453 | chr3 89637642 89637904 HSA_NumtS_142_b5 454 | chr3 89637975 89639246 HSA_NumtS_142_b6 455 | chr3 96336032 96337354 HSA_NumtS_143_b1 456 | chr3 96483980 96484128 HSA_NumtS_144_b1 457 | chr3 98563709 98563841 HSA_NumtS_145_b1 458 | chr3 106612986 106614747 HSA_NumtS_146_b1 459 | chr3 106615140 106617101 HSA_NumtS_146_b2 460 | chr3 106617446 106617726 HSA_NumtS_147_b1 461 | chr3 106617734 106619663 HSA_NumtS_148_b1 462 | chr3 106619706 106619913 HSA_NumtS_149_b1 463 | chr3 106619915 106621369 HSA_NumtS_150_b1 464 | chr3 106621370 106622402 HSA_NumtS_151_b1 465 | chr3 120440870 120441492 HSA_NumtS_152_b1 466 | chr3 122407585 122407651 HSA_NumtS_153_b1 467 | chr3 126724319 126724444 HSA_NumtS_154_b1 468 | chr3 128708003 128708082 HSA_NumtS_155_b1 469 | chr3 150559858 150560014 HSA_NumtS_156_b1 470 | chr3 152637483 152637645 HSA_NumtS_157_b1 471 | chr3 153376592 153376776 HSA_NumtS_158_b1 472 | chr3 160665442 160665830 HSA_NumtS_159_b1 473 | chr3 165877691 165877911 HSA_NumtS_160_b1 474 | chr3 165878018 165878136 HSA_NumtS_160_b2 475 | chr3 165878193 165879852 HSA_NumtS_160_b3 476 | chr3 165944495 165944590 HSA_NumtS_161_b1 477 | chr3 165944910 165945331 HSA_NumtS_161_b2 478 | chr3 165950270 165951045 HSA_NumtS_162_b1 479 | chr3 169654585 169654656 HSA_NumtS_163_b1 480 | chr3 171252212 171252440 HSA_NumtS_164_b1 481 | chr3 175934278 175934335 HSA_NumtS_165_b1 482 | chr3 176684106 176684224 HSA_NumtS_166_b1 483 | chr3 176728508 176728816 HSA_NumtS_167_b1 484 | chr3 185776436 185776549 HSA_NumtS_168_b1 485 | chr3 187183969 187184039 HSA_NumtS_169_b1 486 | chr4 5406266 5406444 HSA_NumtS_170_b1 487 | chr4 12641918 12642259 HSA_NumtS_171_b1 488 | chr4 14507530 14508073 HSA_NumtS_172_b1 489 | chr4 17063500 17063790 HSA_NumtS_173_b1 490 | chr4 25719536 25720505 HSA_NumtS_174_b1 491 | chr4 25720834 25722679 HSA_NumtS_174_b2 492 | chr4 27732046 27732197 HSA_NumtS_175_b1 493 | chr4 30886019 30886804 HSA_NumtS_176_b1 494 | chr4 41025243 41025306 HSA_NumtS_177_b1 495 | chr4 46120166 46120232 HSA_NumtS_178_b1 496 | chr4 47774289 47774381 HSA_NumtS_179_b1 497 | chr4 49248037 49248700 HSA_NumtS_180_b1 498 | chr4 49248777 49248933 HSA_NumtS_180_b2 499 | chr4 49550580 49551453 HSA_NumtS_181_b1 500 | chr4 56194327 56194457 HSA_NumtS_182_b1 501 | chr4 65472098 65472313 HSA_NumtS_183_b1 502 | chr4 65472420 65472735 HSA_NumtS_184_b1 503 | chr4 65472741 65476259 HSA_NumtS_184_b2 504 | chr4 65476261 65477600 HSA_NumtS_184_b3 505 | chr4 66931062 66931174 HSA_NumtS_185_b1 506 | chr4 68915355 68916164 HSA_NumtS_186_b1 507 | chr4 69438209 69438337 HSA_NumtS_187_b1 508 | chr4 78929684 78929915 HSA_NumtS_188_b1 509 | chr4 79108686 79108750 HSA_NumtS_189_b1 510 | chr4 82654543 82655200 HSA_NumtS_190_b1 511 | chr4 88597229 88597358 HSA_NumtS_191_b1 512 | chr4 90652948 90653168 HSA_NumtS_192_b1 513 | chr4 93622976 93623804 HSA_NumtS_193_b1 514 | chr4 94738483 94738660 HSA_NumtS_194_b1 515 | chr4 108295263 108295374 HSA_NumtS_195_b1 516 | chr4 111365749 111365814 HSA_NumtS_196_b1 517 | chr4 117217758 117218023 HSA_NumtS_197_b1 518 | chr4 117218024 117218104 HSA_NumtS_197_b2 519 | chr4 117218151 117218257 HSA_NumtS_197_b3 520 | chr4 117218897 117221468 HSA_NumtS_197_b4 521 | chr4 129002560 129002931 HSA_NumtS_198_b1 522 | chr4 156372787 156387622 HSA_NumtS_199_b1 523 | chr4 160965748 160965826 HSA_NumtS_200_b1 524 | chr4 163342526 163342693 HSA_NumtS_201_b1 525 | chr4 164150191 164150301 HSA_NumtS_202_b1 526 | chr4 179990591 179990745 HSA_NumtS_203_b1 527 | chr4 182158516 182158699 HSA_NumtS_204_b1 528 | chr4 186215304 186215367 HSA_NumtS_205_b1 529 | chr5 1370659 1370883 HSA_NumtS_206_b1 530 | chr5 5395493 5396050 HSA_NumtS_207_b1 531 | chr5 5396082 5397081 HSA_NumtS_208_b1 532 | chr5 8618689 8621131 HSA_NumtS_209_b1 533 | chr5 8621132 8621372 HSA_NumtS_209_b2 534 | chr5 8622201 8622685 HSA_NumtS_209_b3 535 | chr5 18457329 18457446 HSA_NumtS_210_b1 536 | chr5 29533712 29533784 HSA_NumtS_211_b1 537 | chr5 60057366 60057851 HSA_NumtS_212_b1 538 | chr5 72535195 72535279 HSA_NumtS_213_b1 539 | chr5 73071717 73071757 HSA_NumtS_214_b1 540 | chr5 79945841 79948187 HSA_NumtS_215_b1 541 | chr5 84533368 84533882 HSA_NumtS_216_b1 542 | chr5 86483549 86483604 HSA_NumtS_217_b1 543 | chr5 93518842 93518972 HSA_NumtS_218_b1 544 | chr5 93903161 93906623 HSA_NumtS_219_b1 545 | chr5 97013097 97014714 HSA_NumtS_220_b1 546 | chr5 97020555 97020982 HSA_NumtS_220_b2 547 | chr5 97745398 97747484 HSA_NumtS_221_b1 548 | chr5 99381642 99390749 HSA_NumtS_222_b1 549 | chr5 105889064 105889233 HSA_NumtS_223_b1 550 | chr5 118463094 118463149 HSA_NumtS_224_b1 551 | chr5 120366682 120367012 HSA_NumtS_225_b1 552 | chr5 121674534 121674668 HSA_NumtS_226_b1 553 | chr5 123095897 123096227 HSA_NumtS_227_b1 554 | chr5 123096496 123097460 HSA_NumtS_227_b2 555 | chr5 134258999 134264217 HSA_NumtS_228_b1 556 | chr5 136640789 136640909 HSA_NumtS_229_b1 557 | chr5 162436435 162436625 HSA_NumtS_230_b1 558 | chr5 162436625 162436712 HSA_NumtS_231_b1 559 | chr5 165957424 165957466 HSA_NumtS_232_b1 560 | chr5 170062562 170062630 HSA_NumtS_233_b1 561 | chr6 1706212 1706399 HSA_NumtS_234_b1 562 | chr6 5477658 5477719 HSA_NumtS_235_b1 563 | chr6 24948063 24948621 HSA_NumtS_236_b1 564 | chr6 32673900 32674678 HSA_NumtS_237_b1 565 | chr6 42005394 42005464 HSA_NumtS_238_b1 566 | chr6 43458905 43459016 HSA_NumtS_239_b1 567 | chr6 51851777 51851871 HSA_NumtS_240_b1 568 | chr6 62284018 62284534 HSA_NumtS_241_b1 569 | chr6 74935199 74935258 HSA_NumtS_242_b1 570 | chr6 75528911 75529033 HSA_NumtS_243_b1 571 | chr6 89265011 89265139 HSA_NumtS_244_b1 572 | chr6 92436438 92436775 HSA_NumtS_245_b1 573 | chr6 92436776 92436880 HSA_NumtS_246_b1 574 | chr6 92436881 92437029 HSA_NumtS_245_b2 575 | chr6 92437028 92437260 HSA_NumtS_247_b1 576 | chr6 95156491 95156823 HSA_NumtS_248_b1 577 | chr6 95156823 95157160 HSA_NumtS_249_b1 578 | chr6 119480472 119480586 HSA_NumtS_250_b1 579 | chr6 125717718 125717902 HSA_NumtS_251_b1 580 | chr6 127633272 127633338 HSA_NumtS_252_b1 581 | chr6 133120790 133120906 HSA_NumtS_253_b1 582 | chr6 133471707 133471971 HSA_NumtS_254_b1 583 | chr6 141442033 141442092 HSA_NumtS_255_b1 584 | chr6 141442093 141442368 HSA_NumtS_255_b2 585 | chr6 143398938 143399080 HSA_NumtS_256_b1 586 | chr6 145050073 145050339 HSA_NumtS_257_b1 587 | chr6 153986707 153990915 HSA_NumtS_258_b1 588 | chr6 154837650 154837723 HSA_NumtS_259_b1 589 | chr6 156868970 156869187 HSA_NumtS_260_b1 590 | chr6 160256626 160256697 HSA_NumtS_261_b1 591 | chr6 161982326 161982376 HSA_NumtS_262_b1 592 | chr6 169586491 169587783 HSA_NumtS_263_b1 593 | chr7 22786500 22786555 HSA_NumtS_264_b1 594 | chr7 22786556 22786832 HSA_NumtS_264_b2 595 | chr7 22787143 22788457 HSA_NumtS_264_b3 596 | chr7 36268230 36268493 HSA_NumtS_265_b1 597 | chr7 37558350 37558438 HSA_NumtS_266_b1 598 | chr7 45291570 45291726 HSA_NumtS_267_b1 599 | chr7 46228613 46228687 HSA_NumtS_268_b1 600 | chr7 57234682 57241904 HSA_NumtS_269_b1 601 | chr7 57241945 57242772 HSA_NumtS_269_b2 602 | chr7 57242773 57242832 HSA_NumtS_269_b3 603 | chr7 57253472 57266177 HSA_NumtS_270_b1 604 | chr7 57266178 57266237 HSA_NumtS_270_b2 605 | chr7 62647443 62647508 HSA_NumtS_271_b1 606 | chr7 63564472 63567737 HSA_NumtS_272_b1 607 | chr7 63567835 63572963 HSA_NumtS_272_b2 608 | chr7 63572971 63573069 HSA_NumtS_272_b3 609 | chr7 67092819 67093340 HSA_NumtS_273_b1 610 | chr7 67562723 67562854 HSA_NumtS_274_b1 611 | chr7 67563101 67563339 HSA_NumtS_274_b2 612 | chr7 67731307 67731529 HSA_NumtS_275_b1 613 | chr7 67731565 67731846 HSA_NumtS_275_b2 614 | chr7 68201515 68201620 HSA_NumtS_276_b1 615 | chr7 68794967 68795245 HSA_NumtS_277_b1 616 | chr7 68795578 68796435 HSA_NumtS_277_b2 617 | chr7 68796727 68797712 HSA_NumtS_277_b3 618 | chr7 68798010 68798990 HSA_NumtS_277_b4 619 | chr7 68799290 68799581 HSA_NumtS_277_b5 620 | chr7 110725457 110725542 HSA_NumtS_278_b1 621 | chr7 112012701 112015635 HSA_NumtS_279_b1 622 | chr7 115946413 115946556 HSA_NumtS_280_b1 623 | chr7 116903603 116903933 HSA_NumtS_281_b1 624 | chr7 116903988 116904440 HSA_NumtS_282_b1 625 | chr7 116904449 116904584 HSA_NumtS_283_b1 626 | chr7 118536951 118537138 HSA_NumtS_284_b1 627 | chr7 132960211 132960409 HSA_NumtS_285_b1 628 | chr7 140872844 140873013 HSA_NumtS_286_b1 629 | chr7 141501137 141504465 HSA_NumtS_287_b1 630 | chr7 141504769 141505330 HSA_NumtS_287_b2 631 | chr7 142372273 142372285 HSA_NumtS_288_b1 632 | chr7 142372286 142372523 HSA_NumtS_288_b2 633 | chr7 142373009 142375525 HSA_NumtS_288_b3 634 | chr7 145694426 145694521 HSA_NumtS_289_b1 635 | chr7 146750137 146750262 HSA_NumtS_290_b1 636 | chr7 149829090 149829249 HSA_NumtS_291_b1 637 | chr7 151122356 151122413 HSA_NumtS_292_b1 638 | chr7 153665748 153665905 HSA_NumtS_293_b1 639 | chr7 154866327 154866472 HSA_NumtS_294_b1 640 | chr8 13210906 13211030 HSA_NumtS_295_b1 641 | chr8 15914513 15914621 HSA_NumtS_296_b1 642 | chr8 18058425 18058854 HSA_NumtS_297_b1 643 | chr8 18706772 18707218 HSA_NumtS_298_b1 644 | chr8 20408707 20408917 HSA_NumtS_299_b1 645 | chr8 26808791 26808967 HSA_NumtS_300_b1 646 | chr8 27760762 27761236 HSA_NumtS_301_b1 647 | chr8 32663403 32663485 HSA_NumtS_302_b1 648 | chr8 32868127 32868238 HSA_NumtS_303_b1 649 | chr8 32868239 32868314 HSA_NumtS_303_b2 650 | chr8 32868968 32873384 HSA_NumtS_303_b3 651 | chr8 32874179 32874773 HSA_NumtS_303_b4 652 | chr8 36135128 36137214 HSA_NumtS_304_b1 653 | chr8 37210841 37210926 HSA_NumtS_305_b1 654 | chr8 39928109 39928166 HSA_NumtS_306_b1 655 | chr8 39928167 39928206 HSA_NumtS_307_b1 656 | chr8 47738860 47739741 HSA_NumtS_308_b1 657 | chr8 47739824 47742672 HSA_NumtS_308_b2 658 | chr8 47747644 47747748 HSA_NumtS_308_b3 659 | chr8 47748264 47748331 HSA_NumtS_308_b4 660 | chr8 47748451 47748491 HSA_NumtS_308_b5 661 | chr8 47748492 47750649 HSA_NumtS_308_b6 662 | chr8 49313247 49313341 HSA_NumtS_309_b1 663 | chr8 49313353 49313465 HSA_NumtS_310_b1 664 | chr8 53664303 53664517 HSA_NumtS_311_b1 665 | chr8 68492763 68492793 HSA_NumtS_312_b1 666 | chr8 68493039 68493098 HSA_NumtS_312_b2 667 | chr8 68493099 68496109 HSA_NumtS_312_b3 668 | chr8 68496422 68499321 HSA_NumtS_312_b4 669 | chr8 68499331 68500677 HSA_NumtS_312_b5 670 | chr8 70015208 70015466 HSA_NumtS_313_b1 671 | chr8 73897932 73898089 HSA_NumtS_314_b1 672 | chr8 77113998 77114374 HSA_NumtS_315_b1 673 | chr8 77557692 77557904 HSA_NumtS_316_b1 674 | chr8 79194449 79194703 HSA_NumtS_317_b1 675 | chr8 98920265 98920383 HSA_NumtS_318_b1 676 | chr8 100508098 100508181 HSA_NumtS_319_b1 677 | chr8 104094191 104094312 HSA_NumtS_320_b1 678 | chr8 104094915 104096583 HSA_NumtS_320_b2 679 | chr8 104096993 104098929 HSA_NumtS_320_b3 680 | chr8 104099228 104099766 HSA_NumtS_320_b4 681 | chr8 104100643 104101158 HSA_NumtS_320_b5 682 | chr8 104101654 104102856 HSA_NumtS_320_b6 683 | chr8 104103349 104103691 HSA_NumtS_320_b7 684 | chr8 111945473 111947595 HSA_NumtS_321_b1 685 | chr8 112260876 112261137 HSA_NumtS_322_b1 686 | chr8 112261340 112261616 HSA_NumtS_323_b1 687 | chr8 121236552 121236679 HSA_NumtS_324_b1 688 | chr8 134527748 134528087 HSA_NumtS_325_b1 689 | chr8 134767050 134768891 HSA_NumtS_326_b1 690 | chr8 140663696 140664061 HSA_NumtS_327_b1 691 | chr9 5091067 5092005 HSA_NumtS_328_b1 692 | chr9 5092087 5098698 HSA_NumtS_329_b1 693 | chr9 5098996 5100917 HSA_NumtS_329_b2 694 | chr9 5107046 5110735 HSA_NumtS_329_b3 695 | chr9 18336642 18336731 HSA_NumtS_330_b1 696 | chr9 18826066 18826126 HSA_NumtS_331_b1 697 | chr9 33655879 33655891 HSA_NumtS_332_b1 698 | chr9 33655892 33655966 HSA_NumtS_332_b2 699 | chr9 33656609 33659128 HSA_NumtS_332_b3 700 | chr9 33916478 33916739 HSA_NumtS_333_b1 701 | chr9 34999141 34999507 HSA_NumtS_334_b1 702 | chr9 42779850 42779993 HSA_NumtS_335_b1 703 | chr9 66931106 66931249 HSA_NumtS_336_b1 704 | chr9 69749493 69749636 HSA_NumtS_337_b1 705 | chr9 70116186 70116329 HSA_NumtS_338_b1 706 | chr9 70366639 70366782 HSA_NumtS_339_b1 707 | chr9 80579770 80580260 HSA_NumtS_340_b1 708 | chr9 81355927 81357734 HSA_NumtS_341_b1 709 | chr9 81358003 81358424 HSA_NumtS_341_b2 710 | chr9 81359143 81359202 HSA_NumtS_341_b3 711 | chr9 81359203 81359358 HSA_NumtS_341_b4 712 | chr9 81425781 81425888 HSA_NumtS_342_b1 713 | chr9 83178469 83180845 HSA_NumtS_343_b1 714 | chr9 85042262 85043030 HSA_NumtS_344_b1 715 | chr9 93198043 93198119 HSA_NumtS_345_b1 716 | chr9 94871200 94871972 HSA_NumtS_346_b1 717 | chr9 94872268 94874052 HSA_NumtS_346_b2 718 | chr9 95301345 95301968 HSA_NumtS_347_b1 719 | chr9 98639391 98639440 HSA_NumtS_348_b1 720 | chr9 104555736 104556159 HSA_NumtS_349_b1 721 | chr9 105827691 105827894 HSA_NumtS_350_b1 722 | chr9 112062509 112062724 HSA_NumtS_351_b1 723 | chr9 112063144 112063423 HSA_NumtS_351_b2 724 | chr9 132641136 132641210 HSA_NumtS_352_b1 725 | chr9 136282436 136282552 HSA_NumtS_353_b1 726 | chrX 5086878 5088813 HSA_NumtS_564_b1 727 | chrX 5089110 5089272 HSA_NumtS_564_b2 728 | chrX 5089427 5089485 HSA_NumtS_564_b3 729 | chrX 55204668 55210460 HSA_NumtS_565_b1 730 | chrX 62057877 62057924 HSA_NumtS_566_b1 731 | chrX 62058367 62058503 HSA_NumtS_566_b2 732 | chrX 62058504 62058563 HSA_NumtS_566_b3 733 | chrX 62059095 62061287 HSA_NumtS_566_b4 734 | chrX 62061482 62063615 HSA_NumtS_566_b5 735 | chrX 102027481 102027625 HSA_NumtS_567_b1 736 | chrX 102027924 102028199 HSA_NumtS_567_b2 737 | chrX 102057841 102057969 HSA_NumtS_568_b1 738 | chrX 102059569 102060551 HSA_NumtS_568_b2 739 | chrX 102061368 102062871 HSA_NumtS_568_b3 740 | chrX 102063179 102063442 HSA_NumtS_568_b4 741 | chrX 102063610 102063649 HSA_NumtS_568_b5 742 | chrX 102064050 102065228 HSA_NumtS_568_b6 743 | chrX 125605687 125606435 HSA_NumtS_569_b1 744 | chrX 125606450 125606715 HSA_NumtS_570_b1 745 | chrX 125606714 125607267 HSA_NumtS_571_b1 746 | chrX 125863025 125864349 HSA_NumtS_572_b1 747 | chrX 142517071 142517103 HSA_NumtS_573_b1 748 | chrX 142517104 142517163 HSA_NumtS_573_b2 749 | chrX 142517720 142519330 HSA_NumtS_573_b3 750 | chrX 142519488 142519895 HSA_NumtS_573_b4 751 | chrX 142520234 142521912 HSA_NumtS_573_b5 752 | chrX 142522269 142522941 HSA_NumtS_573_b6 753 | chrY 4212822 4212892 HSA_NumtS_574_b1 754 | chrY 8231603 8233329 HSA_NumtS_575_b1 755 | chrY 8233617 8233886 HSA_NumtS_576_b1 756 | chrY 8234669 8235240 HSA_NumtS_577_b1 757 | chrY 8237560 8239854 HSA_NumtS_577_b2 758 | chrY 8239879 8240755 HSA_NumtS_577_b3 759 | chrY 8979505 8979570 HSA_NumtS_578_b1 760 | chrY 13287774 13288729 HSA_NumtS_579_b1 761 | chrY 13288742 13290087 HSA_NumtS_580_b1 762 | chrY 13290079 13290128 HSA_NumtS_581_b1 763 | chrY 13290151 13290667 HSA_NumtS_582_b1 764 | chrY 13601100 13601240 HSA_NumtS_583_b1 765 | chrY 16180036 16180168 HSA_NumtS_584_b1 766 | chrY 21033988 21034133 HSA_NumtS_585_b1 767 | chrY 59372560 59375560 custom_Y_1 768 | -------------------------------------------------------------------------------- /dinumt.pl: -------------------------------------------------------------------------------- 1 | #!/usr/bin/perl 2 | 3 | use warnings; 4 | use strict; 5 | use Getopt::Long; 6 | 7 | my $version = "0.0.23"; 8 | 9 | #version update 10 | # 0.0.23 11 | # -fixed oversight on mask overlap to consider all possible overlaps of reference numt 12 | # 13 | # 0.0.22 14 | # -changed name to "dinumt" (dynumite!) 15 | # -added option --mt_names for discrete MT identifiers 16 | # 17 | # 0.0.21 18 | # -added option for ensemble genomes (chrMT) 19 | # -fix usage of masking when --include-mask isn't present 20 | # -include reference paremeter in vcf output 21 | # 22 | # 0.0.20 23 | # -added option to output GL information 24 | # 25 | # 0.0.19 26 | # -added option to output supporting reads to auxilliary file 27 | # 28 | # 0.0.18 29 | # -added mito position estimation 30 | # 31 | # 0.0.17 32 | # -implemented likelihood scoring for events 33 | # -added quality, evidence and depth filters 34 | # -implemented VCF format reporting 35 | # 36 | # 0.0.15 37 | # -bug fixes 38 | # 39 | # 0.0.14 40 | # -fixed bug with parsing read group information 41 | # -fixed bug where read group information was not being utilized in findBreakpoint() 42 | # 43 | # 0.0.13 44 | # -added option for minimum clip size to consider 45 | # -added option for maximum limit of putative breakpoints 46 | # -added estimated numt size and mitochondria coordinates to output report 47 | # 48 | # 0.0.12 49 | # -added option to additionally attempt to cluster reads mapping in known numt regions 50 | # whose mates map elsewhere 51 | # -added prefix option for report 52 | # 53 | # 0.0.11 54 | # -added restriction that mates of linked clusters must be consistent with direct 55 | # or inverted sequence (incl changing dir to dnext in input_hash) 56 | # 57 | # 0.0.10 58 | # -added option for maximun read coverage when considering breakpoints 59 | # -moved mate quality filtering to getInput() 60 | # 61 | # 0.0.9 62 | # -changed default cluster reads to 1 63 | # -added option to consider total evidence from read pairs and breakpoints 64 | # -moved mask file comparison to getInput() 65 | # -added option for minimum mapping quality 66 | # -added filter in seqcluster() to remove low quality reads/clusters 67 | # 68 | # 0.0.8 69 | # -added option to restrict analysis to one or more read groups 70 | # -added option to use UCSC naming conventions (e.g. chrM instead of MT) 71 | # -added option to use genomes segregated by chromosome 72 | 73 | my %opts = (); 74 | 75 | $opts{len_cluster_include} = 600; 76 | $opts{len_cluster_link} = 800; 77 | $opts{filter_quality} = 50; 78 | $opts{filter_evidence} = 4; 79 | $opts{filter_depth} = 5; 80 | $opts{min_reads_cluster} = 1; 81 | $opts{min_clipped_seq} = 5; 82 | $opts{max_num_clipped} = 5; 83 | $opts{include_mask} = 0; 84 | $opts{min_evidence} = 4; 85 | $opts{min_map_qual} = 10; 86 | $opts{max_read_cov} = 200; 87 | $opts{mask_filename} = "numtS.bed"; 88 | $opts{samtools} = "samtools"; 89 | $opts{prefix} = "numt"; 90 | $opts{len_mt} = 16569; #eventually should be read in by BAM header 91 | $opts{ploidy} = 2; 92 | $opts{output_support} = 0; 93 | $opts{output_gl} = 0; 94 | my $optResult = GetOptions( 95 | "input_filename=s" => \$opts{input_filename}, 96 | "output_filename=s" => \$opts{output_filename}, 97 | "mask_filename=s" => \$opts{mask_filename}, 98 | "support_filename=s" => \$opts{support_filename}, 99 | "include_mask" => \$opts{include_mask}, 100 | "output_support" => \$opts{output_support}, 101 | "len_cluster_include=i" => \$opts{len_cluster_include}, 102 | "len_cluster_link=i" => \$opts{len_cluster_link}, 103 | "filter_quality=i" => \$opts{filter_quality}, 104 | "filter_evidence=i" => \$opts{filter_evidence}, 105 | "filter_depth=i" => \$opts{filter_evidence}, 106 | "min_reads_cluster=i" => \$opts{min_reads_cluster}, 107 | "min_evidence=i" => \$opts{min_evidence}, 108 | "min_clipped_seq=i" => \$opts{min_clipped_seq}, 109 | "max_num_clipped=i" => \$opts{max_num_clipped}, 110 | "min_map_qual=i" => \$opts{min_map_qual}, 111 | "max_read_cov=i" => \$opts{max_read_cov}, 112 | "mean_read_cov=f" => \$opts{mean_read_cov}, 113 | "insert_size=s" => \$opts{insert_size}, 114 | "read_groups=s" => \$opts{read_groups}, 115 | "mt_names=s" => \$opts{mt_names}, 116 | "by_chr_dir=s" => \$opts{by_chr_dir}, 117 | "reference=s" => \$opts{reference}, 118 | "prefix=s" => \$opts{prefix}, 119 | "output_gl" => \$opts{output_gl}, 120 | "ucsc" => \$opts{ucsc}, 121 | "ensembl" => \$opts{ensembl}, 122 | "help" => \$opts{help}, 123 | "verbose" => \$opts{verbose} 124 | ); 125 | 126 | checkOptions( $optResult, \%opts, $version ); 127 | 128 | my $seq_num = 0; 129 | my %seq_hash = (); 130 | 131 | my %sorted_hash = (); 132 | my %readgroup_hash = (); 133 | 134 | my $i = 1; 135 | my %infile_hash = (); 136 | my %group_hash = (); 137 | my %outfile_hash = (); 138 | my %mask_hash = (); 139 | my %mt_hash = (); 140 | 141 | if ( defined( $opts{read_groups} ) ) { 142 | my @rgs = split( /,/, $opts{read_groups} ); 143 | %readgroup_hash = map { $_, 1 } @rgs; 144 | } 145 | 146 | if (defined( $opts{mt_names} ) ) { 147 | my @mts = split (/,/, $opts{mt_names} ); 148 | %mt_hash = map { $_, 1 } @mts; 149 | } 150 | 151 | getInput( \%infile_hash, \%readgroup_hash, \%mask_hash, \%mt_hash ); 152 | seqCluster( \%infile_hash ); 153 | linkCluster( \%infile_hash ); 154 | mapCluster( \%infile_hash, \%outfile_hash, \%readgroup_hash ); 155 | findBreakpoint( \%outfile_hash, \%readgroup_hash, \%mask_hash ); 156 | scoreData( \%outfile_hash ); 157 | report( \%outfile_hash ); 158 | 159 | ################################################################################################################ 160 | sub scoreData { 161 | my ($outfile_hash) = @_; 162 | print "entering scoreData()\n" if $opts{verbose}; 163 | foreach my $group ( keys %$outfile_hash ) { 164 | print "Group: $group\n" if $opts{verbose}; 165 | my $sumGP = 0; 166 | my $numRef = $$outfile_hash{$group}{numRefRP}; 167 | my $numAlt = $$outfile_hash{$group}{numAltRP}; 168 | if ( $$outfile_hash{$group}{numAltSR} > 0 ) { 169 | $numRef += $$outfile_hash{$group}{numRefSR}; 170 | $numAlt += $$outfile_hash{$group}{numAltSR}; 171 | } 172 | print "\t$numRef\t$numAlt\n" if $opts{verbose}; 173 | 174 | foreach my $g ( 0 .. $opts{ploidy} ) { 175 | my $geno = $opts{ploidy} - $g; #need to reverse as calculation is reference allele based 176 | if ( $numAlt + $numRef > 0 && 1 / $opts{ploidy}**( $numAlt + $numRef ) > 0 ) { 177 | $$outfile_hash{$group}{gl}{$geno} = calcGl( $opts{ploidy}, $g, $numAlt + $numRef, $numRef, $$outfile_hash{$group}{avgQ} ); 178 | $$outfile_hash{$group}{gp}{$geno} = 10**$$outfile_hash{$group}{gl}{$geno}; 179 | $sumGP += $$outfile_hash{$group}{gp}{$geno}; 180 | print "\t$geno\t$$outfile_hash{$group}{gl}{$geno}\t$$outfile_hash{$group}{gp}{$geno}\n" if $opts{verbose}; 181 | } 182 | } 183 | print "\tsumGP: $sumGP\n" if $opts{verbose}; 184 | if ( $sumGP == 0 ) { 185 | foreach my $geno ( 0 .. $opts{ploidy} ) { 186 | $$outfile_hash{$group}{pl}{$geno} = 0; 187 | $$outfile_hash{$group}{gl}{$geno} = 0; 188 | } 189 | $$outfile_hash{$group}{gq} = 0; 190 | $$outfile_hash{$group}{gt} = "./."; 191 | $$outfile_hash{$group}{ft} = "NC"; 192 | } 193 | else { 194 | my $maxGP = 0; 195 | my $maxGeno = 0; 196 | foreach my $geno ( 0 .. $opts{ploidy} ) { 197 | if ( $$outfile_hash{$group}{gp}{$geno} == 0 ) { $$outfile_hash{$group}{gp}{$geno} = 1e-200; } 198 | $$outfile_hash{$group}{gp}{$geno} /= $sumGP; 199 | $$outfile_hash{$group}{pl}{$geno} = int( -10 * log10( $$outfile_hash{$group}{gp}{$geno} ) ); 200 | if ( $$outfile_hash{$group}{gp}{$geno} > $maxGP ) { $maxGP = $$outfile_hash{$group}{gp}{$geno}; $maxGeno = $geno; } 201 | } 202 | 203 | $maxGP = 1 - $$outfile_hash{$group}{gp}{0}; #calculate P(not 0/0 | data) 204 | if ( 1 - $maxGP == 0 ) { 205 | $$outfile_hash{$group}{gq} = 199; 206 | } 207 | else { 208 | $$outfile_hash{$group}{gq} = int( -10 * log10( 1 - $maxGP ) ); 209 | } 210 | my $gt = "0/0"; 211 | if ( $maxGeno == 1 ) { $gt = "0/1"; } 212 | elsif ( $maxGeno == 2 ) { $gt = "1/1"; } 213 | $$outfile_hash{$group}{gt} = $gt; 214 | 215 | my @filters = (); 216 | if ( $$outfile_hash{$group}{gq} < $opts{filter_quality} ) { 217 | push @filters, "q" . $opts{filter_quality}; 218 | } 219 | if ( $numAlt < $opts{filter_evidence} ) { 220 | push @filters, "e" . $opts{filter_evidence}; 221 | } 222 | if ( $numAlt + $numRef < $opts{filter_depth} ) { 223 | push @filters, "d" . $opts{filter_depth}; 224 | } 225 | $$outfile_hash{$group}{ft} = ( defined( $filters[0] ) ) ? join( ";", @filters ) : "PASS"; 226 | } 227 | } 228 | print "exiting scoreData()\n" if $opts{verbose}; 229 | } 230 | 231 | sub getDate { 232 | my ( $second, $minute, $hour, $dayOfMonth, $month, $yearOffset, $dayOfWeek, $dayOfYear, $daylightSavings ) = localtime(); 233 | my $year = 1900 + $yearOffset; 234 | $month++; 235 | my $fmonth = sprintf( "%.2d", $month ); 236 | my $fday = sprintf( "%.2d", $dayOfMonth ); 237 | return "$year$fmonth$fday"; 238 | } 239 | 240 | sub report { 241 | my ($outfile_hash) = @_; 242 | print "entering report()\n" if $opts{verbose}; 243 | 244 | #open output file 245 | if ( defined( $opts{output_filename} ) ) { 246 | open( foutname1, ">$opts{output_filename}" ) or die("error opening file $opts{output_filename}\n"); 247 | } 248 | else { 249 | open( foutname1, ">&", \*STDOUT ) or die; 250 | } 251 | 252 | if ( $opts{output_support} ) { 253 | 254 | #open support file 255 | open( support1, ">$opts{support_filename}" ) or die("could not open $opts{support_filename} for output, $!\n"); 256 | } 257 | 258 | my $filedate = getDate(); 259 | print foutname1 < 262 | ##ALT= 263 | ##FILTER= 264 | ##FILTER= 265 | ##FILTER= 266 | ##FORMAT= 267 | ##FORMAT= 268 | ##FORMAT= 269 | ##FORMAT= 270 | ##FORMAT= 271 | ##FORMAT= 272 | ##FORMAT= 273 | ##FORMAT= 274 | ##FORMAT= 275 | ##INFO= 276 | ##INFO= 277 | ##INFO= 278 | ##INFO= 279 | ##INFO= 280 | ##INFO= 281 | ##INFO= 282 | ##INFO= 283 | ##INFO= 284 | ##INFO= 285 | ##INFO= 286 | ##INFO= 287 | ##fileDate=$filedate 288 | ##reference=$opts{reference} 289 | ##source=dinumt-$version 290 | HEADER 291 | 292 | my @vars = sort { $$outfile_hash{$a}{chr} cmp $$outfile_hash{$b}{chr} || $$outfile_hash{$a}{leftBkpt} <=> $$outfile_hash{$b}{leftBkpt} } keys %$outfile_hash; 293 | if ( $opts{output_gl} ) { 294 | print foutname1 "#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\tFORMAT\t$opts{prefix}\n"; 295 | } 296 | else { 297 | print foutname1 "#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\n"; 298 | } 299 | 300 | my $index = 1; 301 | foreach my $group (@vars) { 302 | print "$group in report()\n" if $opts{verbose}; 303 | if ( $$outfile_hash{$group}{gt} eq "0/0" || $$outfile_hash{$group}{gt} eq "./." ) { 304 | print "\thomref or nc, skipping\n" if $opts{verbose}; 305 | next; 306 | } 307 | my $chrom = $$outfile_hash{$group}{chr}; 308 | unless ( $opts{ucsc} || $opts{ensembl} ) { 309 | $chrom =~ s/chr//g; 310 | } 311 | my $id = $opts{prefix} . "_$index"; 312 | my $alt = ""; 313 | my $qual = $$outfile_hash{$group}{gq}; 314 | my $filter = $$outfile_hash{$group}{ft}; 315 | my %info = (); 316 | my $pos = $$outfile_hash{$group}{leftBkpt} - 1; 317 | my $end = $$outfile_hash{$group}{rightBkpt}; 318 | my $ciDelta = $end - $pos + 1; 319 | 320 | $info{IMPRECISE} = undef; 321 | $info{CIPOS} = "0,$ciDelta"; 322 | $info{CIEND} = "-$ciDelta,0"; 323 | $info{END} = $end; 324 | $info{SVTYPE} = "INS"; 325 | 326 | if ( $$outfile_hash{$group}{m_len} ne "NA" ) { 327 | $info{MSTART} = $$outfile_hash{$group}{l_m_pos}; 328 | $info{MEND} = $$outfile_hash{$group}{r_m_pos}; 329 | $info{MLEN} = $$outfile_hash{$group}{m_len}; 330 | } 331 | 332 | my $refline = `$opts{samtools} faidx $opts{reference} $chrom:$pos-$pos`; 333 | my $ref = ( split( /\n/, $refline ) )[1]; 334 | if ( !defined($ref) ) { $ref = "N"; } 335 | 336 | my $format = "GT:FT:GL:GQ:PL"; 337 | my $info = ""; 338 | my @sKeys = sort { $a cmp $b } keys %info; 339 | for ( my $i = 0 ; $i <= $#sKeys ; $i++ ) { 340 | if ( $i > 0 ) { $info .= ";"; } 341 | if ( defined( $info{ $sKeys[$i] } ) ) { 342 | $info .= "$sKeys[$i]=$info{$sKeys[$i]}"; 343 | } 344 | else { 345 | $info .= "$sKeys[$i]"; 346 | } 347 | } 348 | if ( $opts{output_gl} ) { 349 | print foutname1 "$chrom\t$pos\t$id\t$ref\t$alt\t$qual\t$filter\t$info\t$format"; 350 | my @gls = (); 351 | my @pls = (); 352 | foreach my $geno ( 0 .. $opts{ploidy} ) { 353 | push @gls, sprintf( "%.2f", $$outfile_hash{$group}{gl}{$geno} ); 354 | push @pls, $$outfile_hash{$group}{pl}{$geno}; 355 | } 356 | my $gl = join( ",", @gls ); 357 | my $pl = join( ",", @pls ); 358 | print foutname1 "\t$$outfile_hash{$group}{gt}:$$outfile_hash{$group}{ft}:$gl:$$outfile_hash{$group}{gq}:$pl\n"; 359 | } 360 | else { 361 | print foutname1 "$chrom\t$pos\t$id\t$ref\t$alt\t$qual\t$filter\t$info\n"; 362 | } 363 | if ( $opts{output_support} ) { 364 | print support1 "$$outfile_hash{$group}{support}"; 365 | } 366 | $index++; 367 | } 368 | close(foutname1); 369 | if ( $opts{output_support} ) { close(support1); } 370 | 371 | print "exiting report()\n" if $opts{verbose}; 372 | } 373 | 374 | sub calcGl { 375 | my ( $m, $g, $k, $l, $e ) = @_; 376 | print "in calcGl():\n" if $opts{verbose}; 377 | print "\t$m\t$g\t$k\t$l\t$e\n" if $opts{verbose}; 378 | if ( 1 / $m**$k <= 0 ) { die "problem in calcGL 1, \t$m\t$g\t$k\t$l\t$e\n"; } 379 | my $gl = log10( 1 / $m**$k ); 380 | if ( ( ( $m - $g ) * $e ) + ( ( 1 - $e ) * $g ) <= 0 ) { die "problem in calcGL 2, \t$m\t$g\t$k\t$l\t$e\n"; } 381 | $gl += log10( ( ( $m - $g ) * $e ) + ( ( 1 - $e ) * $g ) ) for 1 .. $l; 382 | if ( ( $m - $g ) * ( 1 - $e ) + ( $g * $e ) <= 0 ) { die "problem in calcGL 3, \t$m\t$g\t$k\t$l\t$e\n"; } 383 | $gl += log10( ( $m - $g ) * ( 1 - $e ) + ( $g * $e ) ) for ( $l + 1 ) .. $k; 384 | return $gl; 385 | } 386 | 387 | sub log10 { 388 | my $n = shift; 389 | return log($n) / log(10); 390 | } 391 | 392 | sub getInput { 393 | my ( $infile_hash, $readgroup_hash, $mask_hash, $mt_hash ) = @_; 394 | my @input_lines = (); 395 | print "Reading input files...\n" if $opts{verbose}; 396 | 397 | #open input file 398 | if ( $opts{by_chr_dir} ) { 399 | if (defined( $opts{mt_names} ) ) { 400 | foreach my $mt_name (keys %{$mt_hash}) { 401 | push @input_lines, "samtools view $opts{by_chr_dir}/$mt_name.*bam |"; 402 | } 403 | } 404 | elsif ( $opts{ucsc} ) { 405 | push @input_lines, "samtools view $opts{by_chr_dir}/chrM.*bam |"; 406 | } 407 | elsif ( $opts{ensembl} ) { 408 | push @input_lines, "samtools view $opts{by_chr_dir}/chrMT.*bam |"; 409 | } 410 | else { 411 | push @input_lines, "samtools view $opts{by_chr_dir}/MT*bam |"; 412 | } 413 | } 414 | else { 415 | if (defined( $opts{mt_names} ) ) { 416 | foreach my $mt_name (keys %{$mt_hash}) { 417 | push @input_lines, "samtools view $opts{input_filename} $mt_name |"; 418 | } 419 | } 420 | elsif ( $opts{ucsc} ) { 421 | push @input_lines, "samtools view $opts{input_filename} chrM |"; 422 | } 423 | elsif ( $opts{ensembl} ) { 424 | push @input_lines, "samtools view $opts{input_filename} chrMT |"; 425 | } 426 | else { 427 | push @input_lines, "samtools view $opts{input_filename} MT |"; 428 | } 429 | } 430 | 431 | #input mask coordinates 432 | if ( $opts{include_mask} ) { 433 | open( mask1, $opts{mask_filename} ) || die "Could not open $opts{mask_filename} for input, $!\n"; 434 | while () { 435 | chomp; 436 | my ( $chr, $start, $end, $id ) = split(/\t/); 437 | $chr =~ s/chr//g; 438 | $$mask_hash{$chr}{$start} = $end; 439 | if ( $opts{include_mask} ) { 440 | if ( $opts{by_chr_dir} ) { 441 | if ( $opts{ucsc} || $opts{ensembl} ) { 442 | $chr = "chr" . $chr; 443 | } 444 | push @input_lines, "samtools view $opts{by_chr_dir}/$chr*bam $chr:$start-$end |"; 445 | } 446 | else { 447 | if ( $opts{ucsc} || $opts{ensembl} ) { 448 | $chr = "chr" . $chr; 449 | } 450 | push @input_lines, "samtools view $opts{input_filename} $chr:$start-$end |"; 451 | } 452 | } 453 | $$mask_hash{$chr}{$start} = $end; 454 | } 455 | close mask1; 456 | } 457 | 458 | foreach my $input_line (@input_lines) { 459 | print "command: $input_line\n" if $opts{verbose}; 460 | open( fname1, $input_line ) || die "error in opening file, $!\n"; 461 | while ( my $line1 = ) { 462 | $seq_num = $i; 463 | chomp($line1); 464 | my ( $qname, $flag, $rname, $pos, $mapq, $cigar, $rnext, $pnext, $tlen, $seq, $qual ) = split( /\t/, $line1 ); 465 | my $pnextend = $pnext + length($seq); #use first read length as proxy for paired read length 466 | 467 | my ($read_group) = $line1 =~ /RG:Z:(\S+)/; 468 | 469 | if ( $opts{read_groups} && !defined($read_group) ) { next; } 470 | elsif ( $opts{read_groups} && !defined( $$readgroup_hash{$read_group} ) ) { next; } 471 | 472 | if ( $rnext eq '=' || $rnext eq '*' ) { next; } 473 | if (defined( $opts{mt_names} ) ) { 474 | if (!defined($$mt_hash{$rname}) && defined($$mt_hash{$rnext}) ) { next; } 475 | } 476 | else { 477 | if ( $rname !~ /M/ && $rnext =~ /M/ ) { next; } 478 | } 479 | 480 | my $dnext = 0; 481 | if ( $flag & 32 ) { 482 | $dnext = 1; 483 | } 484 | 485 | my $dir = 0; 486 | if ( $flag & 16 ) { 487 | $dir = 1; 488 | } 489 | 490 | #Compare to masked regions 491 | my $isMaskOverlap = 0; 492 | if ( $opts{include_mask} ) { 493 | foreach my $maskStart ( keys %{ $$mask_hash{$rnext} } ) { 494 | my $maskEnd = $$mask_hash{$rnext}{$maskStart}; 495 | if ( $pnext >= $maskStart && $pnext <= $maskEnd ) { 496 | $isMaskOverlap = 1; 497 | last; 498 | } 499 | elsif ($pnextend >= $maskStart && $pnextend <= $maskEnd) { 500 | $isMaskOverlap = 1; 501 | last; 502 | } 503 | elsif ($pnext <= $maskStart && $pnextend >= $maskEnd) { 504 | $isMaskOverlap = 1; 505 | last; 506 | } 507 | } 508 | } 509 | if ($isMaskOverlap) { next; } 510 | 511 | #get mate information 512 | if ( $opts{by_chr_dir} ) { 513 | open( SAM, "samtools view $opts{by_chr_dir}/$rnext.*bam $rnext:$pnext-$pnext | " ) || die "Could not find MT bam file in $opts{by_chr_dir}, $!\n"; 514 | } 515 | else { 516 | open( SAM, "samtools view $opts{input_filename} $rnext:$pnext-$pnext | " ) || die "Could not open $opts{input_filename}, $!\n"; 517 | } 518 | 519 | my $c_mapq = 0; 520 | my $cnext = 0; 521 | while () { 522 | chomp; 523 | my ( $m_qname, $m_flag, $m_rname, $m_pos, $m_mapq, $m_cigar, $m_rnext, $m_pnext, $m_tlen, $m_seq, $m_qual, $opt ) = split(/\t/); 524 | if ( $m_qname ne $qname ) { next; } 525 | $c_mapq = $m_mapq; 526 | $cnext = $m_cigar; 527 | #if ( $c_mapq < $opts{min_map_qual} ) { 528 | # print "MAPQ Filtering:\t$_\n" if $opts{verbose}; 529 | #} 530 | } 531 | close SAM; 532 | 533 | if ( $c_mapq < $opts{min_map_qual} ) { 534 | next; 535 | } 536 | 537 | $$infile_hash{$seq_num}{group} = 0; 538 | $$infile_hash{$seq_num}{seq} = $seq; 539 | 540 | $$infile_hash{$seq_num}{dir} = $dir; 541 | $$infile_hash{$seq_num}{qname} = $qname; 542 | $$infile_hash{$seq_num}{rname} = $rname; 543 | $$infile_hash{$seq_num}{pos} = $pos; 544 | $$infile_hash{$seq_num}{cigar} = $cigar; 545 | $$infile_hash{$seq_num}{cnext} = $cnext; 546 | $$infile_hash{$seq_num}{rnext} = $rnext; 547 | $$infile_hash{$seq_num}{pnext} = $pnext; 548 | $$infile_hash{$seq_num}{dnext} = $dnext; 549 | $$infile_hash{$seq_num}{tlen} = $tlen; 550 | $$infile_hash{$seq_num}{qual} = $qual; 551 | $$infile_hash{$seq_num}{line} = $line1; 552 | 553 | $i++; 554 | } 555 | close(fname1); 556 | } 557 | } 558 | 559 | sub findBreakpoint { 560 | my ( $outfile_hash, $readgroup_hash, $mask_hash ) = @_; 561 | print "entering findBreakpoint()\n" if $opts{verbose}; 562 | foreach my $group ( keys %$outfile_hash ) { 563 | print "\tAssessing group $group for breakpoints\n" if $opts{verbose}; 564 | my $l_pos = $$outfile_hash{$group}{l_pos}; 565 | my $r_pos = $$outfile_hash{$group}{r_pos}; 566 | my $chro = $$outfile_hash{$group}{chr}; 567 | $$outfile_hash{$group}{numAltSR} = 0; 568 | $$outfile_hash{$group}{numRefSR} = 0; 569 | $$outfile_hash{$group}{leftBkpt} = $l_pos; 570 | $$outfile_hash{$group}{rightBkpt} = $r_pos; 571 | 572 | #Compare to masked regions 573 | my $isMaskOverlap = 0; 574 | if ( $opts{include_mask} ) { 575 | foreach my $maskStart ( keys %{ $$mask_hash{$chro} } ) { 576 | my $maskEnd = $$mask_hash{$chro}{$maskStart}; 577 | if ( ( $l_pos >= $maskStart && $l_pos <= $maskEnd ) || ( $r_pos >= $maskStart && $r_pos <= $maskEnd ) || ( $l_pos <= $maskStart && $r_pos >= $maskEnd ) ) { 578 | $isMaskOverlap = 1; 579 | last; 580 | } 581 | } 582 | } 583 | if ($isMaskOverlap) { next; } 584 | my %clippedPos = (); 585 | 586 | #open input file 587 | if ( $opts{by_chr_dir} ) { 588 | open( SAM, "samtools view $opts{by_chr_dir}/$chro.*bam $chro:$l_pos-$r_pos |" ) || die "Could not find MT bam file in $opts{by_chr_dir}, $!\n"; 589 | } 590 | else { 591 | open( SAM, "samtools view $opts{input_filename} $chro:$l_pos-$r_pos |" ) || die "Could not open $opts{input_filename}, $!\n"; 592 | } 593 | 594 | my %cnt = (); 595 | while () { 596 | chomp; 597 | my ( $qname, $flag, $rname, $pos, $mapq, $cigar, $rnext, $pnext, $tlen, $seq, $qual ) = split(/\t/); 598 | my ($read_group) = $_ =~ /RG:Z:(\S+)/; 599 | 600 | if ( $opts{read_groups} && !defined($read_group) ) { next; } 601 | elsif ( $opts{read_groups} && !defined( $$readgroup_hash{$read_group} ) ) { next; } 602 | 603 | #Check for read positions outside max_read_cov 604 | my $break = 0; 605 | for ( my $p = 0 ; $p <= length($seq) ; $p++ ) { 606 | $cnt{ $pos + $p }++; 607 | if ( $cnt{ $pos + $p } > $opts{max_read_cov} ) { $break = 1; last; } 608 | } 609 | 610 | if ($break) { 611 | print "Read count has reached limit of $opts{max_read_cov}, removing group $group\n" if $opts{verbose}; 612 | delete( $$outfile_hash{$group} ); 613 | last; 614 | } 615 | 616 | #Mark Clipped Positions 617 | my ( $cPos, $clipside, $clipsize ) = getSoftClipInfo( $pos, $cigar, $qual ); 618 | if ( $cPos > -1 ) { 619 | $clippedPos{$cPos}++; 620 | } 621 | } 622 | close SAM; 623 | 624 | if ( !defined( $$outfile_hash{$group} ) ) { next; } 625 | my %bkpts = (); 626 | my $numBkpts = 0; 627 | foreach my $cPos ( sort keys %clippedPos ) { 628 | if ( $clippedPos{$cPos} > 1 ) { 629 | $bkpts{$cPos} = $clippedPos{$cPos}; 630 | $numBkpts++; 631 | } 632 | } 633 | 634 | my $num_bkpt_support = 0; 635 | my $num_ref_support = 0; 636 | my $leftBkpt = $l_pos; 637 | my $rightBkpt = $r_pos; 638 | 639 | if ( $numBkpts > 0 && scalar keys %clippedPos <= $opts{max_num_clipped} ) { 640 | 641 | #take two most prevelant breaks for now 642 | my @sorted = sort { $bkpts{$b} <=> $bkpts{$a} } keys %bkpts; 643 | if ( $numBkpts == 1 ) { 644 | $leftBkpt = $sorted[0]; 645 | $rightBkpt = $leftBkpt + 1; 646 | $num_bkpt_support = $bkpts{ $sorted[0] }; 647 | $num_ref_support = $cnt{ $sorted[0] } - $num_bkpt_support; 648 | } 649 | else { 650 | if ( $sorted[0] < $sorted[1] ) { 651 | $leftBkpt = $sorted[0]; 652 | $rightBkpt = $sorted[1]; 653 | } 654 | else { 655 | $leftBkpt = $sorted[1]; 656 | $rightBkpt = $sorted[0]; 657 | } 658 | $num_bkpt_support = $bkpts{ $sorted[0] } + $bkpts{ $sorted[1] }; 659 | $num_ref_support = $cnt{ $sorted[0] } + $cnt{ $sorted[1] } - $num_bkpt_support; 660 | } 661 | } 662 | $$outfile_hash{$group}{leftBkpt} = $leftBkpt; 663 | $$outfile_hash{$group}{rightBkpt} = $rightBkpt; 664 | $$outfile_hash{$group}{numAltSR} = $num_bkpt_support; 665 | $$outfile_hash{$group}{numRefSR} = $num_ref_support; 666 | } 667 | print "exiting findBreakpoints()\n" if $opts{verbose}; 668 | } 669 | 670 | sub seqCluster { 671 | my ($infile_hash) = @_; 672 | my $k = 0; 673 | my %d = (); 674 | 675 | $d{0}{k} = 0; 676 | $d{0}{pnext} = 0; 677 | $d{0}{last} = (); 678 | $d{0}{rnext} = 0; 679 | 680 | $d{1}{k} = 0; 681 | $d{1}{pnext} = 0; 682 | $d{1}{last} = (); 683 | $d{1}{rnext} = 0; 684 | 685 | my @sorted = sort { $$infile_hash{$a}->{rnext} cmp $$infile_hash{$b}->{rnext} || $$infile_hash{$a}->{pnext} <=> $$infile_hash{$b}->{pnext} } keys %{$infile_hash}; 686 | print scalar @sorted . " total reads to process for clustering\n" if $opts{verbose}; 687 | foreach my $c_seq_num (@sorted) { 688 | my $c_pnext = $$infile_hash{$c_seq_num}{pnext}; 689 | my $c_dnext = $$infile_hash{$c_seq_num}{dnext}; 690 | my $c_rnext = $$infile_hash{$c_seq_num}{rnext}; 691 | my $c_qname = $$infile_hash{$c_seq_num}{qname}; 692 | 693 | print "$c_dnext\n" if $opts{verbose}; 694 | if ( $c_pnext - $d{$c_dnext}{pnext} > $opts{len_cluster_include} || $d{$c_dnext}{k} == 0 || $c_rnext ne $d{$c_dnext}{rnext} ) { 695 | 696 | #print "c_pnext:$c_pnext \t d_pnext:$d{$c_dnext}{pnext} \t dir:$d{$c_dnext}{k}\n" if $opts{verbose}; 697 | if ( $d{$c_dnext}{k} > 0 && scalar @{ $d{$c_dnext}{last} } < $opts{min_reads_cluster} ) { 698 | foreach my $seq_num ( @{ $d{$c_dnext}{last} } ) { 699 | delete $$infile_hash{$seq_num}; 700 | } 701 | } 702 | $k++; 703 | $$infile_hash{$c_seq_num}{'group'} = $k; 704 | $d{$c_dnext}{k} = $k; 705 | $d{$c_dnext}{last} = (); 706 | } 707 | else { 708 | $$infile_hash{$c_seq_num}{'group'} = $d{$c_dnext}{k}; 709 | 710 | } 711 | print "$d{$c_dnext}{k}\t$c_qname\t$c_rnext\t$c_pnext" if $opts{verbose}; 712 | 713 | push @{ $d{$c_dnext}{last} }, $c_seq_num; 714 | $d{$c_dnext}{pnext} = $c_pnext; 715 | $d{$c_dnext}{rnext} = $c_rnext; 716 | 717 | #print "k:$k \t grp:$$infile_hash{$c_seq_num}{'group'} \t $d{$c_dnext}{k} \t pre_pos:$d{$c_dnext}{pnext}\n" if $opts{verbose}; 718 | #print "$$infile_hash{$c_seq_num}->{'group'}\n chr_num:$c_rnext\n" if $opts{verbose}; 719 | } 720 | 721 | } 722 | 723 | sub linkCluster { 724 | my ($infile_hash) = @_; 725 | 726 | #this can link multiple F's to a single leftmost R 727 | my @sorted = sort { $$infile_hash{$a}->{rnext} cmp $$infile_hash{$b}->{rnext} || $$infile_hash{$a}->{pnext} <=> $$infile_hash{$b}->{pnext} } keys %{$infile_hash}; 728 | print scalar @sorted . " total reads to process for linking clusters\n" if $opts{verbose}; 729 | 730 | for ( my $c = 0 ; $c <= $#sorted ; $c++ ) { 731 | my $c_seq_num = $sorted[$c]; 732 | $$infile_hash{$c_seq_num}{link} = 0; 733 | my $c_pnext = $$infile_hash{$c_seq_num}{pnext}; 734 | my $c_dnext = $$infile_hash{$c_seq_num}{dnext}; 735 | my $c_rnext = $$infile_hash{$c_seq_num}{rnext}; 736 | my $c_dir = $$infile_hash{$c_seq_num}{dir}; 737 | 738 | if ( $c_dnext == 1 ) { next; } 739 | 740 | for ( my $d = $c + 1 ; $d <= $#sorted ; $d++ ) { 741 | my $d_seq_num = $sorted[$d]; 742 | my $d_pnext = $$infile_hash{$d_seq_num}{pnext}; 743 | my $d_dnext = $$infile_hash{$d_seq_num}{dnext}; 744 | my $d_rnext = $$infile_hash{$d_seq_num}{rnext}; 745 | my $d_dir = $$infile_hash{$d_seq_num}{dir}; 746 | 747 | if ( $d_dnext == 0 ) { next; } 748 | 749 | my $delta = $d_pnext - $c_pnext; 750 | 751 | if ( 752 | $delta < $opts{len_cluster_link} 753 | && $c_rnext eq $d_rnext 754 | && ( ( $c_dir == 0 && $c_dnext == 1 && $d_dnext == 0 && $d_dir == 1 ) 755 | || ( $c_dir == 0 && $c_dnext == 0 && $d_dnext == 1 && $d_dir == 1 ) 756 | || ( $c_dir == 1 && $c_dnext == 0 && $d_dnext == 1 && $d_dir == 0 ) ) 757 | ) 758 | { 759 | 760 | #0.0.11 - must have consistent orientation between left and right sides of insertions 761 | $$infile_hash{$c_seq_num}{link} = $$infile_hash{$d_seq_num}{group}; 762 | $$infile_hash{$d_seq_num}{link} = $$infile_hash{$c_seq_num}{group}; 763 | } 764 | elsif ( !defined( $$infile_hash{$c_seq_num}{link} ) ) { 765 | $$infile_hash{$c_seq_num}{link} = 0; 766 | } 767 | } 768 | } 769 | } 770 | 771 | sub mapCluster { 772 | my ( $infile_hash, $outfile_hash, $readgroup_hash ) = @_; 773 | 774 | my %l_linked_groups; 775 | my %linked_group_pnext; 776 | 777 | foreach my $key ( sort { $infile_hash{$a}->{group} <=> $infile_hash{$b}->{group} } keys %infile_hash ) { 778 | if ( ( $infile_hash{$key}{'group'} > 0 ) && ( $infile_hash{$key}{'dnext'} == 0 ) && ( $infile_hash{$key}{'link'} > 0 ) ) { 779 | $l_linked_groups{ $infile_hash{$key}{'group'} } = $infile_hash{$key}{'link'}; 780 | } 781 | } 782 | 783 | my $i = 1; 784 | while ( my ( $group, $link ) = each(%l_linked_groups) ) { 785 | print "group = $group ;; link = $link \n" if $opts{verbose}; 786 | my @l_rnext = map { $infile_hash{$_}{'rnext'} } grep { $infile_hash{$_}{group} == $group } keys %infile_hash; 787 | my @l_rname = map { $infile_hash{$_}{'rname'} } grep { $infile_hash{$_}{group} == $group } keys %infile_hash; 788 | my @l_pos = map { $infile_hash{$_}{'pos'} } grep { $infile_hash{$_}{group} == $group } keys %infile_hash; 789 | my @l_dir = map { $infile_hash{$_}{'dir'} } grep { $infile_hash{$_}{group} == $group } keys %infile_hash; 790 | my @l_qname = map { $infile_hash{$_}{'qname'} } grep { $infile_hash{$_}{group} == $group } keys %infile_hash; 791 | my @l_pnext = map { $infile_hash{$_}{'pnext'} } grep { $infile_hash{$_}{group} == $group } keys %infile_hash; 792 | my @l_dnext = map { $infile_hash{$_}{'dnext'} } grep { $infile_hash{$_}{group} == $group } keys %infile_hash; 793 | my @l_cigar = map { $infile_hash{$_}{'cigar'} } grep { $infile_hash{$_}{group} == $group } keys %infile_hash; 794 | my @l_cnext = map { $infile_hash{$_}{'cnext'} } grep { $infile_hash{$_}{group} == $group } keys %infile_hash; 795 | my @r_rnext = map { $infile_hash{$_}{'rnext'} } grep { $infile_hash{$_}{group} == $link } keys %infile_hash; 796 | my @r_pos = map { $infile_hash{$_}{'pos'} } grep { $infile_hash{$_}{group} == $link } keys %infile_hash; 797 | my @r_dir = map { $infile_hash{$_}{'dir'} } grep { $infile_hash{$_}{group} == $link } keys %infile_hash; 798 | my @r_rname = map { $infile_hash{$_}{'rname'} } grep { $infile_hash{$_}{group} == $link } keys %infile_hash; 799 | my @r_pnext = map { $infile_hash{$_}{'pnext'} } grep { $infile_hash{$_}{group} == $link } keys %infile_hash; 800 | my @r_dnext = map { $infile_hash{$_}{'dnext'} } grep { $infile_hash{$_}{group} == $link } keys %infile_hash; 801 | my @r_qname = map { $infile_hash{$_}{'qname'} } grep { $infile_hash{$_}{group} == $link } keys %infile_hash; 802 | my @r_cigar = map { $infile_hash{$_}{'cigar'} } grep { $infile_hash{$_}{group} == $link } keys %infile_hash; 803 | my @r_cnext = map { $infile_hash{$_}{'cnext'} } grep { $infile_hash{$_}{group} == $link } keys %infile_hash; 804 | 805 | my @rc_pnext = (); 806 | my @lc_pnext = (); 807 | my $chr = $l_rnext[0]; 808 | 809 | #update right coordinates based on cigar length 810 | for ( my $c = 0 ; $c <= $#r_cnext ; $c++ ) { 811 | my $cigar = $r_cnext[$c]; 812 | $rc_pnext[$c] = $r_pnext[$c]; 813 | 814 | while ( $cigar =~ /(\d+)M/g ) { 815 | $rc_pnext[$c] += $1; 816 | } 817 | while ( $cigar =~ /(\d+)N/g ) { 818 | $rc_pnext[$c] += $1; 819 | } 820 | while ( $cigar =~ /(\d+)D/g ) { 821 | $rc_pnext[$c] += $1; 822 | } 823 | } 824 | 825 | #update left coordinates based on cigar length 826 | for ( my $c = 0 ; $c <= $#l_cnext ; $c++ ) { 827 | my $cigar = $l_cnext[$c]; 828 | $lc_pnext[$c] = $l_pnext[$c]; 829 | 830 | while ( $cigar =~ /(\d+)M/g ) { 831 | $lc_pnext[$c] += $1; 832 | } 833 | while ( $cigar =~ /(\d+)N/g ) { 834 | $lc_pnext[$c] += $1; 835 | } 836 | while ( $cigar =~ /(\d+)D/g ) { 837 | $lc_pnext[$c] += $1; 838 | } 839 | } 840 | 841 | my @s_l_pnext = sort { $a <=> $b } @l_pnext; 842 | my @s_r_pnext = sort { $a <=> $b } @r_pnext; 843 | 844 | my @s_lc_pnext = sort { $a <=> $b } @lc_pnext; 845 | my @s_rc_pnext = sort { $a <=> $b } @rc_pnext; 846 | 847 | my $l_brk_point = $s_l_pnext[$#s_l_pnext]; 848 | my $r_brk_point = $s_rc_pnext[0]; 849 | 850 | my $win_l_s = $s_l_pnext[0]; 851 | my $win_l_e = $s_lc_pnext[$#s_lc_pnext]; 852 | my $win_r_s = $s_r_pnext[0]; 853 | my $win_r_e = $s_rc_pnext[$#s_rc_pnext]; 854 | 855 | $linked_group_pnext{$i}{l_group} = $group; 856 | $linked_group_pnext{$i}{r_group} = $link; 857 | 858 | #Check for crossing clusters; occurs in regions of high coverage, but not ascertained 859 | #until next step so stopgap here 860 | if ( $l_brk_point > $r_brk_point ) { 861 | my $temp = $l_brk_point; 862 | $l_brk_point = $r_brk_point; 863 | $r_brk_point = $temp; 864 | } 865 | 866 | my $commandLeft = ""; 867 | my $commandRight = ""; 868 | if ( $opts{by_chr_dir} ) { 869 | $commandLeft = "samtools view $opts{by_chr_dir}/$chr.*bam $chr:$win_l_s-$win_l_e |"; 870 | $commandRight = "samtools view $opts{by_chr_dir}/$chr.*bam $chr:$win_r_s-$win_r_e |"; 871 | } 872 | else { 873 | $commandLeft = "samtools view $opts{input_filename} $chr:$win_l_s-$win_l_e |"; 874 | $commandRight = "samtools view $opts{input_filename} $chr:$win_r_s-$win_r_e |"; 875 | } 876 | 877 | my $numRefRP = 0; 878 | my $numAltRP = scalar @l_pnext + scalar @r_pnext; 879 | my $sumE = 0; 880 | 881 | #left 882 | open( SAM, $commandLeft ) || die "Could not open sam file for input, $!\n"; 883 | while () { 884 | chomp; 885 | my ( $qname, $flag, $rname, $pos, $mapq, $cigar, $rnext, $pnext, $tlen, $seq, $qual ) = split(/\t/); 886 | 887 | my $mapE = 10**( -1 * $mapq / 10 ); 888 | my ($read_group) = $_ =~ /RG:Z:(\S+)/; 889 | 890 | if ( $opts{read_groups} && !defined($read_group) ) { next; } 891 | elsif ( $opts{read_groups} && !defined( $$readgroup_hash{$read_group} ) ) { next; } 892 | 893 | if ( $mapq < $opts{min_map_qual} ) { next; } 894 | my $dir = 0; #F 895 | if ( $flag & 16 ) { $dir = 1; } #R 896 | 897 | if ( $pos >= $win_l_s && $pos <= $win_l_e && $dir == 0 ) { 898 | $numRefRP++; 899 | $sumE += $mapE; 900 | } 901 | } 902 | close SAM; 903 | 904 | #right 905 | open( SAM, $commandRight ) || die "Could not open sam file for input, $!\n"; 906 | while () { 907 | chomp; 908 | my ( $qname, $flag, $rname, $pos, $mapq, $cigar, $rnext, $pnext, $tlen, $seq, $qual ) = split(/\t/); 909 | 910 | my $mapE = 10**( -1 * $mapq / 10 ); 911 | my ($read_group) = $_ =~ /RG:Z:(\S+)/; 912 | 913 | if ( $opts{read_groups} && !defined($read_group) ) { next; } 914 | elsif ( $opts{read_groups} && !defined( $$readgroup_hash{$read_group} ) ) { next; } 915 | 916 | if ( $mapq < $opts{min_map_qual} ) { next; } 917 | my $dir = 0; #F 918 | if ( $flag & 16 ) { $dir = 1; } #R 919 | 920 | if ( $pos >= $win_r_s && $pos <= $win_r_e && $dir == 1 ) { 921 | $numRefRP++; 922 | $sumE += $mapE; 923 | } 924 | } 925 | close SAM; 926 | 927 | my $avgQ = $sumE / $numRefRP; #calculate average over all reads, ref and alt 928 | $numRefRP -= $numAltRP; #correct for alt reads 929 | 930 | #estimate mitochondrial coordinates from mated sequence alignments 931 | my $l_m_min = 1e10; 932 | my $l_m_max = 0; 933 | my $l_m_min_i = -1; 934 | my $l_m_max_i = -1; 935 | my $l_n_dir = -1; 936 | my $l_m_dir = -1; 937 | 938 | for ( my $i = 0 ; $i <= $#l_qname ; $i++ ) { 939 | if ( $l_rname[$i] !~ /M/ ) { next; } #don't include nuclear homologous regions 940 | if ( $l_pos[$i] < $l_m_min ) { 941 | $l_m_min = $l_pos[$i]; 942 | $l_m_min_i = $i; 943 | } 944 | if ( $l_pos[$i] > $l_m_max ) { 945 | $l_m_max = $l_pos[$i]; 946 | $l_m_max_i = $i; 947 | } 948 | $l_n_dir = $l_dnext[$i]; 949 | $l_m_dir = $l_dir[$i]; 950 | } 951 | my $r_m_min = 1e10; 952 | my $r_m_max = 0; 953 | my $r_m_min_i = -1; 954 | my $r_m_max_i = -1; 955 | my $r_n_dir = -1; 956 | my $r_m_dir = -1; 957 | 958 | for ( my $i = 0 ; $i <= $#r_qname ; $i++ ) { 959 | if ( $r_rname[$i] !~ /M/ ) { next; } #don't include nuclear homologous regions 960 | if ( $r_pos[$i] < $r_m_min ) { 961 | $r_m_min = $r_pos[$i]; 962 | $r_m_min_i = $i; 963 | } 964 | if ( $r_pos[$i] > $r_m_max ) { 965 | $r_m_max = $r_pos[$i]; 966 | $r_m_max_i = $i; 967 | } 968 | $r_n_dir = $r_dnext[$i]; 969 | $r_m_dir = $r_dir[$i]; 970 | } 971 | $$outfile_hash{$group}{l_m_pos} = "NA"; 972 | $$outfile_hash{$group}{r_m_pos} = "NA"; 973 | $$outfile_hash{$group}{m_len} = "NA"; 974 | 975 | if ( $l_m_dir > -1 && $r_m_dir > -1 ) { #have mitochondrial mappings 976 | if ( $l_n_dir == 0 && $l_m_dir == 1 && $r_m_dir == 0 && $r_n_dir == 1 ) { 977 | my $cigar = $r_cigar[$r_m_max_i]; 978 | while ( $cigar =~ /(\d+)M/g ) { 979 | $r_m_max += $1; 980 | } 981 | while ( $cigar =~ /(\d+)N/g ) { 982 | $r_m_max += $1; 983 | } 984 | while ( $cigar =~ /(\d+)D/g ) { 985 | $r_m_max += $1; 986 | } 987 | } 988 | elsif ( $l_n_dir == 0 && $l_m_dir == 0 && $r_m_dir == 1 && $r_n_dir == 1 ) { 989 | my $cigar = $l_cigar[$l_m_max_i]; 990 | while ( $cigar =~ /(\d+)M/g ) { 991 | $l_m_max += $1; 992 | } 993 | while ( $cigar =~ /(\d+)N/g ) { 994 | $l_m_max += $1; 995 | } 996 | while ( $cigar =~ /(\d+)D/g ) { 997 | $l_m_max += $1; 998 | } 999 | } 1000 | 1001 | $$outfile_hash{$group}{l_m_pos} = $l_m_min; 1002 | $$outfile_hash{$group}{r_m_pos} = $r_m_max; 1003 | if ( $r_m_max > $l_m_min ) { 1004 | $$outfile_hash{$group}{l_m_pos} = $l_m_min; 1005 | $$outfile_hash{$group}{r_m_pos} = $r_m_max; 1006 | } 1007 | else { 1008 | $$outfile_hash{$group}{l_m_pos} = $r_m_max; 1009 | $$outfile_hash{$group}{r_m_pos} = $l_m_max; 1010 | } 1011 | 1012 | #currently assumes smallest sequence possible due to circular nature of mt dna 1013 | $$outfile_hash{$group}{m_len} = $$outfile_hash{$group}{r_m_pos} - $$outfile_hash{$group}{l_m_pos} + 1; 1014 | my $lenAlt = $opts{len_mt} - $$outfile_hash{$group}{r_m_pos} + $$outfile_hash{$group}{l_m_pos} + 1; 1015 | if ( $lenAlt < $$outfile_hash{$group}{m_len} ) { $$outfile_hash{$group}{m_len} = $lenAlt; } 1016 | } 1017 | 1018 | $$outfile_hash{$group}{avgQ} = $avgQ; 1019 | $$outfile_hash{$group}{l_pos} = $l_brk_point; 1020 | $$outfile_hash{$group}{r_pos} = $r_brk_point; 1021 | $$outfile_hash{$group}{chr} = $chr; 1022 | $$outfile_hash{$group}{numRefRP} = $numRefRP; 1023 | $$outfile_hash{$group}{numAltRP} = $numAltRP; 1024 | 1025 | if ( $opts{output_support} ) { 1026 | foreach my $key ( grep { $infile_hash{$_}{group} == $group } keys %infile_hash ) { 1027 | $$outfile_hash{$group}{support} .= "$infile_hash{$key}{line}\n"; 1028 | } 1029 | foreach my $key ( grep { $infile_hash{$_}{group} == $link } keys %infile_hash ) { 1030 | $$outfile_hash{$group}{support} .= "$infile_hash{$key}{line}\n"; 1031 | } 1032 | } 1033 | $i++; 1034 | 1035 | if ( $opts{verbose} ) { 1036 | print "\t$chr\t$$outfile_hash{$group}{l_pos}\t$$outfile_hash{$group}{r_pos}\t$$outfile_hash{$group}{numRefRP}\t$$outfile_hash{$group}{numAltRP}\t$$outfile_hash{$group}{avgQ}\t$$outfile_hash{$group}{l_m_pos}\t$$outfile_hash{$group}{r_m_pos}\t$$outfile_hash{$group}{m_len}\n"; 1037 | } 1038 | } 1039 | } 1040 | 1041 | sub getSoftClipInfo { 1042 | my ( $pos, $cigar, $qual ) = @_; 1043 | my $clipside = ""; 1044 | my $clipsize = 0; 1045 | my $cPos = -1; 1046 | my $avgQual = -1; 1047 | 1048 | if ( $cigar =~ /^(\d+)S.*M.*?(\d+)S$/ ) { 1049 | if ( $1 > $2 ) { 1050 | $cPos = $pos; 1051 | $clipside = "l"; 1052 | $clipsize = $1; 1053 | 1054 | #consider breakpoint after leftmost soft clipped fragment 1055 | } 1056 | else { 1057 | $cPos = $pos - 1; 1058 | $clipside = "r"; 1059 | $clipsize = $2; 1060 | while ( $cigar =~ /(\d+)M/g ) { #have to take into account that a CIGAR may contain multiple M's 1061 | $cPos += $1; 1062 | } 1063 | while ( $cigar =~ /(\d+)I/g ) { 1064 | $cPos -= $1; 1065 | } 1066 | while ( $cigar =~ /(\d+)D/g ) { 1067 | $cPos += $1; 1068 | } 1069 | } 1070 | } 1071 | 1072 | #upstream soft clip only 1073 | elsif ( $cigar =~ /^(\d+)S.*M/ ) { 1074 | $cPos = $pos; 1075 | $clipside = "l"; 1076 | $clipsize = $1; 1077 | } 1078 | 1079 | #downstream soft clip only 1080 | elsif ( $cigar =~ /M.*?(\d+)S/ ) { 1081 | $cPos = $pos - 1; 1082 | $clipside = "r"; 1083 | $clipsize = $1; 1084 | while ( $cigar =~ /(\d+)M/g ) { #have to take into account that a CIGAR may contain multiple M's 1085 | $cPos += $1; 1086 | } 1087 | while ( $cigar =~ /(\d+)I/g ) { 1088 | $cPos -= $1; 1089 | } 1090 | while ( $cigar =~ /(\d+)D/g ) { 1091 | $cPos += $1; 1092 | } 1093 | } 1094 | 1095 | #Check quality of clipped sequence and alignment to reference 1096 | if ( $cPos > -1 ) { 1097 | my $clippedQuals = ""; 1098 | 1099 | if ( $clipside eq "r" ) { 1100 | $clippedQuals = substr( $qual, length($qual) - $clipsize - 1, $clipsize ); 1101 | } 1102 | else { 1103 | $clippedQuals = substr( $qual, 0, $clipsize ); 1104 | } 1105 | 1106 | my $avgQualSum = 0; 1107 | my $avgQualNum = 0; 1108 | foreach my $qual ( split( //, $clippedQuals ) ) { 1109 | $avgQualSum += ord($qual) - 33; 1110 | $avgQualNum++; 1111 | } 1112 | $avgQual = $avgQualSum / $avgQualNum; 1113 | } 1114 | if ( $avgQual < 10 ) { $cPos = -1; } 1115 | if ( $clipsize < $opts{min_clipped_seq} ) { $cPos = -1; } 1116 | return ( $cPos, $clipside, $clipsize ); 1117 | } 1118 | 1119 | sub usage { 1120 | my $version = shift; 1121 | printf("\n"); 1122 | printf( "%-9s %s\n", "Program:", "dinumt.pl" ); 1123 | printf( "%-9s %s\n", "Version:", "$version" ); 1124 | printf("\n"); 1125 | printf( "%-9s %s\n", "Usage:", "dinumt.pl [options]" ); 1126 | printf("\n"); 1127 | printf( "%-9s %-35s %s\n", "Options:", "--input_filename=[filename]", "Input alignment file in BAM format" ); 1128 | printf( "%-9s %-35s %s\n", "", "--output_filename=[filename]", "Output file (default stdout)" ); 1129 | printf( "%-9s %-35s %s\n", "", "--mask_filename=[filename]", "Mask file for reference numts in BED format (optional)" ); 1130 | printf( "%-9s %-35s %s\n", "", "--include_mask", "Include aberrant reads mapped to mask regions in clustering" ); 1131 | printf( "%-9s %-35s %s\n", "", "--len_cluster_include=[integer]", "Maximum distance to be included in cluster (default 600)" ); 1132 | printf( "%-9s %-35s %s\n", "", "--len_cluster_link=[integer]", "Maximum distance to link clusters (default 800)" ); 1133 | printf( "%-9s %-35s %s\n", "", "--min_reads_cluster=[integer]", "Minimum number of reads to link a cluster (default 1)" ); 1134 | printf( "%-9s %-35s %s\n", "", "--min_evidence=[integer]", "Minimum evidence to consider an insertion event (default 4)" ); 1135 | printf( "%-9s %-35s %s\n", "", "--min_map_qual=[integer]", "Minimum mapping quality for read consideration (default 10)" ); 1136 | printf( "%-9s %-35s %s\n", "", "--max_read_cov=[integer]", "Maximum read coverage allowed for breakpoint searching (default 200)" ); 1137 | printf( "%-9s %-35s %s\n", "", "--min_clipped_seq=[integer]", "Minimum clipped sequence required to consider as putative breakpoint (default 5)" ); 1138 | printf( "%-9s %-35s %s\n", "", "--max_num_clipped=[integer]", "Maximum number of clipped sequences observed before removing from evidence consideration (default 5)" ); 1139 | printf( "%-9s %-35s %s\n", "", "--read_groups=[read_group1],...", "Limit analysis to specified read group(s)" ); 1140 | printf( "%-9s %-35s %s\n", "", "--mt_names=[mt_name1],...", "Limit analysis to specified mitochondrial sequence names" ); 1141 | printf( "%-9s %-35s %s\n", "", "--by_chr_dir=[directory]", "If set, expects to find chr specific BAM files in indicated directory" ); 1142 | printf( "%-9s %-35s %s\n", "", "--prefix=[string]", "Prepend label in report output" ); 1143 | printf( "%-9s %-35s %s\n", "", "--ucsc", "Use UCSC genome formatting (e.g. chrM)" ); 1144 | printf( "%-9s %-35s %s\n", "", "--ensembl", "Use Ensembl genome formatting (e.g. chrMT)" ); 1145 | printf( "%-9s %-35s %s\n", "", "--output_gl", "Output genotype likelihood information" ); 1146 | printf("\n"); 1147 | } 1148 | 1149 | sub checkOptions { 1150 | my $optResult = shift; 1151 | my $opts = shift; 1152 | my $version = shift; 1153 | 1154 | if ( !$optResult || $$opts{help} ) { 1155 | usage($version); 1156 | exit; 1157 | } 1158 | 1159 | if ( !defined( $$opts{input_filename} ) && !defined( $$opts{by_chr_dir} ) ) { 1160 | print "\n***ERROR***\t--input_filename or --by_chr_dir is required\n"; 1161 | usage($version); 1162 | exit; 1163 | } 1164 | elsif ( !defined( $$opts{by_chr_dir} ) && !-e $$opts{input_filename} ) { 1165 | print "\n***ERROR***\t--input_filename does not exist\n"; 1166 | usage($version); 1167 | exit; 1168 | } 1169 | elsif ( defined( $$opts{by_chr_dir} ) && !-d $$opts{by_chr_dir} ) { 1170 | print "\n***ERROR***\t--by_chr_dir does not exist\n"; 1171 | usage($version); 1172 | exit; 1173 | } 1174 | if ( defined( $$opts{mask_filename} ) && !-e ( $$opts{mask_filename} ) ) { 1175 | print "\n***ERROR***\t--mask_filename does not exist\n"; 1176 | usage($version); 1177 | exit; 1178 | } 1179 | if ( !defined( $$opts{reference} ) ) { 1180 | print "\n***ERROR***\t--reference is required\n"; 1181 | usage($version); 1182 | exit; 1183 | } 1184 | if ( $$opts{include_mask} && !defined( $$opts{mask_filename} ) ) { 1185 | print "\n***ERROR***\t--mask_filename is neccessary with --include_mask option\n"; 1186 | usage($version); 1187 | exit; 1188 | } 1189 | if ( $$opts{output_support} && !defined( $$opts{output_filename} ) ) { 1190 | print "\n***ERROR***\t--output_filename is neccessary with --output_support option\n"; 1191 | usage($version); 1192 | exit; 1193 | } 1194 | if ( $$opts{ucsc} && $$opts{ensembl} ) { 1195 | print "\n***ERROR***\t--ucsc and --ensembl are mutually exclusive options\n"; 1196 | usage($version); 1197 | exit; 1198 | } 1199 | } 1200 | -------------------------------------------------------------------------------- /dinumt.cram.pl: -------------------------------------------------------------------------------- 1 | #!/usr/bin/perl 2 | 3 | use warnings; 4 | use strict; 5 | use Getopt::Long; 6 | 7 | my $version = "0.0.23"; 8 | 9 | #version update 10 | # 0.0.23 11 | # -fixed oversight on mask overlap to consider all possible overlaps of reference numt 12 | # 13 | # 0.0.22 14 | # -changed name to "dinumt" (dynumite!) 15 | # -added option --mt_names for discrete MT identifiers 16 | # 17 | # 0.0.21 18 | # -added option for ensemble genomes (chrMT) 19 | # -fix usage of masking when --include-mask isn't present 20 | # -include reference paremeter in vcf output 21 | # 22 | # 0.0.20 23 | # -added option to output GL information 24 | # 25 | # 0.0.19 26 | # -added option to output supporting reads to auxilliary file 27 | # 28 | # 0.0.18 29 | # -added mito position estimation 30 | # 31 | # 0.0.17 32 | # -implemented likelihood scoring for events 33 | # -added quality, evidence and depth filters 34 | # -implemented VCF format reporting 35 | # 36 | # 0.0.15 37 | # -bug fixes 38 | # 39 | # 0.0.14 40 | # -fixed bug with parsing read group information 41 | # -fixed bug where read group information was not being utilized in findBreakpoint() 42 | # 43 | # 0.0.13 44 | # -added option for minimum clip size to consider 45 | # -added option for maximum limit of putative breakpoints 46 | # -added estimated numt size and mitochondria coordinates to output report 47 | # 48 | # 0.0.12 49 | # -added option to additionally attempt to cluster reads mapping in known numt regions 50 | # whose mates map elsewhere 51 | # -added prefix option for report 52 | # 53 | # 0.0.11 54 | # -added restriction that mates of linked clusters must be consistent with direct 55 | # or inverted sequence (incl changing dir to dnext in input_hash) 56 | # 57 | # 0.0.10 58 | # -added option for maximun read coverage when considering breakpoints 59 | # -moved mate quality filtering to getInput() 60 | # 61 | # 0.0.9 62 | # -changed default cluster reads to 1 63 | # -added option to consider total evidence from read pairs and breakpoints 64 | # -moved mask file comparison to getInput() 65 | # -added option for minimum mapping quality 66 | # -added filter in seqcluster() to remove low quality reads/clusters 67 | # 68 | # 0.0.8 69 | # -added option to restrict analysis to one or more read groups 70 | # -added option to use UCSC naming conventions (e.g. chrM instead of MT) 71 | # -added option to use genomes segregated by chromosome 72 | 73 | my %opts = (); 74 | 75 | $opts{len_cluster_include} = 600; 76 | $opts{len_cluster_link} = 800; 77 | $opts{filter_quality} = 50; 78 | $opts{filter_evidence} = 4; 79 | $opts{filter_depth} = 5; 80 | $opts{min_reads_cluster} = 1; 81 | $opts{min_clipped_seq} = 5; 82 | $opts{max_num_clipped} = 5; 83 | $opts{include_mask} = 0; 84 | $opts{min_evidence} = 4; 85 | $opts{min_map_qual} = 10; 86 | $opts{max_read_cov} = 200; 87 | $opts{mask_filename} = "numtS.bed"; 88 | $opts{samtools} = "samtools"; 89 | $opts{prefix} = "numt"; 90 | $opts{len_mt} = 16569; #eventually should be read in by BAM header 91 | $opts{ploidy} = 2; 92 | $opts{output_support} = 0; 93 | $opts{output_gl} = 0; 94 | my $optResult = GetOptions( 95 | "input_filename=s" => \$opts{input_filename}, 96 | "output_filename=s" => \$opts{output_filename}, 97 | "mask_filename=s" => \$opts{mask_filename}, 98 | "support_filename=s" => \$opts{support_filename}, 99 | "include_mask" => \$opts{include_mask}, 100 | "output_support" => \$opts{output_support}, 101 | "len_cluster_include=i" => \$opts{len_cluster_include}, 102 | "len_cluster_link=i" => \$opts{len_cluster_link}, 103 | "filter_quality=i" => \$opts{filter_quality}, 104 | "filter_evidence=i" => \$opts{filter_evidence}, 105 | "filter_depth=i" => \$opts{filter_evidence}, 106 | "min_reads_cluster=i" => \$opts{min_reads_cluster}, 107 | "min_evidence=i" => \$opts{min_evidence}, 108 | "min_clipped_seq=i" => \$opts{min_clipped_seq}, 109 | "max_num_clipped=i" => \$opts{max_num_clipped}, 110 | "min_map_qual=i" => \$opts{min_map_qual}, 111 | "max_read_cov=i" => \$opts{max_read_cov}, 112 | "mean_read_cov=f" => \$opts{mean_read_cov}, 113 | "insert_size=s" => \$opts{insert_size}, 114 | "read_groups=s" => \$opts{read_groups}, 115 | "mt_names=s" => \$opts{mt_names}, 116 | "by_chr_dir=s" => \$opts{by_chr_dir}, 117 | "reference=s" => \$opts{reference}, 118 | "prefix=s" => \$opts{prefix}, 119 | "output_gl" => \$opts{output_gl}, 120 | "ucsc" => \$opts{ucsc}, 121 | "ensembl" => \$opts{ensembl}, 122 | "help" => \$opts{help}, 123 | "verbose" => \$opts{verbose} 124 | ); 125 | 126 | checkOptions( $optResult, \%opts, $version ); 127 | 128 | my $seq_num = 0; 129 | my %seq_hash = (); 130 | 131 | my %sorted_hash = (); 132 | my %readgroup_hash = (); 133 | 134 | my $i = 1; 135 | my %infile_hash = (); 136 | my %group_hash = (); 137 | my %outfile_hash = (); 138 | my %mask_hash = (); 139 | my %mt_hash = (); 140 | 141 | if ( defined( $opts{read_groups} ) ) { 142 | my @rgs = split( /,/, $opts{read_groups} ); 143 | %readgroup_hash = map { $_, 1 } @rgs; 144 | } 145 | 146 | if (defined( $opts{mt_names} ) ) { 147 | my @mts = split (/,/, $opts{mt_names} ); 148 | %mt_hash = map { $_, 1 } @mts; 149 | } 150 | 151 | getInput( \%infile_hash, \%readgroup_hash, \%mask_hash, \%mt_hash ); 152 | seqCluster( \%infile_hash ); 153 | linkCluster( \%infile_hash ); 154 | mapCluster( \%infile_hash, \%outfile_hash, \%readgroup_hash ); 155 | findBreakpoint( \%outfile_hash, \%readgroup_hash, \%mask_hash ); 156 | scoreData( \%outfile_hash ); 157 | report( \%outfile_hash ); 158 | 159 | ################################################################################################################ 160 | sub scoreData { 161 | my ($outfile_hash) = @_; 162 | print "entering scoreData()\n" if $opts{verbose}; 163 | foreach my $group ( keys %$outfile_hash ) { 164 | print "Group: $group\n" if $opts{verbose}; 165 | my $sumGP = 0; 166 | my $numRef = $$outfile_hash{$group}{numRefRP}; 167 | my $numAlt = $$outfile_hash{$group}{numAltRP}; 168 | if ( $$outfile_hash{$group}{numAltSR} > 0 ) { 169 | $numRef += $$outfile_hash{$group}{numRefSR}; 170 | $numAlt += $$outfile_hash{$group}{numAltSR}; 171 | } 172 | print "\t$numRef\t$numAlt\n" if $opts{verbose}; 173 | 174 | foreach my $g ( 0 .. $opts{ploidy} ) { 175 | my $geno = $opts{ploidy} - $g; #need to reverse as calculation is reference allele based 176 | if ( $numAlt + $numRef > 0 && 1 / $opts{ploidy}**( $numAlt + $numRef ) > 0 ) { 177 | $$outfile_hash{$group}{gl}{$geno} = calcGl( $opts{ploidy}, $g, $numAlt + $numRef, $numRef, $$outfile_hash{$group}{avgQ} ); 178 | $$outfile_hash{$group}{gp}{$geno} = 10**$$outfile_hash{$group}{gl}{$geno}; 179 | $sumGP += $$outfile_hash{$group}{gp}{$geno}; 180 | print "\t$geno\t$$outfile_hash{$group}{gl}{$geno}\t$$outfile_hash{$group}{gp}{$geno}\n" if $opts{verbose}; 181 | } 182 | } 183 | print "\tsumGP: $sumGP\n" if $opts{verbose}; 184 | if ( $sumGP == 0 ) { 185 | foreach my $geno ( 0 .. $opts{ploidy} ) { 186 | $$outfile_hash{$group}{pl}{$geno} = 0; 187 | $$outfile_hash{$group}{gl}{$geno} = 0; 188 | } 189 | $$outfile_hash{$group}{gq} = 0; 190 | $$outfile_hash{$group}{gt} = "./."; 191 | $$outfile_hash{$group}{ft} = "NC"; 192 | } 193 | else { 194 | my $maxGP = 0; 195 | my $maxGeno = 0; 196 | foreach my $geno ( 0 .. $opts{ploidy} ) { 197 | if ( $$outfile_hash{$group}{gp}{$geno} == 0 ) { $$outfile_hash{$group}{gp}{$geno} = 1e-200; } 198 | $$outfile_hash{$group}{gp}{$geno} /= $sumGP; 199 | $$outfile_hash{$group}{pl}{$geno} = int( -10 * log10( $$outfile_hash{$group}{gp}{$geno} ) ); 200 | if ( $$outfile_hash{$group}{gp}{$geno} > $maxGP ) { $maxGP = $$outfile_hash{$group}{gp}{$geno}; $maxGeno = $geno; } 201 | } 202 | 203 | $maxGP = 1 - $$outfile_hash{$group}{gp}{0}; #calculate P(not 0/0 | data) 204 | if ( 1 - $maxGP == 0 ) { 205 | $$outfile_hash{$group}{gq} = 199; 206 | } 207 | else { 208 | $$outfile_hash{$group}{gq} = int( -10 * log10( 1 - $maxGP ) ); 209 | } 210 | my $gt = "0/0"; 211 | if ( $maxGeno == 1 ) { $gt = "0/1"; } 212 | elsif ( $maxGeno == 2 ) { $gt = "1/1"; } 213 | $$outfile_hash{$group}{gt} = $gt; 214 | 215 | my @filters = (); 216 | if ( $$outfile_hash{$group}{gq} < $opts{filter_quality} ) { 217 | push @filters, "q" . $opts{filter_quality}; 218 | } 219 | if ( $numAlt < $opts{filter_evidence} ) { 220 | push @filters, "e" . $opts{filter_evidence}; 221 | } 222 | if ( $numAlt + $numRef < $opts{filter_depth} ) { 223 | push @filters, "d" . $opts{filter_depth}; 224 | } 225 | $$outfile_hash{$group}{ft} = ( defined( $filters[0] ) ) ? join( ";", @filters ) : "PASS"; 226 | } 227 | } 228 | print "exiting scoreData()\n" if $opts{verbose}; 229 | } 230 | 231 | sub getDate { 232 | my ( $second, $minute, $hour, $dayOfMonth, $month, $yearOffset, $dayOfWeek, $dayOfYear, $daylightSavings ) = localtime(); 233 | my $year = 1900 + $yearOffset; 234 | $month++; 235 | my $fmonth = sprintf( "%.2d", $month ); 236 | my $fday = sprintf( "%.2d", $dayOfMonth ); 237 | return "$year$fmonth$fday"; 238 | } 239 | 240 | sub report { 241 | my ($outfile_hash) = @_; 242 | print "entering report()\n" if $opts{verbose}; 243 | 244 | #open output file 245 | if ( defined( $opts{output_filename} ) ) { 246 | open( foutname1, ">$opts{output_filename}" ) or die("error opening file $opts{output_filename}\n"); 247 | } 248 | else { 249 | open( foutname1, ">&", \*STDOUT ) or die; 250 | } 251 | 252 | if ( $opts{output_support} ) { 253 | 254 | #open support file 255 | open( support1, ">$opts{support_filename}" ) or die("could not open $opts{support_filename} for output, $!\n"); 256 | } 257 | 258 | my $filedate = getDate(); 259 | print foutname1 < 262 | ##ALT= 263 | ##FILTER= 264 | ##FILTER= 265 | ##FILTER= 266 | ##FORMAT= 267 | ##FORMAT= 268 | ##FORMAT= 269 | ##FORMAT= 270 | ##FORMAT= 271 | ##FORMAT= 272 | ##FORMAT= 273 | ##FORMAT= 274 | ##FORMAT= 275 | ##INFO= 276 | ##INFO= 277 | ##INFO= 278 | ##INFO= 279 | ##INFO= 280 | ##INFO= 281 | ##INFO= 282 | ##INFO= 283 | ##INFO= 284 | ##INFO= 285 | ##INFO= 286 | ##INFO= 287 | ##fileDate=$filedate 288 | ##reference=$opts{reference} 289 | ##source=dinumt-$version 290 | HEADER 291 | 292 | my @vars = sort { $$outfile_hash{$a}{chr} cmp $$outfile_hash{$b}{chr} || $$outfile_hash{$a}{leftBkpt} <=> $$outfile_hash{$b}{leftBkpt} } keys %$outfile_hash; 293 | if ( $opts{output_gl} ) { 294 | print foutname1 "#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\tFORMAT\t$opts{prefix}\n"; 295 | } 296 | else { 297 | print foutname1 "#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\n"; 298 | } 299 | 300 | my $index = 1; 301 | foreach my $group (@vars) { 302 | print "$group in report()\n" if $opts{verbose}; 303 | if ( $$outfile_hash{$group}{gt} eq "0/0" || $$outfile_hash{$group}{gt} eq "./." ) { 304 | print "\thomref or nc, skipping\n" if $opts{verbose}; 305 | next; 306 | } 307 | my $chrom = $$outfile_hash{$group}{chr}; 308 | 309 | my $id = $opts{prefix} . "_$index"; 310 | my $alt = ""; 311 | my $qual = $$outfile_hash{$group}{gq}; 312 | my $filter = $$outfile_hash{$group}{ft}; 313 | my %info = (); 314 | my $pos = $$outfile_hash{$group}{leftBkpt} - 1; 315 | my $end = $$outfile_hash{$group}{rightBkpt}; 316 | my $ciDelta = $end - $pos + 1; 317 | 318 | $info{IMPRECISE} = undef; 319 | $info{CIPOS} = "0,$ciDelta"; 320 | $info{CIEND} = "-$ciDelta,0"; 321 | $info{END} = $end; 322 | $info{SVTYPE} = "INS"; 323 | 324 | if ( $$outfile_hash{$group}{m_len} ne "NA" ) { 325 | $info{MSTART} = $$outfile_hash{$group}{l_m_pos}; 326 | $info{MEND} = $$outfile_hash{$group}{r_m_pos}; 327 | $info{MLEN} = $$outfile_hash{$group}{m_len}; 328 | } 329 | 330 | my $refline = `$opts{samtools} faidx $opts{reference} $chrom:$pos-$pos`; 331 | my $ref = ( split( /\n/, $refline ) )[1]; 332 | if ( !defined($ref) ) { $ref = "N"; } 333 | 334 | my $format = "GT:FT:GL:GQ:PL"; 335 | my $info = ""; 336 | my @sKeys = sort { $a cmp $b } keys %info; 337 | for ( my $i = 0 ; $i <= $#sKeys ; $i++ ) { 338 | if ( $i > 0 ) { $info .= ";"; } 339 | if ( defined( $info{ $sKeys[$i] } ) ) { 340 | $info .= "$sKeys[$i]=$info{$sKeys[$i]}"; 341 | } 342 | else { 343 | $info .= "$sKeys[$i]"; 344 | } 345 | } 346 | if ( $opts{output_gl} ) { 347 | print foutname1 "$chrom\t$pos\t$id\t$ref\t$alt\t$qual\t$filter\t$info\t$format"; 348 | my @gls = (); 349 | my @pls = (); 350 | foreach my $geno ( 0 .. $opts{ploidy} ) { 351 | push @gls, sprintf( "%.2f", $$outfile_hash{$group}{gl}{$geno} ); 352 | push @pls, $$outfile_hash{$group}{pl}{$geno}; 353 | } 354 | my $gl = join( ",", @gls ); 355 | my $pl = join( ",", @pls ); 356 | print foutname1 "\t$$outfile_hash{$group}{gt}:$$outfile_hash{$group}{ft}:$gl:$$outfile_hash{$group}{gq}:$pl\n"; 357 | } 358 | else { 359 | print foutname1 "$chrom\t$pos\t$id\t$ref\t$alt\t$qual\t$filter\t$info\n"; 360 | } 361 | if ( $opts{output_support} ) { 362 | print support1 "$$outfile_hash{$group}{support}"; 363 | } 364 | $index++; 365 | } 366 | close(foutname1); 367 | if ( $opts{output_support} ) { close(support1); } 368 | 369 | print "exiting report()\n" if $opts{verbose}; 370 | } 371 | 372 | sub calcGl { 373 | my ( $m, $g, $k, $l, $e ) = @_; 374 | print "in calcGl():\n" if $opts{verbose}; 375 | print "\t$m\t$g\t$k\t$l\t$e\n" if $opts{verbose}; 376 | if ( 1 / $m**$k <= 0 ) { die "problem in calcGL 1, \t$m\t$g\t$k\t$l\t$e\n"; } 377 | my $gl = log10( 1 / $m**$k ); 378 | if ( ( ( $m - $g ) * $e ) + ( ( 1 - $e ) * $g ) <= 0 ) { die "problem in calcGL 2, \t$m\t$g\t$k\t$l\t$e\n"; } 379 | $gl += log10( ( ( $m - $g ) * $e ) + ( ( 1 - $e ) * $g ) ) for 1 .. $l; 380 | if ( ( $m - $g ) * ( 1 - $e ) + ( $g * $e ) <= 0 ) { die "problem in calcGL 3, \t$m\t$g\t$k\t$l\t$e\n"; } 381 | $gl += log10( ( $m - $g ) * ( 1 - $e ) + ( $g * $e ) ) for ( $l + 1 ) .. $k; 382 | return $gl; 383 | } 384 | 385 | sub log10 { 386 | my $n = shift; 387 | return log($n) / log(10); 388 | } 389 | 390 | sub getInput { 391 | my ( $infile_hash, $readgroup_hash, $mask_hash, $mt_hash ) = @_; 392 | my @input_lines = (); 393 | print "Reading input files...\n" if $opts{verbose}; 394 | 395 | #open input file 396 | if ( $opts{by_chr_dir} ) { 397 | if (defined( $opts{mt_names} ) ) { 398 | foreach my $mt_name (keys %{$mt_hash}) { 399 | push @input_lines, "samtools view -T $opts{reference} $opts{by_chr_dir}/$mt_name.cram |"; 400 | } 401 | } 402 | elsif ( $opts{ucsc} ) { 403 | push @input_lines, "samtools view -T $opts{reference} $opts{by_chr_dir}/chrM.*cram |"; 404 | } 405 | elsif ( $opts{ensembl} ) { 406 | push @input_lines, "samtools view -T $opts{reference} $opts{by_chr_dir}/chrMT.*cram |"; 407 | } 408 | else { 409 | push @input_lines, "samtools view -T $opts{reference} $opts{by_chr_dir}/MT*cram |"; 410 | } 411 | } 412 | else { 413 | if (defined( $opts{mt_names} ) ) { 414 | foreach my $mt_name (keys %{$mt_hash}) { 415 | push @input_lines, "samtools view -T $opts{reference} $opts{input_filename} $mt_name |"; 416 | } 417 | } 418 | elsif ( $opts{ucsc} ) { 419 | push @input_lines, "samtools view -T $opts{reference} $opts{input_filename} chrM |"; 420 | } 421 | elsif ( $opts{ensembl} ) { 422 | push @input_lines, "samtools view -T $opts{reference} $opts{input_filename} chrMT |"; 423 | } 424 | else { 425 | push @input_lines, "samtools view -T $opts{reference} $opts{input_filename} MT |"; 426 | } 427 | } 428 | 429 | #input mask coordinates 430 | if ( $opts{include_mask} ) { 431 | open( mask1, $opts{mask_filename} ) || die "Could not open $opts{mask_filename} for input, $!\n"; 432 | while () { 433 | chomp; 434 | my ( $chr, $start, $end, $id ) = split(/\t/); 435 | 436 | $$mask_hash{$chr}{$start} = $end; 437 | if ( $opts{include_mask} ) { 438 | if ( $opts{by_chr_dir} ) { 439 | if ( $opts{ucsc} || $opts{ensembl} ) { 440 | $chr = "chr" . $chr; 441 | } 442 | push @input_lines, "samtools view -T $opts{reference} $opts{by_chr_dir}/$chr*cram $chr:$start-$end |"; 443 | } 444 | else { 445 | if ( $opts{ucsc} || $opts{ensembl} ) { 446 | $chr = "chr" . $chr; 447 | } 448 | push @input_lines, "samtools view -T $opts{reference} $opts{input_filename} $chr:$start-$end |"; 449 | } 450 | } 451 | $$mask_hash{$chr}{$start} = $end; 452 | } 453 | close mask1; 454 | } 455 | 456 | foreach my $input_line (@input_lines) { 457 | print "command: $input_line\n" if $opts{verbose}; 458 | open( fname1, $input_line ) || die "error in opening file, $!\n"; 459 | while ( my $line1 = ) { 460 | $seq_num = $i; 461 | chomp($line1); 462 | my ( $qname, $flag, $rname, $pos, $mapq, $cigar, $rnext, $pnext, $tlen, $seq, $qual ) = split( /\t/, $line1 ); 463 | my $pnextend = $pnext + length($seq); #use first read length as proxy for paired read length 464 | 465 | my ($read_group) = $line1 =~ /RG:Z:(\S+)/; 466 | 467 | if ( $opts{read_groups} && !defined($read_group) ) { next; } 468 | elsif ( $opts{read_groups} && !defined( $$readgroup_hash{$read_group} ) ) { next; } 469 | 470 | if ( $rnext eq '=' || $rnext eq '*' ) { next; } 471 | if (defined( $opts{mt_names} ) ) { 472 | if (!defined($$mt_hash{$rname}) && defined($$mt_hash{$rnext}) ) { next; } 473 | } 474 | else { 475 | if ( $rname !~ /M/ && $rnext =~ /M/ ) { next; } 476 | } 477 | 478 | my $dnext = 0; 479 | if ( $flag & 32 ) { 480 | $dnext = 1; 481 | } 482 | 483 | my $dir = 0; 484 | if ( $flag & 16 ) { 485 | $dir = 1; 486 | } 487 | 488 | #Compare to masked regions 489 | my $isMaskOverlap = 0; 490 | if ( $opts{include_mask} ) { 491 | foreach my $maskStart ( keys %{ $$mask_hash{$rnext} } ) { 492 | my $maskEnd = $$mask_hash{$rnext}{$maskStart}; 493 | if ( $pnext >= $maskStart && $pnext <= $maskEnd ) { 494 | $isMaskOverlap = 1; 495 | last; 496 | } 497 | elsif ($pnextend >= $maskStart && $pnextend <= $maskEnd) { 498 | $isMaskOverlap = 1; 499 | last; 500 | } 501 | elsif ($pnext <= $maskStart && $pnextend >= $maskEnd) { 502 | $isMaskOverlap = 1; 503 | last; 504 | } 505 | } 506 | } 507 | if ($isMaskOverlap) { next; } 508 | 509 | #get mate information 510 | if ( $opts{by_chr_dir} ) { 511 | open( SAM, "samtools view -T $opts{reference} $opts{by_chr_dir}/$rnext.*cram $rnext:$pnext-$pnext | " ) || die "Could not find MT cram file in $opts{by_chr_dir}, $!\n"; 512 | } 513 | else { 514 | open( SAM, "samtools view -T $opts{reference} $opts{input_filename} $rnext:$pnext-$pnext | " ) || die "Could not open $opts{input_filename}, $!\n"; 515 | } 516 | 517 | my $c_mapq = 0; 518 | my $cnext = 0; 519 | while () { 520 | chomp; 521 | my ( $m_qname, $m_flag, $m_rname, $m_pos, $m_mapq, $m_cigar, $m_rnext, $m_pnext, $m_tlen, $m_seq, $m_qual, $opt ) = split(/\t/); 522 | if ( $m_qname ne $qname ) { next; } 523 | $c_mapq = $m_mapq; 524 | $cnext = $m_cigar; 525 | #if ( $c_mapq < $opts{min_map_qual} ) { 526 | # print "MAPQ Filtering:\t$_\n" if $opts{verbose}; 527 | #} 528 | } 529 | close SAM; 530 | 531 | if ( $c_mapq < $opts{min_map_qual} ) { 532 | next; 533 | } 534 | 535 | $$infile_hash{$seq_num}{group} = 0; 536 | $$infile_hash{$seq_num}{seq} = $seq; 537 | 538 | $$infile_hash{$seq_num}{dir} = $dir; 539 | $$infile_hash{$seq_num}{qname} = $qname; 540 | $$infile_hash{$seq_num}{rname} = $rname; 541 | $$infile_hash{$seq_num}{pos} = $pos; 542 | $$infile_hash{$seq_num}{cigar} = $cigar; 543 | $$infile_hash{$seq_num}{cnext} = $cnext; 544 | $$infile_hash{$seq_num}{rnext} = $rnext; 545 | $$infile_hash{$seq_num}{pnext} = $pnext; 546 | $$infile_hash{$seq_num}{dnext} = $dnext; 547 | $$infile_hash{$seq_num}{tlen} = $tlen; 548 | $$infile_hash{$seq_num}{qual} = $qual; 549 | $$infile_hash{$seq_num}{line} = $line1; 550 | 551 | $i++; 552 | } 553 | close(fname1); 554 | } 555 | } 556 | 557 | sub findBreakpoint { 558 | my ( $outfile_hash, $readgroup_hash, $mask_hash ) = @_; 559 | print "entering findBreakpoint()\n" if $opts{verbose}; 560 | foreach my $group ( keys %$outfile_hash ) { 561 | print "\tAssessing group $group for breakpoints\n" if $opts{verbose}; 562 | my $l_pos = $$outfile_hash{$group}{l_pos}; 563 | my $r_pos = $$outfile_hash{$group}{r_pos}; 564 | my $chro = $$outfile_hash{$group}{chr}; 565 | $$outfile_hash{$group}{numAltSR} = 0; 566 | $$outfile_hash{$group}{numRefSR} = 0; 567 | $$outfile_hash{$group}{leftBkpt} = $l_pos; 568 | $$outfile_hash{$group}{rightBkpt} = $r_pos; 569 | 570 | #Compare to masked regions 571 | my $isMaskOverlap = 0; 572 | if ( $opts{include_mask} ) { 573 | foreach my $maskStart ( keys %{ $$mask_hash{$chro} } ) { 574 | my $maskEnd = $$mask_hash{$chro}{$maskStart}; 575 | if ( ( $l_pos >= $maskStart && $l_pos <= $maskEnd ) || ( $r_pos >= $maskStart && $r_pos <= $maskEnd ) || ( $l_pos <= $maskStart && $r_pos >= $maskEnd ) ) { 576 | $isMaskOverlap = 1; 577 | last; 578 | } 579 | } 580 | } 581 | if ($isMaskOverlap) { next; } 582 | my %clippedPos = (); 583 | 584 | #open input file 585 | if ( $opts{by_chr_dir} ) { 586 | open( SAM, "samtools view -T $opts{reference} $opts{by_chr_dir}/$chro.*cram $chro:$l_pos-$r_pos |" ) || die "Could not find MT cram file in $opts{by_chr_dir}, $!\n"; 587 | } 588 | else { 589 | open( SAM, "samtools view -T $opts{reference} $opts{input_filename} $chro:$l_pos-$r_pos |" ) || die "Could not open $opts{input_filename}, $!\n"; 590 | } 591 | 592 | my %cnt = (); 593 | while () { 594 | chomp; 595 | my ( $qname, $flag, $rname, $pos, $mapq, $cigar, $rnext, $pnext, $tlen, $seq, $qual ) = split(/\t/); 596 | my ($read_group) = $_ =~ /RG:Z:(\S+)/; 597 | 598 | if ( $opts{read_groups} && !defined($read_group) ) { next; } 599 | elsif ( $opts{read_groups} && !defined( $$readgroup_hash{$read_group} ) ) { next; } 600 | 601 | #Check for read positions outside max_read_cov 602 | my $break = 0; 603 | for ( my $p = 0 ; $p <= length($seq) ; $p++ ) { 604 | $cnt{ $pos + $p }++; 605 | if ( $cnt{ $pos + $p } > $opts{max_read_cov} ) { $break = 1; last; } 606 | } 607 | 608 | if ($break) { 609 | print "Read count has reached limit of $opts{max_read_cov}, removing group $group\n" if $opts{verbose}; 610 | delete( $$outfile_hash{$group} ); 611 | last; 612 | } 613 | 614 | #Mark Clipped Positions 615 | my ( $cPos, $clipside, $clipsize ) = getSoftClipInfo( $pos, $cigar, $qual ); 616 | if ( $cPos > -1 ) { 617 | $clippedPos{$cPos}++; 618 | } 619 | } 620 | close SAM; 621 | 622 | if ( !defined( $$outfile_hash{$group} ) ) { next; } 623 | my %bkpts = (); 624 | my $numBkpts = 0; 625 | foreach my $cPos ( sort keys %clippedPos ) { 626 | if ( $clippedPos{$cPos} > 1 ) { 627 | $bkpts{$cPos} = $clippedPos{$cPos}; 628 | $numBkpts++; 629 | } 630 | } 631 | 632 | my $num_bkpt_support = 0; 633 | my $num_ref_support = 0; 634 | my $leftBkpt = $l_pos; 635 | my $rightBkpt = $r_pos; 636 | 637 | if ( $numBkpts > 0 && scalar keys %clippedPos <= $opts{max_num_clipped} ) { 638 | 639 | #take two most prevelant breaks for now 640 | my @sorted = sort { $bkpts{$b} <=> $bkpts{$a} } keys %bkpts; 641 | if ( $numBkpts == 1 ) { 642 | $leftBkpt = $sorted[0]; 643 | $rightBkpt = $leftBkpt + 1; 644 | $num_bkpt_support = $bkpts{ $sorted[0] }; 645 | $num_ref_support = $cnt{ $sorted[0] } - $num_bkpt_support; 646 | } 647 | else { 648 | if ( $sorted[0] < $sorted[1] ) { 649 | $leftBkpt = $sorted[0]; 650 | $rightBkpt = $sorted[1]; 651 | } 652 | else { 653 | $leftBkpt = $sorted[1]; 654 | $rightBkpt = $sorted[0]; 655 | } 656 | $num_bkpt_support = $bkpts{ $sorted[0] } + $bkpts{ $sorted[1] }; 657 | $num_ref_support = $cnt{ $sorted[0] } + $cnt{ $sorted[1] } - $num_bkpt_support; 658 | } 659 | } 660 | $$outfile_hash{$group}{leftBkpt} = $leftBkpt; 661 | $$outfile_hash{$group}{rightBkpt} = $rightBkpt; 662 | $$outfile_hash{$group}{numAltSR} = $num_bkpt_support; 663 | $$outfile_hash{$group}{numRefSR} = $num_ref_support; 664 | } 665 | print "exiting findBreakpoints()\n" if $opts{verbose}; 666 | } 667 | 668 | sub seqCluster { 669 | my ($infile_hash) = @_; 670 | my $k = 0; 671 | my %d = (); 672 | 673 | $d{0}{k} = 0; 674 | $d{0}{pnext} = 0; 675 | $d{0}{last} = (); 676 | $d{0}{rnext} = 0; 677 | 678 | $d{1}{k} = 0; 679 | $d{1}{pnext} = 0; 680 | $d{1}{last} = (); 681 | $d{1}{rnext} = 0; 682 | 683 | my @sorted = sort { $$infile_hash{$a}->{rnext} cmp $$infile_hash{$b}->{rnext} || $$infile_hash{$a}->{pnext} <=> $$infile_hash{$b}->{pnext} } keys %{$infile_hash}; 684 | print scalar @sorted . " total reads to process for clustering\n" if $opts{verbose}; 685 | foreach my $c_seq_num (@sorted) { 686 | my $c_pnext = $$infile_hash{$c_seq_num}{pnext}; 687 | my $c_dnext = $$infile_hash{$c_seq_num}{dnext}; 688 | my $c_rnext = $$infile_hash{$c_seq_num}{rnext}; 689 | my $c_qname = $$infile_hash{$c_seq_num}{qname}; 690 | 691 | print "$c_dnext\n" if $opts{verbose}; 692 | if ( $c_pnext - $d{$c_dnext}{pnext} > $opts{len_cluster_include} || $d{$c_dnext}{k} == 0 || $c_rnext ne $d{$c_dnext}{rnext} ) { 693 | 694 | #print "c_pnext:$c_pnext \t d_pnext:$d{$c_dnext}{pnext} \t dir:$d{$c_dnext}{k}\n" if $opts{verbose}; 695 | if ( $d{$c_dnext}{k} > 0 && scalar @{ $d{$c_dnext}{last} } < $opts{min_reads_cluster} ) { 696 | foreach my $seq_num ( @{ $d{$c_dnext}{last} } ) { 697 | delete $$infile_hash{$seq_num}; 698 | } 699 | } 700 | $k++; 701 | $$infile_hash{$c_seq_num}{'group'} = $k; 702 | $d{$c_dnext}{k} = $k; 703 | $d{$c_dnext}{last} = (); 704 | } 705 | else { 706 | $$infile_hash{$c_seq_num}{'group'} = $d{$c_dnext}{k}; 707 | 708 | } 709 | print "$d{$c_dnext}{k}\t$c_qname\t$c_rnext\t$c_pnext" if $opts{verbose}; 710 | 711 | push @{ $d{$c_dnext}{last} }, $c_seq_num; 712 | $d{$c_dnext}{pnext} = $c_pnext; 713 | $d{$c_dnext}{rnext} = $c_rnext; 714 | 715 | #print "k:$k \t grp:$$infile_hash{$c_seq_num}{'group'} \t $d{$c_dnext}{k} \t pre_pos:$d{$c_dnext}{pnext}\n" if $opts{verbose}; 716 | #print "$$infile_hash{$c_seq_num}->{'group'}\n chr_num:$c_rnext\n" if $opts{verbose}; 717 | } 718 | 719 | } 720 | 721 | sub linkCluster { 722 | my ($infile_hash) = @_; 723 | 724 | #this can link multiple F's to a single leftmost R 725 | my @sorted = sort { $$infile_hash{$a}->{rnext} cmp $$infile_hash{$b}->{rnext} || $$infile_hash{$a}->{pnext} <=> $$infile_hash{$b}->{pnext} } keys %{$infile_hash}; 726 | print scalar @sorted . " total reads to process for linking clusters\n" if $opts{verbose}; 727 | 728 | for ( my $c = 0 ; $c <= $#sorted ; $c++ ) { 729 | my $c_seq_num = $sorted[$c]; 730 | $$infile_hash{$c_seq_num}{link} = 0; 731 | my $c_pnext = $$infile_hash{$c_seq_num}{pnext}; 732 | my $c_dnext = $$infile_hash{$c_seq_num}{dnext}; 733 | my $c_rnext = $$infile_hash{$c_seq_num}{rnext}; 734 | my $c_dir = $$infile_hash{$c_seq_num}{dir}; 735 | 736 | if ( $c_dnext == 1 ) { next; } 737 | 738 | for ( my $d = $c + 1 ; $d <= $#sorted ; $d++ ) { 739 | my $d_seq_num = $sorted[$d]; 740 | my $d_pnext = $$infile_hash{$d_seq_num}{pnext}; 741 | my $d_dnext = $$infile_hash{$d_seq_num}{dnext}; 742 | my $d_rnext = $$infile_hash{$d_seq_num}{rnext}; 743 | my $d_dir = $$infile_hash{$d_seq_num}{dir}; 744 | 745 | if ( $d_dnext == 0 ) { next; } 746 | 747 | my $delta = $d_pnext - $c_pnext; 748 | 749 | if ( 750 | $delta < $opts{len_cluster_link} 751 | && $c_rnext eq $d_rnext 752 | && ( ( $c_dir == 0 && $c_dnext == 1 && $d_dnext == 0 && $d_dir == 1 ) 753 | || ( $c_dir == 0 && $c_dnext == 0 && $d_dnext == 1 && $d_dir == 1 ) 754 | || ( $c_dir == 1 && $c_dnext == 0 && $d_dnext == 1 && $d_dir == 0 ) ) 755 | ) 756 | { 757 | 758 | #0.0.11 - must have consistent orientation between left and right sides of insertions 759 | $$infile_hash{$c_seq_num}{link} = $$infile_hash{$d_seq_num}{group}; 760 | $$infile_hash{$d_seq_num}{link} = $$infile_hash{$c_seq_num}{group}; 761 | } 762 | elsif ( !defined( $$infile_hash{$c_seq_num}{link} ) ) { 763 | $$infile_hash{$c_seq_num}{link} = 0; 764 | } 765 | } 766 | } 767 | } 768 | 769 | sub mapCluster { 770 | my ( $infile_hash, $outfile_hash, $readgroup_hash ) = @_; 771 | 772 | my %l_linked_groups; 773 | my %linked_group_pnext; 774 | 775 | foreach my $key ( sort { $infile_hash{$a}->{group} <=> $infile_hash{$b}->{group} } keys %infile_hash ) { 776 | if ( ( $infile_hash{$key}{'group'} > 0 ) && ( $infile_hash{$key}{'dnext'} == 0 ) && ( $infile_hash{$key}{'link'} > 0 ) ) { 777 | $l_linked_groups{ $infile_hash{$key}{'group'} } = $infile_hash{$key}{'link'}; 778 | } 779 | } 780 | 781 | my $i = 1; 782 | while ( my ( $group, $link ) = each(%l_linked_groups) ) { 783 | print "group = $group ;; link = $link \n" if $opts{verbose}; 784 | my @l_rnext = map { $infile_hash{$_}{'rnext'} } grep { $infile_hash{$_}{group} == $group } keys %infile_hash; 785 | my @l_rname = map { $infile_hash{$_}{'rname'} } grep { $infile_hash{$_}{group} == $group } keys %infile_hash; 786 | my @l_pos = map { $infile_hash{$_}{'pos'} } grep { $infile_hash{$_}{group} == $group } keys %infile_hash; 787 | my @l_dir = map { $infile_hash{$_}{'dir'} } grep { $infile_hash{$_}{group} == $group } keys %infile_hash; 788 | my @l_qname = map { $infile_hash{$_}{'qname'} } grep { $infile_hash{$_}{group} == $group } keys %infile_hash; 789 | my @l_pnext = map { $infile_hash{$_}{'pnext'} } grep { $infile_hash{$_}{group} == $group } keys %infile_hash; 790 | my @l_dnext = map { $infile_hash{$_}{'dnext'} } grep { $infile_hash{$_}{group} == $group } keys %infile_hash; 791 | my @l_cigar = map { $infile_hash{$_}{'cigar'} } grep { $infile_hash{$_}{group} == $group } keys %infile_hash; 792 | my @l_cnext = map { $infile_hash{$_}{'cnext'} } grep { $infile_hash{$_}{group} == $group } keys %infile_hash; 793 | my @r_rnext = map { $infile_hash{$_}{'rnext'} } grep { $infile_hash{$_}{group} == $link } keys %infile_hash; 794 | my @r_pos = map { $infile_hash{$_}{'pos'} } grep { $infile_hash{$_}{group} == $link } keys %infile_hash; 795 | my @r_dir = map { $infile_hash{$_}{'dir'} } grep { $infile_hash{$_}{group} == $link } keys %infile_hash; 796 | my @r_rname = map { $infile_hash{$_}{'rname'} } grep { $infile_hash{$_}{group} == $link } keys %infile_hash; 797 | my @r_pnext = map { $infile_hash{$_}{'pnext'} } grep { $infile_hash{$_}{group} == $link } keys %infile_hash; 798 | my @r_dnext = map { $infile_hash{$_}{'dnext'} } grep { $infile_hash{$_}{group} == $link } keys %infile_hash; 799 | my @r_qname = map { $infile_hash{$_}{'qname'} } grep { $infile_hash{$_}{group} == $link } keys %infile_hash; 800 | my @r_cigar = map { $infile_hash{$_}{'cigar'} } grep { $infile_hash{$_}{group} == $link } keys %infile_hash; 801 | my @r_cnext = map { $infile_hash{$_}{'cnext'} } grep { $infile_hash{$_}{group} == $link } keys %infile_hash; 802 | 803 | my @rc_pnext = (); 804 | my @lc_pnext = (); 805 | my $chr = $l_rnext[0]; 806 | 807 | #update right coordinates based on cigar length 808 | for ( my $c = 0 ; $c <= $#r_cnext ; $c++ ) { 809 | my $cigar = $r_cnext[$c]; 810 | $rc_pnext[$c] = $r_pnext[$c]; 811 | 812 | while ( $cigar =~ /(\d+)M/g ) { 813 | $rc_pnext[$c] += $1; 814 | } 815 | while ( $cigar =~ /(\d+)N/g ) { 816 | $rc_pnext[$c] += $1; 817 | } 818 | while ( $cigar =~ /(\d+)D/g ) { 819 | $rc_pnext[$c] += $1; 820 | } 821 | } 822 | 823 | #update left coordinates based on cigar length 824 | for ( my $c = 0 ; $c <= $#l_cnext ; $c++ ) { 825 | my $cigar = $l_cnext[$c]; 826 | $lc_pnext[$c] = $l_pnext[$c]; 827 | 828 | while ( $cigar =~ /(\d+)M/g ) { 829 | $lc_pnext[$c] += $1; 830 | } 831 | while ( $cigar =~ /(\d+)N/g ) { 832 | $lc_pnext[$c] += $1; 833 | } 834 | while ( $cigar =~ /(\d+)D/g ) { 835 | $lc_pnext[$c] += $1; 836 | } 837 | } 838 | 839 | my @s_l_pnext = sort { $a <=> $b } @l_pnext; 840 | my @s_r_pnext = sort { $a <=> $b } @r_pnext; 841 | 842 | my @s_lc_pnext = sort { $a <=> $b } @lc_pnext; 843 | my @s_rc_pnext = sort { $a <=> $b } @rc_pnext; 844 | 845 | my $l_brk_point = $s_l_pnext[$#s_l_pnext]; 846 | my $r_brk_point = $s_rc_pnext[0]; 847 | 848 | my $win_l_s = $s_l_pnext[0]; 849 | my $win_l_e = $s_lc_pnext[$#s_lc_pnext]; 850 | my $win_r_s = $s_r_pnext[0]; 851 | my $win_r_e = $s_rc_pnext[$#s_rc_pnext]; 852 | 853 | $linked_group_pnext{$i}{l_group} = $group; 854 | $linked_group_pnext{$i}{r_group} = $link; 855 | 856 | #Check for crossing clusters; occurs in regions of high coverage, but not ascertained 857 | #until next step so stopgap here 858 | if ( $l_brk_point > $r_brk_point ) { 859 | my $temp = $l_brk_point; 860 | $l_brk_point = $r_brk_point; 861 | $r_brk_point = $temp; 862 | } 863 | 864 | my $commandLeft = ""; 865 | my $commandRight = ""; 866 | if ( $opts{by_chr_dir} ) { 867 | $commandLeft = "samtools view -T $opts{reference} $opts{by_chr_dir}/$chr.*cram $chr:$win_l_s-$win_l_e |"; 868 | $commandRight = "samtools view -T $opts{reference} $opts{by_chr_dir}/$chr.*cram $chr:$win_r_s-$win_r_e |"; 869 | } 870 | else { 871 | $commandLeft = "samtools view -T $opts{reference} $opts{input_filename} $chr:$win_l_s-$win_l_e |"; 872 | $commandRight = "samtools view -T $opts{reference} $opts{input_filename} $chr:$win_r_s-$win_r_e |"; 873 | } 874 | 875 | my $numRefRP = 0; 876 | my $numAltRP = scalar @l_pnext + scalar @r_pnext; 877 | my $sumE = 0; 878 | 879 | #left 880 | open( SAM, $commandLeft ) || die "Could not open sam file for input, $!\n"; 881 | while () { 882 | chomp; 883 | my ( $qname, $flag, $rname, $pos, $mapq, $cigar, $rnext, $pnext, $tlen, $seq, $qual ) = split(/\t/); 884 | 885 | my $mapE = 10**( -1 * $mapq / 10 ); 886 | my ($read_group) = $_ =~ /RG:Z:(\S+)/; 887 | 888 | if ( $opts{read_groups} && !defined($read_group) ) { next; } 889 | elsif ( $opts{read_groups} && !defined( $$readgroup_hash{$read_group} ) ) { next; } 890 | 891 | if ( $mapq < $opts{min_map_qual} ) { next; } 892 | my $dir = 0; #F 893 | if ( $flag & 16 ) { $dir = 1; } #R 894 | 895 | if ( $pos >= $win_l_s && $pos <= $win_l_e && $dir == 0 ) { 896 | $numRefRP++; 897 | $sumE += $mapE; 898 | } 899 | } 900 | close SAM; 901 | 902 | #right 903 | open( SAM, $commandRight ) || die "Could not open sam file for input, $!\n"; 904 | while () { 905 | chomp; 906 | my ( $qname, $flag, $rname, $pos, $mapq, $cigar, $rnext, $pnext, $tlen, $seq, $qual ) = split(/\t/); 907 | 908 | my $mapE = 10**( -1 * $mapq / 10 ); 909 | my ($read_group) = $_ =~ /RG:Z:(\S+)/; 910 | 911 | if ( $opts{read_groups} && !defined($read_group) ) { next; } 912 | elsif ( $opts{read_groups} && !defined( $$readgroup_hash{$read_group} ) ) { next; } 913 | 914 | if ( $mapq < $opts{min_map_qual} ) { next; } 915 | my $dir = 0; #F 916 | if ( $flag & 16 ) { $dir = 1; } #R 917 | 918 | if ( $pos >= $win_r_s && $pos <= $win_r_e && $dir == 1 ) { 919 | $numRefRP++; 920 | $sumE += $mapE; 921 | } 922 | } 923 | close SAM; 924 | 925 | my $avgQ = $sumE / $numRefRP; #calculate average over all reads, ref and alt 926 | $numRefRP -= $numAltRP; #correct for alt reads 927 | 928 | #estimate mitochondrial coordinates from mated sequence alignments 929 | my $l_m_min = 1e10; 930 | my $l_m_max = 0; 931 | my $l_m_min_i = -1; 932 | my $l_m_max_i = -1; 933 | my $l_n_dir = -1; 934 | my $l_m_dir = -1; 935 | 936 | for ( my $i = 0 ; $i <= $#l_qname ; $i++ ) { 937 | if ( $l_rname[$i] !~ /M/ ) { next; } #don't include nuclear homologous regions 938 | if ( $l_pos[$i] < $l_m_min ) { 939 | $l_m_min = $l_pos[$i]; 940 | $l_m_min_i = $i; 941 | } 942 | if ( $l_pos[$i] > $l_m_max ) { 943 | $l_m_max = $l_pos[$i]; 944 | $l_m_max_i = $i; 945 | } 946 | $l_n_dir = $l_dnext[$i]; 947 | $l_m_dir = $l_dir[$i]; 948 | } 949 | my $r_m_min = 1e10; 950 | my $r_m_max = 0; 951 | my $r_m_min_i = -1; 952 | my $r_m_max_i = -1; 953 | my $r_n_dir = -1; 954 | my $r_m_dir = -1; 955 | 956 | for ( my $i = 0 ; $i <= $#r_qname ; $i++ ) { 957 | if ( $r_rname[$i] !~ /M/ ) { next; } #don't include nuclear homologous regions 958 | if ( $r_pos[$i] < $r_m_min ) { 959 | $r_m_min = $r_pos[$i]; 960 | $r_m_min_i = $i; 961 | } 962 | if ( $r_pos[$i] > $r_m_max ) { 963 | $r_m_max = $r_pos[$i]; 964 | $r_m_max_i = $i; 965 | } 966 | $r_n_dir = $r_dnext[$i]; 967 | $r_m_dir = $r_dir[$i]; 968 | } 969 | $$outfile_hash{$group}{l_m_pos} = "NA"; 970 | $$outfile_hash{$group}{r_m_pos} = "NA"; 971 | $$outfile_hash{$group}{m_len} = "NA"; 972 | 973 | if ( $l_m_dir > -1 && $r_m_dir > -1 ) { #have mitochondrial mappings 974 | if ( $l_n_dir == 0 && $l_m_dir == 1 && $r_m_dir == 0 && $r_n_dir == 1 ) { 975 | my $cigar = $r_cigar[$r_m_max_i]; 976 | while ( $cigar =~ /(\d+)M/g ) { 977 | $r_m_max += $1; 978 | } 979 | while ( $cigar =~ /(\d+)N/g ) { 980 | $r_m_max += $1; 981 | } 982 | while ( $cigar =~ /(\d+)D/g ) { 983 | $r_m_max += $1; 984 | } 985 | } 986 | elsif ( $l_n_dir == 0 && $l_m_dir == 0 && $r_m_dir == 1 && $r_n_dir == 1 ) { 987 | my $cigar = $l_cigar[$l_m_max_i]; 988 | while ( $cigar =~ /(\d+)M/g ) { 989 | $l_m_max += $1; 990 | } 991 | while ( $cigar =~ /(\d+)N/g ) { 992 | $l_m_max += $1; 993 | } 994 | while ( $cigar =~ /(\d+)D/g ) { 995 | $l_m_max += $1; 996 | } 997 | } 998 | 999 | $$outfile_hash{$group}{l_m_pos} = $l_m_min; 1000 | $$outfile_hash{$group}{r_m_pos} = $r_m_max; 1001 | if ( $r_m_max > $l_m_min ) { 1002 | $$outfile_hash{$group}{l_m_pos} = $l_m_min; 1003 | $$outfile_hash{$group}{r_m_pos} = $r_m_max; 1004 | } 1005 | else { 1006 | $$outfile_hash{$group}{l_m_pos} = $r_m_max; 1007 | $$outfile_hash{$group}{r_m_pos} = $l_m_max; 1008 | } 1009 | 1010 | #currently assumes smallest sequence possible due to circular nature of mt dna 1011 | $$outfile_hash{$group}{m_len} = $$outfile_hash{$group}{r_m_pos} - $$outfile_hash{$group}{l_m_pos} + 1; 1012 | my $lenAlt = $opts{len_mt} - $$outfile_hash{$group}{r_m_pos} + $$outfile_hash{$group}{l_m_pos} + 1; 1013 | if ( $lenAlt < $$outfile_hash{$group}{m_len} ) { $$outfile_hash{$group}{m_len} = $lenAlt; } 1014 | } 1015 | 1016 | $$outfile_hash{$group}{avgQ} = $avgQ; 1017 | $$outfile_hash{$group}{l_pos} = $l_brk_point; 1018 | $$outfile_hash{$group}{r_pos} = $r_brk_point; 1019 | $$outfile_hash{$group}{chr} = $chr; 1020 | $$outfile_hash{$group}{numRefRP} = $numRefRP; 1021 | $$outfile_hash{$group}{numAltRP} = $numAltRP; 1022 | 1023 | if ( $opts{output_support} ) { 1024 | foreach my $key ( grep { $infile_hash{$_}{group} == $group } keys %infile_hash ) { 1025 | $$outfile_hash{$group}{support} .= "$infile_hash{$key}{line}\n"; 1026 | } 1027 | foreach my $key ( grep { $infile_hash{$_}{group} == $link } keys %infile_hash ) { 1028 | $$outfile_hash{$group}{support} .= "$infile_hash{$key}{line}\n"; 1029 | } 1030 | } 1031 | $i++; 1032 | 1033 | if ( $opts{verbose} ) { 1034 | print "\t$chr\t$$outfile_hash{$group}{l_pos}\t$$outfile_hash{$group}{r_pos}\t$$outfile_hash{$group}{numRefRP}\t$$outfile_hash{$group}{numAltRP}\t$$outfile_hash{$group}{avgQ}\t$$outfile_hash{$group}{l_m_pos}\t$$outfile_hash{$group}{r_m_pos}\t$$outfile_hash{$group}{m_len}\n"; 1035 | } 1036 | } 1037 | } 1038 | 1039 | sub getSoftClipInfo { 1040 | my ( $pos, $cigar, $qual ) = @_; 1041 | my $clipside = ""; 1042 | my $clipsize = 0; 1043 | my $cPos = -1; 1044 | my $avgQual = -1; 1045 | 1046 | if ( $cigar =~ /^(\d+)S.*M.*?(\d+)S$/ ) { 1047 | if ( $1 > $2 ) { 1048 | $cPos = $pos; 1049 | $clipside = "l"; 1050 | $clipsize = $1; 1051 | 1052 | #consider breakpoint after leftmost soft clipped fragment 1053 | } 1054 | else { 1055 | $cPos = $pos - 1; 1056 | $clipside = "r"; 1057 | $clipsize = $2; 1058 | while ( $cigar =~ /(\d+)M/g ) { #have to take into account that a CIGAR may contain multiple M's 1059 | $cPos += $1; 1060 | } 1061 | while ( $cigar =~ /(\d+)I/g ) { 1062 | $cPos -= $1; 1063 | } 1064 | while ( $cigar =~ /(\d+)D/g ) { 1065 | $cPos += $1; 1066 | } 1067 | } 1068 | } 1069 | 1070 | #upstream soft clip only 1071 | elsif ( $cigar =~ /^(\d+)S.*M/ ) { 1072 | $cPos = $pos; 1073 | $clipside = "l"; 1074 | $clipsize = $1; 1075 | } 1076 | 1077 | #downstream soft clip only 1078 | elsif ( $cigar =~ /M.*?(\d+)S/ ) { 1079 | $cPos = $pos - 1; 1080 | $clipside = "r"; 1081 | $clipsize = $1; 1082 | while ( $cigar =~ /(\d+)M/g ) { #have to take into account that a CIGAR may contain multiple M's 1083 | $cPos += $1; 1084 | } 1085 | while ( $cigar =~ /(\d+)I/g ) { 1086 | $cPos -= $1; 1087 | } 1088 | while ( $cigar =~ /(\d+)D/g ) { 1089 | $cPos += $1; 1090 | } 1091 | } 1092 | 1093 | #Check quality of clipped sequence and alignment to reference 1094 | if ( $cPos > -1 ) { 1095 | my $clippedQuals = ""; 1096 | 1097 | if ( $clipside eq "r" ) { 1098 | $clippedQuals = substr( $qual, length($qual) - $clipsize - 1, $clipsize ); 1099 | } 1100 | else { 1101 | $clippedQuals = substr( $qual, 0, $clipsize ); 1102 | } 1103 | 1104 | my $avgQualSum = 0; 1105 | my $avgQualNum = 0; 1106 | foreach my $qual ( split( //, $clippedQuals ) ) { 1107 | $avgQualSum += ord($qual) - 33; 1108 | $avgQualNum++; 1109 | } 1110 | $avgQual = $avgQualSum / $avgQualNum; 1111 | } 1112 | if ( $avgQual < 10 ) { $cPos = -1; } 1113 | if ( $clipsize < $opts{min_clipped_seq} ) { $cPos = -1; } 1114 | return ( $cPos, $clipside, $clipsize ); 1115 | } 1116 | 1117 | sub usage { 1118 | my $version = shift; 1119 | printf("\n"); 1120 | printf( "%-9s %s\n", "Program:", "dinumt.pl" ); 1121 | printf( "%-9s %s\n", "Version:", "$version" ); 1122 | printf("\n"); 1123 | printf( "%-9s %s\n", "Usage:", "dinumt.pl [options]" ); 1124 | printf("\n"); 1125 | printf( "%-9s %-35s %s\n", "Options:", "--input_filename=[filename]", "Input alignment file in BAM format" ); 1126 | printf( "%-9s %-35s %s\n", "", "--output_filename=[filename]", "Output file (default stdout)" ); 1127 | printf( "%-9s %-35s %s\n", "", "--mask_filename=[filename]", "Mask file for reference numts in BED format (optional)" ); 1128 | printf( "%-9s %-35s %s\n", "", "--include_mask", "Include aberrant reads mapped to mask regions in clustering" ); 1129 | printf( "%-9s %-35s %s\n", "", "--len_cluster_include=[integer]", "Maximum distance to be included in cluster (default 600)" ); 1130 | printf( "%-9s %-35s %s\n", "", "--len_cluster_link=[integer]", "Maximum distance to link clusters (default 800)" ); 1131 | printf( "%-9s %-35s %s\n", "", "--min_reads_cluster=[integer]", "Minimum number of reads to link a cluster (default 1)" ); 1132 | printf( "%-9s %-35s %s\n", "", "--min_evidence=[integer]", "Minimum evidence to consider an insertion event (default 4)" ); 1133 | printf( "%-9s %-35s %s\n", "", "--min_map_qual=[integer]", "Minimum mapping quality for read consideration (default 10)" ); 1134 | printf( "%-9s %-35s %s\n", "", "--max_read_cov=[integer]", "Maximum read coverage allowed for breakpoint searching (default 200)" ); 1135 | printf( "%-9s %-35s %s\n", "", "--min_clipped_seq=[integer]", "Minimum clipped sequence required to consider as putative breakpoint (default 5)" ); 1136 | printf( "%-9s %-35s %s\n", "", "--max_num_clipped=[integer]", "Maximum number of clipped sequences observed before removing from evidence consideration (default 5)" ); 1137 | printf( "%-9s %-35s %s\n", "", "--read_groups=[read_group1],...", "Limit analysis to specified read group(s)" ); 1138 | printf( "%-9s %-35s %s\n", "", "--mt_names=[mt_name1],...", "Limit analysis to specified mitochondrial sequence names" ); 1139 | printf( "%-9s %-35s %s\n", "", "--by_chr_dir=[directory]", "If set, expects to find chr specific CRAM files in indicated directory" ); 1140 | printf( "%-9s %-35s %s\n", "", "--prefix=[string]", "Prepend label in report output" ); 1141 | printf( "%-9s %-35s %s\n", "", "--ucsc", "Use UCSC genome formatting (e.g. chrM)" ); 1142 | printf( "%-9s %-35s %s\n", "", "--ensembl", "Use Ensembl genome formatting (e.g. chrMT)" ); 1143 | printf( "%-9s %-35s %s\n", "", "--output_gl", "Output genotype likelihood information" ); 1144 | printf("\n"); 1145 | } 1146 | 1147 | sub checkOptions { 1148 | my $optResult = shift; 1149 | my $opts = shift; 1150 | my $version = shift; 1151 | 1152 | if ( !$optResult || $$opts{help} ) { 1153 | usage($version); 1154 | exit; 1155 | } 1156 | 1157 | if ( !defined( $$opts{input_filename} ) && !defined( $$opts{by_chr_dir} ) ) { 1158 | print "\n***ERROR***\t--input_filename or --by_chr_dir is required\n"; 1159 | usage($version); 1160 | exit; 1161 | } 1162 | elsif ( !defined( $$opts{by_chr_dir} ) && !-e $$opts{input_filename} ) { 1163 | print "\n***ERROR***\t--input_filename does not exist\n"; 1164 | usage($version); 1165 | exit; 1166 | } 1167 | elsif ( defined( $$opts{by_chr_dir} ) && !-d $$opts{by_chr_dir} ) { 1168 | print "\n***ERROR***\t--by_chr_dir does not exist\n"; 1169 | usage($version); 1170 | exit; 1171 | } 1172 | if ( defined( $$opts{mask_filename} ) && !-e ( $$opts{mask_filename} ) ) { 1173 | print "\n***ERROR***\t--mask_filename does not exist\n"; 1174 | usage($version); 1175 | exit; 1176 | } 1177 | if ( !defined( $$opts{reference} ) ) { 1178 | print "\n***ERROR***\t--reference is required\n"; 1179 | usage($version); 1180 | exit; 1181 | } 1182 | if ( $$opts{include_mask} && !defined( $$opts{mask_filename} ) ) { 1183 | print "\n***ERROR***\t--mask_filename is neccessary with --include_mask option\n"; 1184 | usage($version); 1185 | exit; 1186 | } 1187 | if ( $$opts{output_support} && !defined( $$opts{output_filename} ) ) { 1188 | print "\n***ERROR***\t--output_filename is neccessary with --output_support option\n"; 1189 | usage($version); 1190 | exit; 1191 | } 1192 | if ( $$opts{ucsc} && $$opts{ensembl} ) { 1193 | print "\n***ERROR***\t--ucsc and --ensembl are mutually exclusive options\n"; 1194 | usage($version); 1195 | exit; 1196 | } 1197 | } 1198 | --------------------------------------------------------------------------------