├── .gitignore ├── LICENSE ├── README.md ├── media ├── GeneVar-flowchart.png ├── GeneVar-methods-overview.png ├── GeneVar-oct13-methods.pdf ├── genevar-app-final.png ├── genevar-app-prototype.png ├── genevar.png ├── genevar_logo_multicolor.jpg ├── genevar_logo_multicolor_v2.jpg └── logo_size.jpg ├── scripts ├── AF_extract.py ├── README.md ├── annotGeneSV.R ├── extract-chr21-genes-variants.ipynb ├── gnomad-variants-af.ipynb ├── intersection_gencode_with_variants_commands.txt ├── linkGeneWithSV.R ├── make-misc-tsvs.ipynb ├── parse_gencode.pl ├── reformat_clinvar_results.pl ├── reformat_tsv_variant_file.pl └── variant-gene-overlap.ipynb └── shinyapp ├── README.md ├── run-app.R ├── server.R ├── testdata ├── README.md ├── af.tsv ├── all_variants.tsv ├── clinsnv_variants.tsv ├── clinsv_variants.tsv ├── ext_urls.tsv ├── gene_variants.tsv └── make-test-data.R └── ui.R /.gitignore: -------------------------------------------------------------------------------- 1 | *~ 2 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2020 collaborativebioinformatics 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # GeneVar 2 | 3 | ![](media/genevar_logo_multicolor_v2.jpg) 4 | 5 | ## Demo 6 | 7 | Please use this link to access the [live demo](https://jmonlong.shinyapps.io/GeneVar/). 8 | 9 | ## Goal 10 | 11 | Develop a tool to facilitate a **gene-centered view of human structural variants**, which takes as input a gene name or id and produces a report, file, and/or genome browser session that informs the user of all structural variants overlapping the gene, including any non-coding regulatory elements affecting expression of the gene. 12 | 13 | The tool is **intended to have a clinical focus**, informing the interpretation of structural variants pertaining to the gene provided by the user. 14 | 15 | 16 | ## Draft flowchart 17 | 18 | ![](media/GeneVar-flowchart.png) 19 | 20 | ## Results 21 | 22 | At the end of the biocodathon, we have extracted info and build an app for chr21. 23 | It and integrates dbVar, GENCODE, ClinVar, gnomAD-SV. 24 | More modules will be added in the future. 25 | 26 | ## Methods 27 | 28 | ![](media/GeneVar-methods-overview.png) 29 | 30 | Scripts to prepare the data available in the [scripts](scripts) folder. 31 | Code to run the Shiny app in the [shinyapp](shinyapp) folder. 32 | 33 | ## How it works 34 | 35 | GeneVar is a web page application. 36 | After entering the gene name (HGNC, Ensembl gene (ENSG), or transcript (ENST) identifier) in the search box on the homepage, you will be directed to the gene-specific page containing: 37 | 1. Gene-level summary with number of SVs, number of clinival SVs or SVs overlapping clinical SNVs. 38 | 1. Links to the gene's page on OMIM, GTEx, gnomAD. 39 | 2. A dynamic table with the annotated variants overlapping the gene. 40 | 3. A graph with the distribution of the allele frequency for variants matched with gnomAD-SV (50% reciprocal overlap). 41 | 42 | The profile of the SV to consider, such as type and size range, can be specified on the side bar. 43 | Each column in the dynamic table can be "searched" into or reorder dynamically. 44 | All data used by the app will be available for download in tab-delimited files. 45 | By default, allele frequency is reported based on gnomAD genomes and exomes. 46 | 47 | ## Software 48 | 49 | GeneVar is available on GitHub (https://github.com/collaborativebioinformatics/GeneVar). 50 | The repository provides detailed instructions for tool usage and installation. 51 | A bash script for an automated installation of the required dependencies is also provided as well as Docker. 52 | For now, the webpage runs on 1 core server with 1 Gb RAM and needs less than 1 Gb of storage. 53 | 54 | ![](media/genevar-app-final.png) 55 | 56 | ## Future features 57 | 58 | For now we link to these gene-centered resources. 59 | In the future we could directly include some of their data: 60 | - Averaged depth of coverage for sequencing experiments, e.g. from the gnomAD dataset 61 | - Gene expression profiles extracted from GTEx. 62 | 63 | The gene impact annotation could be improve, for example with amino acid change prediction, by integrating the following tools: 64 | - [SnpEff](https://pcingola.github.io/SnpEff/) 65 | - [AnnotSV](https://lbgi.fr/AnnotSV/) 66 | - [OpenCRAVAT](https://opencravat.org/) 67 | 68 | A different data exchange strategy will be necessary to scale up to the full genome and integrate more and more annotation. 69 | The TSV files quickly become extremely large. 70 | We are considering two options: 71 | 1. Tabix-indexed variants and on-the-fly comparison in the Shiny app. 72 | - Integrate all variant-level annotation into one BED-like file indexed for fast accession where each variant is present only once. For example, variant coordinates, allele frequencies, clinical flags. 73 | - The comparison with the gene annotation would be done on-the-fly in the app. 74 | 2. Switch to databases as suggested [here](https://shiny.rstudio.com/articles/overview.html). 75 | 76 | ### Link dbVar SVs to genes 77 | 78 | - Input: 79 | - All dbVar SVs (BED file including a *variant id* column) 80 | - Gene annotation: GENCODE or Ensembl 81 | - Output: 82 | - TSV with two columns: `variant_id`, `gene_id` 83 | - Variant ids and gene ids may repeat. 84 | 85 | Methods: 86 | Either extract this information from an annotation tool like SnpEff or AnnotSV, or use custom scripts using bedtools, R, python to perform the overlap. 87 | The latter should be much faster to get exactly the information we want. 88 | 89 | ### Annotate gene impact 90 | 91 | - Input: 92 | - All dbVar SVs or subset of SVs for one gene (using SV<->gene link computed above). 93 | - Gene annotation: GENCODE or Ensembl 94 | - Output: 95 | - TSV with at least three columns: `variant_id`, `elt_type` (e.g. *UTR*, *exon*), `elt_info` (e.g. exon number) 96 | 97 | Methods: 98 | Similar as above. 99 | It could maybe be done all in one module: overlap SVs and genes, extract variant-gene pairs and also variant-element pairs. 100 | 101 | ### Annotate allele frequency 102 | 103 | - Input: 104 | - All dbVar SVs or subset of SVs for one gene (using SV<->gene link computed above). 105 | - gnomAD-SV VCF or BED file with allele frequency information 106 | - Output: 107 | - TSV with at least two columns: `variant_id`, `af`. 108 | - Only for variants that were matched with the gnomAD-SV data. 109 | - Going further: extract frequency in super-populations in column: `af_AFR`, etc. 110 | - Going even further: match SVs from other studies with gnomAD-SV and annotate their frequency 111 | 112 | Methods: 113 | We might be able to match the dbVar and gnomAD-SV variants by variant ID. 114 | Otherwise very stringent overlapping of the two should be able to match the variants. 115 | We might need to use the hg19 version to match the original gnomAD-SV data to dbVar variants, and then make the connection to GRCh38 variants from dbVar (by variant ID?). 116 | 117 | ### Annotate overlap with clinically-relevant SVs 118 | 119 | - Input: 120 | - All dbVar SVs or subset of SVs for one gene (using SV<->gene link computed above). 121 | - ClinGen or pathogenic SVs from ClinVar, etc 122 | - Output: 123 | - TSV with `variant_id` and TRUE/FALSE columns about their overlap. E.g. `pathogenic_clinvar_sv`. 124 | 125 | Methods: 126 | Use either a simple overlap (any base overlapping) or reciprocal overlap (typically 50%). 127 | 128 | ### Annotate overlap with clinically-relevant SNVs/indels 129 | 130 | - Input: 131 | - All dbVar SVs or subset of SVs for one gene (using SV<->gene link computed above). 132 | - ClinGen or pathogenic SNV/indels from ClinVar, etc 133 | - Output: 134 | - TSV with `variant_id` and TRUE/FALSE columns about their overlap. E.g. `pathogenic_clinvar_sv`. 135 | 136 | Methods: 137 | Use either a simple overlap (any base overlapping) or reciprocal overlap (typically 50%). 138 | 139 | ### Gene-level summary 140 | 141 | This could be done at the level of the report (below), or pre-computed in its own module. 142 | 143 | - Input: 144 | - All dbVar SVs 145 | - TSV with variant-gene pairs. 146 | - TSV with variant allele frequencies 147 | - TSV with gene impact annotation 148 | - TSV from other modules 149 | - Output: 150 | - A TSV with `gene_id` and one column per summary statistic. E.g. `common_sv_nb`. 151 | 152 | ### External resources for genes 153 | 154 | We could pre-compute links to relevant resources for each genes. 155 | Some resources might require matching gene names. 156 | 157 | - Input: 158 | - Gene IDs as used in the variant-gene pairs 159 | - Info from other resources: OMIM, gnomAD, ... 160 | - Output: 161 | - A TSV with `gene_id` and one column with URLs to external resource. E.g. `omim_url`. 162 | 163 | ### Report/browser 164 | 165 | - Input: 166 | - All dbVar SVs 167 | - TSV with variant-gene pairs. 168 | - TSV with variant allele frequencies 169 | - TSV with gene impact annotation 170 | - TSV from other modules 171 | - Output: 172 | - ShinyApp to visualize data for one selected gene. 173 | 174 | See [shinyapp](shinyapp) folder for the code and commands. 175 | 176 | Methods: 177 | An application is implemented in R+Shiny where the user can select a gene and some filtering criteria for SVs (size and type). 178 | A page is loaded with a summary of the SVs overlapping the gene, a table listing all annotated SVs, and a graph showing the distribution of allele frequencies. 179 | 180 | ## Data 181 | 182 | - GRCh38 183 | - dbVar GRCh38 from: https://ftp.ncbi.nlm.nih.gov/pub/dbVar/data/Homo_sapiens/by_assembly/GRCh38/vcf/GRCh38.variant_call.all.vcf.gz 184 | - Clinical SVs: https://ftp.ncbi.nlm.nih.gov/pub/dbVar/data/Homo_sapiens/by_study/tsv/nstd102.GRCh38.variant_call.tsv.gz 185 | - ClinVar https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz 186 | - GENCODE v35: ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_35/gencode.v35.annotation.gff3.gz 187 | - GRCh37 188 | - dbVar GRCh37 from: https://ftp.ncbi.nlm.nih.gov/pub/dbVar/data/Homo_sapiens/by_assembly/GRCh37/vcf/GRCh37.variant_call.all.vcf.gz 189 | - gnomAD https://gnomad.broadinstitute.org 190 | - Gene 191 | - OMIM https://www.ncbi.nlm.nih.gov/omim 192 | 193 | 194 | -------------------------------------------------------------------------------- /media/GeneVar-flowchart.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/collaborativebioinformatics/GeneVar/a854af590a01fd814a33a125531d9b30e2b67a3a/media/GeneVar-flowchart.png -------------------------------------------------------------------------------- /media/GeneVar-methods-overview.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/collaborativebioinformatics/GeneVar/a854af590a01fd814a33a125531d9b30e2b67a3a/media/GeneVar-methods-overview.png -------------------------------------------------------------------------------- /media/GeneVar-oct13-methods.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/collaborativebioinformatics/GeneVar/a854af590a01fd814a33a125531d9b30e2b67a3a/media/GeneVar-oct13-methods.pdf -------------------------------------------------------------------------------- /media/genevar-app-final.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/collaborativebioinformatics/GeneVar/a854af590a01fd814a33a125531d9b30e2b67a3a/media/genevar-app-final.png -------------------------------------------------------------------------------- /media/genevar-app-prototype.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/collaborativebioinformatics/GeneVar/a854af590a01fd814a33a125531d9b30e2b67a3a/media/genevar-app-prototype.png -------------------------------------------------------------------------------- /media/genevar.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/collaborativebioinformatics/GeneVar/a854af590a01fd814a33a125531d9b30e2b67a3a/media/genevar.png -------------------------------------------------------------------------------- /media/genevar_logo_multicolor.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/collaborativebioinformatics/GeneVar/a854af590a01fd814a33a125531d9b30e2b67a3a/media/genevar_logo_multicolor.jpg -------------------------------------------------------------------------------- /media/genevar_logo_multicolor_v2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/collaborativebioinformatics/GeneVar/a854af590a01fd814a33a125531d9b30e2b67a3a/media/genevar_logo_multicolor_v2.jpg -------------------------------------------------------------------------------- /media/logo_size.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/collaborativebioinformatics/GeneVar/a854af590a01fd814a33a125531d9b30e2b67a3a/media/logo_size.jpg -------------------------------------------------------------------------------- /scripts/AF_extract.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import os 3 | import gzip 4 | import argparse 5 | import pandas as pd 6 | import re 7 | 8 | infile = "" 9 | infile_ = "" 10 | outpath = "" 11 | parser = argparse.ArgumentParser() 12 | parser.add_argument('-vcf','--vcf', help='Path to the gnomAD_SV vcf file', required=True) 13 | parser.add_argument('-out','--out', help='Path to the output tsv file', required=True) 14 | args = vars(parser.parse_args()) 15 | if "vcf.gz.vcf" in args['vcf']: 16 | infile = args['vcf'] 17 | a = os.path.splitext(infile) 18 | infile_ = a[0] 19 | os.system("cp "+infile+" "+infile_) 20 | if (args['out']).endswith(".tsv"): 21 | outpath = args['out'] 22 | else: 23 | outpath = args['out']+".tsv" 24 | print(infile_) 25 | counter = 0 26 | flag = False 27 | chroms = [] 28 | pos = [] 29 | variant_ids = [] 30 | AFs = [] 31 | types = [] 32 | ends = [] 33 | with gzip.open(infile_, 'rb') as f: 34 | for line in f: 35 | # print(line.decode().strip()) 36 | l = line.decode().strip() 37 | if "#CHROM" in line.decode().strip(): 38 | flag = True 39 | continue 40 | if flag: 41 | a = re.split(';|\t', l) 42 | # aa = a[0].strip().split('\t') 43 | chroms.append("chr"+str(a[0])) 44 | pos.append(str(a[1])) 45 | variant_ids.append(str(a[2])) 46 | types.append(str(a[4])) 47 | for i in a: 48 | if "END=" in i: 49 | ends.append(str(i.split('=')[-1])) 50 | # ends.append(str(a[7].split('=')[-1])) 51 | for i in a: 52 | if "AF=" in i and not "_AF=" in i: 53 | AFs.append(str(i.split("=")[-1])) 54 | dict_ = {'chr':chroms, "start":pos, "variant_id":variant_ids, "AF":AFs, "end":ends, "type":types} 55 | df = pd.DataFrame(dict_) 56 | df = df.reindex(columns=['chr', 'start', 'end','type', 'variant_id', 'AF']) 57 | df.to_csv(outpath, index=False, sep="\t", header=False) 58 | os.system("sort -V -k1,1 -k2,2 "+outpath+" > "+os.path.splitext(outpath)[0]+".sorted.tsv") 59 | 60 | 61 | -------------------------------------------------------------------------------- /scripts/README.md: -------------------------------------------------------------------------------- 1 | ## Subset dbVar variant to one chromosome 2 | 3 | We used a subset of the data to develop the app: variants and genes in chr21. 4 | The different dbVar versions (GRCh37 and GRCh38) and genes were subset as shown in the [extract-chr21-genes-variants.ipynb](extract-chr21-genes-variants.ipynb) notebook. 5 | 6 | ## Allele frequency 7 | 8 | ### Extract allele frequency from gnomAD-SV 9 | 10 | The [AF_extract.py](AF_extract.py) script reads a gnomAd_SV vcf file and extracts the allele frequencies of all variants and outputs them into a CSV file 11 | 12 | Usage: 13 | 14 | `python AF_extract.py -vcf -out ` 15 | 16 | ### Match allele frequencies with dbVar SVs 17 | 18 | The [gnomad-variants-af.ipynb](gnomad-variants-af.ipynb) notebook shows how we matched variants in dbVar (GRCh37) and the gnomAD-SV. 19 | Briefly: 50% reciprocal overlap per SV type. 20 | 21 | R/Bioconductor packages used include: 22 | - GenomicRanges 23 | - dplyr 24 | 25 | ## Variants overlapping genes and gene impact 26 | 27 | ### Script to extract information for one gene 28 | 29 | We used [annotGeneSV.R](annotGeneSV.R) (at first [linkGeneWithSV.R](linkGeneWithSV.R)) to read variants from dbVar and Gencode annotation and extract relevant information. 30 | 31 | ### Overlapping all genes with all variants in chr21 32 | 33 | The [variant-gene-overlap.ipynb](variant-gene-overlap.ipynb) notebook shows how we overlapped variants in chr21 with Gencode to extract gene impact information. 34 | 35 | R/Bioconductor packages used include: 36 | - rtracklayer 37 | - GenomicRanges 38 | - dplyr 39 | 40 | ## Overlap dbVar SVs with SNVs of known clinical significance 41 | 42 | We overlapped SVs with ClinVar using bedtools. 43 | The overlaps were then summarize in a TSV file using code in the [make-misc-tsvs.ipynb](make-misc-tsvs.ipynb) notebook. 44 | 45 | ## Annotate dbVar SVs with clinical significance 46 | 47 | We used study nstd102 that contain clinical SVs. 48 | The code to make the TSV for this annotation is part of the [make-misc-tsvs.ipynb](make-misc-tsvs.ipynb) notebook. 49 | -------------------------------------------------------------------------------- /scripts/annotGeneSV.R: -------------------------------------------------------------------------------- 1 | options(stringsAsFactors = F) 2 | library(stringr) 3 | args = commandArgs(TRUE) 4 | geneName = args[1] 5 | geneChr = args[2] 6 | geneStart = as.numeric(args[3]) 7 | geneEnd = as.numeric(args[4]) 8 | SVFile = args[5] 9 | annotationFile = args[6] 10 | 11 | getSVAndAnnotation <- function(geneName,geneChr,geneStart,geneEnd,SVFile,annotationFile) { 12 | cat(geneName,'\n') 13 | geneChr = paste('chr',geneChr,sep='') 14 | SVFileName = paste(SVFile,geneChr,'POS','tsv','gz',sep = '.') 15 | gzFile = gzfile(SVFileName,'rt') 16 | SVTable = read.table(file = gzFile, header = F, sep = '\t', quote = "") 17 | close(gzFile) 18 | SVTable[,2] = as.numeric(SVTable[,2]) 19 | SVTable[,3] = as.numeric(SVTable[,3]) 20 | SVTable = SVTable[!is.na(SVTable[,2]+SVTable[,3]),] 21 | minPos = sapply(1:nrow(SVTable), function(x) min(SVTable[x,2:3])) 22 | maxPos = sapply(1:nrow(SVTable), function(x) max(SVTable[x,2:3])) 23 | SVTable[,1] = paste('chr',SVTable[,1],sep='') 24 | SVTable[,2] = minPos; SVTable[,3] = maxPos; 25 | gzFile = gzfile(annotationFile,'rt') 26 | annotationTable = read.table(file = gzFile, header = F, sep = '\t', quote = "") 27 | close(gzFile) 28 | 29 | SVChrTable = SVTable[SVTable[,1]==geneChr,] 30 | SVgeneTable = SVChrTable[(((SVChrTable[,2]>=geneEnd)|(SVChrTable[,3]<=geneStart))==0),] 31 | 32 | geneAnnoTab = annotationTable[which(str_detect(annotationTable[,9], geneName)==T),] 33 | 34 | eltType = rep('',nrow(SVgeneTable)) 35 | for (i in 1:length(eltType)) { 36 | startTmp = SVgeneTable[i,2];endTmp = SVgeneTable[i,3] 37 | geneAnnoTabForSV = geneAnnoTab[(((geneAnnoTab[,4]>=endTmp)|(geneAnnoTab[,5]<=startTmp))==0),] 38 | if (nrow(geneAnnoTabForSV)>0) {eltType[i] = paste(unique(geneAnnoTabForSV[,3]),collapse = ';')} 39 | } 40 | 41 | if (nrow(SVgeneTable)!=0) { 42 | outputTSV = data.frame(variant_id = SVgeneTable[,5], gene_id = rep(geneName,nrow(SVgeneTable)), elt_type = eltType) 43 | write.table(outputTSV, file=paste(SVFile,geneName,'tsv',sep = '.'), quote=FALSE, sep='\t', col.names = T, row.names = F) 44 | } 45 | } 46 | 47 | getSVAndAnnotation(geneName,geneChr,geneStart,geneEnd,SVFile,annotationFile) 48 | -------------------------------------------------------------------------------- /scripts/extract-chr21-genes-variants.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 2, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import gzip" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "# Extract variants in chr21 into TSV\n", 17 | "Read VCF and extract coordinates, SV type and variant ID\n", 18 | "\n", 19 | "## GCRh38" 20 | ] 21 | }, 22 | { 23 | "cell_type": "code", 24 | "execution_count": 3, 25 | "metadata": {}, 26 | "outputs": [], 27 | "source": [ 28 | "%%bash\n", 29 | "dx download dbVar_byAssembly_hg38/GRCh38.variant_call.all.vcf.gz*" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": 15, 35 | "metadata": {}, 36 | "outputs": [], 37 | "source": [ 38 | "outf = open('all_variants_chr21.tsv', 'w')\n", 39 | "# header\n", 40 | "outf.write('chr\\tstart\\tend\\ttype\\tvariant_id\\n')\n", 41 | "# read VCF with all variants\n", 42 | "svtypes = {}\n", 43 | "for line in gzip.open('GRCh38.variant_call.all.vcf.gz', 'rb'):\n", 44 | " line = line.decode('ascii')\n", 45 | " # skip headers\n", 46 | " if line[0] == '#':\n", 47 | " continue\n", 48 | " line = line.rstrip().split('\\t')\n", 49 | " # skip if not chr 21\n", 50 | " if line[0] != '21':\n", 51 | " continue\n", 52 | " # parse INFO field\n", 53 | " infos = {}\n", 54 | " for info in line[7].split(';'):\n", 55 | " info_pair = info.split('=')\n", 56 | " if len(info_pair) == 1:\n", 57 | " infos[info] = True\n", 58 | " else:\n", 59 | " infos[info_pair[0]] = info_pair[1]\n", 60 | " # prepare info for TSV\n", 61 | " out_line = [line[0]]\n", 62 | " # start pos\n", 63 | " pos_s = line[1]\n", 64 | " out_line.append(pos_s)\n", 65 | " # end pos\n", 66 | " pos_e = pos_s\n", 67 | " if 'END' in infos:\n", 68 | " pos_e = infos['END']\n", 69 | " out_line.append(pos_e)\n", 70 | " # SV type\n", 71 | " svtype = infos['SVTYPE']\n", 72 | " out_line.append(svtype)\n", 73 | " svtypes[svtype] = True\n", 74 | " # skip if BND type\n", 75 | " if svtype == 'BND':\n", 76 | " continue\n", 77 | " if int(pos_e) < int(pos_s):\n", 78 | " print(svtype)\n", 79 | " # variant id\n", 80 | " out_line.append(line[2])\n", 81 | " # write line in tsv\n", 82 | " outf.write('\\t'.join(out_line) + '\\n')\n", 83 | "outf.close()" 84 | ] 85 | }, 86 | { 87 | "cell_type": "code", 88 | "execution_count": 16, 89 | "metadata": {}, 90 | "outputs": [ 91 | { 92 | "name": "stdout", 93 | "output_type": "stream", 94 | "text": [ 95 | "dict_keys(['INS', 'DEL', 'DUP', 'CNV', 'INV', 'BND'])\n" 96 | ] 97 | } 98 | ], 99 | "source": [ 100 | "print(svtypes.keys())" 101 | ] 102 | }, 103 | { 104 | "cell_type": "code", 105 | "execution_count": 17, 106 | "metadata": {}, 107 | "outputs": [ 108 | { 109 | "name": "stdout", 110 | "output_type": "stream", 111 | "text": [ 112 | "ID file-Fy33VG00Z5b26bpZJ3Zk8f8k\n", 113 | "Class file\n", 114 | "Project project-Fy1b7V80Z5b4jXb224P1fY4b\n", 115 | "Folder /chr-subset-genes-variants\n", 116 | "Name all_variants_chr21.tsv\n", 117 | "State closing\n", 118 | "Visibility visible\n", 119 | "Types -\n", 120 | "Properties -\n", 121 | "Tags -\n", 122 | "Outgoing links -\n", 123 | "Created Tue Oct 13 23:30:56 2020\n", 124 | "Created by jmonlong\n", 125 | " via the job job-Fy332xj0Z5bKZz0F8v9gKVfJ\n", 126 | "Last modified Tue Oct 13 23:30:57 2020\n", 127 | "Media type \n", 128 | "archivalState \"live\"\n", 129 | "cloudAccount \"cloudaccount-dnanexus\"\n" 130 | ] 131 | } 132 | ], 133 | "source": [ 134 | "%%bash\n", 135 | "dx upload all_variants_chr21.tsv --path chr-subset-genes-variants/" 136 | ] 137 | }, 138 | { 139 | "cell_type": "markdown", 140 | "metadata": {}, 141 | "source": [ 142 | "## GRCh37\n", 143 | "\n", 144 | "To match gnomAD-SV frequencies" 145 | ] 146 | }, 147 | { 148 | "cell_type": "code", 149 | "execution_count": 10, 150 | "metadata": {}, 151 | "outputs": [ 152 | { 153 | "name": "stderr", 154 | "output_type": "stream", 155 | "text": [ 156 | "Error: path \"/opt/notebooks/GRCh37.variant_call.all.vcf.gz\" already exists but\n", 157 | "-f/--overwrite was not set\n" 158 | ] 159 | }, 160 | { 161 | "ename": "CalledProcessError", 162 | "evalue": "Command 'b'dx download dbVar_byAssembly_GRCh37/GRCh37.variant_call.all.vcf.gz\\n'' returned non-zero exit status 1.", 163 | "output_type": "error", 164 | "traceback": [ 165 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 166 | "\u001b[0;31mCalledProcessError\u001b[0m Traceback (most recent call last)", 167 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mget_ipython\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrun_cell_magic\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'bash'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m''\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'dx download dbVar_byAssembly_GRCh37/GRCh37.variant_call.all.vcf.gz\\n'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 168 | "\u001b[0;32m/opt/conda/lib/python3.6/site-packages/IPython/core/interactiveshell.py\u001b[0m in \u001b[0;36mrun_cell_magic\u001b[0;34m(self, magic_name, line, cell)\u001b[0m\n\u001b[1;32m 2369\u001b[0m \u001b[0;32mwith\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mbuiltin_trap\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2370\u001b[0m \u001b[0margs\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mmagic_arg_s\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcell\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2371\u001b[0;31m \u001b[0mresult\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mfn\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2372\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mresult\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2373\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", 169 | "\u001b[0;32m/opt/conda/lib/python3.6/site-packages/IPython/core/magics/script.py\u001b[0m in \u001b[0;36mnamed_script_magic\u001b[0;34m(line, cell)\u001b[0m\n\u001b[1;32m 140\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 141\u001b[0m \u001b[0mline\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mscript\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 142\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mshebang\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mline\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcell\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 143\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 144\u001b[0m \u001b[0;31m# write a basic docstring:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 170 | "\u001b[0;32m\u001b[0m in \u001b[0;36mshebang\u001b[0;34m(self, line, cell)\u001b[0m\n", 171 | "\u001b[0;32m/opt/conda/lib/python3.6/site-packages/IPython/core/magic.py\u001b[0m in \u001b[0;36m\u001b[0;34m(f, *a, **k)\u001b[0m\n\u001b[1;32m 185\u001b[0m \u001b[0;31m# but it's overkill for just that one bit of state.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 186\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mmagic_deco\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0marg\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 187\u001b[0;31m \u001b[0mcall\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mlambda\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0ma\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mk\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0ma\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mk\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 188\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 189\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mcallable\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0marg\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 172 | "\u001b[0;32m/opt/conda/lib/python3.6/site-packages/IPython/core/magics/script.py\u001b[0m in \u001b[0;36mshebang\u001b[0;34m(self, line, cell)\u001b[0m\n\u001b[1;32m 243\u001b[0m \u001b[0msys\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstderr\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mflush\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 244\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0margs\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mraise_error\u001b[0m \u001b[0;32mand\u001b[0m \u001b[0mp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mreturncode\u001b[0m\u001b[0;34m!=\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 245\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mCalledProcessError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mreturncode\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcell\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0moutput\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mout\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mstderr\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0merr\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 246\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 247\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_run_script\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mp\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcell\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mto_close\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 173 | "\u001b[0;31mCalledProcessError\u001b[0m: Command 'b'dx download dbVar_byAssembly_GRCh37/GRCh37.variant_call.all.vcf.gz\\n'' returned non-zero exit status 1." 174 | ] 175 | } 176 | ], 177 | "source": [ 178 | "%%bash\n", 179 | "dx download dbVar_byAssembly_GRCh37/GRCh37.variant_call.all.vcf.gz" 180 | ] 181 | }, 182 | { 183 | "cell_type": "code", 184 | "execution_count": 11, 185 | "metadata": {}, 186 | "outputs": [], 187 | "source": [ 188 | "outf = open('all_variants_chr21_grch37.tsv', 'w')\n", 189 | "# header\n", 190 | "outf.write('chr\\tstart\\tend\\ttype\\tvariant_id\\n')\n", 191 | "# read VCF with all variants\n", 192 | "svtypes = {}\n", 193 | "for line in gzip.open('GRCh37.variant_call.all.vcf.gz', 'rb'):\n", 194 | " line = line.decode('ascii')\n", 195 | " # skip headers\n", 196 | " if line[0] == '#':\n", 197 | " continue\n", 198 | " line = line.rstrip().split('\\t')\n", 199 | " # skip if not chr 21\n", 200 | " if line[0] != '21':\n", 201 | " continue\n", 202 | " # parse INFO field\n", 203 | " infos = {}\n", 204 | " for info in line[7].split(';'):\n", 205 | " info_pair = info.split('=')\n", 206 | " if len(info_pair) == 1:\n", 207 | " infos[info] = True\n", 208 | " else:\n", 209 | " infos[info_pair[0]] = info_pair[1]\n", 210 | " # prepare info for TSV\n", 211 | " out_line = [line[0]]\n", 212 | " # start pos\n", 213 | " pos_s = line[1]\n", 214 | " out_line.append(pos_s)\n", 215 | " # end pos\n", 216 | " pos_e = pos_s\n", 217 | " if 'END' in infos:\n", 218 | " pos_e = infos['END']\n", 219 | " out_line.append(pos_e)\n", 220 | " # SV type\n", 221 | " svtype = infos['SVTYPE']\n", 222 | " out_line.append(svtype)\n", 223 | " svtypes[svtype] = True\n", 224 | " # skip if BND type\n", 225 | " if svtype == 'BND':\n", 226 | " continue\n", 227 | " if int(pos_e) < int(pos_s):\n", 228 | " print(svtype)\n", 229 | " # variant id\n", 230 | " out_line.append(line[2])\n", 231 | " # write line in tsv\n", 232 | " outf.write('\\t'.join(out_line) + '\\n')\n", 233 | "outf.close()" 234 | ] 235 | }, 236 | { 237 | "cell_type": "code", 238 | "execution_count": 12, 239 | "metadata": {}, 240 | "outputs": [ 241 | { 242 | "name": "stdout", 243 | "output_type": "stream", 244 | "text": [ 245 | "dict_keys(['DEL', 'DUP', 'CNV', 'INS', 'BND', 'INV'])\n" 246 | ] 247 | } 248 | ], 249 | "source": [ 250 | "print(svtypes.keys())" 251 | ] 252 | }, 253 | { 254 | "cell_type": "code", 255 | "execution_count": 13, 256 | "metadata": {}, 257 | "outputs": [ 258 | { 259 | "name": "stdout", 260 | "output_type": "stream", 261 | "text": [ 262 | "ID file-Fy33J480Z5b09bvV4jpxj09z\n", 263 | "Class file\n", 264 | "Project project-Fy1b7V80Z5b4jXb224P1fY4b\n", 265 | "Folder /chr-subset-genes-variants\n", 266 | "Name all_variants_chr21_grch37.tsv\n", 267 | "State closing\n", 268 | "Visibility visible\n", 269 | "Types -\n", 270 | "Properties -\n", 271 | "Tags -\n", 272 | "Outgoing links -\n", 273 | "Created Tue Oct 13 23:21:53 2020\n", 274 | "Created by jmonlong\n", 275 | " via the job job-Fy332xj0Z5bKZz0F8v9gKVfJ\n", 276 | "Last modified Tue Oct 13 23:21:53 2020\n", 277 | "Media type \n", 278 | "archivalState \"live\"\n", 279 | "cloudAccount \"cloudaccount-dnanexus\"\n" 280 | ] 281 | } 282 | ], 283 | "source": [ 284 | "%%bash\n", 285 | "dx upload all_variants_chr21_grch37.tsv --path chr-subset-genes-variants/" 286 | ] 287 | }, 288 | { 289 | "cell_type": "markdown", 290 | "metadata": {}, 291 | "source": [ 292 | "## Extract genes in chr 21\n", 293 | "Gene names as gene IDs" 294 | ] 295 | }, 296 | { 297 | "cell_type": "code", 298 | "execution_count": null, 299 | "metadata": {}, 300 | "outputs": [], 301 | "source": [ 302 | "%%bash\n", 303 | "dx download Annotation/gene/" 304 | ] 305 | }, 306 | { 307 | "cell_type": "code", 308 | "execution_count": 21, 309 | "metadata": {}, 310 | "outputs": [], 311 | "source": [ 312 | "# read VCF with all variants\n", 313 | "genes = {}\n", 314 | "for line in gzip.open('gencode.v19.annotation.gff3.gz', 'rb'):\n", 315 | " line = line.decode('ascii')\n", 316 | " # skip headers\n", 317 | " if line[0] == '#':\n", 318 | " continue\n", 319 | " line = line.rstrip().split('\\t')\n", 320 | " # skip if not chr 21 or the \"gene\" info\n", 321 | " if line[0] != 'chr21' or line[2] != 'gene':\n", 322 | " continue\n", 323 | " # parse INFO field\n", 324 | " infos = {}\n", 325 | " for info in line[8].split(';'):\n", 326 | " info_pair = info.split('=')\n", 327 | " if len(info_pair) == 1:\n", 328 | " infos[info] = True\n", 329 | " else:\n", 330 | " infos[info_pair[0]] = info_pair[1]\n", 331 | " if infos['gene_type'] == 'protein_coding':\n", 332 | " genes[infos['gene_name']] = True" 333 | ] 334 | }, 335 | { 336 | "cell_type": "code", 337 | "execution_count": 22, 338 | "metadata": {}, 339 | "outputs": [ 340 | { 341 | "data": { 342 | "text/plain": [ 343 | "241" 344 | ] 345 | }, 346 | "execution_count": 22, 347 | "metadata": {}, 348 | "output_type": "execute_result" 349 | } 350 | ], 351 | "source": [ 352 | "len(genes)" 353 | ] 354 | }, 355 | { 356 | "cell_type": "code", 357 | "execution_count": 23, 358 | "metadata": {}, 359 | "outputs": [], 360 | "source": [ 361 | "outf = open('genes_chr21.tsv', 'w')\n", 362 | "# header\n", 363 | "outf.write('gene_id\\n')\n", 364 | "for gene in genes.keys():\n", 365 | " outf.write(gene + '\\n')\n", 366 | "outf.close()" 367 | ] 368 | }, 369 | { 370 | "cell_type": "code", 371 | "execution_count": 24, 372 | "metadata": {}, 373 | "outputs": [ 374 | { 375 | "name": "stdout", 376 | "output_type": "stream", 377 | "text": [ 378 | "ID file-Fy31xB00Z5bFYXyG5Bf33yzf\n", 379 | "Class file\n", 380 | "Project project-Fy1b7V80Z5b4jXb224P1fY4b\n", 381 | "Folder /chr-subset-genes-variants\n", 382 | "Name genes_chr21.tsv\n", 383 | "State closing\n", 384 | "Visibility visible\n", 385 | "Types -\n", 386 | "Properties -\n", 387 | "Tags -\n", 388 | "Outgoing links -\n", 389 | "Created Tue Oct 13 21:39:52 2020\n", 390 | "Created by jmonlong\n", 391 | " via the job job-Fy30f5Q0Z5b2Pk7kByqz8fk6\n", 392 | "Last modified Tue Oct 13 21:39:52 2020\n", 393 | "Media type \n", 394 | "archivalState \"live\"\n", 395 | "cloudAccount \"cloudaccount-dnanexus\"\n" 396 | ] 397 | } 398 | ], 399 | "source": [ 400 | "%%bash\n", 401 | "dx upload genes_chr21.tsv --path chr-subset-genes-variants/" 402 | ] 403 | }, 404 | { 405 | "cell_type": "markdown", 406 | "metadata": {}, 407 | "source": [ 408 | "## Save the notebook" 409 | ] 410 | }, 411 | { 412 | "cell_type": "code", 413 | "execution_count": 26, 414 | "metadata": {}, 415 | "outputs": [ 416 | { 417 | "name": "stdout", 418 | "output_type": "stream", 419 | "text": [ 420 | "ID file-Fy31xV80Z5bFYXyG5Bf33z02\n", 421 | "Class file\n", 422 | "Project project-Fy1b7V80Z5b4jXb224P1fY4b\n", 423 | "Folder /chr-subset-genes-variants\n", 424 | "Name extract-chr21-genes-variants.ipynb\n", 425 | "State closing\n", 426 | "Visibility visible\n", 427 | "Types -\n", 428 | "Properties -\n", 429 | "Tags -\n", 430 | "Outgoing links -\n", 431 | "Created Tue Oct 13 21:40:21 2020\n", 432 | "Created by jmonlong\n", 433 | " via the job job-Fy30f5Q0Z5b2Pk7kByqz8fk6\n", 434 | "Last modified Tue Oct 13 21:40:22 2020\n", 435 | "Media type \n", 436 | "archivalState \"live\"\n", 437 | "cloudAccount \"cloudaccount-dnanexus\"\n" 438 | ] 439 | } 440 | ], 441 | "source": [ 442 | "%%bash\n", 443 | "dx upload extract-chr21-genes-variants.ipynb --path chr-subset-genes-variants/" 444 | ] 445 | }, 446 | { 447 | "cell_type": "code", 448 | "execution_count": null, 449 | "metadata": {}, 450 | "outputs": [], 451 | "source": [] 452 | } 453 | ], 454 | "metadata": { 455 | "kernelspec": { 456 | "display_name": "Python 3", 457 | "language": "python", 458 | "name": "python3" 459 | }, 460 | "language_info": { 461 | "codemirror_mode": { 462 | "name": "ipython", 463 | "version": 3 464 | }, 465 | "file_extension": ".py", 466 | "mimetype": "text/x-python", 467 | "name": "python", 468 | "nbconvert_exporter": "python", 469 | "pygments_lexer": "ipython3", 470 | "version": "3.6.5" 471 | } 472 | }, 473 | "nbformat": 4, 474 | "nbformat_minor": 4 475 | } 476 | -------------------------------------------------------------------------------- /scripts/intersection_gencode_with_variants_commands.txt: -------------------------------------------------------------------------------- 1 | gunzip gencode.v35.annotation.gff3.gz 2 | perl parse_gencode.pl # will produce a number of *annotation.txt files 3 | perl reformat_tsv_variant_file.pl # if you haven't already run this 4 | ls *annotation.txt | while read FILE; do /hgsc_software/BEDTools/latest/bin/intersectBed -wao -a all_variants_chr21.bed -b $FILE | grep -v "\s\-1\s" > ${FILE}_info.txt; done 5 | ls *annotation.txt_info.txt | while read FILE; do cut -f4,7,13 $FILE | sed 's/.*dbVarID=//' | sed 's/\;transcript.*$//' | sed 's/ID.*gene_id=//' | sort -u >> dbvar_gencode.tsv;done 6 | -------------------------------------------------------------------------------- /scripts/linkGeneWithSV.R: -------------------------------------------------------------------------------- 1 | options(stringsAsFactors = F) 2 | args = commandArgs(TRUE) 3 | geneName = args[1] 4 | geneChr = args[2] 5 | geneStart = as.numeric(args[3]) 6 | geneEnd = as.numeric(args[4]) 7 | fileName = args[5] 8 | 9 | getSVs <- function(geneName,geneChr,geneStart,geneEnd,fileName) { 10 | gzFile = gzfile(fileName,'rt') 11 | SVTable = read.table(file = gzFile, header = F, sep = '\t', quote = "") 12 | 13 | SVChrTable = SVTable[SVTable[,1]==geneChr,] 14 | SVgeneTable = SVChrTable[(((SVChrTable[,2]>geneEnd)|(SVChrTable[,3]\n", 51 | "A data.frame: 6 × 5\n", 52 | "\n", 53 | "\tchrstartendtypevariant_id\n", 54 | "\t<int><int><int><chr><chr>\n", 55 | "\n", 56 | "\n", 57 | "\t12150515935051594INSnssv14017801\n", 58 | "\t22150640525066138DELnssv14300595\n", 59 | "\t32150640525066138DELnssv14301211\n", 60 | "\t42150640525066138DELnssv14301212\n", 61 | "\t52150640525066138DELnssv14301213\n", 62 | "\t62150640525066138DELnssv14432733\n", 63 | "\n", 64 | "\n" 65 | ], 66 | "text/latex": [ 67 | "A data.frame: 6 × 5\n", 68 | "\\begin{tabular}{r|lllll}\n", 69 | " & chr & start & end & type & variant\\_id\\\\\n", 70 | " & & & & & \\\\\n", 71 | "\\hline\n", 72 | "\t1 & 21 & 5051593 & 5051594 & INS & nssv14017801\\\\\n", 73 | "\t2 & 21 & 5064052 & 5066138 & DEL & nssv14300595\\\\\n", 74 | "\t3 & 21 & 5064052 & 5066138 & DEL & nssv14301211\\\\\n", 75 | "\t4 & 21 & 5064052 & 5066138 & DEL & nssv14301212\\\\\n", 76 | "\t5 & 21 & 5064052 & 5066138 & DEL & nssv14301213\\\\\n", 77 | "\t6 & 21 & 5064052 & 5066138 & DEL & nssv14432733\\\\\n", 78 | "\\end{tabular}\n" 79 | ], 80 | "text/markdown": [ 81 | "\n", 82 | "A data.frame: 6 × 5\n", 83 | "\n", 84 | "| | chr <int> | start <int> | end <int> | type <chr> | variant_id <chr> |\n", 85 | "|---|---|---|---|---|---|\n", 86 | "| 1 | 21 | 5051593 | 5051594 | INS | nssv14017801 |\n", 87 | "| 2 | 21 | 5064052 | 5066138 | DEL | nssv14300595 |\n", 88 | "| 3 | 21 | 5064052 | 5066138 | DEL | nssv14301211 |\n", 89 | "| 4 | 21 | 5064052 | 5066138 | DEL | nssv14301212 |\n", 90 | "| 5 | 21 | 5064052 | 5066138 | DEL | nssv14301213 |\n", 91 | "| 6 | 21 | 5064052 | 5066138 | DEL | nssv14432733 |\n", 92 | "\n" 93 | ], 94 | "text/plain": [ 95 | " chr start end type variant_id \n", 96 | "1 21 5051593 5051594 INS nssv14017801\n", 97 | "2 21 5064052 5066138 DEL nssv14300595\n", 98 | "3 21 5064052 5066138 DEL nssv14301211\n", 99 | "4 21 5064052 5066138 DEL nssv14301212\n", 100 | "5 21 5064052 5066138 DEL nssv14301213\n", 101 | "6 21 5064052 5066138 DEL nssv14432733" 102 | ] 103 | }, 104 | "metadata": {}, 105 | "output_type": "display_data" 106 | } 107 | ], 108 | "source": [ 109 | "vars.df = read.table('all_variants_chr21.tsv', as.is=TRUE, header=TRUE)\n", 110 | "head(vars.df)" 111 | ] 112 | }, 113 | { 114 | "cell_type": "markdown", 115 | "metadata": {}, 116 | "source": [ 117 | "# nstd102 inc. ClinGen CNVs\n", 118 | "```\n", 119 | "dx download Annotation/nstd102\\ \\(ClinVarSV\\ -\\ includes\\ ClinGen\\ CNVs\\)/nstd102.GRCh38.variant_call.tsv.gz\n", 120 | "```" 121 | ] 122 | }, 123 | { 124 | "cell_type": "code", 125 | "execution_count": 3, 126 | "metadata": {}, 127 | "outputs": [ 128 | { 129 | "name": "stderr", 130 | "output_type": "stream", 131 | "text": [ 132 | "Warning message in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :\n", 133 | "“EOF within quoted string”\n", 134 | "Warning message in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :\n", 135 | "“number of items read is not a multiple of the number of columns”\n" 136 | ] 137 | }, 138 | { 139 | "data": { 140 | "text/html": [ 141 | "\n", 142 | "\n", 143 | "\n", 144 | "\t\n", 145 | "\t\n", 146 | "\n", 147 | "\n", 148 | "\t\n", 149 | "\t\n", 150 | "\t\n", 151 | "\t\n", 152 | "\t\n", 153 | "\t\n", 154 | "\n", 155 | "
A data.frame: 6 × 38
X.variant_call_accessionvariant_call_idvariant_call_typeexperiment_idsample_idsampleset_idassemblychrcontigouter_startremap_alignmentremap_best_within_clusterremap_coverageremap_diff_chrremap_failure_codeexternal_linksevidencesequenceclinical_significanceclinical_source
<chr><chr><chr><int><lgl><int><chr><chr><chr><int><chr><int><dbl><int><chr><chr><lgl><chr><chr><chr>
1nssv8639265RCV000076578_96551 deletion 1NA1GRCh38 2NA NANANAClinGen:CA331581,ClinVar:RCV000076578.2,PubMed:11598466,PubMed:15604628,PubMed:20301390,PubMed:23408351,PubMed:23535968,PubMed:23788249,PubMed:24310308,PubMed:24493721,PubMed:25003300,PubMed:25070057,PubMed:25356965,PubMed:25452455,PubMed:25645574,PubMed:25711197,PubMed:27854360NAPathogenic ClinVar
2nssv8639267RCV000203260_215728duplication1NA1GRCh38.p128NAFirst Pass 0 1NAClinVar:RCV000203260.1,PubMed:20301641 NAPathogenic ClinVar
3nssv8639268RCV000203462_216747deletion 1NA1GRCh38.p122NAFirst Pass 0 1NAClinVar:RCV000203462.1,PubMed:20301339 NAPathogenic ClinVar
4nssv8639269RCV000201517_214150deletion 1NA1GRCh38 2NA NANANAClinVar:RCV000201517.1 NAUncertain significanceClinVar
5nssv8639270RCV000203882_222922deletion 1NA1GRCh38 7NA NANANAClinVar:RCV000203882.1 NALikely pathogenic ClinVar
6nssv8639272RCV000144263_166057deletion 1NA1GRCh38 3NA NANANAClinGen:CA277981,ClinVar:RCV000144263.1,PubMed:20301627,dbSNP:rs1553721650 NAPathogenic ClinVar
\n" 156 | ], 157 | "text/latex": [ 158 | "A data.frame: 6 × 38\n", 159 | "\\begin{tabular}{r|lllllllllllllllllllll}\n", 160 | " & X.variant\\_call\\_accession & variant\\_call\\_id & variant\\_call\\_type & experiment\\_id & sample\\_id & sampleset\\_id & assembly & chr & contig & outer\\_start & ⋯ & remap\\_alignment & remap\\_best\\_within\\_cluster & remap\\_coverage & remap\\_diff\\_chr & remap\\_failure\\_code & external\\_links & evidence & sequence & clinical\\_significance & clinical\\_source\\\\\n", 161 | " & & & & & & & & & & & ⋯ & & & & & & & & & & \\\\\n", 162 | "\\hline\n", 163 | "\t1 & nssv8639265 & RCV000076578\\_96551 & deletion & 1 & NA & 1 & GRCh38 & 2 & & NA & ⋯ & & NA & NA & NA & & ClinGen:CA331581,ClinVar:RCV000076578.2,PubMed:11598466,PubMed:15604628,PubMed:20301390,PubMed:23408351,PubMed:23535968,PubMed:23788249,PubMed:24310308,PubMed:24493721,PubMed:25003300,PubMed:25070057,PubMed:25356965,PubMed:25452455,PubMed:25645574,PubMed:25711197,PubMed:27854360 & NA & & Pathogenic & ClinVar\\\\\n", 164 | "\t2 & nssv8639267 & RCV000203260\\_215728 & duplication & 1 & NA & 1 & GRCh38.p12 & 8 & & NA & ⋯ & First Pass & 0 & 1 & NA & & ClinVar:RCV000203260.1,PubMed:20301641 & NA & & Pathogenic & ClinVar\\\\\n", 165 | "\t3 & nssv8639268 & RCV000203462\\_216747 & deletion & 1 & NA & 1 & GRCh38.p12 & 2 & & NA & ⋯ & First Pass & 0 & 1 & NA & & ClinVar:RCV000203462.1,PubMed:20301339 & NA & & Pathogenic & ClinVar\\\\\n", 166 | "\t4 & nssv8639269 & RCV000201517\\_214150 & deletion & 1 & NA & 1 & GRCh38 & 2 & & NA & ⋯ & & NA & NA & NA & & ClinVar:RCV000201517.1 & NA & & Uncertain significance & ClinVar\\\\\n", 167 | "\t5 & nssv8639270 & RCV000203882\\_222922 & deletion & 1 & NA & 1 & GRCh38 & 7 & & NA & ⋯ & & NA & NA & NA & & ClinVar:RCV000203882.1 & NA & & Likely pathogenic & ClinVar\\\\\n", 168 | "\t6 & nssv8639272 & RCV000144263\\_166057 & deletion & 1 & NA & 1 & GRCh38 & 3 & & NA & ⋯ & & NA & NA & NA & & ClinGen:CA277981,ClinVar:RCV000144263.1,PubMed:20301627,dbSNP:rs1553721650 & NA & & Pathogenic & ClinVar\\\\\n", 169 | "\\end{tabular}\n" 170 | ], 171 | "text/markdown": [ 172 | "\n", 173 | "A data.frame: 6 × 38\n", 174 | "\n", 175 | "| | X.variant_call_accession <chr> | variant_call_id <chr> | variant_call_type <chr> | experiment_id <int> | sample_id <lgl> | sampleset_id <int> | assembly <chr> | chr <chr> | contig <chr> | outer_start <int> | ⋯ ⋯ | remap_alignment <chr> | remap_best_within_cluster <int> | remap_coverage <dbl> | remap_diff_chr <int> | remap_failure_code <chr> | external_links <chr> | evidence <lgl> | sequence <chr> | clinical_significance <chr> | clinical_source <chr> |\n", 176 | "|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n", 177 | "| 1 | nssv8639265 | RCV000076578_96551 | deletion | 1 | NA | 1 | GRCh38 | 2 | | NA | ⋯ | | NA | NA | NA | | ClinGen:CA331581,ClinVar:RCV000076578.2,PubMed:11598466,PubMed:15604628,PubMed:20301390,PubMed:23408351,PubMed:23535968,PubMed:23788249,PubMed:24310308,PubMed:24493721,PubMed:25003300,PubMed:25070057,PubMed:25356965,PubMed:25452455,PubMed:25645574,PubMed:25711197,PubMed:27854360 | NA | | Pathogenic | ClinVar |\n", 178 | "| 2 | nssv8639267 | RCV000203260_215728 | duplication | 1 | NA | 1 | GRCh38.p12 | 8 | | NA | ⋯ | First Pass | 0 | 1 | NA | | ClinVar:RCV000203260.1,PubMed:20301641 | NA | | Pathogenic | ClinVar |\n", 179 | "| 3 | nssv8639268 | RCV000203462_216747 | deletion | 1 | NA | 1 | GRCh38.p12 | 2 | | NA | ⋯ | First Pass | 0 | 1 | NA | | ClinVar:RCV000203462.1,PubMed:20301339 | NA | | Pathogenic | ClinVar |\n", 180 | "| 4 | nssv8639269 | RCV000201517_214150 | deletion | 1 | NA | 1 | GRCh38 | 2 | | NA | ⋯ | | NA | NA | NA | | ClinVar:RCV000201517.1 | NA | | Uncertain significance | ClinVar |\n", 181 | "| 5 | nssv8639270 | RCV000203882_222922 | deletion | 1 | NA | 1 | GRCh38 | 7 | | NA | ⋯ | | NA | NA | NA | | ClinVar:RCV000203882.1 | NA | | Likely pathogenic | ClinVar |\n", 182 | "| 6 | nssv8639272 | RCV000144263_166057 | deletion | 1 | NA | 1 | GRCh38 | 3 | | NA | ⋯ | | NA | NA | NA | | ClinGen:CA277981,ClinVar:RCV000144263.1,PubMed:20301627,dbSNP:rs1553721650 | NA | | Pathogenic | ClinVar |\n", 183 | "\n" 184 | ], 185 | "text/plain": [ 186 | " X.variant_call_accession variant_call_id variant_call_type experiment_id\n", 187 | "1 nssv8639265 RCV000076578_96551 deletion 1 \n", 188 | "2 nssv8639267 RCV000203260_215728 duplication 1 \n", 189 | "3 nssv8639268 RCV000203462_216747 deletion 1 \n", 190 | "4 nssv8639269 RCV000201517_214150 deletion 1 \n", 191 | "5 nssv8639270 RCV000203882_222922 deletion 1 \n", 192 | "6 nssv8639272 RCV000144263_166057 deletion 1 \n", 193 | " sample_id sampleset_id assembly chr contig outer_start ⋯ remap_alignment\n", 194 | "1 NA 1 GRCh38 2 NA ⋯ \n", 195 | "2 NA 1 GRCh38.p12 8 NA ⋯ First Pass \n", 196 | "3 NA 1 GRCh38.p12 2 NA ⋯ First Pass \n", 197 | "4 NA 1 GRCh38 2 NA ⋯ \n", 198 | "5 NA 1 GRCh38 7 NA ⋯ \n", 199 | "6 NA 1 GRCh38 3 NA ⋯ \n", 200 | " remap_best_within_cluster remap_coverage remap_diff_chr remap_failure_code\n", 201 | "1 NA NA NA \n", 202 | "2 0 1 NA \n", 203 | "3 0 1 NA \n", 204 | "4 NA NA NA \n", 205 | "5 NA NA NA \n", 206 | "6 NA NA NA \n", 207 | " external_links \n", 208 | "1 ClinGen:CA331581,ClinVar:RCV000076578.2,PubMed:11598466,PubMed:15604628,PubMed:20301390,PubMed:23408351,PubMed:23535968,PubMed:23788249,PubMed:24310308,PubMed:24493721,PubMed:25003300,PubMed:25070057,PubMed:25356965,PubMed:25452455,PubMed:25645574,PubMed:25711197,PubMed:27854360\n", 209 | "2 ClinVar:RCV000203260.1,PubMed:20301641 \n", 210 | "3 ClinVar:RCV000203462.1,PubMed:20301339 \n", 211 | "4 ClinVar:RCV000201517.1 \n", 212 | "5 ClinVar:RCV000203882.1 \n", 213 | "6 ClinGen:CA277981,ClinVar:RCV000144263.1,PubMed:20301627,dbSNP:rs1553721650 \n", 214 | " evidence sequence clinical_significance clinical_source\n", 215 | "1 NA Pathogenic ClinVar \n", 216 | "2 NA Pathogenic ClinVar \n", 217 | "3 NA Pathogenic ClinVar \n", 218 | "4 NA Uncertain significance ClinVar \n", 219 | "5 NA Likely pathogenic ClinVar \n", 220 | "6 NA Pathogenic ClinVar " 221 | ] 222 | }, 223 | "metadata": {}, 224 | "output_type": "display_data" 225 | } 226 | ], 227 | "source": [ 228 | "clin.df = read.table('nstd102.GRCh38.variant_call.tsv.gz', as.is=TRUE, skip=1, comment='', sep='\\t', header=TRUE)\n", 229 | "head(clin.df)" 230 | ] 231 | }, 232 | { 233 | "cell_type": "code", 234 | "execution_count": 10, 235 | "metadata": {}, 236 | "outputs": [], 237 | "source": [ 238 | "clin.df %>% mutate(variant_id=X.variant_call_accession, sv_clinical_significance=clinical_significance) %>% \n", 239 | " filter(variant_id %in% vars.df$variant_id) %>% select(variant_id, sv_clinical_significance) %>% \n", 240 | " write.table('sv.clinical.variants.chr21.nstd120.tsv', row.names=FALSE, sep='\\t', quote=FALSE)" 241 | ] 242 | }, 243 | { 244 | "cell_type": "markdown", 245 | "metadata": {}, 246 | "source": [ 247 | "```\n", 248 | "dx upload sv.clinical.variants.chr21.nstd120.tsv --path chr-subset-genes-variants/\n", 249 | "dx upload make-misc-tsvs.ipynb --path chr-subset-genes-variants/\n", 250 | "```" 251 | ] 252 | }, 253 | { 254 | "cell_type": "markdown", 255 | "metadata": {}, 256 | "source": [ 257 | "# Compact ClinVar TSV" 258 | ] 259 | }, 260 | { 261 | "cell_type": "markdown", 262 | "metadata": {}, 263 | "source": [ 264 | "```\n", 265 | "dx download chr-subset-genes-variants/final_clinvar_dbvar_results.txt\n", 266 | "```" 267 | ] 268 | }, 269 | { 270 | "cell_type": "code", 271 | "execution_count": 13, 272 | "metadata": {}, 273 | "outputs": [ 274 | { 275 | "data": { 276 | "text/html": [ 277 | "\n", 278 | "\n", 279 | "\n", 280 | "\t\n", 281 | "\t\n", 282 | "\n", 283 | "\n", 284 | "\t\n", 285 | "\t\n", 286 | "\t\n", 287 | "\t\n", 288 | "\t\n", 289 | "\t\n", 290 | "\n", 291 | "
A data.frame: 6 × 2
variant_ideffect
<chr><chr>
1nssv15216117Pathogenic
2nssv14267889Pathogenic
3nssv14267890Pathogenic
4nssv14407870Pathogenic
5nssv15136031Likely_benign
6nssv15136031Benign
\n" 292 | ], 293 | "text/latex": [ 294 | "A data.frame: 6 × 2\n", 295 | "\\begin{tabular}{r|ll}\n", 296 | " & variant\\_id & effect\\\\\n", 297 | " & & \\\\\n", 298 | "\\hline\n", 299 | "\t1 & nssv15216117 & Pathogenic \\\\\n", 300 | "\t2 & nssv14267889 & Pathogenic \\\\\n", 301 | "\t3 & nssv14267890 & Pathogenic \\\\\n", 302 | "\t4 & nssv14407870 & Pathogenic \\\\\n", 303 | "\t5 & nssv15136031 & Likely\\_benign\\\\\n", 304 | "\t6 & nssv15136031 & Benign \\\\\n", 305 | "\\end{tabular}\n" 306 | ], 307 | "text/markdown": [ 308 | "\n", 309 | "A data.frame: 6 × 2\n", 310 | "\n", 311 | "| | variant_id <chr> | effect <chr> |\n", 312 | "|---|---|---|\n", 313 | "| 1 | nssv15216117 | Pathogenic |\n", 314 | "| 2 | nssv14267889 | Pathogenic |\n", 315 | "| 3 | nssv14267890 | Pathogenic |\n", 316 | "| 4 | nssv14407870 | Pathogenic |\n", 317 | "| 5 | nssv15136031 | Likely_benign |\n", 318 | "| 6 | nssv15136031 | Benign |\n", 319 | "\n" 320 | ], 321 | "text/plain": [ 322 | " variant_id effect \n", 323 | "1 nssv15216117 Pathogenic \n", 324 | "2 nssv14267889 Pathogenic \n", 325 | "3 nssv14267890 Pathogenic \n", 326 | "4 nssv14407870 Pathogenic \n", 327 | "5 nssv15136031 Likely_benign\n", 328 | "6 nssv15136031 Benign " 329 | ] 330 | }, 331 | "metadata": {}, 332 | "output_type": "display_data" 333 | } 334 | ], 335 | "source": [ 336 | "clinvar = read.table('final_clinvar_dbvar_results.txt', as.is=TRUE, sep='\\t')\n", 337 | "colnames(clinvar) = c('variant_id', 'effect')\n", 338 | "head(clinvar)" 339 | ] 340 | }, 341 | { 342 | "cell_type": "code", 343 | "execution_count": 15, 344 | "metadata": {}, 345 | "outputs": [ 346 | { 347 | "name": "stderr", 348 | "output_type": "stream", 349 | "text": [ 350 | "`summarise()` regrouping output by 'variant_id' (override with `.groups` argument)\n", 351 | "\n" 352 | ] 353 | }, 354 | { 355 | "data": { 356 | "text/html": [ 357 | "\n", 358 | "\n", 359 | "\n", 360 | "\t\n", 361 | "\t\n", 362 | "\n", 363 | "\n", 364 | "\t\n", 365 | "\t\n", 366 | "\t\n", 367 | "\t\n", 368 | "\t\n", 369 | "\t\n", 370 | "\n", 371 | "
A grouped_df: 6 × 3
variant_ideffectn
<chr><chr><int>
essv100139Pathogenic 1
essv100355Pathogenic 1
essv100680Benign 26
essv100680Benign/Likely_benign 7
essv100680Conflicting_interpretations_of_pathogenicity 9
essv100680Likely_benign 74
\n" 372 | ], 373 | "text/latex": [ 374 | "A grouped\\_df: 6 × 3\n", 375 | "\\begin{tabular}{lll}\n", 376 | " variant\\_id & effect & n\\\\\n", 377 | " & & \\\\\n", 378 | "\\hline\n", 379 | "\t essv100139 & Pathogenic & 1\\\\\n", 380 | "\t essv100355 & Pathogenic & 1\\\\\n", 381 | "\t essv100680 & Benign & 26\\\\\n", 382 | "\t essv100680 & Benign/Likely\\_benign & 7\\\\\n", 383 | "\t essv100680 & Conflicting\\_interpretations\\_of\\_pathogenicity & 9\\\\\n", 384 | "\t essv100680 & Likely\\_benign & 74\\\\\n", 385 | "\\end{tabular}\n" 386 | ], 387 | "text/markdown": [ 388 | "\n", 389 | "A grouped_df: 6 × 3\n", 390 | "\n", 391 | "| variant_id <chr> | effect <chr> | n <int> |\n", 392 | "|---|---|---|\n", 393 | "| essv100139 | Pathogenic | 1 |\n", 394 | "| essv100355 | Pathogenic | 1 |\n", 395 | "| essv100680 | Benign | 26 |\n", 396 | "| essv100680 | Benign/Likely_benign | 7 |\n", 397 | "| essv100680 | Conflicting_interpretations_of_pathogenicity | 9 |\n", 398 | "| essv100680 | Likely_benign | 74 |\n", 399 | "\n" 400 | ], 401 | "text/plain": [ 402 | " variant_id effect n \n", 403 | "1 essv100139 Pathogenic 1\n", 404 | "2 essv100355 Pathogenic 1\n", 405 | "3 essv100680 Benign 26\n", 406 | "4 essv100680 Benign/Likely_benign 7\n", 407 | "5 essv100680 Conflicting_interpretations_of_pathogenicity 9\n", 408 | "6 essv100680 Likely_benign 74" 409 | ] 410 | }, 411 | "metadata": {}, 412 | "output_type": "display_data" 413 | } 414 | ], 415 | "source": [ 416 | "clinvar.sum = clinvar %>% group_by(variant_id, effect) %>% summarize(n=n())\n", 417 | "head(clinvar.sum)" 418 | ] 419 | }, 420 | { 421 | "cell_type": "code", 422 | "execution_count": 16, 423 | "metadata": {}, 424 | "outputs": [], 425 | "source": [ 426 | "write.table(clinvar.sum, file='final_clinvar_dbvar_results_summary.txt', sep='\\t', row.names=FALSE, quote=FALSE)" 427 | ] 428 | }, 429 | { 430 | "cell_type": "code", 431 | "execution_count": null, 432 | "metadata": {}, 433 | "outputs": [], 434 | "source": [] 435 | } 436 | ], 437 | "metadata": { 438 | "kernelspec": { 439 | "display_name": "R", 440 | "language": "R", 441 | "name": "ir" 442 | }, 443 | "language_info": { 444 | "codemirror_mode": "r", 445 | "file_extension": ".r", 446 | "mimetype": "text/x-r-source", 447 | "name": "R", 448 | "pygments_lexer": "r", 449 | "version": "3.6.3" 450 | } 451 | }, 452 | "nbformat": 4, 453 | "nbformat_minor": 4 454 | } 455 | -------------------------------------------------------------------------------- /scripts/parse_gencode.pl: -------------------------------------------------------------------------------- 1 | #! /usr/bin/perl 2 | $file = "gencode.v35.annotation.gff3"; 3 | # split gencode annoation by type, so bedtools intersect will run faster 4 | open CDS, ">cds_annotation.txt"; 5 | open EXON, ">exon_annotation.txt"; 6 | open GENE, ">gene_annotation.txt"; 7 | open START, ">start_codon_annotation.txt"; 8 | open STOP, ">stop_codon_annotation.txt"; 9 | open TRANSCRIPT, ">transcript_annotation.txt"; 10 | open UTR5, ">5utr_annotation.txt"; 11 | open UTR3, ">3utr_annotation.txt"; 12 | open STOP2, ">stop_codon_selenocysteine_annotation.txt"; 13 | 14 | 15 | open READ, $file or die print "Can't open $file\n"; 16 | while () 17 | { 18 | chomp $_; 19 | #skip header 20 | if($_ =~ /^#/){next;} 21 | 22 | @data = split/\t/,$_; 23 | if ($_ =~ /^chr/){$_ =~ s/^chr//;} 24 | if($data[2] eq "CDS") 25 | { 26 | print CDS "$_\n"; 27 | } 28 | elsif($data[2] eq "exon") 29 | { 30 | print EXON "$_\n"; 31 | } 32 | elsif($data[2] eq "gene") 33 | { 34 | print GENE "$_\n"; 35 | } 36 | elsif($data[2] eq "stop_codon_redefined_as_selenocysteine") 37 | { 38 | print STOP2 "$_\n"; 39 | } 40 | elsif($data[2] eq "start_codon") 41 | { 42 | print START "$_\n"; 43 | } 44 | elsif($data[2] eq "stop_codon") 45 | { 46 | print STOP "$_\n"; 47 | } 48 | elsif($data[2] eq "transcript") 49 | { 50 | print TRANSCRIPT "$_\n"; 51 | } 52 | elsif($data[2] eq "five_prime_UTR") 53 | { 54 | print UTR5 "$_\n"; 55 | } 56 | elsif($data[2] eq "three_prime_UTR") 57 | { 58 | print UTR3 "$_\n"; 59 | } 60 | else{print "Error - type unknown $_\n";} 61 | } 62 | close READ; 63 | 64 | -------------------------------------------------------------------------------- /scripts/reformat_clinvar_results.pl: -------------------------------------------------------------------------------- 1 | #! /usr/bin/perl 2 | 3 | #temp line with hard coded input file since this is for testing chr21 4 | $file = "all_variants_chr21_clinvar.txt"; 5 | open READ, $file or die "Can't open $file\n"; 6 | open RESULTS, ">final_clinvar_dbvar_results.txt" or die "Can't open final_clinvar_dbvar_results.txt\n"; 7 | open TF, ">final_clinvar_dbvar_results_TF.txt" or die "Can't open final_clinvar_dbvar_results_TF.txt\n"; 8 | while () 9 | { 10 | chomp $_; 11 | @data = split /\t/,$_; 12 | $data[3] =~ s/\;dbVarID\=(.*)//; $dbvar = $1; 13 | $data[11] =~ s/CLNSIG=(.*)\;CLNVC//; $clinvar = $1; $clinvar =~ s/\;CLNVC.*$//; $clinvar =~ s/\;CLNSIGCONF.*$//; 14 | if ($clinvar =~ /^ns/) 15 | { 16 | # these records have no CLNSIG, instead CLNSIGINCL 17 | $data[11] =~ s/.*CLNSIGINCL=(.*)//; $cli = $1; $cli =~ s/^.*://; $clinvar = $cli; 18 | } 19 | # print dbvar id and the clinvar significance 20 | print RESULTS "$dbvar\t$clinvar\n"; 21 | if ( ($clinvar eq "Pathogenic") || ($clinvar eq "Likely_pathogenic") || ($clinvar eq "Pathogenic/Likely_pathogenic") ) 22 | { 23 | $truth = "TRUE"; 24 | } 25 | else{$truth = "FALSE";} 26 | # print dbvar id and simple True or false if it is P/LP 27 | print TF "$dbvar\t$truth\n"; 28 | } 29 | close READ; 30 | -------------------------------------------------------------------------------- /scripts/reformat_tsv_variant_file.pl: -------------------------------------------------------------------------------- 1 | #! /usr/bin/perl 2 | 3 | # temp line, hard code input file since test is on chr21 only 4 | $file = "all_variants_chr21.tsv"; 5 | open READ, $file or die "Can't open $file\n"; 6 | open BED, ">all_variants_chr21.bed" or die "Can't open all_variants_chr21.bed\n"; 7 | while () 8 | { 9 | chomp $_; 10 | if($_ =~ /^chr/){next;} 11 | @data = split /\t/,$_; 12 | #removing BND that was accidently add to the all_variants_chr21.tsv 13 | if ($data[3] eq "BND"){next;} 14 | print BED "$data[0]\t$data[1]\t$data[2]\ttype=$data[3];dbVarID=$data[4]\n"; 15 | } 16 | close READ; 17 | -------------------------------------------------------------------------------- /scripts/variant-gene-overlap.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "```\n", 8 | "dx download Annotation/gene/gencode.v35.annotation.gff3.gz\n", 9 | "dx download chr-subset-genes-variants/all_variants_chr21.tsv\n", 10 | "```" 11 | ] 12 | }, 13 | { 14 | "cell_type": "code", 15 | "execution_count": null, 16 | "metadata": {}, 17 | "outputs": [], 18 | "source": [ 19 | "library(rtracklayer)" 20 | ] 21 | }, 22 | { 23 | "cell_type": "code", 24 | "execution_count": null, 25 | "metadata": {}, 26 | "outputs": [], 27 | "source": [ 28 | "genc = import('gencode.v35.annotation.gff3.gz')\n", 29 | "genc" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": 3, 35 | "metadata": {}, 36 | "outputs": [ 37 | { 38 | "data": { 39 | "text/plain": [ 40 | "GRanges object with 2949356 ranges and 23 metadata columns:\n", 41 | " seqnames ranges strand | source type score\n", 42 | " | \n", 43 | " [1] chr1 11869-14409 + | HAVANA gene \n", 44 | " [2] chr1 11869-14409 + | HAVANA transcript \n", 45 | " [3] chr1 11869-12227 + | HAVANA exon \n", 46 | " [4] chr1 12613-12721 + | HAVANA exon \n", 47 | " [5] chr1 13221-14409 + | HAVANA exon \n", 48 | " ... ... ... ... . ... ... ...\n", 49 | " [2949352] chrM 15888-15953 + | ENSEMBL transcript \n", 50 | " [2949353] chrM 15888-15953 + | ENSEMBL exon \n", 51 | " [2949354] chrM 15956-16023 - | ENSEMBL gene \n", 52 | " [2949355] chrM 15956-16023 - | ENSEMBL transcript \n", 53 | " [2949356] chrM 15956-16023 - | ENSEMBL exon \n", 54 | " phase ID gene_id\n", 55 | " \n", 56 | " [1] ENSG00000223972.5 ENSG00000223972.5\n", 57 | " [2] ENST00000456328.2 ENSG00000223972.5\n", 58 | " [3] exon:ENST00000456328.2:1 ENSG00000223972.5\n", 59 | " [4] exon:ENST00000456328.2:2 ENSG00000223972.5\n", 60 | " [5] exon:ENST00000456328.2:3 ENSG00000223972.5\n", 61 | " ... ... ... ...\n", 62 | " [2949352] ENST00000387460.2 ENSG00000210195.2\n", 63 | " [2949353] exon:ENST00000387460.2:1 ENSG00000210195.2\n", 64 | " [2949354] ENSG00000210196.2 ENSG00000210196.2\n", 65 | " [2949355] ENST00000387461.2 ENSG00000210196.2\n", 66 | " [2949356] exon:ENST00000387461.2:1 ENSG00000210196.2\n", 67 | " gene_type gene_name level\n", 68 | " \n", 69 | " [1] transcribed_unprocessed_pseudogene DDX11L1 2\n", 70 | " [2] transcribed_unprocessed_pseudogene DDX11L1 2\n", 71 | " [3] transcribed_unprocessed_pseudogene DDX11L1 2\n", 72 | " [4] transcribed_unprocessed_pseudogene DDX11L1 2\n", 73 | " [5] transcribed_unprocessed_pseudogene DDX11L1 2\n", 74 | " ... ... ... ...\n", 75 | " [2949352] Mt_tRNA MT-TT 3\n", 76 | " [2949353] Mt_tRNA MT-TT 3\n", 77 | " [2949354] Mt_tRNA MT-TP 3\n", 78 | " [2949355] Mt_tRNA MT-TP 3\n", 79 | " [2949356] Mt_tRNA MT-TP 3\n", 80 | " hgnc_id havana_gene Parent\n", 81 | " \n", 82 | " [1] HGNC:37102 OTTHUMG00000000961.2 \n", 83 | " [2] HGNC:37102 OTTHUMG00000000961.2 ENSG00000223972.5\n", 84 | " [3] HGNC:37102 OTTHUMG00000000961.2 ENST00000456328.2\n", 85 | " [4] HGNC:37102 OTTHUMG00000000961.2 ENST00000456328.2\n", 86 | " [5] HGNC:37102 OTTHUMG00000000961.2 ENST00000456328.2\n", 87 | " ... ... ... ...\n", 88 | " [2949352] HGNC:7499 ENSG00000210195.2\n", 89 | " [2949353] HGNC:7499 ENST00000387460.2\n", 90 | " [2949354] HGNC:7494 \n", 91 | " [2949355] HGNC:7494 ENSG00000210196.2\n", 92 | " [2949356] HGNC:7494 ENST00000387461.2\n", 93 | " transcript_id transcript_type transcript_name\n", 94 | " \n", 95 | " [1] \n", 96 | " [2] ENST00000456328.2 processed_transcript DDX11L1-202\n", 97 | " [3] ENST00000456328.2 processed_transcript DDX11L1-202\n", 98 | " [4] ENST00000456328.2 processed_transcript DDX11L1-202\n", 99 | " [5] ENST00000456328.2 processed_transcript DDX11L1-202\n", 100 | " ... ... ... ...\n", 101 | " [2949352] ENST00000387460.2 Mt_tRNA MT-TT-201\n", 102 | " [2949353] ENST00000387460.2 Mt_tRNA MT-TT-201\n", 103 | " [2949354] \n", 104 | " [2949355] ENST00000387461.2 Mt_tRNA MT-TP-201\n", 105 | " [2949356] ENST00000387461.2 Mt_tRNA MT-TP-201\n", 106 | " transcript_support_level tag havana_transcript\n", 107 | " \n", 108 | " [1] \n", 109 | " [2] 1 basic OTTHUMT00000362751.1\n", 110 | " [3] 1 basic OTTHUMT00000362751.1\n", 111 | " [4] 1 basic OTTHUMT00000362751.1\n", 112 | " [5] 1 basic OTTHUMT00000362751.1\n", 113 | " ... ... ... ...\n", 114 | " [2949352] NA basic \n", 115 | " [2949353] NA basic \n", 116 | " [2949354] \n", 117 | " [2949355] NA basic \n", 118 | " [2949356] NA basic \n", 119 | " exon_number exon_id ont protein_id\n", 120 | " \n", 121 | " [1] \n", 122 | " [2] \n", 123 | " [3] 1 ENSE00002234944.1 \n", 124 | " [4] 2 ENSE00003582793.1 \n", 125 | " [5] 3 ENSE00002312635.1 \n", 126 | " ... ... ... ... ...\n", 127 | " [2949352] \n", 128 | " [2949353] 1 ENSE00001544475.2 \n", 129 | " [2949354] \n", 130 | " [2949355] \n", 131 | " [2949356] 1 ENSE00001544473.2 \n", 132 | " ccdsid\n", 133 | " \n", 134 | " [1] \n", 135 | " [2] \n", 136 | " [3] \n", 137 | " [4] \n", 138 | " [5] \n", 139 | " ... ...\n", 140 | " [2949352] \n", 141 | " [2949353] \n", 142 | " [2949354] \n", 143 | " [2949355] \n", 144 | " [2949356] \n", 145 | " -------\n", 146 | " seqinfo: 25 sequences from an unspecified genome; no seqlengths" 147 | ] 148 | }, 149 | "metadata": {}, 150 | "output_type": "display_data" 151 | } 152 | ], 153 | "source": [ 154 | "genc" 155 | ] 156 | }, 157 | { 158 | "cell_type": "code", 159 | "execution_count": 4, 160 | "metadata": {}, 161 | "outputs": [ 162 | { 163 | "data": { 164 | "text/plain": [ 165 | "\n", 166 | " gene transcript \n", 167 | " 60656 229580 \n", 168 | " exon CDS \n", 169 | " 1398443 774993 \n", 170 | " start_codon stop_codon \n", 171 | " 88717 81122 \n", 172 | " five_prime_UTR three_prime_UTR \n", 173 | " 155436 160294 \n", 174 | "stop_codon_redefined_as_selenocysteine \n", 175 | " 115 " 176 | ] 177 | }, 178 | "metadata": {}, 179 | "output_type": "display_data" 180 | } 181 | ], 182 | "source": [ 183 | "table(genc$type)" 184 | ] 185 | }, 186 | { 187 | "cell_type": "code", 188 | "execution_count": 5, 189 | "metadata": {}, 190 | "outputs": [], 191 | "source": [ 192 | "genc = subset(genc, type %in% c('transcript', 'CDS', 'five_prime_UTR', 'three_prime_UTR'))" 193 | ] 194 | }, 195 | { 196 | "cell_type": "code", 197 | "execution_count": 6, 198 | "metadata": {}, 199 | "outputs": [ 200 | { 201 | "name": "stderr", 202 | "output_type": "stream", 203 | "text": [ 204 | "\n", 205 | "Attaching package: ‘dplyr’\n", 206 | "\n", 207 | "\n", 208 | "The following objects are masked from ‘package:GenomicRanges’:\n", 209 | "\n", 210 | " intersect, setdiff, union\n", 211 | "\n", 212 | "\n", 213 | "The following object is masked from ‘package:GenomeInfoDb’:\n", 214 | "\n", 215 | " intersect\n", 216 | "\n", 217 | "\n", 218 | "The following objects are masked from ‘package:IRanges’:\n", 219 | "\n", 220 | " collapse, desc, intersect, setdiff, slice, union\n", 221 | "\n", 222 | "\n", 223 | "The following objects are masked from ‘package:S4Vectors’:\n", 224 | "\n", 225 | " first, intersect, rename, setdiff, setequal, union\n", 226 | "\n", 227 | "\n", 228 | "The following objects are masked from ‘package:BiocGenerics’:\n", 229 | "\n", 230 | " combine, intersect, setdiff, union\n", 231 | "\n", 232 | "\n", 233 | "The following objects are masked from ‘package:stats’:\n", 234 | "\n", 235 | " filter, lag\n", 236 | "\n", 237 | "\n", 238 | "The following objects are masked from ‘package:base’:\n", 239 | "\n", 240 | " intersect, setdiff, setequal, union\n", 241 | "\n", 242 | "\n" 243 | ] 244 | } 245 | ], 246 | "source": [ 247 | "library(GenomicRanges)\n", 248 | "library(dplyr)" 249 | ] 250 | }, 251 | { 252 | "cell_type": "code", 253 | "execution_count": 7, 254 | "metadata": {}, 255 | "outputs": [ 256 | { 257 | "data": { 258 | "text/plain": [ 259 | "GRanges object with 420225 ranges and 2 metadata columns:\n", 260 | " seqnames ranges strand | type variant_id\n", 261 | " | \n", 262 | " [1] chr21 5051593-5051594 * | INS nssv14017801\n", 263 | " [2] chr21 5064052-5066138 * | DEL nssv14300595\n", 264 | " [3] chr21 5064052-5066138 * | DEL nssv14301211\n", 265 | " [4] chr21 5064052-5066138 * | DEL nssv14301212\n", 266 | " [5] chr21 5064052-5066138 * | DEL nssv14301213\n", 267 | " ... ... ... ... . ... ...\n", 268 | " [420221] chr21 46699865 * | INS nssv14662319\n", 269 | " [420222] chr21 46699867 * | INS nssv14667619\n", 270 | " [420223] chr21 46699891 * | INS nssv14670897\n", 271 | " [420224] chr21 46699928 * | INS nssv14657123\n", 272 | " [420225] chr21 46699957 * | INS nssv14663905\n", 273 | " -------\n", 274 | " seqinfo: 1 sequence from an unspecified genome; no seqlengths" 275 | ] 276 | }, 277 | "metadata": {}, 278 | "output_type": "display_data" 279 | } 280 | ], 281 | "source": [ 282 | "vars = read.table('all_variants_chr21.tsv', as.is=TRUE, sep='\\t', header=TRUE)\n", 283 | "vars.gr = vars %>% mutate(chr=paste0('chr', chr)) %>% makeGRangesFromDataFrame(keep.extra.columns = TRUE)\n", 284 | "vars.gr" 285 | ] 286 | }, 287 | { 288 | "cell_type": "code", 289 | "execution_count": 9, 290 | "metadata": {}, 291 | "outputs": [ 292 | { 293 | "data": { 294 | "text/html": [ 295 | "\n", 296 | "\n", 297 | "\n", 298 | "\t\n", 299 | "\t\n", 300 | "\n", 301 | "\n", 302 | "\t\n", 303 | "\t\n", 304 | "\t\n", 305 | "\t\n", 306 | "\t\n", 307 | "\t\n", 308 | "\n", 309 | "
A data.frame: 6 × 6
variant_idgene_idgene_nametranscript_idexon_numbertype
<chr><chr><chr><chr><chr><fct>
1nssv13693287ENSG00000280071.4GATD3BENST00000624810.3NAtranscript
2nssv13693287ENSG00000280071.4GATD3BENST00000625036.3NAtranscript
3nssv14395136ENSG00000280071.4GATD3BENST00000624810.3NAtranscript
4nssv14395136ENSG00000280071.4GATD3BENST00000625036.3NAtranscript
5nssv14031467ENSG00000280071.4GATD3BENST00000624810.3NAtranscript
6nssv14031467ENSG00000280071.4GATD3BENST00000625036.3NAtranscript
\n" 310 | ], 311 | "text/latex": [ 312 | "A data.frame: 6 × 6\n", 313 | "\\begin{tabular}{r|llllll}\n", 314 | " & variant\\_id & gene\\_id & gene\\_name & transcript\\_id & exon\\_number & type\\\\\n", 315 | " & & & & & & \\\\\n", 316 | "\\hline\n", 317 | "\t1 & nssv13693287 & ENSG00000280071.4 & GATD3B & ENST00000624810.3 & NA & transcript\\\\\n", 318 | "\t2 & nssv13693287 & ENSG00000280071.4 & GATD3B & ENST00000625036.3 & NA & transcript\\\\\n", 319 | "\t3 & nssv14395136 & ENSG00000280071.4 & GATD3B & ENST00000624810.3 & NA & transcript\\\\\n", 320 | "\t4 & nssv14395136 & ENSG00000280071.4 & GATD3B & ENST00000625036.3 & NA & transcript\\\\\n", 321 | "\t5 & nssv14031467 & ENSG00000280071.4 & GATD3B & ENST00000624810.3 & NA & transcript\\\\\n", 322 | "\t6 & nssv14031467 & ENSG00000280071.4 & GATD3B & ENST00000625036.3 & NA & transcript\\\\\n", 323 | "\\end{tabular}\n" 324 | ], 325 | "text/markdown": [ 326 | "\n", 327 | "A data.frame: 6 × 6\n", 328 | "\n", 329 | "| | variant_id <chr> | gene_id <chr> | gene_name <chr> | transcript_id <chr> | exon_number <chr> | type <fct> |\n", 330 | "|---|---|---|---|---|---|---|\n", 331 | "| 1 | nssv13693287 | ENSG00000280071.4 | GATD3B | ENST00000624810.3 | NA | transcript |\n", 332 | "| 2 | nssv13693287 | ENSG00000280071.4 | GATD3B | ENST00000625036.3 | NA | transcript |\n", 333 | "| 3 | nssv14395136 | ENSG00000280071.4 | GATD3B | ENST00000624810.3 | NA | transcript |\n", 334 | "| 4 | nssv14395136 | ENSG00000280071.4 | GATD3B | ENST00000625036.3 | NA | transcript |\n", 335 | "| 5 | nssv14031467 | ENSG00000280071.4 | GATD3B | ENST00000624810.3 | NA | transcript |\n", 336 | "| 6 | nssv14031467 | ENSG00000280071.4 | GATD3B | ENST00000625036.3 | NA | transcript |\n", 337 | "\n" 338 | ], 339 | "text/plain": [ 340 | " variant_id gene_id gene_name transcript_id exon_number\n", 341 | "1 nssv13693287 ENSG00000280071.4 GATD3B ENST00000624810.3 NA \n", 342 | "2 nssv13693287 ENSG00000280071.4 GATD3B ENST00000625036.3 NA \n", 343 | "3 nssv14395136 ENSG00000280071.4 GATD3B ENST00000624810.3 NA \n", 344 | "4 nssv14395136 ENSG00000280071.4 GATD3B ENST00000625036.3 NA \n", 345 | "5 nssv14031467 ENSG00000280071.4 GATD3B ENST00000624810.3 NA \n", 346 | "6 nssv14031467 ENSG00000280071.4 GATD3B ENST00000625036.3 NA \n", 347 | " type \n", 348 | "1 transcript\n", 349 | "2 transcript\n", 350 | "3 transcript\n", 351 | "4 transcript\n", 352 | "5 transcript\n", 353 | "6 transcript" 354 | ] 355 | }, 356 | "metadata": {}, 357 | "output_type": "display_data" 358 | }, 359 | { 360 | "data": { 361 | "text/html": [ 362 | "9448086" 363 | ], 364 | "text/latex": [ 365 | "9448086" 366 | ], 367 | "text/markdown": [ 368 | "9448086" 369 | ], 370 | "text/plain": [ 371 | "[1] 9448086" 372 | ] 373 | }, 374 | "metadata": {}, 375 | "output_type": "display_data" 376 | } 377 | ], 378 | "source": [ 379 | "var.gene.df = findOverlaps(vars.gr, genc) %>% as.data.frame %>%\n", 380 | " filter(genc$gene_type[subjectHits] == 'protein_coding') %>%\n", 381 | " mutate(variant_id=vars.gr$variant_id[queryHits],\n", 382 | " gene_id=genc$gene_id[subjectHits],\n", 383 | " gene_name=genc$gene_name[subjectHits],\n", 384 | " transcript_id=genc$transcript_id[subjectHits],\n", 385 | " exon_number=genc$exon_number[subjectHits], type=genc$type[subjectHits]) %>% \n", 386 | " select(-queryHits, -subjectHits) %>% unique\n", 387 | "head(var.gene.df)\n", 388 | "nrow(var.gene.df)" 389 | ] 390 | }, 391 | { 392 | "cell_type": "code", 393 | "execution_count": 10, 394 | "metadata": {}, 395 | "outputs": [ 396 | { 397 | "name": "stderr", 398 | "output_type": "stream", 399 | "text": [ 400 | "`summarise()` regrouping output by 'variant_id', 'gene_id', 'gene_name', 'transcript_id' (override with `.groups` argument)\n", 401 | "\n" 402 | ] 403 | }, 404 | { 405 | "data": { 406 | "text/html": [ 407 | "\n", 408 | "\n", 409 | "\n", 410 | "\t\n", 411 | "\t\n", 412 | "\n", 413 | "\n", 414 | "\t\n", 415 | "\t\n", 416 | "\t\n", 417 | "\t\n", 418 | "\t\n", 419 | "\t\n", 420 | "\n", 421 | "
A grouped_df: 6 × 6
variant_idgene_idgene_nametranscript_idtypeexon_number
<chr><chr><chr><chr><fct><chr>
essv100139ENSG00000160209.19PDXKENST00000291565.9 transcript
essv100139ENSG00000160209.19PDXKENST00000291565.9 CDS 3
essv100139ENSG00000160209.19PDXKENST00000327574.4 transcript
essv100139ENSG00000160209.19PDXKENST00000327574.4 three_prime_UTR4
essv100139ENSG00000160209.19PDXKENST00000343528.10transcript
essv100139ENSG00000160209.19PDXKENST00000398078.7 transcript
\n" 422 | ], 423 | "text/latex": [ 424 | "A grouped\\_df: 6 × 6\n", 425 | "\\begin{tabular}{llllll}\n", 426 | " variant\\_id & gene\\_id & gene\\_name & transcript\\_id & type & exon\\_number\\\\\n", 427 | " & & & & & \\\\\n", 428 | "\\hline\n", 429 | "\t essv100139 & ENSG00000160209.19 & PDXK & ENST00000291565.9 & transcript & \\\\\n", 430 | "\t essv100139 & ENSG00000160209.19 & PDXK & ENST00000291565.9 & CDS & 3\\\\\n", 431 | "\t essv100139 & ENSG00000160209.19 & PDXK & ENST00000327574.4 & transcript & \\\\\n", 432 | "\t essv100139 & ENSG00000160209.19 & PDXK & ENST00000327574.4 & three\\_prime\\_UTR & 4\\\\\n", 433 | "\t essv100139 & ENSG00000160209.19 & PDXK & ENST00000343528.10 & transcript & \\\\\n", 434 | "\t essv100139 & ENSG00000160209.19 & PDXK & ENST00000398078.7 & transcript & \\\\\n", 435 | "\\end{tabular}\n" 436 | ], 437 | "text/markdown": [ 438 | "\n", 439 | "A grouped_df: 6 × 6\n", 440 | "\n", 441 | "| variant_id <chr> | gene_id <chr> | gene_name <chr> | transcript_id <chr> | type <fct> | exon_number <chr> |\n", 442 | "|---|---|---|---|---|---|\n", 443 | "| essv100139 | ENSG00000160209.19 | PDXK | ENST00000291565.9 | transcript | |\n", 444 | "| essv100139 | ENSG00000160209.19 | PDXK | ENST00000291565.9 | CDS | 3 |\n", 445 | "| essv100139 | ENSG00000160209.19 | PDXK | ENST00000327574.4 | transcript | |\n", 446 | "| essv100139 | ENSG00000160209.19 | PDXK | ENST00000327574.4 | three_prime_UTR | 4 |\n", 447 | "| essv100139 | ENSG00000160209.19 | PDXK | ENST00000343528.10 | transcript | |\n", 448 | "| essv100139 | ENSG00000160209.19 | PDXK | ENST00000398078.7 | transcript | |\n", 449 | "\n" 450 | ], 451 | "text/plain": [ 452 | " variant_id gene_id gene_name transcript_id type \n", 453 | "1 essv100139 ENSG00000160209.19 PDXK ENST00000291565.9 transcript \n", 454 | "2 essv100139 ENSG00000160209.19 PDXK ENST00000291565.9 CDS \n", 455 | "3 essv100139 ENSG00000160209.19 PDXK ENST00000327574.4 transcript \n", 456 | "4 essv100139 ENSG00000160209.19 PDXK ENST00000327574.4 three_prime_UTR\n", 457 | "5 essv100139 ENSG00000160209.19 PDXK ENST00000343528.10 transcript \n", 458 | "6 essv100139 ENSG00000160209.19 PDXK ENST00000398078.7 transcript \n", 459 | " exon_number\n", 460 | "1 \n", 461 | "2 3 \n", 462 | "3 \n", 463 | "4 4 \n", 464 | "5 \n", 465 | "6 " 466 | ] 467 | }, 468 | "metadata": {}, 469 | "output_type": "display_data" 470 | }, 471 | { 472 | "data": { 473 | "text/html": [ 474 | "3582430" 475 | ], 476 | "text/latex": [ 477 | "3582430" 478 | ], 479 | "text/markdown": [ 480 | "3582430" 481 | ], 482 | "text/plain": [ 483 | "[1] 3582430" 484 | ] 485 | }, 486 | "metadata": {}, 487 | "output_type": "display_data" 488 | } 489 | ], 490 | "source": [ 491 | "var.gene.df = var.gene.df %>% group_by(variant_id, gene_id, gene_name, transcript_id, type) %>% summarize(exon_number=paste(sort(unique(exon_number)), collapse=';'))\n", 492 | "head(var.gene.df)\n", 493 | "nrow(var.gene.df)" 494 | ] 495 | }, 496 | { 497 | "cell_type": "code", 498 | "execution_count": 11, 499 | "metadata": {}, 500 | "outputs": [], 501 | "source": [ 502 | "write.table(var.gene.df, file='variants.genes.chr21.tsv', sep='\\t', quote=FALSE, row.names=FALSE)" 503 | ] 504 | }, 505 | { 506 | "cell_type": "markdown", 507 | "metadata": {}, 508 | "source": [ 509 | "```\n", 510 | "gzip variants.genes.chr21.tsv\n", 511 | "dx upload variants.genes.chr21.tsv.gz --path jmonlong-notebook-test/\n", 512 | "dx upload variant-gene-overlap.ipynb --path jmonlong-notebook-test/\n", 513 | "```" 514 | ] 515 | } 516 | ], 517 | "metadata": { 518 | "kernelspec": { 519 | "display_name": "R", 520 | "language": "R", 521 | "name": "ir" 522 | }, 523 | "language_info": { 524 | "codemirror_mode": "r", 525 | "file_extension": ".r", 526 | "mimetype": "text/x-r-source", 527 | "name": "R", 528 | "pygments_lexer": "r", 529 | "version": "3.6.3" 530 | } 531 | }, 532 | "nbformat": 4, 533 | "nbformat_minor": 4 534 | } 535 | -------------------------------------------------------------------------------- /shinyapp/README.md: -------------------------------------------------------------------------------- 1 | Run the `shiny::runApp()` in this folder to launch the Shiny App locally. 2 | 3 | This will require R with the following packages: 4 | - shiny 5 | - dplyr 6 | - DT 7 | - ggplot2 8 | - shinydashboard 9 | 10 | A docker with these dependencies will be available soon. 11 | 12 | The [testdata](testdata) was used while preparing the real data. 13 | 14 | ## Live version 15 | 16 | The demo app is set up on https://www.shinyapps.io/ at https://jmonlong.shinyapps.io/GeneVar/ 17 | 18 | To set it up, run the following in this folder: 19 | 20 | ```r 21 | library(rsconnect) 22 | rsconnect::setAccountInfo(name='jmonlong', 23 | token='', 24 | secret='') 25 | deployApp(appName='GeneVar') 26 | ``` 27 | 28 | `` and `` provided on the shinyapps.io "Tokens" account tab. 29 | -------------------------------------------------------------------------------- /shinyapp/run-app.R: -------------------------------------------------------------------------------- 1 | library(shiny) 2 | 3 | #### launch app locally 4 | runApp() 5 | 6 | #### launch app on UCSC server 7 | runApp(port=3457, host='0.0.0.0') 8 | -------------------------------------------------------------------------------- /shinyapp/server.R: -------------------------------------------------------------------------------- 1 | library(dplyr) 2 | library(shiny) 3 | library(DT) 4 | library(ggplot2) 5 | library(shinydashboard) 6 | 7 | ## read real data 8 | data = list() 9 | data$all_variants = read.table('./all_variants_chr21.tsv', as.is=TRUE, header=TRUE, sep='\t') 10 | data$gene_variants = read.table('./variants.genes.chr21.tsv.gz', as.is=TRUE, header=TRUE, sep='\t') 11 | data$clinsv_variants = read.table('./sv.clinical.variants.chr21.nstd120.tsv', as.is=TRUE, header=TRUE, sep='\t') 12 | data$clinsnv_variants = read.table('./final_clinvar_dbvar_results_summary.txt', as.is=TRUE, header=TRUE, sep='\t') 13 | data$af = read.table('./gnomad-af-variants-chr21.tsv', as.is=TRUE, header=TRUE, sep='\t') 14 | 15 | ## save pathogenic variants and modify the main table to have a summary 16 | snvs.path = data$clinsnv_variants %>% filter(effect %in% c('Pathogenic', 'Likely pathogenic')) %>% .$variant_id 17 | data$clinsnv_variants = data$clinsnv_variants %>% 18 | mutate(effect=factor(effect, levels=c("Pathogenic", "Likely pathogenic", "Benign", "Benign/Likely benign", "Likely benign"))) %>% 19 | filter(!is.na(effect)) %>% 20 | arrange(effect) %>% group_by(variant_id) %>% summarize(clinical_snv=paste0(effect, '(', n, ')', collapse=';')) 21 | 22 | ## same for SVs 23 | svs.path = data$clinsv_variants %>% filter(sv_clinical_significance %in% c('Pathogenic', 'Likely pathogenic')) %>% .$variant_id 24 | data$clinsv_variants = data$clinsv_variants %>% 25 | mutate(effect=factor(sv_clinical_significance, 26 | levels=c("Pathogenic", "Likely pathogenic", "Benign", "Benign/Likely benign", "Likely benign"))) %>% 27 | filter(!is.na(effect)) %>% 28 | group_by(variant_id, effect) %>% summarize(n=n()) %>% 29 | arrange(effect) %>% group_by(variant_id) %>% summarize(clinical_sv=paste0(effect, '(', n, ')', collapse=';')) 30 | 31 | ## list all genes 32 | genes = unique(c(data$gene_variants$gene_id, data$gene_variants$gene_name, data$gene_variants$transcript_id)) 33 | genes = unique(c('ENST00000400454.6', genes)) 34 | 35 | ## merge general variant info 36 | vars.df = data$all_variants %>% 37 | merge(data$clinsv_variants, all.x=TRUE) %>% 38 | merge(data$clinsnv_variants, all.x=TRUE) %>% 39 | merge(data$af, all.x=TRUE) 40 | 41 | ## format columns for better DataTable experience 42 | vars.df = vars.df %>% mutate(type=factor(type), coord=paste0(chr, ':', start, '-', end), size=end-start) %>% 43 | select(-chr, -start, -end) %>% 44 | select(variant_id, coord, type, size, af, everything()) 45 | svtypes = sort(unique(vars.df$type)) 46 | svsize.max = max(vars.df$size) 47 | 48 | ## function to add links to the variant table 49 | dtify <- function(df){ 50 | df %>% mutate(variant_id=paste0('', variant_id, '')) 51 | } 52 | 53 | ## server side of the app 54 | server <- function(input, output) { 55 | ## reactive conductor to apply the filtering only once for all elements that need it 56 | selVars <- reactive({ 57 | message('Gene: ', input$gene_search) 58 | gene.var = data$gene_variants %>% 59 | filter(gene_id==input$gene_search | gene_name==input$gene_search | transcript_id==input$gene_search) %>% 60 | group_by(variant_id, type) %>% 61 | summarize(exon_number=ifelse(any(exon_number!=''), paste0(sort(unique(exon_number)), collapse='|'), '')) %>% 62 | mutate(elt_type=ifelse(exon_number!='', paste0(type, '(', exon_number, ')'), type)) %>% 63 | select(-type) 64 | vars.sel = vars.df %>% filter(variant_id %in% gene.var$variant_id, 65 | type %in% input$svtypes, 66 | size >= input$size.min, 67 | size <= input$size.max) 68 | vars.sel = merge(gene.var, vars.sel, all.y=TRUE) %>% 69 | group_by(variant_id, coord, type, size, af, clinical_sv, clinical_snv) %>% 70 | summarize(gene_impact=paste(sort(unique(elt_type)), collapse=';')) %>% ungroup 71 | vars.sel %>% arrange(desc(clinical_sv)) 72 | }) 73 | geneName <- reactive({ 74 | gene_name = input$gene_search 75 | if(all(gene_name != data$gene_variants$gene_name)){ 76 | gene.var = data$gene_variants %>% 77 | filter(gene_id==input$gene_search | gene_name==input$gene_search | transcript_id==input$gene_search) 78 | gene_name = gene.var$gene_name[1] 79 | } 80 | return(gene_name) 81 | }) 82 | ## Text 83 | output$title = renderText({ 84 | paste0('

', input$gene_search, '

') 85 | }) 86 | output$omim_url = renderText({ 87 | return(as.character(a('OMIM', href=paste0('https://www.genenames.org/tools/search/#!/genes?query=', geneName()), target='_blank'))) 88 | }) 89 | output$gtex_url = renderText({ 90 | return(as.character(a('GTEx', href=paste0('https://gtexportal.org/home/gene/', geneName()), target='_blank'))) 91 | }) 92 | output$gnomad_url = renderText({ 93 | return(as.character(a('gnomAD', href=paste0('https://gnomad.broadinstitute.org/gene/', geneName()), target='_blank'))) 94 | }) 95 | ## boxes 96 | output$sv_box <- renderInfoBox({ 97 | infoBox("SVs", nrow(selVars()), icon=icon("dna"), color="blue") 98 | }) 99 | output$path_sv_box <- renderInfoBox({ 100 | infoBox("Clinical SVs", sum(selVars()$variant_id %in% svs.path), 101 | icon=icon("stethoscope"), color="red") 102 | }) 103 | output$path_snv_box <- renderInfoBox({ 104 | infoBox("Overlap Clinical SNVs", sum(selVars()$variant_id %in% snvs.path), 105 | icon=icon("stethoscope"), color="red") 106 | }) 107 | ## dynamic tables 108 | output$vars_table <- renderDataTable( 109 | datatable(dtify(selVars()), 110 | filter='top', 111 | rownames=FALSE, 112 | escape=FALSE, 113 | options=list(pageLength=15, searching=FALSE))) 114 | ## Graph 115 | output$af_plot = renderPlot({ 116 | ggplot(selVars(), aes(x=af)) + geom_histogram() + theme_bw() + xlab('allele frequency') + xlim(-.1,1.1) 117 | }) 118 | } 119 | -------------------------------------------------------------------------------- /shinyapp/testdata/README.md: -------------------------------------------------------------------------------- 1 | Fake small dataset to help building the app. 2 | Also some examples of the kind of files the modules should produce. 3 | Basically, making sure that the different TSV files can be merged using `variant_id` and/or `gene_id`. 4 | -------------------------------------------------------------------------------- /shinyapp/testdata/af.tsv: -------------------------------------------------------------------------------- 1 | variant_id af 2 | sv553 0.600703883217648 3 | sv995 0.3741288499441 4 | sv594 0.884334105765447 5 | sv708 0.36340217711404 6 | sv326 0.15102969086729 7 | sv650 0.268739695427939 8 | sv647 0.721494370372966 9 | sv662 0.628475405042991 10 | sv130 0.325971622020006 11 | sv761 0.54104381124489 12 | sv818 0.794067649636418 13 | sv932 0.0564311668276787 14 | sv988 0.398481790209189 15 | sv181 0.208260864950716 16 | sv501 0.776889194035903 17 | sv496 0.96893922239542 18 | sv810 0.0993965272791684 19 | sv201 0.991712222574279 20 | sv226 0.408853823086247 21 | sv790 0.0928564809728414 22 | sv19 0.362002719193697 23 | sv202 0.376928109675646 24 | sv343 0.31287913559936 25 | sv114 0.243158819852397 26 | sv802 0.887517127906904 27 | sv551 0.410844454308972 28 | sv613 0.242254863027483 29 | sv674 0.389557681046426 30 | sv282 0.241915872320533 31 | sv93 0.453306507086381 32 | sv44 0.112976297270507 33 | sv135 0.459528943989426 34 | sv99 0.999749885639176 35 | sv396 0.734552665380761 36 | sv304 0.680806903168559 37 | sv52 0.873308161273599 38 | sv231 0.129630846204236 39 | sv316 0.419654794270173 40 | sv273 0.912888741586357 41 | sv84 0.566137453308329 42 | sv72 0.767139810137451 43 | sv473 0.558116890955716 44 | sv241 0.741458358708769 45 | sv578 0.236601918004453 46 | sv513 0.962430338840932 47 | sv426 0.485312425531447 48 | sv293 0.794039134168997 49 | sv215 0.545181083725765 50 | sv340 0.455137012759224 51 | sv163 0.811997304437682 52 | sv53 0.798963122535497 53 | sv931 0.359195899451151 54 | sv279 0.348463270580396 55 | sv972 0.948402212932706 56 | sv848 0.135766382561997 57 | sv47 0.44010098814033 58 | sv373 0.0663917465135455 59 | sv441 0.932556875282899 60 | sv812 0.629754419438541 61 | sv967 0.319185766158625 62 | sv514 0.165459538111463 63 | sv139 0.526371557498351 64 | sv591 0.600771666271612 65 | sv212 0.0811542200390249 66 | sv579 0.916144282091409 67 | sv784 0.0917899054475129 68 | sv35 0.265541113447398 69 | sv320 0.591671275207773 70 | sv1 0.592834850540385 71 | sv834 0.390384783735499 72 | sv611 0.386205631541088 73 | sv350 0.959359415573999 74 | sv143 0.128743738867342 75 | sv118 0.0774070029146969 76 | sv115 0.611932547064498 77 | sv46 0.716012598481029 78 | sv362 0.962346978019923 79 | sv454 0.375529820332304 80 | sv537 0.384311832953244 81 | sv417 0.556191431824118 82 | sv420 0.885634282836691 83 | sv754 0.0915178814902902 84 | sv881 0.842901285737753 85 | sv998 0.698495100950822 86 | sv843 0.713251024251804 87 | sv717 0.598746807314456 88 | sv749 0.428461107425392 89 | sv317 0.0458470911253244 90 | sv398 0.449776924680918 91 | sv768 0.478627261007205 92 | sv687 0.00810611760243773 93 | sv679 0.0104666396509856 94 | sv648 0.639427381334826 95 | sv205 0.928412004373968 96 | sv265 0.079973767278716 97 | sv64 0.403510325122625 98 | sv971 0.362117365002632 99 | sv289 0.986645619384944 100 | sv572 0.282508483389392 101 | sv526 0.954423139570281 102 | sv269 0.965824510669336 103 | sv976 0.945457382127643 104 | sv898 0.35094073228538 105 | sv984 0.568150240462273 106 | sv688 0.950021979399025 107 | sv255 0.939262846717611 108 | sv372 0.601760559249669 109 | sv138 0.443585301982239 110 | sv449 0.318062288919464 111 | sv521 0.00917417253367603 112 | sv332 0.810931926826015 113 | sv101 0.8548339526169 114 | sv886 0.225746642332524 115 | sv36 0.184191468171775 116 | sv43 0.750611244235188 117 | sv987 0.987134331604466 118 | sv122 0.959417118923739 119 | sv786 0.250950682442635 120 | sv531 0.62478887126781 121 | sv8 0.927052332786843 122 | sv894 0.823917095549405 123 | sv216 0.222978533944115 124 | sv525 0.928237359272316 125 | sv494 0.997499269200489 126 | sv12 0.515595662407577 127 | sv141 0.861830754205585 128 | sv88 0.203301028348505 129 | sv571 0.742515167221427 130 | sv930 0.325620234711096 131 | sv125 0.711108732502908 132 | sv604 0.675020552240312 133 | sv964 0.187129970174283 134 | sv136 0.546507796272635 135 | sv883 0.238357576308772 136 | sv629 0.828390682116151 137 | sv559 0.872225551400334 138 | sv649 0.453313330654055 139 | sv408 0.0144920360762626 140 | sv779 0.0805632763076574 141 | sv493 0.105382101377472 142 | sv798 0.550692322431132 143 | sv963 0.122128429356962 144 | sv793 0.0976054782513529 145 | sv244 0.772121798014268 146 | sv427 0.838586684083566 147 | sv466 0.789712315192446 148 | sv274 0.96784144686535 149 | sv969 0.0573251964524388 150 | sv912 0.177767551271245 151 | sv191 0.76412209123373 152 | sv850 0.593299119267613 153 | sv733 0.202795258024707 154 | sv655 0.715326711069793 155 | sv665 0.766759514110163 156 | sv51 0.135302615584806 157 | sv20 0.910467514768243 158 | sv638 0.709155570017174 159 | sv597 0.935072623658925 160 | sv85 0.496621198020875 161 | sv726 0.388372149085626 162 | sv549 0.608276518294588 163 | sv820 0.618497068993747 164 | sv671 0.616842939984053 165 | sv652 0.76633801870048 166 | sv251 0.381649628980085 167 | sv194 0.671751098474488 168 | sv384 0.114026944385841 169 | sv461 0.220282239373773 170 | sv240 0.887632764410228 171 | sv989 0.800083543872461 172 | sv112 0.134651338215917 173 | sv729 0.86495490022935 174 | sv922 0.0897733448073268 175 | sv825 0.517579401377589 176 | sv523 0.875365419313312 177 | sv966 0.998058745404705 178 | sv25 0.353078181855381 179 | sv186 0.702912693843246 180 | sv368 0.74610083270818 181 | sv32 0.401171266101301 182 | sv831 0.637062677182257 183 | sv189 0.808700608788058 184 | sv745 0.415260840440169 185 | sv520 0.715161911211908 186 | sv965 0.69678235356696 187 | sv303 0.0939011431764811 188 | sv294 0.190061847213656 189 | sv676 0.81955934735015 190 | sv406 0.0139741175808012 191 | sv299 0.479713153792545 192 | sv433 0.193153505912051 193 | sv253 0.863690022611991 194 | sv230 0.719876889837906 195 | sv474 0.248021261533722 196 | sv158 0.710308040725067 197 | sv270 0.0539427516050637 198 | sv968 0.580643810564652 199 | sv822 0.82143762987107 200 | sv895 0.300984328379855 201 | sv557 0.529735786374658 202 | sv718 0.421307644573972 203 | sv658 0.92932236334309 204 | sv9 0.491323640337214 205 | sv975 0.0385750629939139 206 | sv950 0.45804316480644 207 | sv730 0.37934857327491 208 | sv539 0.528732902137563 209 | sv200 0.700764313573018 210 | sv455 0.258868078468367 211 | sv666 0.191058445023373 212 | sv188 0.335776532767341 213 | sv27 0.449855672894046 214 | sv863 0.518147368449718 215 | sv653 0.898328936193138 216 | sv184 0.343688751803711 217 | sv214 0.693484417395666 218 | sv947 0.31639081495814 219 | sv977 0.885400244733319 220 | sv365 0.422719731694087 221 | sv196 0.70000527985394 222 | sv573 0.381642999825999 223 | sv556 0.55641962448135 224 | sv329 0.555783048039302 225 | sv171 0.550472920527682 226 | sv445 0.984367134980857 227 | sv499 0.408734496682882 228 | sv210 0.92737561208196 229 | sv287 0.930968862958252 230 | sv15 0.400806233519688 231 | sv562 0.60002294019796 232 | sv569 0.725342108169571 233 | sv66 0.556045722216368 234 | sv160 0.279753247275949 235 | sv436 0.257502482738346 236 | sv77 0.784215251682326 237 | sv874 0.442221380537376 238 | sv388 0.344925788929686 239 | sv545 0.828138884855434 240 | sv104 0.356596668250859 241 | sv806 0.33290850603953 242 | sv453 0.305964774917811 243 | sv617 0.871800374938175 244 | sv561 0.867617438314483 245 | sv463 0.124310079030693 246 | sv720 0.61996511160396 247 | sv719 0.370460681850091 248 | sv799 0.16454346687533 249 | sv721 0.107681739609689 250 | sv429 0.137092304416001 251 | sv495 0.768411896890029 252 | sv644 0.979131530737504 253 | sv819 0.947815162595361 254 | sv634 0.472990680253133 255 | sv364 0.916966705815867 256 | sv225 0.888654977781698 257 | sv446 0.184464941499755 258 | sv393 0.0365355804096907 259 | sv97 0.651325695915148 260 | sv203 0.685466439463198 261 | sv235 0.415863495785743 262 | sv41 0.296754204900935 263 | sv288 0.573393184691668 264 | sv208 0.242869579000399 265 | sv518 0.762825961923227 266 | sv224 0.518448659451678 267 | sv704 0.2710455970373 268 | sv599 0.252623705891892 269 | sv380 0.782519126310945 270 | sv319 0.10367454495281 271 | sv919 0.267932872287929 272 | sv166 0.990096618654206 273 | sv291 0.461557031376287 274 | sv974 0.430528422119096 275 | sv951 0.992244223831221 276 | sv575 0.66325232712552 277 | sv266 0.0390246342867613 278 | sv637 0.0551049143541604 279 | sv667 0.0962842528242618 280 | sv470 0.979550179792568 281 | sv854 0.716330386698246 282 | sv668 0.707229098770767 283 | sv766 0.429993776371703 284 | sv507 0.424158726120368 285 | sv221 0.136513738427311 286 | sv432 0.614145065192133 287 | sv407 0.838463364401832 288 | sv457 0.207481029210612 289 | sv98 0.291415379848331 290 | sv890 0.190966894384474 291 | sv992 0.0607101465575397 292 | sv252 0.500205185264349 293 | sv816 0.752042328007519 294 | sv308 0.723418095614761 295 | sv404 0.187955263536423 296 | sv481 0.88246966060251 297 | sv297 0.508171421708539 298 | sv862 0.555272932164371 299 | sv162 0.347760828677565 300 | sv887 0.760196412215009 301 | sv635 0.72293521463871 302 | -------------------------------------------------------------------------------- /shinyapp/testdata/all_variants.tsv: -------------------------------------------------------------------------------- 1 | variant_id type chr start end 2 | sv1 duplication 6 49906814 50659190 3 | sv2 duplication 12 41508694 41640840 4 | sv3 deletion 6 82876884 83573788 5 | sv4 deletion 1 26053581 26774196 6 | sv5 duplication 6 4239400 4921486 7 | sv6 duplication 8 12369500 12627944 8 | sv7 deletion 15 60074997 60583225 9 | sv8 duplication 12 39623655 40023941 10 | sv9 deletion 18 15526405 15954182 11 | sv10 deletion 21 36087708 36707993 12 | sv11 deletion 18 8653577 8673072 13 | sv12 deletion 9 5989326 6354305 14 | sv13 duplication 6 53524766 54265872 15 | sv14 duplication 13 93506771 93663837 16 | sv15 duplication 17 2708873 2857526 17 | sv16 duplication 12 40242163 40559179 18 | sv17 duplication 3 7249111 7798550 19 | sv18 deletion 9 55276878 55347548 20 | sv19 deletion 5 2279732 2810095 21 | sv20 duplication 6 65610373 66069993 22 | sv21 deletion 1 34738909 35650182 23 | sv22 duplication 15 12177689 12549323 24 | sv23 deletion 19 27050186 27684322 25 | sv24 deletion 17 19044446 19082231 26 | sv25 deletion 20 29684623 30273580 27 | sv26 deletion 21 66540364 67221159 28 | sv27 deletion 14 31285509 31293485 29 | sv28 duplication 18 90739226 90936437 30 | sv29 duplication 21 59555970 59997822 31 | sv30 duplication 15 49453873 49969305 32 | sv31 duplication 17 24901969 25463551 33 | sv32 deletion 6 85725687 86562371 34 | sv33 deletion 5 76489030 76848674 35 | sv34 deletion 14 47305212 47336116 36 | sv35 deletion 18 98906108 99768249 37 | sv36 duplication 16 22748285 23010085 38 | sv37 duplication 7 88415398 88681644 39 | sv38 deletion 16 22523553 22926051 40 | sv39 deletion 21 34031007 34782570 41 | sv40 duplication 16 25131871 25683276 42 | sv41 deletion 22 44488100 45056713 43 | sv42 deletion 21 41117781 42092960 44 | sv43 deletion 7 48920549 48984539 45 | sv44 duplication 18 2867694 3583927 46 | sv45 deletion 18 61887615 62862382 47 | sv46 duplication 16 28067291 28990629 48 | sv47 deletion 4 49518516 49718789 49 | sv48 duplication 14 36063683 36683265 50 | sv49 duplication 20 69549861 69657345 51 | sv50 deletion 6 16137516 16145381 52 | sv51 duplication 7 89144742 89796620 53 | sv52 duplication 13 55167518 55175119 54 | sv53 deletion 10 98985247 99382536 55 | sv54 duplication 9 53929673 54800674 56 | sv55 deletion 22 15914126 16304004 57 | sv56 deletion 4 51033342 51855844 58 | sv57 deletion 10 92651622 93381565 59 | sv58 duplication 14 63245439 63475243 60 | sv59 duplication 14 97944020 98892023 61 | sv60 deletion 7 10188460 10738889 62 | sv61 deletion 7 47070387 47977739 63 | sv62 deletion 20 31638820 32309207 64 | sv63 deletion 22 75017853 75173418 65 | sv64 duplication 3 83902748 84235140 66 | sv65 deletion 19 61656636 61849594 67 | sv66 deletion 17 98389056 98772515 68 | sv67 deletion 13 67451112 68234614 69 | sv68 deletion 7 83086965 83996522 70 | sv69 duplication 18 49861902 50364772 71 | sv70 deletion 19 68003980 68705864 72 | sv71 duplication 16 46075089 47029179 73 | sv72 deletion 9 93861408 94360042 74 | sv73 duplication 1 58007982 58277917 75 | sv74 duplication 13 56095197 56927724 76 | sv75 deletion 18 66553133 67060359 77 | sv76 duplication 4 55958066 56867426 78 | sv77 duplication 15 40851839 41834111 79 | sv78 duplication 1 6010598 6396728 80 | sv79 deletion 12 24223544 24632279 81 | sv80 deletion 4 38012359 38367363 82 | sv81 duplication 22 7187092 8016058 83 | sv82 deletion 17 41031104 41596266 84 | sv83 duplication 4 65576439 66424239 85 | sv84 deletion 3 77034016 77310402 86 | sv85 duplication 19 34107590 34712058 87 | sv86 deletion 21 16718236 17527420 88 | sv87 duplication 1 89498476 89666881 89 | sv88 deletion 14 35925672 36049841 90 | sv89 deletion 12 28678985 29438210 91 | sv90 deletion 14 24190245 25185633 92 | sv91 duplication 3 93975477 93977276 93 | sv92 duplication 15 53900346 54664368 94 | sv93 duplication 17 42582121 43406779 95 | sv94 deletion 4 5209023 5923883 96 | sv95 deletion 2 8483749 8773677 97 | sv96 duplication 7 79296694 79685035 98 | sv97 duplication 16 42098740 42937933 99 | sv98 deletion 12 57470539 57987886 100 | sv99 deletion 4 35627393 35982942 101 | sv100 duplication 5 99117075 99352671 102 | sv101 deletion 20 37339600 37649756 103 | sv102 deletion 1 20065339 20958720 104 | sv103 duplication 12 90327037 90933992 105 | sv104 duplication 10 86021456 86088903 106 | sv105 duplication 7 47519255 47834453 107 | sv106 duplication 7 5387325 5553462 108 | sv107 deletion 21 17274899 17346153 109 | sv108 duplication 11 18979509 19372176 110 | sv109 duplication 21 94596762 95296960 111 | sv110 duplication 11 77860279 78516450 112 | sv111 deletion 14 96623203 96757321 113 | sv112 deletion 8 91583460 91842571 114 | sv113 duplication 8 12202321 12647373 115 | sv114 deletion 15 93266330 93415295 116 | sv115 duplication 22 93495595 93823658 117 | sv116 deletion 3 9476981 10034947 118 | sv117 duplication 17 8142545 8671597 119 | sv118 deletion 6 327787 1230543 120 | sv119 deletion 17 34349544 34879974 121 | sv120 deletion 2 71083633 71594220 122 | sv121 duplication 3 55246836 55709472 123 | sv122 deletion 2 7216232 7379784 124 | sv123 duplication 12 25072400 25311568 125 | sv124 duplication 11 24733204 25717672 126 | sv125 deletion 6 49200342 49621642 127 | sv126 duplication 5 62002276 62549310 128 | sv127 deletion 5 82173229 82570611 129 | sv128 duplication 9 55015936 55038365 130 | sv129 duplication 20 76135386 76752235 131 | sv130 deletion 19 30898815 31695962 132 | sv131 duplication 4 31772275 32007905 133 | sv132 deletion 20 78415975 78467855 134 | sv133 duplication 14 3023290 3966913 135 | sv134 duplication 11 42583951 42792507 136 | sv135 deletion 10 14781109 15671725 137 | sv136 duplication 17 63249238 63422119 138 | sv137 duplication 8 29985969 29988311 139 | sv138 deletion 20 98411219 98553638 140 | sv139 duplication 1 56497502 57141475 141 | sv140 duplication 11 82444905 83356884 142 | sv141 duplication 19 28609035 29406360 143 | sv142 deletion 18 66078267 66385745 144 | sv143 duplication 5 38874414 39573700 145 | sv144 deletion 5 28217331 28971728 146 | sv145 deletion 2 54716111 55244562 147 | sv146 duplication 19 74848052 75135180 148 | sv147 deletion 14 51182115 51531516 149 | sv148 deletion 18 92532920 93090965 150 | sv149 deletion 22 88789292 89764722 151 | sv150 deletion 8 53527401 54106739 152 | sv151 duplication 10 3490313 4391921 153 | sv152 duplication 22 86902367 87376346 154 | sv153 deletion 11 38273471 38431794 155 | sv154 duplication 8 53968452 54046286 156 | sv155 duplication 10 3351733 3990441 157 | sv156 duplication 18 50763467 51668510 158 | sv157 duplication 13 76647860 77415996 159 | sv158 deletion 20 5540960 6313863 160 | sv159 duplication 16 72748995 72982300 161 | sv160 duplication 16 65506973 66399562 162 | sv161 deletion 10 10837028 11503880 163 | sv162 duplication 8 57990094 58271979 164 | sv163 duplication 8 58023333 58337074 165 | sv164 deletion 22 78713973 79712974 166 | sv165 deletion 21 40770113 40910884 167 | sv166 deletion 22 19838796 20205461 168 | sv167 deletion 12 96722637 96905666 169 | sv168 deletion 2 89370192 89627402 170 | sv169 deletion 10 42507452 42568632 171 | sv170 deletion 20 75725418 76290993 172 | sv171 deletion 2 49549635 50082160 173 | sv172 deletion 22 43374907 44191930 174 | sv173 deletion 4 16564592 16655692 175 | sv174 deletion 18 17045730 17675574 176 | sv175 deletion 15 32287133 33216993 177 | sv176 deletion 17 10915487 11294916 178 | sv177 duplication 5 19329165 19407828 179 | sv178 duplication 4 44974217 45324184 180 | sv179 deletion 17 20136373 20794106 181 | sv180 deletion 19 65402276 65788394 182 | sv181 duplication 16 80088918 80905887 183 | sv182 deletion 4 30791842 31094185 184 | sv183 deletion 19 73636591 74394886 185 | sv184 duplication 5 81295094 82118606 186 | sv185 deletion 3 88043869 88894369 187 | sv186 duplication 3 64370604 64604386 188 | sv187 deletion 2 49578198 49880890 189 | sv188 deletion 14 59024322 59183197 190 | sv189 deletion 13 66138053 66426704 191 | sv190 deletion 17 32677294 33019610 192 | sv191 deletion 6 18753407 18804344 193 | sv192 deletion 17 64722965 65278487 194 | sv193 duplication 21 21905118 22702494 195 | sv194 deletion 10 24184400 25014973 196 | sv195 deletion 8 4174606 4208373 197 | sv196 duplication 2 54672768 55549722 198 | sv197 duplication 7 22049206 22897710 199 | sv198 deletion 3 70455315 70892887 200 | sv199 deletion 19 89676378 90086057 201 | sv200 deletion 13 31102895 32059784 202 | sv201 deletion 16 73873348 74148406 203 | sv202 deletion 16 12585394 12787178 204 | sv203 deletion 8 81058268 81943020 205 | sv204 duplication 22 70113010 70760682 206 | sv205 deletion 13 33421029 34144455 207 | sv206 deletion 11 6885970 7224226 208 | sv207 deletion 10 70262499 71192874 209 | sv208 duplication 5 42159220 42774249 210 | sv209 duplication 3 47832235 48493041 211 | sv210 deletion 15 18544077 18935204 212 | sv211 deletion 21 2033888 2877566 213 | sv212 duplication 22 92621095 92743275 214 | sv213 deletion 5 64922922 65796352 215 | sv214 deletion 13 64131009 64438197 216 | sv215 deletion 5 49999532 50939988 217 | sv216 deletion 7 15889356 16629140 218 | sv217 duplication 3 94962164 94971748 219 | sv218 deletion 19 41874373 42769798 220 | sv219 duplication 10 25220015 25565527 221 | sv220 deletion 18 42985488 43320458 222 | sv221 deletion 10 18234205 18568436 223 | sv222 deletion 9 15848141 15884317 224 | sv223 deletion 9 58152803 58901770 225 | sv224 duplication 21 59568279 60389500 226 | sv225 duplication 14 8205474 9175464 227 | sv226 duplication 8 19918550 20182464 228 | sv227 duplication 9 89422393 89841607 229 | sv228 deletion 2 58726630 59399859 230 | sv229 deletion 17 8307270 8767834 231 | sv230 deletion 15 17128396 17561345 232 | sv231 duplication 11 4899800 5791181 233 | sv232 deletion 14 27764075 27819672 234 | sv233 deletion 16 73513333 74290307 235 | sv234 deletion 5 57567371 57943839 236 | sv235 deletion 2 5905204 6637098 237 | sv236 deletion 7 72808354 73709569 238 | sv237 duplication 11 21385499 21853249 239 | sv238 duplication 1 10666225 11238580 240 | sv239 duplication 1 933644 1824007 241 | sv240 duplication 17 91465092 91808193 242 | sv241 duplication 20 62475078 63169078 243 | sv242 duplication 9 22501211 23408808 244 | sv243 duplication 3 17107480 17535976 245 | sv244 deletion 15 4092374 4811605 246 | sv245 deletion 13 84483445 85465989 247 | sv246 deletion 6 89387802 89958354 248 | sv247 duplication 16 84444497 84884492 249 | sv248 duplication 3 10680231 11535197 250 | sv249 deletion 15 41856897 42073276 251 | sv250 deletion 2 40399298 40932926 252 | sv251 deletion 21 64995397 65487868 253 | sv252 deletion 17 71303032 72103208 254 | sv253 deletion 18 92071960 92660715 255 | sv254 duplication 17 38127772 38191571 256 | sv255 duplication 14 64244844 64509772 257 | sv256 deletion 8 92172478 93061896 258 | sv257 deletion 7 50853123 51030567 259 | sv258 deletion 13 5129826 5130455 260 | sv259 deletion 16 36260602 36705067 261 | sv260 deletion 18 43319720 43994945 262 | sv261 deletion 9 98566254 99146916 263 | sv262 duplication 16 11311793 11620307 264 | sv263 duplication 12 21403536 21896167 265 | sv264 deletion 1 77573928 77735647 266 | sv265 duplication 7 12078104 12515347 267 | sv266 deletion 7 16288978 16733451 268 | sv267 duplication 9 38111955 38784288 269 | sv268 duplication 18 51057605 51558212 270 | sv269 duplication 15 63214738 63449354 271 | sv270 duplication 22 5269744 5394388 272 | sv271 deletion 16 39889158 40538289 273 | sv272 duplication 9 74845460 75079806 274 | sv273 duplication 8 60359704 60514850 275 | sv274 deletion 1 85395536 85568457 276 | sv275 deletion 13 33032055 33376901 277 | sv276 deletion 15 38600479 39331834 278 | sv277 deletion 3 51484003 52269458 279 | sv278 deletion 22 13459593 14335473 280 | sv279 deletion 5 51564092 52449985 281 | sv280 duplication 3 99976783 100211267 282 | sv281 duplication 13 1475326 2018659 283 | sv282 deletion 1 90165385 90350730 284 | sv283 deletion 8 64990995 65183161 285 | sv284 deletion 3 12299748 13153950 286 | sv285 duplication 17 96763955 97052652 287 | sv286 deletion 8 5787697 6038594 288 | sv287 duplication 16 83940494 84374883 289 | sv288 deletion 21 84845470 84922435 290 | sv289 deletion 13 32287359 32663054 291 | sv290 duplication 4 73740655 74238615 292 | sv291 deletion 22 34197486 35124364 293 | sv292 duplication 19 74773064 75014642 294 | sv293 duplication 18 36627258 37068114 295 | sv294 deletion 1 56717813 56907629 296 | sv295 duplication 11 91360157 92241390 297 | sv296 deletion 16 97357542 97924231 298 | sv297 deletion 8 37018746 37804297 299 | sv298 deletion 16 98727199 99667881 300 | sv299 deletion 12 87704400 87883605 301 | sv300 duplication 5 28161996 28518757 302 | sv301 deletion 18 12608686 13144001 303 | sv302 deletion 10 44146006 44900707 304 | sv303 duplication 4 62962414 63753774 305 | sv304 deletion 22 54214326 54282648 306 | sv305 deletion 15 18239273 18255250 307 | sv306 duplication 13 76624267 76846418 308 | sv307 duplication 15 26811534 26889241 309 | sv308 duplication 1 10393654 11253655 310 | sv309 duplication 8 24355303 25260693 311 | sv310 deletion 22 35802564 36081865 312 | sv311 duplication 12 44700562 45040036 313 | sv312 duplication 11 37731138 38175674 314 | sv313 deletion 16 95355366 96107379 315 | sv314 deletion 5 83016779 83702203 316 | sv315 duplication 4 60927512 61826736 317 | sv316 deletion 5 42228321 43079222 318 | sv317 duplication 6 82677739 83441239 319 | sv318 deletion 16 37813137 38068744 320 | sv319 deletion 22 35331301 35371386 321 | sv320 deletion 22 91795823 91933982 322 | sv321 deletion 20 69341238 69425427 323 | sv322 deletion 3 59721003 60518584 324 | sv323 deletion 6 9544376 9634811 325 | sv324 duplication 8 49676586 49799543 326 | sv325 deletion 5 73683276 74580749 327 | sv326 deletion 9 15005510 15861005 328 | sv327 duplication 14 76258348 77173879 329 | sv328 deletion 13 30280543 30414644 330 | sv329 duplication 16 99465972 99918400 331 | sv330 deletion 2 53574753 54518317 332 | sv331 duplication 10 62723814 62985110 333 | sv332 deletion 16 98974997 99902419 334 | sv333 duplication 14 40456444 40617321 335 | sv334 duplication 18 59339913 59749355 336 | sv335 duplication 3 4409173 5367242 337 | sv336 duplication 13 77559160 77979284 338 | sv337 deletion 6 88111613 88385725 339 | sv338 duplication 17 55088421 55899424 340 | sv339 deletion 18 84416345 84658944 341 | sv340 deletion 15 58591116 59033964 342 | sv341 duplication 13 82431079 82613592 343 | sv342 deletion 16 40403864 41003006 344 | sv343 deletion 16 40407063 41125873 345 | sv344 deletion 14 11392883 12274637 346 | sv345 duplication 16 78528385 79399184 347 | sv346 deletion 12 27087972 27486045 348 | sv347 deletion 18 53605931 54068007 349 | sv348 deletion 14 39915250 40340620 350 | sv349 deletion 16 51131889 51305809 351 | sv350 duplication 16 51072514 51886268 352 | sv351 deletion 6 55877909 56513030 353 | sv352 duplication 18 641577 1509903 354 | sv353 duplication 17 23183599 23579853 355 | sv354 deletion 16 41993942 42769655 356 | sv355 deletion 7 38653145 38861199 357 | sv356 duplication 21 57656435 57847892 358 | sv357 deletion 12 11571213 12082624 359 | sv358 duplication 2 17363043 18317763 360 | sv359 deletion 7 7570602 8338264 361 | sv360 deletion 10 51096175 51406718 362 | sv361 duplication 19 62659894 63533528 363 | sv362 deletion 14 56210631 56593579 364 | sv363 duplication 18 14180311 14360242 365 | sv364 duplication 13 78700328 79327512 366 | sv365 deletion 15 42865955 43642834 367 | sv366 deletion 13 71516142 72492154 368 | sv367 duplication 3 94285191 94940895 369 | sv368 duplication 6 72225075 72569439 370 | sv369 deletion 21 4117308 4496943 371 | sv370 duplication 2 32511599 33151629 372 | sv371 deletion 15 13180400 13710066 373 | sv372 deletion 4 3704209 3765053 374 | sv373 duplication 18 49292626 49729837 375 | sv374 duplication 20 98695608 99549373 376 | sv375 duplication 13 15823001 16460342 377 | sv376 deletion 14 289347 1207094 378 | sv377 duplication 11 93532477 93899627 379 | sv378 duplication 12 42284938 43197709 380 | sv379 duplication 17 47734187 48620975 381 | sv380 deletion 7 40237864 40249434 382 | sv381 deletion 2 64581512 64902933 383 | sv382 duplication 13 63580655 63983692 384 | sv383 deletion 5 41331419 41904434 385 | sv384 duplication 13 92371310 92895658 386 | sv385 duplication 20 12983006 13303634 387 | sv386 deletion 21 4358727 4688155 388 | sv387 duplication 11 26311603 27294014 389 | sv388 deletion 12 16109060 16449411 390 | sv389 deletion 8 78479004 78912699 391 | sv390 deletion 14 97692799 98547741 392 | sv391 deletion 9 18147817 18828041 393 | sv392 duplication 17 36799386 37281831 394 | sv393 duplication 5 83414973 83530439 395 | sv394 deletion 8 47785571 48514433 396 | sv395 duplication 8 59486663 60292648 397 | sv396 duplication 5 38265899 38712313 398 | sv397 duplication 3 83449783 84069733 399 | sv398 duplication 19 25467880 25897504 400 | sv399 duplication 8 8048813 8234471 401 | sv400 deletion 22 98375999 98890400 402 | sv401 duplication 12 94850717 95200138 403 | sv402 duplication 17 35861322 36406275 404 | sv403 deletion 19 40199329 40423628 405 | sv404 deletion 8 59818746 59926903 406 | sv405 duplication 16 36791635 36863766 407 | sv406 duplication 12 68195154 68699952 408 | sv407 duplication 21 4119929 4165591 409 | sv408 deletion 16 21273072 22249684 410 | sv409 duplication 21 4649230 5347420 411 | sv410 duplication 3 53903742 54045338 412 | sv411 deletion 1 82016056 82994115 413 | sv412 deletion 13 38821800 39465842 414 | sv413 duplication 15 20156542 20598643 415 | sv414 deletion 9 31383841 32141788 416 | sv415 deletion 17 84538943 85440470 417 | sv416 deletion 12 99310056 99727226 418 | sv417 duplication 6 68999025 69927551 419 | sv418 deletion 5 41518613 41834465 420 | sv419 duplication 9 6453310 7140513 421 | sv420 deletion 6 90499426 90567116 422 | sv421 duplication 3 13467200 14442415 423 | sv422 deletion 13 29107396 29532905 424 | sv423 deletion 1 94819829 95475727 425 | sv424 deletion 3 8000425 8086092 426 | sv425 deletion 10 33237961 33701544 427 | sv426 deletion 22 81821548 82005710 428 | sv427 deletion 5 9821101 10234353 429 | sv428 duplication 18 82914970 83220578 430 | sv429 deletion 12 88812003 89444044 431 | sv430 duplication 19 80039380 80795696 432 | sv431 deletion 14 27358613 28315191 433 | sv432 deletion 13 10939360 11769801 434 | sv433 deletion 16 72278759 72938912 435 | sv434 duplication 7 98397109 99088211 436 | sv435 deletion 9 28836534 29300644 437 | sv436 deletion 17 45021124 45481371 438 | sv437 deletion 21 93506199 94027890 439 | sv438 deletion 15 33847156 34237924 440 | sv439 duplication 5 36232442 36460377 441 | sv440 deletion 6 34471153 34760234 442 | sv441 deletion 9 61293344 61584720 443 | sv442 deletion 21 60860831 61556350 444 | sv443 deletion 18 73071502 73477364 445 | sv444 duplication 20 56455662 57399780 446 | sv445 deletion 11 25621875 25837083 447 | sv446 deletion 15 96994338 97707636 448 | sv447 duplication 7 34296041 34369956 449 | sv448 duplication 5 56193974 57033879 450 | sv449 duplication 9 70424150 71323107 451 | sv450 deletion 2 92204763 92632044 452 | sv451 duplication 2 99472055 100209772 453 | sv452 duplication 6 32311521 33244036 454 | sv453 duplication 6 53814410 54402563 455 | sv454 deletion 12 76007883 76320960 456 | sv455 duplication 16 52652574 53077267 457 | sv456 deletion 9 55671019 55779232 458 | sv457 deletion 13 90433220 90878771 459 | sv458 duplication 4 66509104 67028011 460 | sv459 duplication 15 55941198 55946584 461 | sv460 deletion 15 26404931 26728142 462 | sv461 duplication 12 96872359 97230597 463 | sv462 duplication 9 34833182 34887751 464 | sv463 duplication 13 20908260 20964455 465 | sv464 duplication 5 98358762 98743750 466 | sv465 deletion 15 66229306 67130928 467 | sv466 deletion 22 73040685 73370996 468 | sv467 deletion 11 99564900 100388514 469 | sv468 deletion 8 21535325 21755250 470 | sv469 deletion 20 82276858 83269046 471 | sv470 deletion 18 94621051 95502991 472 | sv471 deletion 12 50685939 51270882 473 | sv472 deletion 11 84876430 85528847 474 | sv473 deletion 3 73102106 74085210 475 | sv474 duplication 2 98337171 99285584 476 | sv475 deletion 20 27730660 27895392 477 | sv476 deletion 2 82446889 82568813 478 | sv477 duplication 11 29472263 30323053 479 | sv478 deletion 12 69423502 69728950 480 | sv479 duplication 10 12140975 12444889 481 | sv480 deletion 8 58210320 58757810 482 | sv481 deletion 6 81487196 81564993 483 | sv482 deletion 6 38727106 38868856 484 | sv483 deletion 9 7338491 7853766 485 | sv484 duplication 22 36621100 37289757 486 | sv485 duplication 1 8632750 9092070 487 | sv486 duplication 1 14166737 14485348 488 | sv487 deletion 5 16671980 16948344 489 | sv488 duplication 4 15319037 16106760 490 | sv489 duplication 21 74529826 74673420 491 | sv490 duplication 3 51453703 51668300 492 | sv491 duplication 3 2918423 3071840 493 | sv492 deletion 22 30675641 31204261 494 | sv493 deletion 9 56854753 57228565 495 | sv494 deletion 4 10306173 10796964 496 | sv495 duplication 2 56599819 57244726 497 | sv496 deletion 22 9372310 9376964 498 | sv497 duplication 22 2652023 3240539 499 | sv498 duplication 8 36024817 36080294 500 | sv499 deletion 17 85283842 85781375 501 | sv500 duplication 4 25987464 26475476 502 | sv501 deletion 15 57645931 57830580 503 | sv502 duplication 16 77181078 77872834 504 | sv503 deletion 9 20598302 21061185 505 | sv504 duplication 21 30821864 31082789 506 | sv505 duplication 11 80769201 81249685 507 | sv506 duplication 10 6274924 6448913 508 | sv507 duplication 17 12728432 13517890 509 | sv508 duplication 15 45299971 45856524 510 | sv509 deletion 16 86047018 86922976 511 | sv510 duplication 2 98178226 98396067 512 | sv511 deletion 4 69080454 69354549 513 | sv512 duplication 22 11425400 12004387 514 | sv513 duplication 3 36230043 36400405 515 | sv514 duplication 10 5332282 5964422 516 | sv515 duplication 9 1410287 2348006 517 | sv516 deletion 1 87405336 87645887 518 | sv517 duplication 8 20906523 21826248 519 | sv518 duplication 14 97515965 97746842 520 | sv519 deletion 4 8943344 9496840 521 | sv520 duplication 22 35270778 36081102 522 | sv521 duplication 19 73139784 74113189 523 | sv522 deletion 14 39668667 40064565 524 | sv523 deletion 21 9057859 9681959 525 | sv524 duplication 3 12586119 13195065 526 | sv525 deletion 7 73978201 74070526 527 | sv526 duplication 13 73656135 73798816 528 | sv527 deletion 14 32874892 33341039 529 | sv528 duplication 14 5793530 6111973 530 | sv529 duplication 4 19834928 19941774 531 | sv530 deletion 15 83254772 83871085 532 | sv531 deletion 18 26195418 26987563 533 | sv532 duplication 6 37357930 37583190 534 | sv533 duplication 11 20660750 21019404 535 | sv534 deletion 9 60847173 61010730 536 | sv535 deletion 3 70941673 71486170 537 | sv536 deletion 16 45868355 46037858 538 | sv537 deletion 17 97268250 97981068 539 | sv538 duplication 12 28849512 29498698 540 | sv539 deletion 15 36507217 36898526 541 | sv540 deletion 10 57596293 58452442 542 | sv541 deletion 9 68690780 69373540 543 | sv542 duplication 5 58440194 59366926 544 | sv543 duplication 4 45886873 46309948 545 | sv544 deletion 5 70593348 70943120 546 | sv545 duplication 18 66604616 66660126 547 | sv546 deletion 2 23008389 23512669 548 | sv547 duplication 9 45067083 45853673 549 | sv548 duplication 8 59154833 59745093 550 | sv549 duplication 4 12479881 13119827 551 | sv550 duplication 12 66929049 67682118 552 | sv551 deletion 13 63616130 64316384 553 | sv552 duplication 18 90667938 91325166 554 | sv553 deletion 14 86863384 87102702 555 | sv554 deletion 1 16760877 17459398 556 | sv555 duplication 13 76132424 76835763 557 | sv556 duplication 2 8580394 9371168 558 | sv557 deletion 20 17170746 17729692 559 | sv558 deletion 1 81229660 82223885 560 | sv559 deletion 13 17454117 17821043 561 | sv560 duplication 4 58129581 58822825 562 | sv561 deletion 1 47957858 48057619 563 | sv562 deletion 16 88673553 88949838 564 | sv563 duplication 1 61724645 62575842 565 | sv564 deletion 13 68878332 69021099 566 | sv565 duplication 22 31528866 31573081 567 | sv566 deletion 7 69345099 70219522 568 | sv567 duplication 11 49559725 49713825 569 | sv568 deletion 19 78086067 78395916 570 | sv569 duplication 11 43942876 44552841 571 | sv570 deletion 18 52556370 52760765 572 | sv571 duplication 11 82196732 83125029 573 | sv572 deletion 15 81354470 81928907 574 | sv573 duplication 8 33124172 33306227 575 | sv574 duplication 22 13862103 14141882 576 | sv575 deletion 17 32248468 32916030 577 | sv576 duplication 20 92009822 92232886 578 | sv577 duplication 7 26019781 26185005 579 | sv578 duplication 2 10874458 11199256 580 | sv579 duplication 12 18210992 18965797 581 | sv580 duplication 4 32137298 33000608 582 | sv581 deletion 18 56690816 56727165 583 | sv582 deletion 19 5796559 6066026 584 | sv583 deletion 2 15979828 16250038 585 | sv584 duplication 1 85851633 86349509 586 | sv585 duplication 8 95426014 96140160 587 | sv586 duplication 1 81950664 82590606 588 | sv587 deletion 2 23115264 23827491 589 | sv588 deletion 17 62793100 63145374 590 | sv589 duplication 10 9116993 10031007 591 | sv590 duplication 18 53217681 54212170 592 | sv591 deletion 15 99373670 99751900 593 | sv592 deletion 1 37034104 37784902 594 | sv593 duplication 15 66014490 66939417 595 | sv594 deletion 6 32607930 33211955 596 | sv595 deletion 5 33236446 33586924 597 | sv596 duplication 18 64545425 64808825 598 | sv597 duplication 4 10467727 11440688 599 | sv598 deletion 13 54737139 55597594 600 | sv599 duplication 17 57832284 58330554 601 | sv600 duplication 19 73689890 74233410 602 | sv601 duplication 15 68171184 68173118 603 | sv602 deletion 2 69607689 70180009 604 | sv603 deletion 10 49548461 49971658 605 | sv604 deletion 4 94695032 94886884 606 | sv605 duplication 15 65483402 66470213 607 | sv606 deletion 7 26828882 27286066 608 | sv607 duplication 15 1042463 1396495 609 | sv608 duplication 22 49523122 50411472 610 | sv609 deletion 4 67327672 67675858 611 | sv610 duplication 3 84587128 84824706 612 | sv611 deletion 11 74357377 74896965 613 | sv612 deletion 10 89814387 90543698 614 | sv613 deletion 14 95150956 95502191 615 | sv614 duplication 14 9483554 10337310 616 | sv615 duplication 15 21952986 22611556 617 | sv616 duplication 11 7057197 7755698 618 | sv617 duplication 14 94364679 95269805 619 | sv618 deletion 14 7254865 8190262 620 | sv619 deletion 7 93712917 94459623 621 | sv620 duplication 16 64603551 64674263 622 | sv621 deletion 1 37570769 38272429 623 | sv622 deletion 19 11569051 11791409 624 | sv623 duplication 11 46346483 46365590 625 | sv624 deletion 15 15584987 15970746 626 | sv625 duplication 8 39635469 40490603 627 | sv626 duplication 18 42242138 42390583 628 | sv627 deletion 16 99385004 100376411 629 | sv628 deletion 18 52778686 53403646 630 | sv629 deletion 16 48863082 48954674 631 | sv630 deletion 5 93848929 94829487 632 | sv631 duplication 7 22622211 22971055 633 | sv632 deletion 18 11659235 11727659 634 | sv633 deletion 9 52557075 53512932 635 | sv634 deletion 21 61509307 61581649 636 | sv635 deletion 21 31767207 32469520 637 | sv636 deletion 2 63834160 64525635 638 | sv637 duplication 21 50957329 51705810 639 | sv638 duplication 21 55260691 55455960 640 | sv639 deletion 5 14657002 14876253 641 | sv640 duplication 10 87074978 87523705 642 | sv641 duplication 16 77366174 78346594 643 | sv642 deletion 4 68966074 69844125 644 | sv643 deletion 1 33018506 33432201 645 | sv644 duplication 19 61281926 61299046 646 | sv645 deletion 17 92791695 93130095 647 | sv646 duplication 4 37360553 37820811 648 | sv647 deletion 4 94673758 95041787 649 | sv648 deletion 17 65693141 66595769 650 | sv649 duplication 7 15066336 15373382 651 | sv650 deletion 8 86189447 86236418 652 | sv651 duplication 19 58816557 59300269 653 | sv652 duplication 9 45281689 46094528 654 | sv653 deletion 18 26682171 27441196 655 | sv654 duplication 17 4383097 4680001 656 | sv655 deletion 11 75370736 75950893 657 | sv656 duplication 14 14085782 14319083 658 | sv657 duplication 12 32568715 33188381 659 | sv658 deletion 6 46222163 46936016 660 | sv659 duplication 10 11744953 12363696 661 | sv660 deletion 10 41468095 42300952 662 | sv661 deletion 2 80021625 80898948 663 | sv662 duplication 7 21306143 21561815 664 | sv663 duplication 2 59761065 60491356 665 | sv664 duplication 14 92790426 93042498 666 | sv665 deletion 17 48364834 48366796 667 | sv666 duplication 3 4450041 4933315 668 | sv667 duplication 9 49177960 49474014 669 | sv668 deletion 1 32401881 32415878 670 | sv669 deletion 12 87762077 87954267 671 | sv670 duplication 8 51391111 51736576 672 | sv671 duplication 12 50368284 51157894 673 | sv672 duplication 19 96936686 96987652 674 | sv673 deletion 10 90509667 91078272 675 | sv674 duplication 5 81686831 82182621 676 | sv675 duplication 16 62904289 63004456 677 | sv676 duplication 16 45500191 45977749 678 | sv677 deletion 13 33501710 34339716 679 | sv678 deletion 17 67739999 68465834 680 | sv679 duplication 20 25945325 26269904 681 | sv680 duplication 20 76744891 77142273 682 | sv681 duplication 8 53078422 53778999 683 | sv682 duplication 14 63741633 64118465 684 | sv683 duplication 18 18190558 18865389 685 | sv684 duplication 17 32628449 33578781 686 | sv685 duplication 12 36065133 36492016 687 | sv686 deletion 22 4925841 5289506 688 | sv687 duplication 17 92629918 92989106 689 | sv688 deletion 7 3080058 3394908 690 | sv689 duplication 7 65056393 65527355 691 | sv690 duplication 22 5008634 5647778 692 | sv691 deletion 7 87156204 87859731 693 | sv692 duplication 9 81855956 82494484 694 | sv693 deletion 7 74081397 74463417 695 | sv694 deletion 6 81207512 81620380 696 | sv695 deletion 15 95031058 95040775 697 | sv696 duplication 5 27144302 28006788 698 | sv697 duplication 21 53681911 54278363 699 | sv698 deletion 14 34252337 34622402 700 | sv699 deletion 11 98834558 99633026 701 | sv700 deletion 5 85841033 85981889 702 | sv701 duplication 2 93864473 94789653 703 | sv702 duplication 15 36530419 36977889 704 | sv703 deletion 22 117437 907501 705 | sv704 deletion 21 70254027 71127176 706 | sv705 deletion 18 37425898 38094822 707 | sv706 deletion 2 77818443 77865643 708 | sv707 deletion 5 5599795 5850923 709 | sv708 duplication 2 87622275 87884814 710 | sv709 deletion 21 65419270 66009894 711 | sv710 duplication 19 88867254 89659100 712 | sv711 duplication 19 39773083 40441487 713 | sv712 duplication 4 94790008 95762782 714 | sv713 duplication 20 75168117 76065063 715 | sv714 deletion 5 34053037 35035475 716 | sv715 duplication 22 9211844 9263962 717 | sv716 duplication 2 58362042 59285486 718 | sv717 duplication 17 19252363 20011838 719 | sv718 duplication 12 11596599 12078593 720 | sv719 deletion 7 45243651 45421793 721 | sv720 deletion 17 28514797 28673065 722 | sv721 deletion 6 6423464 6649363 723 | sv722 duplication 13 36624144 37548831 724 | sv723 duplication 7 17707203 18239947 725 | sv724 deletion 1 63112436 63454485 726 | sv725 deletion 18 81640191 81867136 727 | sv726 deletion 12 83515198 84471294 728 | sv727 duplication 9 18398642 19222363 729 | sv728 deletion 2 83878626 84386469 730 | sv729 deletion 15 65980999 66156904 731 | sv730 duplication 1 20576157 21276115 732 | sv731 deletion 8 93049915 93365893 733 | sv732 deletion 20 17794822 18370620 734 | sv733 deletion 1 3771631 4318641 735 | sv734 deletion 20 32369985 32674209 736 | sv735 deletion 19 89942960 90244500 737 | sv736 deletion 6 76549545 77208995 738 | sv737 deletion 21 40807095 41607654 739 | sv738 deletion 16 55520898 56439479 740 | sv739 deletion 20 74991620 75074090 741 | sv740 duplication 14 5491998 6184177 742 | sv741 deletion 21 12076483 12634533 743 | sv742 deletion 22 19834409 20115194 744 | sv743 duplication 7 68776179 69730874 745 | sv744 deletion 5 61197424 61243778 746 | sv745 duplication 19 17034451 17236917 747 | sv746 deletion 5 26778055 27663557 748 | sv747 duplication 5 448903 701193 749 | sv748 duplication 3 33308558 33834414 750 | sv749 deletion 20 3986334 4590329 751 | sv750 deletion 7 44228607 44545976 752 | sv751 deletion 8 10769135 11503014 753 | sv752 deletion 9 82871220 83382908 754 | sv753 deletion 9 57394714 58008150 755 | sv754 duplication 10 6272403 7132495 756 | sv755 duplication 7 47906693 48204296 757 | sv756 duplication 11 36632210 37181000 758 | sv757 deletion 13 71501185 72232821 759 | sv758 deletion 12 27974896 28650448 760 | sv759 duplication 4 1311882 1649133 761 | sv760 deletion 14 29120354 29683067 762 | sv761 duplication 11 31742065 31884729 763 | sv762 duplication 11 22923067 22940192 764 | sv763 duplication 15 54541537 54830749 765 | sv764 deletion 4 94852290 95263871 766 | sv765 deletion 14 91177746 91600104 767 | sv766 deletion 8 53590039 53906602 768 | sv767 deletion 11 98443634 98936908 769 | sv768 duplication 6 91662438 91743497 770 | sv769 duplication 15 79463773 80200766 771 | sv770 deletion 18 68306993 68854415 772 | sv771 deletion 18 93725436 94359360 773 | sv772 deletion 7 92674573 93540019 774 | sv773 duplication 10 26814669 27577917 775 | sv774 deletion 21 41591190 42321390 776 | sv775 deletion 6 624715 1020072 777 | sv776 deletion 10 4436249 5253613 778 | sv777 duplication 2 42963232 43635440 779 | sv778 deletion 21 3499293 3648274 780 | sv779 deletion 21 89349485 89902970 781 | sv780 duplication 8 32208244 33191855 782 | sv781 deletion 14 35717133 36118997 783 | sv782 duplication 1 16954955 17126186 784 | sv783 duplication 12 26269416 26921027 785 | sv784 duplication 19 40253751 41025762 786 | sv785 deletion 4 1764814 2576342 787 | sv786 duplication 5 27678581 28240653 788 | sv787 deletion 21 80615675 80733856 789 | sv788 duplication 10 37920188 38543837 790 | sv789 deletion 18 95938501 96483456 791 | sv790 deletion 20 93205218 94080779 792 | sv791 deletion 6 18784626 19614059 793 | sv792 duplication 2 76474428 76782172 794 | sv793 deletion 4 52674343 52914282 795 | sv794 deletion 1 58493105 59335324 796 | sv795 deletion 21 44395960 44514023 797 | sv796 deletion 5 65748660 66454259 798 | sv797 duplication 16 62504685 63346641 799 | sv798 duplication 16 36962622 37069893 800 | sv799 deletion 7 48070280 48742667 801 | sv800 duplication 1 64413120 64934118 802 | sv801 deletion 6 75332688 75362117 803 | sv802 duplication 15 77134737 77226043 804 | sv803 deletion 14 24532475 24776524 805 | sv804 deletion 5 42261343 43010077 806 | sv805 duplication 13 11954573 12290266 807 | sv806 deletion 22 97806944 98481297 808 | sv807 duplication 20 60346867 60391388 809 | sv808 duplication 11 31632368 32479996 810 | sv809 duplication 5 35781868 36241113 811 | sv810 duplication 14 39393161 39511509 812 | sv811 deletion 10 83298083 83352656 813 | sv812 duplication 22 2965148 2986661 814 | sv813 deletion 1 52017850 52862802 815 | sv814 deletion 12 1906738 2015116 816 | sv815 duplication 4 68779563 69038591 817 | sv816 duplication 4 35092468 35183930 818 | sv817 duplication 2 7104111 7921978 819 | sv818 duplication 7 46088405 47054256 820 | sv819 duplication 1 73065704 73964270 821 | sv820 deletion 16 78803324 78861038 822 | sv821 duplication 11 72338094 73076622 823 | sv822 deletion 11 12368937 12631363 824 | sv823 duplication 7 96417704 96803122 825 | sv824 duplication 17 44804746 45342492 826 | sv825 duplication 13 93059827 93320295 827 | sv826 duplication 7 44824829 45668901 828 | sv827 deletion 10 67845822 68143713 829 | sv828 deletion 13 98798650 99676358 830 | sv829 duplication 10 10809664 11689971 831 | sv830 deletion 21 71113861 72010447 832 | sv831 duplication 7 24186129 24367532 833 | sv832 duplication 13 37370357 38220504 834 | sv833 deletion 6 78537061 78700192 835 | sv834 duplication 21 97262122 98221808 836 | sv835 deletion 20 74033627 74284204 837 | sv836 deletion 15 51037993 51274318 838 | sv837 deletion 22 27959352 28345597 839 | sv838 duplication 18 46358949 46968579 840 | sv839 deletion 7 16499025 16586609 841 | sv840 deletion 6 8005721 8131738 842 | sv841 duplication 19 88241260 89041030 843 | sv842 duplication 15 35344000 35579245 844 | sv843 deletion 3 5682698 5839704 845 | sv844 duplication 6 28280348 29161428 846 | sv845 duplication 19 33684723 34581986 847 | sv846 deletion 17 30009096 30586604 848 | sv847 duplication 4 85579181 86339448 849 | sv848 deletion 3 1647036 2408043 850 | sv849 deletion 21 31623965 32173851 851 | sv850 deletion 15 92905729 92989713 852 | sv851 deletion 16 15805658 16581442 853 | sv852 duplication 12 53744421 54144557 854 | sv853 deletion 12 13316013 13351252 855 | sv854 deletion 16 67774059 68110454 856 | sv855 deletion 7 32050830 32984061 857 | sv856 duplication 4 59027797 59500056 858 | sv857 deletion 17 65656169 65742371 859 | sv858 duplication 22 57406469 57420860 860 | sv859 duplication 5 84021221 84519252 861 | sv860 deletion 14 20667566 21372594 862 | sv861 deletion 14 26431346 26449125 863 | sv862 duplication 3 63271233 63618157 864 | sv863 duplication 19 19197076 19869233 865 | sv864 duplication 6 24772266 25311481 866 | sv865 deletion 8 91740849 91766483 867 | sv866 duplication 10 64282741 64839265 868 | sv867 deletion 18 79886772 79887737 869 | sv868 deletion 6 96233567 96961118 870 | sv869 duplication 20 36734924 36771639 871 | sv870 duplication 15 693751 1313322 872 | sv871 deletion 14 67447506 68343325 873 | sv872 duplication 10 18692070 19595073 874 | sv873 deletion 17 92269441 93004738 875 | sv874 deletion 1 91431354 91764794 876 | sv875 duplication 19 80068811 80906782 877 | sv876 duplication 6 38295799 38836758 878 | sv877 duplication 14 30264955 31075140 879 | sv878 duplication 7 69944075 70562068 880 | sv879 deletion 4 98289051 98486087 881 | sv880 duplication 14 36694113 36866226 882 | sv881 deletion 5 22739460 23172626 883 | sv882 deletion 17 72455078 72510991 884 | sv883 duplication 13 6330687 6737425 885 | sv884 duplication 16 59696666 59846911 886 | sv885 duplication 14 89490849 89884300 887 | sv886 duplication 20 45441215 46100445 888 | sv887 deletion 22 45378427 45939893 889 | sv888 duplication 5 20974410 21148807 890 | sv889 duplication 9 52409909 52789868 891 | sv890 duplication 11 51786010 51921781 892 | sv891 duplication 21 85555156 86426886 893 | sv892 deletion 21 467151 1122521 894 | sv893 deletion 2 44506506 44807333 895 | sv894 duplication 2 82090943 82809638 896 | sv895 duplication 15 73321250 74193439 897 | sv896 deletion 15 87687407 88536781 898 | sv897 duplication 22 30805543 31182733 899 | sv898 deletion 20 28731418 28736405 900 | sv899 deletion 2 54889453 55583268 901 | sv900 deletion 21 92218322 92973479 902 | sv901 duplication 1 50145281 50528799 903 | sv902 duplication 8 92860380 93572673 904 | sv903 duplication 19 61768875 62438171 905 | sv904 deletion 15 39576940 40198351 906 | sv905 deletion 12 18854567 19357105 907 | sv906 duplication 13 33641851 33704114 908 | sv907 duplication 2 31947316 32395105 909 | sv908 deletion 13 5249367 5771286 910 | sv909 duplication 1 92298 817256 911 | sv910 duplication 11 13135115 13276645 912 | sv911 deletion 19 15092301 15255011 913 | sv912 duplication 3 45005675 45550862 914 | sv913 duplication 7 74376199 74962699 915 | sv914 duplication 20 69869863 70395522 916 | sv915 deletion 17 13527648 14227178 917 | sv916 deletion 21 71652298 72387472 918 | sv917 duplication 21 76565022 76743765 919 | sv918 duplication 7 69498440 69799772 920 | sv919 deletion 4 52373283 53083485 921 | sv920 duplication 10 92935164 93430708 922 | sv921 deletion 8 65447799 65878905 923 | sv922 deletion 20 76669788 76974815 924 | sv923 duplication 13 71098806 71786436 925 | sv924 deletion 4 21126630 21307466 926 | sv925 duplication 22 79478065 80334087 927 | sv926 duplication 17 42624787 43225375 928 | sv927 deletion 14 73100244 73161946 929 | sv928 duplication 13 22133159 22346517 930 | sv929 deletion 10 66877833 67646123 931 | sv930 duplication 21 18157601 18682156 932 | sv931 duplication 13 39343123 39529809 933 | sv932 deletion 20 30729647 31378936 934 | sv933 deletion 14 64427915 65401813 935 | sv934 duplication 19 12037432 12668633 936 | sv935 duplication 14 26769485 27614244 937 | sv936 deletion 18 25116215 25499683 938 | sv937 duplication 15 95586764 96048711 939 | sv938 deletion 10 4105905 4870471 940 | sv939 deletion 22 98455090 98881363 941 | sv940 duplication 5 43076957 43719123 942 | sv941 duplication 3 4854127 5571575 943 | sv942 duplication 5 39946486 40084108 944 | sv943 duplication 16 93054226 93556472 945 | sv944 duplication 8 43301628 44169692 946 | sv945 duplication 21 41435199 41819901 947 | sv946 duplication 5 12005213 12471996 948 | sv947 duplication 9 94696601 94714138 949 | sv948 deletion 19 8588770 8758813 950 | sv949 duplication 5 52223883 52319680 951 | sv950 duplication 13 82221136 83004051 952 | sv951 duplication 19 61367852 61591422 953 | sv952 duplication 22 3459561 3828663 954 | sv953 duplication 21 34397037 35356206 955 | sv954 duplication 10 18784014 18949440 956 | sv955 deletion 22 56826531 56892008 957 | sv956 deletion 20 48837277 49233842 958 | sv957 duplication 6 94679647 94712857 959 | sv958 deletion 3 18597105 19423051 960 | sv959 duplication 17 2178559 2892673 961 | sv960 duplication 7 11007348 11149384 962 | sv961 deletion 14 91966720 92139549 963 | sv962 deletion 12 21138659 21466886 964 | sv963 deletion 9 43659812 44054412 965 | sv964 deletion 8 31365128 31971986 966 | sv965 duplication 9 10382736 10821093 967 | sv966 deletion 12 43443784 44234357 968 | sv967 deletion 13 48935461 49124976 969 | sv968 deletion 18 99322928 99858323 970 | sv969 duplication 18 82209388 82994240 971 | sv970 deletion 4 56525316 56652218 972 | sv971 duplication 9 99617231 100224115 973 | sv972 deletion 8 86326879 86691565 974 | sv973 duplication 6 22336479 22793529 975 | sv974 duplication 13 58796803 59656036 976 | sv975 duplication 6 67727621 68323577 977 | sv976 duplication 2 42301086 43036769 978 | sv977 duplication 7 36149725 36969742 979 | sv978 duplication 17 67028485 67870874 980 | sv979 deletion 5 58912071 59857309 981 | sv980 duplication 13 95459506 95889201 982 | sv981 duplication 14 91952136 91987766 983 | sv982 duplication 13 11688842 12624364 984 | sv983 deletion 20 22021634 22087943 985 | sv984 deletion 14 63961660 64156026 986 | sv985 duplication 22 45682911 45754774 987 | sv986 deletion 3 51347991 52235021 988 | sv987 deletion 17 23371010 24296917 989 | sv988 duplication 8 52444171 52933506 990 | sv989 deletion 20 29499045 29770613 991 | sv990 deletion 13 90915298 91223108 992 | sv991 deletion 10 94337441 94445812 993 | sv992 duplication 4 38459096 39298691 994 | sv993 deletion 3 4453955 4537677 995 | sv994 duplication 17 27277712 27421177 996 | sv995 duplication 5 7925090 8656550 997 | sv996 duplication 15 57039051 57332185 998 | sv997 duplication 14 36160531 36488170 999 | sv998 duplication 11 45263817 45622255 1000 | sv999 duplication 1 84593371 85546658 1001 | sv1000 deletion 18 79500130 79919301 1002 | -------------------------------------------------------------------------------- /shinyapp/testdata/clinsnv_variants.tsv: -------------------------------------------------------------------------------- 1 | variant_id pathogenic_clinvar_snv_indel 2 | sv605 TRUE 3 | sv24 TRUE 4 | sv270 TRUE 5 | sv892 TRUE 6 | sv540 TRUE 7 | sv19 TRUE 8 | sv50 TRUE 9 | sv334 TRUE 10 | sv876 TRUE 11 | sv706 TRUE 12 | sv253 TRUE 13 | sv323 TRUE 14 | sv701 TRUE 15 | sv87 TRUE 16 | sv287 TRUE 17 | sv970 TRUE 18 | sv457 TRUE 19 | sv939 TRUE 20 | sv399 TRUE 21 | sv811 TRUE 22 | sv341 TRUE 23 | sv830 TRUE 24 | sv46 TRUE 25 | sv447 TRUE 26 | sv285 TRUE 27 | sv171 TRUE 28 | sv972 TRUE 29 | sv482 TRUE 30 | sv566 TRUE 31 | sv175 TRUE 32 | sv617 TRUE 33 | sv310 TRUE 34 | sv266 TRUE 35 | sv448 TRUE 36 | sv821 TRUE 37 | sv269 TRUE 38 | sv68 TRUE 39 | sv912 TRUE 40 | sv302 TRUE 41 | sv226 TRUE 42 | sv867 TRUE 43 | sv445 TRUE 44 | sv219 TRUE 45 | sv855 TRUE 46 | sv484 TRUE 47 | sv443 TRUE 48 | sv423 TRUE 49 | sv433 TRUE 50 | sv49 TRUE 51 | sv128 TRUE 52 | sv475 TRUE 53 | sv398 TRUE 54 | sv148 TRUE 55 | sv975 TRUE 56 | sv967 TRUE 57 | sv562 TRUE 58 | sv259 TRUE 59 | sv181 TRUE 60 | sv258 TRUE 61 | sv641 TRUE 62 | sv415 TRUE 63 | sv394 TRUE 64 | sv460 TRUE 65 | sv251 TRUE 66 | sv707 TRUE 67 | sv757 TRUE 68 | sv694 TRUE 69 | sv41 TRUE 70 | sv199 TRUE 71 | sv748 TRUE 72 | sv992 TRUE 73 | sv818 TRUE 74 | sv850 TRUE 75 | sv25 TRUE 76 | sv11 TRUE 77 | sv954 TRUE 78 | sv565 TRUE 79 | sv264 TRUE 80 | sv469 TRUE 81 | sv942 TRUE 82 | sv316 TRUE 83 | sv794 TRUE 84 | sv679 TRUE 85 | sv392 TRUE 86 | sv154 TRUE 87 | sv504 TRUE 88 | sv595 TRUE 89 | sv527 TRUE 90 | sv380 TRUE 91 | sv979 TRUE 92 | sv802 TRUE 93 | sv858 TRUE 94 | sv784 TRUE 95 | sv278 TRUE 96 | sv179 TRUE 97 | sv662 TRUE 98 | sv419 TRUE 99 | sv416 TRUE 100 | sv5 TRUE 101 | sv952 TRUE 102 | sv134 TRUE 103 | sv532 TRUE 104 | sv186 TRUE 105 | sv911 TRUE 106 | sv165 TRUE 107 | sv377 TRUE 108 | sv609 TRUE 109 | sv485 TRUE 110 | sv592 TRUE 111 | sv31 TRUE 112 | sv357 TRUE 113 | sv146 TRUE 114 | sv185 TRUE 115 | sv556 TRUE 116 | sv745 TRUE 117 | sv487 TRUE 118 | sv117 TRUE 119 | sv483 TRUE 120 | sv910 TRUE 121 | sv929 TRUE 122 | sv96 TRUE 123 | sv274 TRUE 124 | sv561 TRUE 125 | sv215 TRUE 126 | sv568 TRUE 127 | sv634 TRUE 128 | sv206 TRUE 129 | sv828 TRUE 130 | sv775 TRUE 131 | sv101 TRUE 132 | sv42 TRUE 133 | sv602 TRUE 134 | sv444 TRUE 135 | sv412 TRUE 136 | sv698 TRUE 137 | sv37 TRUE 138 | sv918 TRUE 139 | sv375 TRUE 140 | sv973 TRUE 141 | sv248 TRUE 142 | sv755 TRUE 143 | sv996 TRUE 144 | sv723 TRUE 145 | sv77 TRUE 146 | sv213 TRUE 147 | sv586 TRUE 148 | sv366 TRUE 149 | sv860 TRUE 150 | sv774 TRUE 151 | sv781 TRUE 152 | sv614 TRUE 153 | sv238 TRUE 154 | sv18 TRUE 155 | sv430 TRUE 156 | sv156 TRUE 157 | sv951 TRUE 158 | sv397 TRUE 159 | sv728 TRUE 160 | sv921 TRUE 161 | sv313 TRUE 162 | sv844 TRUE 163 | sv383 TRUE 164 | sv854 TRUE 165 | sv645 TRUE 166 | sv86 TRUE 167 | sv113 TRUE 168 | sv780 TRUE 169 | sv889 TRUE 170 | sv141 TRUE 171 | sv816 TRUE 172 | sv10 TRUE 173 | sv325 TRUE 174 | sv109 TRUE 175 | sv476 TRUE 176 | sv241 TRUE 177 | sv647 TRUE 178 | sv756 TRUE 179 | sv878 TRUE 180 | sv838 TRUE 181 | sv924 TRUE 182 | sv671 TRUE 183 | sv93 TRUE 184 | sv489 TRUE 185 | sv417 TRUE 186 | sv288 TRUE 187 | sv648 TRUE 188 | sv393 TRUE 189 | sv976 TRUE 190 | sv958 TRUE 191 | sv295 TRUE 192 | sv862 TRUE 193 | sv845 TRUE 194 | sv606 TRUE 195 | sv111 TRUE 196 | sv841 TRUE 197 | sv389 TRUE 198 | sv857 TRUE 199 | sv531 TRUE 200 | sv553 TRUE 201 | sv239 TRUE 202 | -------------------------------------------------------------------------------- /shinyapp/testdata/clinsv_variants.tsv: -------------------------------------------------------------------------------- 1 | variant_id pathogenic_clinvar_sv 2 | sv738 TRUE 3 | sv496 TRUE 4 | sv728 TRUE 5 | sv166 TRUE 6 | sv31 TRUE 7 | sv55 TRUE 8 | sv335 TRUE 9 | sv144 TRUE 10 | sv628 TRUE 11 | sv723 TRUE 12 | sv881 TRUE 13 | sv428 TRUE 14 | sv47 TRUE 15 | sv282 TRUE 16 | sv682 TRUE 17 | sv118 TRUE 18 | sv435 TRUE 19 | sv285 TRUE 20 | sv804 TRUE 21 | sv141 TRUE 22 | sv95 TRUE 23 | sv726 TRUE 24 | sv646 TRUE 25 | sv223 TRUE 26 | sv968 TRUE 27 | sv304 TRUE 28 | sv799 TRUE 29 | sv661 TRUE 30 | sv295 TRUE 31 | sv954 TRUE 32 | sv362 TRUE 33 | sv611 TRUE 34 | sv619 TRUE 35 | sv260 TRUE 36 | sv192 TRUE 37 | sv897 TRUE 38 | sv542 TRUE 39 | sv715 TRUE 40 | sv183 TRUE 41 | sv943 TRUE 42 | sv768 TRUE 43 | sv23 TRUE 44 | sv875 TRUE 45 | sv252 TRUE 46 | sv698 TRUE 47 | sv346 TRUE 48 | sv136 TRUE 49 | sv744 TRUE 50 | sv61 TRUE 51 | sv514 TRUE 52 | sv725 TRUE 53 | sv18 TRUE 54 | sv233 TRUE 55 | sv669 TRUE 56 | sv201 TRUE 57 | sv297 TRUE 58 | sv77 TRUE 59 | sv983 TRUE 60 | sv363 TRUE 61 | sv164 TRUE 62 | sv659 TRUE 63 | sv329 TRUE 64 | sv275 TRUE 65 | sv904 TRUE 66 | sv513 TRUE 67 | sv624 TRUE 68 | sv974 TRUE 69 | sv24 TRUE 70 | sv251 TRUE 71 | sv739 TRUE 72 | sv503 TRUE 73 | sv338 TRUE 74 | sv361 TRUE 75 | sv790 TRUE 76 | sv587 TRUE 77 | sv960 TRUE 78 | sv96 TRUE 79 | sv855 TRUE 80 | sv823 TRUE 81 | sv316 TRUE 82 | sv865 TRUE 83 | sv984 TRUE 84 | sv245 TRUE 85 | sv376 TRUE 86 | sv654 TRUE 87 | sv582 TRUE 88 | sv577 TRUE 89 | sv627 TRUE 90 | sv899 TRUE 91 | sv995 TRUE 92 | sv366 TRUE 93 | sv181 TRUE 94 | sv694 TRUE 95 | sv532 TRUE 96 | sv635 TRUE 97 | sv568 TRUE 98 | sv109 TRUE 99 | sv391 TRUE 100 | sv13 TRUE 101 | sv516 TRUE 102 | -------------------------------------------------------------------------------- /shinyapp/testdata/ext_urls.tsv: -------------------------------------------------------------------------------- 1 | gene_id omim_url 2 | gene1 https://www.youtube.com/watch?v=dQw4w9WgXcQ 3 | gene2 https://www.youtube.com/watch?v=dQw4w9WgXcQ 4 | gene3 https://www.youtube.com/watch?v=dQw4w9WgXcQ 5 | gene4 https://www.youtube.com/watch?v=dQw4w9WgXcQ 6 | gene5 https://www.youtube.com/watch?v=dQw4w9WgXcQ 7 | gene6 https://www.youtube.com/watch?v=dQw4w9WgXcQ 8 | gene7 https://www.youtube.com/watch?v=dQw4w9WgXcQ 9 | gene8 https://www.youtube.com/watch?v=dQw4w9WgXcQ 10 | gene9 https://www.youtube.com/watch?v=dQw4w9WgXcQ 11 | gene10 https://www.youtube.com/watch?v=dQw4w9WgXcQ 12 | gene11 https://www.youtube.com/watch?v=dQw4w9WgXcQ 13 | gene12 https://www.youtube.com/watch?v=dQw4w9WgXcQ 14 | gene13 https://www.youtube.com/watch?v=dQw4w9WgXcQ 15 | gene14 https://www.youtube.com/watch?v=dQw4w9WgXcQ 16 | gene15 https://www.youtube.com/watch?v=dQw4w9WgXcQ 17 | gene16 https://www.youtube.com/watch?v=dQw4w9WgXcQ 18 | gene17 https://www.youtube.com/watch?v=dQw4w9WgXcQ 19 | gene18 https://www.youtube.com/watch?v=dQw4w9WgXcQ 20 | gene19 https://www.youtube.com/watch?v=dQw4w9WgXcQ 21 | gene20 https://www.youtube.com/watch?v=dQw4w9WgXcQ 22 | gene21 https://www.youtube.com/watch?v=dQw4w9WgXcQ 23 | gene22 https://www.youtube.com/watch?v=dQw4w9WgXcQ 24 | gene23 https://www.youtube.com/watch?v=dQw4w9WgXcQ 25 | gene24 https://www.youtube.com/watch?v=dQw4w9WgXcQ 26 | gene25 https://www.youtube.com/watch?v=dQw4w9WgXcQ 27 | gene26 https://www.youtube.com/watch?v=dQw4w9WgXcQ 28 | gene27 https://www.youtube.com/watch?v=dQw4w9WgXcQ 29 | gene28 https://www.youtube.com/watch?v=dQw4w9WgXcQ 30 | gene29 https://www.youtube.com/watch?v=dQw4w9WgXcQ 31 | gene30 https://www.youtube.com/watch?v=dQw4w9WgXcQ 32 | gene31 https://www.youtube.com/watch?v=dQw4w9WgXcQ 33 | gene32 https://www.youtube.com/watch?v=dQw4w9WgXcQ 34 | gene33 https://www.youtube.com/watch?v=dQw4w9WgXcQ 35 | gene34 https://www.youtube.com/watch?v=dQw4w9WgXcQ 36 | gene35 https://www.youtube.com/watch?v=dQw4w9WgXcQ 37 | gene36 https://www.youtube.com/watch?v=dQw4w9WgXcQ 38 | gene37 https://www.youtube.com/watch?v=dQw4w9WgXcQ 39 | gene38 https://www.youtube.com/watch?v=dQw4w9WgXcQ 40 | gene39 https://www.youtube.com/watch?v=dQw4w9WgXcQ 41 | gene40 https://www.youtube.com/watch?v=dQw4w9WgXcQ 42 | gene41 https://www.youtube.com/watch?v=dQw4w9WgXcQ 43 | gene42 https://www.youtube.com/watch?v=dQw4w9WgXcQ 44 | gene43 https://www.youtube.com/watch?v=dQw4w9WgXcQ 45 | gene44 https://www.youtube.com/watch?v=dQw4w9WgXcQ 46 | gene45 https://www.youtube.com/watch?v=dQw4w9WgXcQ 47 | gene46 https://www.youtube.com/watch?v=dQw4w9WgXcQ 48 | gene47 https://www.youtube.com/watch?v=dQw4w9WgXcQ 49 | gene48 https://www.youtube.com/watch?v=dQw4w9WgXcQ 50 | gene49 https://www.youtube.com/watch?v=dQw4w9WgXcQ 51 | gene50 https://www.youtube.com/watch?v=dQw4w9WgXcQ 52 | gene51 https://www.youtube.com/watch?v=dQw4w9WgXcQ 53 | gene52 https://www.youtube.com/watch?v=dQw4w9WgXcQ 54 | gene53 https://www.youtube.com/watch?v=dQw4w9WgXcQ 55 | gene54 https://www.youtube.com/watch?v=dQw4w9WgXcQ 56 | gene55 https://www.youtube.com/watch?v=dQw4w9WgXcQ 57 | gene56 https://www.youtube.com/watch?v=dQw4w9WgXcQ 58 | gene57 https://www.youtube.com/watch?v=dQw4w9WgXcQ 59 | gene58 https://www.youtube.com/watch?v=dQw4w9WgXcQ 60 | gene59 https://www.youtube.com/watch?v=dQw4w9WgXcQ 61 | gene60 https://www.youtube.com/watch?v=dQw4w9WgXcQ 62 | gene61 https://www.youtube.com/watch?v=dQw4w9WgXcQ 63 | gene62 https://www.youtube.com/watch?v=dQw4w9WgXcQ 64 | gene63 https://www.youtube.com/watch?v=dQw4w9WgXcQ 65 | gene64 https://www.youtube.com/watch?v=dQw4w9WgXcQ 66 | gene65 https://www.youtube.com/watch?v=dQw4w9WgXcQ 67 | gene66 https://www.youtube.com/watch?v=dQw4w9WgXcQ 68 | gene67 https://www.youtube.com/watch?v=dQw4w9WgXcQ 69 | gene68 https://www.youtube.com/watch?v=dQw4w9WgXcQ 70 | gene69 https://www.youtube.com/watch?v=dQw4w9WgXcQ 71 | gene70 https://www.youtube.com/watch?v=dQw4w9WgXcQ 72 | gene71 https://www.youtube.com/watch?v=dQw4w9WgXcQ 73 | gene72 https://www.youtube.com/watch?v=dQw4w9WgXcQ 74 | gene73 https://www.youtube.com/watch?v=dQw4w9WgXcQ 75 | gene74 https://www.youtube.com/watch?v=dQw4w9WgXcQ 76 | gene75 https://www.youtube.com/watch?v=dQw4w9WgXcQ 77 | gene76 https://www.youtube.com/watch?v=dQw4w9WgXcQ 78 | gene77 https://www.youtube.com/watch?v=dQw4w9WgXcQ 79 | gene78 https://www.youtube.com/watch?v=dQw4w9WgXcQ 80 | gene79 https://www.youtube.com/watch?v=dQw4w9WgXcQ 81 | gene80 https://www.youtube.com/watch?v=dQw4w9WgXcQ 82 | gene81 https://www.youtube.com/watch?v=dQw4w9WgXcQ 83 | gene82 https://www.youtube.com/watch?v=dQw4w9WgXcQ 84 | gene83 https://www.youtube.com/watch?v=dQw4w9WgXcQ 85 | gene84 https://www.youtube.com/watch?v=dQw4w9WgXcQ 86 | gene85 https://www.youtube.com/watch?v=dQw4w9WgXcQ 87 | gene86 https://www.youtube.com/watch?v=dQw4w9WgXcQ 88 | gene87 https://www.youtube.com/watch?v=dQw4w9WgXcQ 89 | gene88 https://www.youtube.com/watch?v=dQw4w9WgXcQ 90 | gene89 https://www.youtube.com/watch?v=dQw4w9WgXcQ 91 | gene90 https://www.youtube.com/watch?v=dQw4w9WgXcQ 92 | gene91 https://www.youtube.com/watch?v=dQw4w9WgXcQ 93 | gene92 https://www.youtube.com/watch?v=dQw4w9WgXcQ 94 | gene93 https://www.youtube.com/watch?v=dQw4w9WgXcQ 95 | gene94 https://www.youtube.com/watch?v=dQw4w9WgXcQ 96 | gene95 https://www.youtube.com/watch?v=dQw4w9WgXcQ 97 | gene96 https://www.youtube.com/watch?v=dQw4w9WgXcQ 98 | gene97 https://www.youtube.com/watch?v=dQw4w9WgXcQ 99 | gene98 https://www.youtube.com/watch?v=dQw4w9WgXcQ 100 | gene99 https://www.youtube.com/watch?v=dQw4w9WgXcQ 101 | gene100 https://www.youtube.com/watch?v=dQw4w9WgXcQ 102 | -------------------------------------------------------------------------------- /shinyapp/testdata/gene_variants.tsv: -------------------------------------------------------------------------------- 1 | variant_id gene_id elt_type elt_info 2 | sv708 gene52 UTR 3p UTR 3 | sv126 gene71 UTR 3p UTR 4 | sv209 gene80 exon exon 6 5 | sv382 gene60 UTR 5p UTR 6 | sv167 gene10 exon exon 14 7 | sv194 gene72 UTR 5p UTR 8 | sv355 gene43 exon exon 18 9 | sv629 gene54 UTR 5p UTR 10 | sv980 gene20 UTR 3p UTR 11 | sv318 gene33 exon exon 6 12 | sv74 gene20 UTR 5p UTR 13 | sv676 gene38 UTR 5p UTR 14 | sv151 gene64 exon exon 14 15 | sv708 gene89 exon exon 8 16 | sv143 gene6 exon exon 10 17 | sv501 gene15 UTR 3p UTR 18 | sv733 gene91 exon exon 7 19 | sv599 gene18 UTR 5p UTR 20 | sv733 gene40 exon exon 13 21 | sv158 gene21 UTR 5p UTR 22 | sv723 gene58 exon exon 13 23 | sv734 gene58 UTR 5p UTR 24 | sv673 gene50 exon exon 6 25 | sv944 gene62 UTR 3p UTR 26 | sv346 gene9 exon exon 11 27 | sv345 gene88 exon exon 3 28 | sv665 gene1 UTR 5p UTR 29 | sv319 gene78 exon exon 6 30 | sv900 gene38 exon exon 2 31 | sv284 gene22 UTR 3p UTR 32 | sv513 gene41 UTR 3p UTR 33 | sv104 gene89 UTR 5p UTR 34 | sv680 gene77 exon exon 1 35 | sv945 gene97 exon exon 11 36 | sv228 gene42 exon exon 8 37 | sv816 gene74 exon exon 18 38 | sv809 gene88 exon exon 12 39 | sv384 gene45 exon exon 3 40 | sv515 gene31 UTR 3p UTR 41 | sv139 gene34 UTR 3p UTR 42 | sv429 gene72 exon exon 19 43 | sv436 gene45 exon exon 9 44 | sv920 gene4 exon exon 15 45 | sv139 gene86 UTR 5p UTR 46 | sv761 gene92 UTR 5p UTR 47 | sv432 gene23 exon exon 12 48 | sv93 gene44 exon exon 13 49 | sv815 gene90 exon exon 12 50 | sv572 gene80 UTR 3p UTR 51 | sv464 gene66 exon exon 12 52 | sv755 gene43 exon exon 19 53 | sv184 gene20 exon exon 17 54 | sv125 gene96 UTR 3p UTR 55 | sv291 gene43 exon exon 19 56 | sv662 gene13 exon exon 19 57 | sv762 gene63 exon exon 14 58 | sv367 gene41 exon exon 6 59 | sv808 gene5 exon exon 9 60 | sv141 gene61 UTR 5p UTR 61 | sv874 gene56 UTR 3p UTR 62 | sv325 gene19 UTR 3p UTR 63 | sv844 gene22 UTR 3p UTR 64 | sv794 gene3 UTR 3p UTR 65 | sv268 gene92 exon exon 19 66 | sv701 gene67 exon exon 19 67 | sv59 gene96 exon exon 9 68 | sv246 gene48 exon exon 6 69 | sv942 gene66 UTR 3p UTR 70 | sv306 gene93 exon exon 6 71 | sv436 gene20 exon exon 10 72 | sv552 gene17 exon exon 20 73 | sv600 gene71 exon exon 15 74 | sv826 gene77 exon exon 4 75 | sv678 gene9 UTR 3p UTR 76 | sv233 gene28 UTR 3p UTR 77 | sv601 gene2 exon exon 4 78 | sv443 gene55 exon exon 3 79 | sv531 gene15 exon exon 1 80 | sv625 gene90 UTR 5p UTR 81 | sv980 gene57 UTR 3p UTR 82 | sv904 gene42 UTR 5p UTR 83 | sv410 gene20 exon exon 11 84 | sv265 gene42 UTR 5p UTR 85 | sv713 gene60 exon exon 1 86 | sv42 gene4 UTR 3p UTR 87 | sv437 gene69 UTR 3p UTR 88 | sv714 gene79 exon exon 17 89 | sv674 gene50 exon exon 9 90 | sv406 gene32 UTR 3p UTR 91 | sv667 gene99 exon exon 13 92 | sv771 gene85 exon exon 15 93 | sv126 gene50 UTR 5p UTR 94 | sv858 gene70 UTR 3p UTR 95 | sv395 gene10 exon exon 13 96 | sv466 gene30 exon exon 20 97 | sv831 gene100 exon exon 18 98 | sv374 gene46 UTR 5p UTR 99 | sv866 gene61 UTR 3p UTR 100 | sv240 gene94 exon exon 2 101 | sv301 gene96 UTR 3p UTR 102 | sv108 gene49 UTR 5p UTR 103 | sv210 gene7 exon exon 7 104 | sv65 gene98 exon exon 20 105 | sv961 gene52 exon exon 7 106 | sv347 gene97 UTR 3p UTR 107 | sv460 gene56 UTR 3p UTR 108 | sv642 gene94 UTR 3p UTR 109 | sv722 gene4 exon exon 15 110 | sv251 gene14 UTR 5p UTR 111 | sv646 gene55 exon exon 6 112 | sv340 gene88 exon exon 9 113 | sv738 gene17 UTR 3p UTR 114 | sv231 gene11 UTR 5p UTR 115 | sv423 gene97 UTR 3p UTR 116 | sv857 gene27 exon exon 3 117 | sv244 gene56 exon exon 19 118 | sv461 gene54 exon exon 11 119 | sv20 gene39 exon exon 2 120 | sv838 gene68 exon exon 3 121 | sv919 gene29 exon exon 4 122 | sv734 gene30 exon exon 4 123 | sv851 gene13 exon exon 9 124 | sv470 gene71 UTR 3p UTR 125 | sv325 gene73 UTR 3p UTR 126 | sv755 gene68 exon exon 5 127 | sv469 gene15 UTR 3p UTR 128 | sv184 gene67 exon exon 19 129 | sv169 gene27 UTR 3p UTR 130 | sv512 gene68 UTR 5p UTR 131 | sv490 gene83 exon exon 5 132 | sv370 gene25 exon exon 7 133 | sv27 gene69 UTR 3p UTR 134 | sv995 gene9 UTR 3p UTR 135 | sv883 gene51 UTR 3p UTR 136 | sv501 gene97 UTR 3p UTR 137 | sv5 gene14 exon exon 3 138 | sv674 gene16 UTR 3p UTR 139 | sv626 gene60 UTR 3p UTR 140 | sv773 gene95 exon exon 19 141 | sv246 gene90 UTR 5p UTR 142 | sv731 gene5 UTR 3p UTR 143 | sv163 gene8 exon exon 14 144 | sv293 gene3 exon exon 8 145 | sv110 gene1 exon exon 2 146 | sv733 gene35 exon exon 8 147 | sv565 gene54 exon exon 18 148 | sv152 gene89 exon exon 2 149 | sv765 gene27 exon exon 10 150 | sv384 gene23 UTR 5p UTR 151 | sv115 gene95 exon exon 7 152 | sv806 gene63 exon exon 1 153 | sv815 gene59 exon exon 15 154 | sv282 gene85 UTR 3p UTR 155 | sv312 gene55 UTR 5p UTR 156 | sv714 gene18 UTR 3p UTR 157 | sv513 gene76 UTR 3p UTR 158 | sv872 gene77 UTR 5p UTR 159 | sv391 gene100 UTR 5p UTR 160 | sv615 gene100 UTR 3p UTR 161 | sv240 gene16 UTR 5p UTR 162 | sv933 gene47 exon exon 19 163 | sv894 gene61 exon exon 1 164 | sv87 gene29 UTR 3p UTR 165 | sv866 gene23 UTR 3p UTR 166 | sv324 gene4 exon exon 10 167 | sv907 gene91 UTR 3p UTR 168 | sv380 gene45 UTR 5p UTR 169 | sv174 gene18 UTR 5p UTR 170 | sv764 gene22 UTR 3p UTR 171 | sv723 gene35 UTR 3p UTR 172 | sv619 gene17 UTR 3p UTR 173 | sv928 gene87 UTR 3p UTR 174 | sv758 gene24 exon exon 9 175 | sv170 gene51 UTR 5p UTR 176 | sv161 gene30 exon exon 8 177 | sv70 gene42 exon exon 15 178 | sv514 gene85 exon exon 8 179 | sv499 gene32 exon exon 19 180 | sv838 gene69 UTR 5p UTR 181 | sv71 gene96 exon exon 20 182 | sv95 gene89 UTR 3p UTR 183 | sv98 gene2 exon exon 12 184 | sv503 gene13 exon exon 3 185 | sv32 gene3 UTR 3p UTR 186 | sv355 gene2 UTR 3p UTR 187 | sv134 gene13 exon exon 19 188 | sv648 gene64 exon exon 19 189 | sv485 gene35 UTR 3p UTR 190 | sv461 gene56 exon exon 7 191 | sv684 gene30 exon exon 14 192 | sv112 gene61 UTR 5p UTR 193 | sv185 gene30 exon exon 7 194 | sv715 gene16 UTR 3p UTR 195 | sv266 gene45 UTR 5p UTR 196 | sv482 gene80 UTR 5p UTR 197 | sv453 gene64 exon exon 7 198 | sv786 gene16 exon exon 18 199 | sv553 gene49 exon exon 3 200 | sv141 gene44 exon exon 18 201 | sv531 gene90 UTR 3p UTR 202 | sv349 gene85 exon exon 3 203 | sv876 gene55 UTR 5p UTR 204 | sv46 gene70 exon exon 15 205 | sv113 gene43 exon exon 6 206 | sv127 gene30 exon exon 1 207 | sv674 gene77 exon exon 7 208 | sv595 gene56 UTR 3p UTR 209 | sv597 gene18 exon exon 19 210 | sv724 gene43 UTR 3p UTR 211 | sv871 gene51 UTR 3p UTR 212 | sv443 gene91 UTR 5p UTR 213 | sv505 gene26 exon exon 20 214 | sv552 gene80 UTR 3p UTR 215 | sv236 gene75 exon exon 1 216 | sv362 gene87 exon exon 20 217 | sv305 gene75 exon exon 16 218 | sv505 gene61 exon exon 19 219 | sv760 gene48 exon exon 2 220 | sv265 gene72 exon exon 6 221 | sv808 gene58 UTR 3p UTR 222 | sv955 gene49 UTR 5p UTR 223 | sv737 gene76 exon exon 11 224 | sv544 gene6 UTR 5p UTR 225 | sv505 gene19 exon exon 8 226 | sv978 gene30 exon exon 2 227 | sv280 gene28 UTR 5p UTR 228 | sv567 gene4 exon exon 13 229 | sv823 gene28 UTR 3p UTR 230 | sv695 gene42 UTR 3p UTR 231 | sv164 gene47 exon exon 8 232 | sv879 gene32 UTR 5p UTR 233 | sv226 gene93 UTR 5p UTR 234 | sv240 gene65 exon exon 13 235 | sv26 gene7 UTR 5p UTR 236 | sv185 gene25 exon exon 7 237 | sv490 gene93 exon exon 15 238 | sv111 gene63 exon exon 17 239 | sv137 gene33 exon exon 7 240 | sv467 gene45 exon exon 14 241 | sv273 gene96 exon exon 15 242 | sv87 gene29 exon exon 18 243 | sv767 gene78 exon exon 6 244 | sv524 gene99 exon exon 8 245 | sv278 gene49 UTR 5p UTR 246 | sv457 gene1 UTR 5p UTR 247 | sv370 gene61 exon exon 3 248 | sv581 gene65 exon exon 1 249 | sv933 gene45 UTR 3p UTR 250 | sv127 gene97 exon exon 16 251 | sv6 gene84 UTR 3p UTR 252 | sv145 gene23 UTR 5p UTR 253 | sv337 gene80 exon exon 20 254 | sv916 gene35 UTR 5p UTR 255 | sv368 gene66 UTR 3p UTR 256 | sv450 gene7 UTR 5p UTR 257 | sv889 gene12 exon exon 2 258 | sv277 gene35 exon exon 14 259 | sv495 gene41 exon exon 18 260 | sv963 gene24 UTR 3p UTR 261 | sv166 gene81 exon exon 2 262 | sv935 gene40 UTR 5p UTR 263 | sv847 gene32 UTR 5p UTR 264 | sv348 gene5 UTR 5p UTR 265 | sv716 gene59 exon exon 7 266 | sv993 gene94 UTR 5p UTR 267 | sv626 gene99 exon exon 3 268 | sv815 gene32 exon exon 9 269 | sv933 gene54 UTR 3p UTR 270 | sv987 gene32 UTR 5p UTR 271 | sv919 gene24 exon exon 11 272 | sv393 gene60 exon exon 15 273 | sv606 gene30 UTR 5p UTR 274 | sv54 gene28 exon exon 13 275 | sv72 gene36 UTR 5p UTR 276 | sv99 gene79 exon exon 2 277 | sv628 gene33 exon exon 8 278 | sv716 gene42 UTR 5p UTR 279 | sv645 gene47 UTR 5p UTR 280 | sv29 gene99 UTR 5p UTR 281 | sv245 gene18 UTR 3p UTR 282 | sv780 gene10 UTR 5p UTR 283 | sv660 gene52 UTR 5p UTR 284 | sv787 gene43 exon exon 14 285 | sv484 gene47 UTR 5p UTR 286 | sv378 gene35 exon exon 19 287 | sv621 gene4 exon exon 9 288 | sv224 gene56 UTR 3p UTR 289 | sv400 gene93 exon exon 11 290 | sv520 gene5 UTR 3p UTR 291 | sv657 gene46 UTR 5p UTR 292 | sv333 gene68 exon exon 20 293 | sv972 gene31 exon exon 10 294 | sv486 gene27 UTR 5p UTR 295 | sv755 gene56 exon exon 6 296 | sv504 gene54 UTR 5p UTR 297 | sv672 gene67 UTR 3p UTR 298 | sv522 gene39 UTR 3p UTR 299 | sv236 gene37 exon exon 16 300 | sv535 gene71 exon exon 5 301 | sv910 gene25 UTR 5p UTR 302 | sv150 gene40 UTR 3p UTR 303 | sv474 gene17 exon exon 6 304 | sv78 gene45 exon exon 4 305 | sv417 gene21 exon exon 6 306 | sv271 gene47 exon exon 5 307 | sv318 gene9 exon exon 3 308 | sv33 gene38 exon exon 7 309 | sv476 gene22 UTR 3p UTR 310 | sv688 gene41 UTR 5p UTR 311 | sv240 gene16 exon exon 8 312 | sv612 gene62 exon exon 14 313 | sv51 gene71 exon exon 13 314 | sv425 gene7 UTR 3p UTR 315 | sv343 gene30 exon exon 10 316 | sv218 gene89 UTR 3p UTR 317 | sv813 gene20 exon exon 20 318 | sv559 gene81 UTR 3p UTR 319 | sv406 gene76 UTR 5p UTR 320 | sv113 gene91 exon exon 9 321 | sv327 gene62 exon exon 8 322 | sv600 gene97 UTR 3p UTR 323 | sv907 gene39 UTR 3p UTR 324 | sv749 gene86 exon exon 8 325 | sv306 gene28 exon exon 3 326 | sv676 gene97 exon exon 15 327 | sv868 gene80 exon exon 8 328 | sv36 gene42 exon exon 20 329 | sv397 gene7 UTR 5p UTR 330 | sv680 gene46 exon exon 20 331 | sv929 gene14 UTR 3p UTR 332 | sv266 gene30 exon exon 16 333 | sv240 gene83 exon exon 19 334 | sv923 gene4 UTR 5p UTR 335 | sv482 gene99 UTR 3p UTR 336 | sv338 gene44 UTR 3p UTR 337 | sv354 gene2 exon exon 3 338 | sv679 gene9 UTR 5p UTR 339 | sv581 gene67 exon exon 18 340 | sv53 gene35 UTR 3p UTR 341 | sv686 gene66 exon exon 4 342 | sv704 gene15 UTR 5p UTR 343 | sv334 gene39 exon exon 4 344 | sv14 gene38 UTR 3p UTR 345 | sv424 gene83 exon exon 13 346 | sv811 gene90 exon exon 2 347 | sv847 gene62 UTR 3p UTR 348 | sv19 gene20 exon exon 1 349 | sv748 gene35 UTR 5p UTR 350 | sv827 gene48 UTR 3p UTR 351 | sv33 gene26 exon exon 2 352 | sv725 gene86 exon exon 12 353 | sv101 gene13 exon exon 17 354 | sv708 gene73 UTR 5p UTR 355 | sv956 gene49 exon exon 4 356 | sv551 gene4 exon exon 4 357 | sv385 gene81 exon exon 10 358 | sv266 gene43 UTR 3p UTR 359 | sv737 gene5 exon exon 19 360 | sv305 gene47 UTR 3p UTR 361 | sv55 gene7 UTR 5p UTR 362 | sv147 gene70 UTR 5p UTR 363 | sv475 gene16 UTR 3p UTR 364 | sv775 gene48 exon exon 15 365 | sv839 gene41 UTR 5p UTR 366 | sv334 gene67 UTR 3p UTR 367 | sv982 gene53 exon exon 17 368 | sv222 gene22 UTR 5p UTR 369 | sv776 gene61 exon exon 17 370 | sv114 gene57 exon exon 1 371 | sv491 gene64 UTR 3p UTR 372 | sv349 gene38 exon exon 15 373 | sv8 gene31 UTR 3p UTR 374 | sv274 gene87 exon exon 7 375 | sv379 gene44 UTR 5p UTR 376 | sv280 gene5 UTR 5p UTR 377 | sv710 gene16 UTR 5p UTR 378 | sv655 gene7 exon exon 3 379 | sv719 gene19 exon exon 18 380 | sv680 gene31 exon exon 3 381 | sv840 gene65 exon exon 13 382 | sv808 gene61 UTR 5p UTR 383 | sv656 gene62 UTR 3p UTR 384 | sv643 gene26 UTR 5p UTR 385 | sv779 gene89 UTR 3p UTR 386 | sv247 gene84 exon exon 7 387 | sv834 gene93 exon exon 20 388 | sv208 gene82 UTR 5p UTR 389 | sv937 gene98 exon exon 11 390 | sv22 gene35 UTR 5p UTR 391 | sv622 gene65 UTR 3p UTR 392 | sv10 gene13 exon exon 4 393 | sv827 gene56 exon exon 18 394 | sv323 gene60 exon exon 17 395 | sv412 gene54 UTR 3p UTR 396 | sv701 gene27 exon exon 11 397 | sv677 gene30 exon exon 15 398 | sv188 gene60 exon exon 15 399 | sv381 gene19 UTR 5p UTR 400 | sv750 gene20 exon exon 13 401 | sv404 gene28 UTR 3p UTR 402 | sv142 gene57 UTR 5p UTR 403 | sv103 gene43 UTR 5p UTR 404 | sv729 gene84 UTR 5p UTR 405 | sv933 gene46 UTR 5p UTR 406 | sv377 gene38 exon exon 10 407 | sv924 gene65 UTR 3p UTR 408 | sv32 gene55 exon exon 11 409 | sv817 gene19 exon exon 18 410 | sv336 gene65 exon exon 7 411 | sv480 gene59 exon exon 14 412 | sv511 gene72 UTR 3p UTR 413 | sv981 gene85 exon exon 20 414 | sv996 gene44 UTR 3p UTR 415 | sv855 gene1 exon exon 19 416 | sv869 gene71 exon exon 16 417 | sv209 gene51 exon exon 15 418 | sv601 gene43 UTR 3p UTR 419 | sv644 gene31 UTR 5p UTR 420 | sv893 gene75 UTR 3p UTR 421 | sv531 gene50 exon exon 16 422 | sv990 gene19 exon exon 2 423 | sv906 gene44 UTR 3p UTR 424 | sv988 gene66 UTR 5p UTR 425 | sv53 gene83 UTR 3p UTR 426 | sv386 gene54 UTR 5p UTR 427 | sv114 gene3 UTR 3p UTR 428 | sv472 gene50 UTR 3p UTR 429 | sv451 gene47 exon exon 16 430 | sv574 gene37 exon exon 18 431 | sv565 gene81 UTR 3p UTR 432 | sv448 gene44 UTR 5p UTR 433 | sv205 gene46 UTR 5p UTR 434 | sv432 gene22 UTR 3p UTR 435 | sv863 gene86 exon exon 4 436 | sv258 gene53 exon exon 9 437 | sv52 gene14 UTR 3p UTR 438 | sv230 gene98 exon exon 1 439 | sv906 gene2 UTR 5p UTR 440 | sv383 gene39 UTR 5p UTR 441 | sv405 gene65 exon exon 11 442 | sv536 gene69 UTR 3p UTR 443 | sv816 gene91 UTR 3p UTR 444 | sv590 gene55 exon exon 4 445 | sv512 gene42 exon exon 4 446 | sv778 gene75 UTR 3p UTR 447 | sv647 gene8 exon exon 20 448 | sv171 gene62 UTR 3p UTR 449 | sv889 gene43 UTR 3p UTR 450 | sv88 gene37 UTR 3p UTR 451 | sv213 gene61 UTR 3p UTR 452 | sv697 gene13 UTR 3p UTR 453 | sv733 gene44 UTR 5p UTR 454 | sv907 gene44 exon exon 1 455 | sv292 gene78 UTR 5p UTR 456 | sv536 gene52 UTR 5p UTR 457 | sv890 gene47 UTR 5p UTR 458 | sv570 gene27 UTR 5p UTR 459 | sv351 gene60 UTR 3p UTR 460 | sv712 gene77 exon exon 20 461 | sv994 gene1 exon exon 16 462 | sv621 gene86 UTR 3p UTR 463 | sv232 gene17 exon exon 3 464 | sv556 gene60 UTR 3p UTR 465 | sv786 gene27 exon exon 6 466 | sv958 gene37 UTR 3p UTR 467 | sv899 gene35 UTR 3p UTR 468 | sv2 gene47 exon exon 18 469 | sv642 gene88 UTR 3p UTR 470 | sv645 gene36 exon exon 10 471 | sv138 gene62 UTR 5p UTR 472 | sv201 gene20 UTR 3p UTR 473 | sv139 gene63 UTR 3p UTR 474 | sv18 gene7 exon exon 6 475 | sv537 gene71 exon exon 14 476 | sv984 gene79 UTR 5p UTR 477 | sv819 gene52 exon exon 19 478 | sv776 gene21 exon exon 12 479 | sv940 gene18 exon exon 16 480 | sv747 gene8 exon exon 15 481 | sv774 gene5 UTR 3p UTR 482 | sv302 gene49 UTR 3p UTR 483 | sv6 gene94 UTR 5p UTR 484 | sv774 gene61 exon exon 11 485 | sv898 gene64 exon exon 4 486 | sv409 gene70 exon exon 1 487 | sv388 gene32 exon exon 2 488 | sv287 gene37 exon exon 16 489 | sv444 gene80 exon exon 20 490 | sv909 gene32 UTR 3p UTR 491 | sv538 gene26 UTR 3p UTR 492 | sv357 gene63 exon exon 12 493 | sv154 gene71 exon exon 18 494 | sv360 gene50 UTR 5p UTR 495 | sv171 gene65 exon exon 20 496 | sv839 gene50 UTR 3p UTR 497 | sv393 gene41 UTR 5p UTR 498 | sv220 gene56 exon exon 17 499 | sv23 gene11 UTR 3p UTR 500 | sv802 gene72 UTR 3p UTR 501 | sv79 gene17 exon exon 16 502 | -------------------------------------------------------------------------------- /shinyapp/testdata/make-test-data.R: -------------------------------------------------------------------------------- 1 | library(dplyr) 2 | 3 | ## genes 4 | genes = paste0('gene', 1:100) 5 | 6 | ## variants 7 | N = 1000 8 | vars = tibble(variant_id=paste0('sv', 1:N), 9 | type=sample(c('deletion', 'duplication'), N, TRUE), 10 | chr=sample(1:22, N, TRUE), 11 | start=round(runif(N, 1, 1e8))) %>% 12 | mutate(end=start + round(runif(N, 50, 1e6))) 13 | head(vars) 14 | write.table(vars, file='all_variants.tsv', quote=FALSE, row.names=FALSE, sep='\t') 15 | 16 | ## variant-gene pair including gene impact 17 | gene.var = tibble(variant_id=sample(vars$variant_id, 500, TRUE), 18 | gene_id=sample(genes, 500, TRUE), 19 | elt_type=sample(c('exon', 'UTR'), 500, TRUE)) %>% 20 | mutate(elt_info=ifelse(elt_type=='exon', 21 | paste('exon', sample(1:20, n(), TRUE)), 22 | sample(c("5p UTR", "3p UTR"), n(), TRUE))) 23 | head(gene.var) 24 | write.table(gene.var, file='gene_variants.tsv', quote=FALSE, row.names=FALSE, sep='\t') 25 | 26 | ## overlap clinically important SV 27 | clinsv = tibble(variant_id=sample(vars$variant_id, 100), 28 | pathogenic_clinvar_sv=TRUE) 29 | write.table(clinsv, file='clinsv_variants.tsv', quote=FALSE, row.names=FALSE, sep='\t') 30 | 31 | ## overlap clinically important SNV/indels 32 | clinsnv = tibble(variant_id=sample(vars$variant_id, 200), 33 | pathogenic_clinvar_snv_indel=TRUE) 34 | write.table(clinsnv, file='clinsnv_variants.tsv', quote=FALSE, row.names=FALSE, sep='\t') 35 | 36 | ## allele frequency 37 | af = tibble(variant_id=sample(vars$variant_id, 300), af=runif(300)) 38 | write.table(af, file='af.tsv', quote=FALSE, row.names=FALSE, sep='\t') 39 | 40 | ## external resources 41 | ext.urls = tibble(gene_id=genes, omim_url='https://www.youtube.com/watch?v=dQw4w9WgXcQ') 42 | write.table(ext.urls, file='ext_urls.tsv', quote=FALSE, row.names=FALSE, sep='\t') 43 | -------------------------------------------------------------------------------- /shinyapp/ui.R: -------------------------------------------------------------------------------- 1 | library(shiny) 2 | library(DT) 3 | library(shinydashboard) 4 | 5 | data = list() 6 | data$all_variants = read.table('./all_variants_chr21.tsv', as.is=TRUE, header=TRUE, sep='\t') 7 | data$gene_variants = read.table('./variants.genes.chr21.tsv.gz', as.is=TRUE, header=TRUE, sep='\t') 8 | 9 | ## list all genes 10 | genes = unique(c(data$gene_variants$gene_id, data$gene_variants$gene_name, data$gene_variants$transcript_id)) 11 | genes = unique(c('ENST00000400454.6', genes)) 12 | 13 | ## merge general variant info 14 | svtypes = sort(unique(data$all_variants$type)) 15 | svsize.max = max(data$all_variants$end - data$all_variants$start) 16 | 17 | ui <- dashboardPage( 18 | dashboardHeader(title='GeneVar'), 19 | dashboardSidebar( 20 | selectizeInput('gene_search', 'Gene', genes), 21 | div(p(' Search by gene name, gene id (ENSG...),, or transcript ID (ENST...)'), 22 | p(' Examples:'), 23 | p(' DSCAM'), 24 | p(' ENSG00000171587.15'), 25 | p(' ENST00000400454.6')), 26 | checkboxGroupInput('svtypes', "SV type", svtypes, svtypes), 27 | numericInput('size.min', 'Minimum SV size (bp)', 0, 0), 28 | numericInput('size.max', 'Maximum SV size (bp)', svsize.max, svsize.max) 29 | ), 30 | dashboardBody( 31 | htmlOutput('title'), 32 | fluidRow( 33 | ## A static infoBox 34 | infoBoxOutput("sv_box"), 35 | infoBoxOutput("path_sv_box"), 36 | infoBoxOutput("path_snv_box") 37 | ), 38 | shiny::htmlOutput('omim_url', class='btn btn-default action-button shiny-bound-input'), 39 | shiny::htmlOutput('gtex_url', class='btn btn-default action-button shiny-bound-input'), 40 | shiny::htmlOutput('gnomad_url', class='btn btn-default action-button shiny-bound-input'), 41 | hr(), 42 | dataTableOutput('vars_table'), 43 | hr(), 44 | h2('Allele frequency distribution'), 45 | plotOutput('af_plot') 46 | ) 47 | ) 48 | --------------------------------------------------------------------------------