├── README.md ├── add-genes-to-characteristic.pl ├── assemble.r ├── cluster.r ├── exam-eta.pl ├── example ├── KIRC.txt ├── UVM-nohup.out ├── UVM-summary.txt └── UVM.txt ├── example_specific_training.txt ├── gene-characteristic-unix.txt ├── genome-characteristic.txt ├── hg19_refGene.exp ├── hg38_refGene.exp ├── monte_carlo_sim.cpp ├── pre-drgap.pl ├── prior └── PAN ├── reduction.pl ├── run.driverml.sh ├── sta.r ├── training.tar.bz2 └── write_eta.r /README.md: -------------------------------------------------------------------------------- 1 | # DriverML 2 | DriverML integrates the Rao’s score test and supervised machine learning to identify cancer driver genes. Rigorous and unbiased benchmark analysis and comparisons of DriverML with other existing tools in 31 independent datasets from The Cancer Genome Atlas (TCGA) show that DriverML is robust and powerful among various datasets and outperforms the other tools with a better balance of precision and recall. 3 | 4 | ## Access 5 | DriverML is free for non-commerical use only. 6 | 7 | ## Installation 8 | unzip DriverML-master.zip 9 | cd DriverML-master 10 | tar jxvf training.tar.bz2 11 | chmod +x *.pl *.r *.sh 12 | 13 | ## Running 14 | Requirements: Perl and R are required on user’s Linux environment. 15 | 16 | Usage: path/run.driverml.sh -w –i -f -r [options] 17 | 18 | Required arguments: 19 | 20 | | Alias| Description | 21 | |:---------------|:-----------------------------------------------| 22 | | -w/--path | The absolute path to the DriverML_v1.0.0. | 23 | | -i/--input | The list of tumor mutations to be analyzed. It should be put in the DriverML-Master/ directory.| 24 | |-f/--reference_genome| The reference genome for input(-i) data.| 25 | |-r/--reference_training |The human reference genome file for training data. Deafuat: hg19 reference file| 26 | 27 | Options: 28 | 29 | | Alias| Description | 30 | |:---------------|:-----------------------------------------------| 31 | |-g/--mutation_table| The predefined gene mutation table which could be found in the package of this application. It could be either hg19_refGene.exp or hg38_refGene.exp according to -i input file. Default: hg38_refGene.exp| 32 | |-y/--tumor_type| Training mutation data.Default: Pan-cancer| 33 | |-m/--multicore |Set the number of parallel instances to be run concurrently. Default: 4| 34 | |-t/--simulation_time| Set the number for Monte Carlo Simulation. Default: 2500| 35 | |-o/--output| Set the prefix of the output file. Default: summary.| 36 | |-c/--cluster_number| Set the upper limit of cluster number for computing BMR. Default: 1| 37 | |-p/--prior| Set the prior information. Default: Non-TCGA genes from DriverDB and IntOGen databases.| 38 | |-n/--interpolation_number| Set the upper limit of interpolation number for making gene clusters. Default: 100| 39 | |-d/--indel_ratio |The ratio of point mutation to indel in the background. Default: 0.05| 40 | |-h/--help| Help information.| 41 | |-v/--version| Software version.| 42 | 43 | ## Notes 44 | The mutation file needs to be MAF format. Eight columns named Hugo_Symbol, Chromosome, Start_Position, Variant_Classification, Variant_Type, Reference_Allele, Tumor_Seq_Allele2 and Tumor_Sample_Barcode are required. The first row in the mutation file must be the header including column names above (case sensitive). 45 | Eight required columns are as below: 46 | * Tumor sample barcode is the sample ID in which this mutation occurs. 47 | * Hugo symbol requires HUGO symbols of genes. 48 | * Chromosome that contains the gene. 49 | * Start position requires the lowest numeric position of the reported variant on the genomic reference sequence (1-based coordinate system). 50 | * Reference allele is the plus strand reference allele at this position. Include the sequence deleted for a deletion, or "-" for an insertion. 51 | * Tumor allele2 is tumor sequencing (discovery) allele 2. 52 | * Variant classification describes the translational effect of a variant allele. It is one of Silent, Missense_Mutation, Nonsense_Mutation, Splice_Site, Frame_Shift_Del, Frame_Shift_Ins, In_Frame_Del, and In_Frame_Ins. Other mutation types will not be analyzed. 53 | * Variant type is Type of mutation, such as SNP, DNP, TNP, INS, and DEL. 54 | The mutation file needs to be Detailed information about the MAF format could be found at https://wiki.nci.nih.gov/display/TCGA/Mutation+Annotation+Format+%28MAF%29+Specification. An example file could be seen in /yourpath/DriverML_v1.0.0/example_data/UVM.txt. 55 | * If the tumor allele2 is not mutated, it should be replaced by the tumor allele1. 56 | 57 | DriverML was developed on a high-performance computing clusters which had nearly 2T memory. The pan-cancer mutation dataset (default) requires much more memory than cancer-specific datasets in the training process. The actual memory usage also depends on the size of the input dataset. Most of the running errors are due to insufficiency of memory. 58 | ## ExInAtor results in our manuscript 59 | **ExInAtor was designed for the detection of driver genes (principally lncRNAs, but also protein-coding genes) using whole-genome sequencing (WGS) mutation data. According to a recent discussion with the developer of ExInAtor, it is worth noting that the result of ExInAtor in the DriverML paper didn’t represent its optimal performance, due to the use of whole-exome sequencing (WES) mutation data. If anyone want to know the optimal performance of ExInAtor, please refer to the ExInAtor manuscript.** 60 | 61 | ## Output 62 | The output of DriverML is a summary of putative driver genes, including the numbers of each mutation type, the value of the statistic, p-value, and FDR adjusted p-value. The genes with negative LRT values (the statistic values) shoule be ignored. 63 | ## Interpretation of the Output 64 | There are 11 columns in the output file. The first column is the gene symbol. The second to eighth columns are numbers of mutations. The ninth column(LRT) is values of the statistics. The tenth and eleventh columns are the P-values and adjusted P-values(Benjamini-Hochberg Procedure). The bigger the statistics, the more likely that gene is a driver. You could refer to the adjusted P-values for possibilities. Find the largest p-adj that is smaller than 0.05(or another value). All values above it (those with lower P-values) are considered significant. The negative values of LRT represent they are not likely to be drivers and you could ignore them. 65 | ## Example 66 | nohup /AbsolutePath/DriverML-master/run.driverml.sh -w /AbsolutePath/DriverML-master -i example/UVM.txt -f /AbsolutePath/GRCh38.fa -r /AbsolutePath/hg19.fa -m 10 -o UVM-summary.txt > UVM-nohup.out 67 | 68 | An example on [Google Colab online](https://colab.research.google.com/drive/1ChSiLMtmWzC-4NAqWHCDP9oP2rL6GY1x) 69 | 70 | ## Citation 71 | Yi Han, Juze Yang, Xinyi Qian, Wei-Chung Cheng, Shu-Hsuan Liu, Xing Hua, Liyuan Zhou, Yaning Yang, Qingbiao Wu, Pengyuan Liu, Yan Lu, DriverML: a machine learning algorithm for identifying driver genes in cancer sequencing studies, Nucleic Acids Research, Volume 47, Issue 8, 07 May 2019, Page e45, https://doi.org/10.1093/nar/gkz096 72 | ## Contact 73 | If you have any questions, please do not hesitate to contact us. 74 | * The email of the developer is yihan@zju.edu.cn & 250147506@qq.com. 75 | 76 | ## Last update 77 | Monday February 2, 2021 78 | -------------------------------------------------------------------------------- /add-genes-to-characteristic.pl: -------------------------------------------------------------------------------- 1 | #!/usr/bin/perl 2 | use autodie; 3 | use strict; 4 | use Getopt::Long; 5 | our%chara; 6 | our$input_file; 7 | our$date; 8 | our$out_name; 9 | my$optionOK=GetOptions( 10 | 'i|input_file=s' =>\$input_file, 11 | 'd|date=s' =>\$date, 12 | ); 13 | $out_name=$date."_gene-characteristic.tmp"; 14 | 15 | open CHAR,'<',"gene-characteristic-unix.txt"; 16 | 17 | open PLUS,'>',"$out_name"; 18 | my$header=; 19 | while(){ 20 | chomp; 21 | my@qwe=split(/\t/); 22 | $chara{$qwe[0]}=1; 23 | print PLUS "$_\n"; 24 | } 25 | close CHAR; 26 | our%notinchara; 27 | open INPUTFILE,'<',"$input_file"; 28 | #$header=; 29 | while(){ 30 | chomp; 31 | my@qwe=split(/\t/); 32 | if(!$chara{$qwe[1]}){ 33 | if($qwe[2]=~/X|Y/){ 34 | $qwe[2]=23; 35 | } 36 | $notinchara{$qwe[1]}[0]=$qwe[2]; 37 | $notinchara{$qwe[1]}[1]=$qwe[3]; 38 | } 39 | } 40 | close INPUTFILE; 41 | our%genome; 42 | open GENOME,'<',"genome-characteristic.txt"; 43 | $header=; 44 | while(){ 45 | chomp; 46 | my@qwe=split(/\t/); 47 | $genome{$qwe[0]}{$qwe[1]}=$qwe[4]; 48 | } 49 | close GENOME; 50 | foreach my$qwe(keys %notinchara){ 51 | if(!$genome{$notinchara{$qwe}[0]}){ 52 | print "chr:$notinchara{$qwe}[0] which is in the input file is not in the genome-characteristic file!\n"; 53 | next; 54 | } 55 | foreach my$asd(keys %{$genome{$notinchara{$qwe}[0]}}){ 56 | if($notinchara{$qwe}[1] >= $asd && $notinchara{$qwe}[1] <= $asd+100000){ 57 | $notinchara{$qwe}[2]=$genome{$notinchara{$qwe}[0]}{$asd}; 58 | } 59 | } 60 | } 61 | foreach my$qwe(keys%notinchara){ 62 | if($notinchara{$qwe}[2]){ 63 | print PLUS "$qwe\t$notinchara{$qwe}[0]\t1\t1\t$notinchara{$qwe}[2]\tNaN\t1\tNaN\tNaN\t1\t1\t1\t1\t1\t1\t1\n"; 64 | } 65 | } 66 | close PLUS; 67 | -------------------------------------------------------------------------------- /assemble.r: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env Rscript 2 | rm(list = ls()) 3 | args<-commandArgs(TRUE) 4 | number<-as.numeric(args[1]) 5 | geneRatio<-as.numeric(args[2]) 6 | name<-as.character(args[3]) 7 | date<-as.character(args[4]) 8 | 9 | out_file_1<-paste(date,'_out_file_1.tmp',sep='') 10 | outSummary_all<-read.table(out_file_1,header = TRUE,sep = '\t') 11 | 12 | cat("read 1") 13 | p_file_name<-paste(date,'_1_p.tmp',sep='') 14 | 15 | pvalues<-read.table(p_file_name,header=F,sep='\t',col.names=c('p','p_adj')) 16 | 17 | cat("read 2") 18 | out_with_p<-data.frame(outSummary_all,pvalues) 19 | out_with_p<-out_with_p[1,] 20 | if(number==1){ 21 | for(f in 1:number){ 22 | out_fi <- paste("_out_file_",f,sep = "") 23 | out_file <- paste(out_fi,'.tmp',sep="") 24 | out_file_date<-paste(date,out_file,sep='') 25 | file<-read.table(out_file_date,header = TRUE,sep = '\t') 26 | 27 | cat(f) 28 | p_file<-paste(date,f,sep='_') 29 | p_file_name<-paste(p_file,'_p.tmp',sep='') 30 | 31 | pfile<-read.table(p_file_name,header=F,sep='\t',col.names=c('p','p_adj')) 32 | 33 | 34 | out_with_p_loop<-data.frame(file,pfile) 35 | out_with_p<-rbind(out_with_p,out_with_p_loop) 36 | } 37 | 38 | out_with_p<-out_with_p[-1,] 39 | 40 | write.table(out_with_p[order(out_with_p$p),],name,row.names=F,quote=F,sep='\t') 41 | 42 | } 43 | if(number>1){ 44 | 45 | for(f in 1:number){ 46 | out_fi <- paste("_out_file_",f,sep = "") 47 | out_file <- paste(out_fi,'.tmp',sep="") 48 | out_file_date<-paste(date,out_file,sep='') 49 | file<-read.table(out_file_date,header = TRUE,sep = '\t') 50 | 51 | cat(f) 52 | p_file<-paste(date,f,sep='_') 53 | p_file_name<-paste(p_file,'_p.tmp',sep='') 54 | 55 | pfile<-read.table(p_file_name,header=F,sep='\t',col.names=c('p','p_adj')) 56 | 57 | out_with_p_loop<-data.frame(file,pfile) 58 | out_with_p_every<-rbind(out_with_p,out_with_p_loop) 59 | out_with_p_every<-out_with_p_every[-1,] 60 | name<-as.character(args[3]) 61 | name<-paste(f,name,sep="_") 62 | write.table(out_with_p_every[order(out_with_p_every$p),],name,row.names=F,quote=F,sep='\t') 63 | 64 | } 65 | 66 | } 67 | -------------------------------------------------------------------------------- /cluster.r: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env Rscript 2 | rm(list = ls()) 3 | args<-commandArgs(TRUE) 4 | input_name<-paste(as.character(args[3]),"_gene-characteristic.tmp",sep="") 5 | gene_chara<-read.table(input_name,sep = "\t",header = FALSE) 6 | distance<-dist(gene_chara[,5],method="euclidean") 7 | mode<-hclust(distance,method="ward.D") 8 | f<-function(x){ 9 | result<-cutree(mode,k=x) 10 | gene_chara_temp<-cbind(gene_chara,result) 11 | esti_matrix<-matrix(0,nrow=x,ncol=9) 12 | for (i in 1:nrow(gene_chara_temp)){ 13 | if (!is.na(gene_chara_temp[i,5])){ 14 | esti_matrix[gene_chara_temp[i,17],1]<-esti_matrix[gene_chara_temp[i,17],1]+gene_chara_temp[i,5] 15 | esti_matrix[gene_chara_temp[i,17],6]<-esti_matrix[gene_chara_temp[i,17],6]+1 16 | } 17 | if (!is.na(gene_chara_temp[i,6])){ 18 | esti_matrix[gene_chara_temp[i,17],2]<-esti_matrix[gene_chara_temp[i,17],2]+gene_chara_temp[i,6] 19 | esti_matrix[gene_chara_temp[i,17],7]<-esti_matrix[gene_chara_temp[i,17],7]+1 20 | } 21 | if (!is.na(gene_chara_temp[i,8])){ 22 | esti_matrix[gene_chara_temp[i,17],4]<-esti_matrix[gene_chara_temp[i,17],4]+gene_chara_temp[i,8] 23 | esti_matrix[gene_chara_temp[i,17],8]<-esti_matrix[gene_chara_temp[i,17],8]+1 24 | } 25 | if(!is.na(gene_chara_temp[i,9])){ 26 | esti_matrix[gene_chara_temp[i,17],5]<-esti_matrix[gene_chara_temp[i,17],5]+gene_chara_temp[i,9] 27 | esti_matrix[gene_chara_temp[i,17],9]<-esti_matrix[gene_chara_temp[i,17],9]+1 28 | } 29 | } 30 | for (i in 1:nrow(esti_matrix)){ 31 | esti_matrix[i,1]<-esti_matrix[i,1]/esti_matrix[i,6] 32 | esti_matrix[i,2]<-esti_matrix[i,2]/esti_matrix[i,7] 33 | esti_matrix[i,4]<-esti_matrix[i,4]/esti_matrix[i,8] 34 | esti_matrix[i,5]<-esti_matrix[i,5]/esti_matrix[i,9] 35 | } 36 | na_pos<-which(is.na(gene_chara),arr.ind=TRUE) 37 | for (i in 1:nrow(na_pos)){ 38 | gene_chara[na_pos[i,1],na_pos[i,2]]<-esti_matrix[gene_chara_temp[na_pos[i,1],17],na_pos[i,2]-4] 39 | } 40 | return(sum(is.na(gene_chara))) 41 | } 42 | 43 | left<-1 44 | right<-as.numeric(args[1]) 45 | if(left==right){ 46 | cluster_number<-left 47 | middle<-0 48 | } 49 | while(right-left>1){ 50 | middle<-round((left+right)/2) 51 | if(f(middle)==0){ 52 | left<-middle 53 | } 54 | else{ 55 | right<-middle 56 | } 57 | } 58 | if(left==middle && f(right)==0){ 59 | cluster_number<-right 60 | } else if(left==middle && f(right)!=0){ 61 | cluster_number<-left 62 | } else { 63 | cluster_number<-left 64 | } 65 | cat("interpolation number is",cluster_number,"\n") 66 | #---------------------------------------------------------------------------------------# 67 | result<-cutree(mode,k=cluster_number) 68 | gene_chara_temp<-cbind(gene_chara,result) 69 | esti_matrix<-matrix(0,nrow=cluster_number,ncol=9) 70 | for (i in 1:nrow(gene_chara_temp)){ 71 | if (!is.na(gene_chara_temp[i,5])){ 72 | esti_matrix[gene_chara_temp[i,17],1]<-esti_matrix[gene_chara_temp[i,17],1]+gene_chara_temp[i,5] 73 | esti_matrix[gene_chara_temp[i,17],6]<-esti_matrix[gene_chara_temp[i,17],6]+1 74 | } 75 | if (!is.na(gene_chara_temp[i,6])){ 76 | esti_matrix[gene_chara_temp[i,17],2]<-esti_matrix[gene_chara_temp[i,17],2]+gene_chara_temp[i,6] 77 | esti_matrix[gene_chara_temp[i,17],7]<-esti_matrix[gene_chara_temp[i,17],7]+1 78 | } 79 | if (!is.na(gene_chara_temp[i,8])){ 80 | esti_matrix[gene_chara_temp[i,17],4]<-esti_matrix[gene_chara_temp[i,17],4]+gene_chara_temp[i,8] 81 | esti_matrix[gene_chara_temp[i,17],8]<-esti_matrix[gene_chara_temp[i,17],8]+1 82 | } 83 | if(!is.na(gene_chara_temp[i,9])){ 84 | esti_matrix[gene_chara_temp[i,17],5]<-esti_matrix[gene_chara_temp[i,17],5]+gene_chara_temp[i,9] 85 | esti_matrix[gene_chara_temp[i,17],9]<-esti_matrix[gene_chara_temp[i,17],9]+1 86 | } 87 | } 88 | for (i in 1:nrow(esti_matrix)){ 89 | esti_matrix[i,1]<-esti_matrix[i,1]/esti_matrix[i,6] 90 | esti_matrix[i,2]<-esti_matrix[i,2]/esti_matrix[i,7] 91 | esti_matrix[i,4]<-esti_matrix[i,4]/esti_matrix[i,8] 92 | esti_matrix[i,5]<-esti_matrix[i,5]/esti_matrix[i,9] 93 | } 94 | na_pos<-which(is.na(gene_chara),arr.ind=TRUE) 95 | for (i in 1:nrow(na_pos)){ 96 | gene_chara[na_pos[i,1],na_pos[i,2]]<-esti_matrix[gene_chara_temp[na_pos[i,1],17],na_pos[i,2]-4] 97 | } 98 | #----------------------------------------------------------------------------------------# 99 | #-------------------------cluster number used to estimate BMR----------------------------# 100 | #----------------------------------------------------------------------------------------# 101 | args2<-as.numeric(args[2]) 102 | cluster_number4<-args2 103 | distance4<-dist(gene_chara[,c(5,6,8,9)],method="euclidean") 104 | mode4<-hclust(distance4,method="ward.D") 105 | result4<-cutree(mode4,k=cluster_number4) 106 | gene_class_tmp<-cbind(gene_chara,result4) 107 | gene_class<-gene_class_tmp[,c(1,17)] 108 | 109 | out_name<-paste(as.character(args[3]),"_gene-class.tmp",sep="") 110 | 111 | 112 | sink(out_name) 113 | gene_class 114 | -------------------------------------------------------------------------------- /exam-eta.pl: -------------------------------------------------------------------------------- 1 | #!/usr/bin/perl 2 | use strict; 3 | use autodie; 4 | use Getopt::Long; 5 | our$file_number; 6 | our$date; 7 | my$options=GetOptions( 8 | 'f|file=s' => \$file_number, 9 | 'd|date=s' => \$date, 10 | ); 11 | open ETA,'<',"${date}_subclass-$file_number-eta.tmp"; 12 | #open ETA,'<',"subclass-$file_number.tmpeta.tmp"; 13 | while(){ 14 | chomp; 15 | if($_<=0){ 16 | #exit(1); 17 | print(1); 18 | } 19 | } 20 | close ETA; 21 | #exit(0) 22 | print(0); 23 | -------------------------------------------------------------------------------- /example/UVM-nohup.out: -------------------------------------------------------------------------------- 1 | Final cluster number is 1 2 | interpolation number is 21 3 | invalid format of gene class in line 17712 4 | invalid format of gene class in line 17739 5 | invalid format of gene class in line 17858 6 | The number of genes which are in the input file but not in the characteristic file is 17 7 | interpolation number is 9 8 | invalid format of gene class in line 17691 9 | invalid format of gene class in line 17696 10 | invalid format of gene class in line 17698 11 | invalid format of gene class in line 17765 12 | invalid format of gene class in line 17833 13 | invalid format of gene class in line 17847 14 | invalid format of gene class in line 17853 15 | invalid format of gene class in line 17868 16 | invalid format of gene class in line 17902 17 | invalid format of gene class in line 17916 18 | invalid format of gene class in line 17943 19 | invalid format of gene class in line 17946 20 | invalid format of gene class in line 17948 21 | invalid format of gene class in line 17961 22 | invalid format of gene class in line 17972 23 | invalid format of gene class in line 18014 24 | invalid format of gene class in line 18019 25 | invalid format of gene class in line 18026 26 | invalid format of gene class in line 18032 27 | invalid format of gene class in line 18039 28 | invalid format of gene class in line 18074 29 | invalid format of gene class in line 18099 30 | invalid format of gene class in line 18175 31 | invalid format of gene class in line 18179 32 | invalid format of gene class in line 18197 33 | invalid format of gene class in line 18198 34 | invalid format of gene class in line 18205 35 | invalid format of gene class in line 18251 36 | invalid format of gene class in line 18262 37 | invalid format of gene class in line 18302 38 | invalid format of gene class in line 18315 39 | invalid format of gene class in line 18330 40 | invalid format of gene class in line 18339 41 | invalid format of gene class in line 18353 42 | invalid format of gene class in line 18432 43 | invalid format of gene class in line 18489 44 | invalid format of gene class in line 18511 45 | invalid format of gene class in line 18560 46 | invalid format of gene class in line 18563 47 | invalid format of gene class in line 18574 48 | invalid format of gene class in line 18579 49 | invalid format of gene class in line 18633 50 | invalid format of gene class in line 18636 51 | invalid format of gene class in line 18638 52 | invalid format of gene class in line 18651 53 | invalid format of gene class in line 18693 54 | invalid format of gene class in line 18730 55 | invalid format of gene class in line 18765 56 | invalid format of gene class in line 18775 57 | invalid format of gene class in line 18780 58 | invalid format of gene class in line 18795 59 | invalid format of gene class in line 18802 60 | invalid format of gene class in line 18809 61 | invalid format of gene class in line 18810 62 | invalid format of gene class in line 18821 63 | invalid format of gene class in line 18822 64 | invalid format of gene class in line 18838 65 | invalid format of gene class in line 18842 66 | invalid format of gene class in line 18861 67 | invalid format of gene class in line 18877 68 | invalid format of gene class in line 18896 69 | invalid format of gene class in line 18909 70 | invalid format of gene class in line 18910 71 | invalid format of gene class in line 18967 72 | invalid format of gene class in line 19004 73 | invalid format of gene class in line 19065 74 | invalid format of gene class in line 19073 75 | invalid format of gene class in line 19091 76 | invalid format of gene class in line 19106 77 | invalid format of gene class in line 19109 78 | invalid format of gene class in line 19123 79 | invalid format of gene class in line 19134 80 | invalid format of gene class in line 19153 81 | invalid format of gene class in line 19165 82 | invalid format of gene class in line 19169 83 | invalid format of gene class in line 19188 84 | invalid format of gene class in line 19189 85 | invalid format of gene class in line 19199 86 | invalid format of gene class in line 19204 87 | invalid format of gene class in line 19246 88 | invalid format of gene class in line 19251 89 | invalid format of gene class in line 19260 90 | invalid format of gene class in line 19270 91 | invalid format of gene class in line 19328 92 | invalid format of gene class in line 19342 93 | invalid format of gene class in line 19380 94 | invalid format of gene class in line 19383 95 | invalid format of gene class in line 19389 96 | invalid format of gene class in line 19391 97 | invalid format of gene class in line 19414 98 | invalid format of gene class in line 19422 99 | invalid format of gene class in line 19458 100 | invalid format of gene class in line 19484 101 | invalid format of gene class in line 19488 102 | invalid format of gene class in line 19511 103 | invalid format of gene class in line 19513 104 | invalid format of gene class in line 19548 105 | invalid format of gene class in line 19601 106 | invalid format of gene class in line 19652 107 | invalid format of gene class in line 19653 108 | invalid format of gene class in line 19662 109 | invalid format of gene class in line 19675 110 | invalid format of gene class in line 19683 111 | invalid format of gene class in line 19753 112 | invalid format of gene class in line 19768 113 | invalid format of gene class in line 19796 114 | invalid format of gene class in line 19822 115 | invalid format of gene class in line 19826 116 | invalid format of gene class in line 19841 117 | invalid format of gene class in line 19880 118 | invalid format of gene class in line 19883 119 | invalid format of gene class in line 19884 120 | invalid format of gene class in line 19885 121 | invalid format of gene class in line 19915 122 | invalid format of gene class in line 19918 123 | invalid format of gene class in line 19928 124 | invalid format of gene class in line 19933 125 | invalid format of gene class in line 19948 126 | invalid format of gene class in line 19955 127 | invalid format of gene class in line 19967 128 | invalid format of gene class in line 19995 129 | invalid format of gene class in line 20003 130 | invalid format of gene class in line 20011 131 | invalid format of gene class in line 20031 132 | invalid format of gene class in line 20081 133 | invalid format of gene class in line 20089 134 | invalid format of gene class in line 20128 135 | invalid format of gene class in line 20134 136 | invalid format of gene class in line 20135 137 | invalid format of gene class in line 20155 138 | invalid format of gene class in line 20160 139 | invalid format of gene class in line 20181 140 | invalid format of gene class in line 20188 141 | invalid format of gene class in line 20191 142 | The number of genes which are in the input file but not in the characteristic file is 298 143 | Use of uninitialized value within %mtype_mbase in hash element at /data/itmll/yhan/DriverML_v1.0.5/reduction.pl line 266. 144 | used (Mb) gc trigger (Mb) max used (Mb) 145 | Ncells 94878 5.1 350000 18.7 235657 12.6 146 | Vcells 203185 1.6 786432 6.0 690025 5.3 147 | used (Mb) gc trigger (Mb) max used (Mb) 148 | Ncells 117992 6.4 592000 31.7 592000 31.7 149 | Vcells 726339965 5541.6 3240482023 24723.0 3649161952 27840.9 150 | used (Mb) gc trigger (Mb) max used (Mb) 151 | Ncells 180641 9.7 592000 31.7 592000 31.7 152 | Vcells 727211348 5548.2 2592385618 19778.4 3649161952 27840.9 153 | used (Mb) gc trigger (Mb) max used (Mb) 154 | Ncells 182637 9.8 592000 31.7 592000 31.7 155 | Vcells 727236129 5548.4 2073908494 15822.7 3649161952 27840.9 156 | used (Mb) gc trigger (Mb) max used (Mb) 157 | Ncells 182822 9.8 592000 31.7 592000 31.7 158 | Vcells 727281587 5548.8 2073908494 15822.7 3649161952 27840.9 159 | used (Mb) gc trigger (Mb) max used (Mb) 160 | Ncells 182874 9.8 592000 31.7 592000 31.7 161 | Vcells 727298593 5548.9 2073908494 15822.7 3649161952 27840.9 162 | [1] 1.1119029 0.9715070 1.0511081 1.1830234 1.0959873 1.0603961 1.0831951 163 | [8] 1.0697963 1.0387563 0.9572351 0.9863031 1.0366762 1.0487524 1.0273827 164 | [15] 0.9911376 1.0461516 1.0571540 0.9665039 0.9426695 1.0079739 1.0691697 165 | [22] 1.0181281 1.0541144 1.0274202 1.0260951 1.0617493 0.9409554 0.7729538 166 | [29] 0.3541738 167 | used (Mb) gc trigger (Mb) max used (Mb) 168 | Ncells 186336 10.0 592000 31.7 592000 31.7 169 | Vcells 727780517 5552.6 2073908494 15822.7 3649161952 27840.9 170 | used (Mb) gc trigger (Mb) max used (Mb) 171 | Ncells 186436 10.0 592000 31.7 592000 31.7 172 | Vcells 730307569 5571.9 2073908494 15822.7 3649161952 27840.9 173 | used (Mb) gc trigger (Mb) max used (Mb) 174 | Ncells 186430 10.0 592000 31.7 592000 31.7 175 | Vcells 4278693 32.7 1659126795 12658.2 3649161952 27840.9 176 | used (Mb) gc trigger (Mb) max used (Mb) 177 | Ncells 184255 9.9 592000 31.7 592000 31.7 178 | Vcells 4209585 32.2 1327301436 10126.6 3649161952 27840.9 179 | used (Mb) gc trigger (Mb) max used (Mb) 180 | Ncells 184253 9.9 592000 31.7 592000 31.7 181 | Vcells 4211167 32.2 1061841148 8101.3 3649161952 27840.9 182 | used (Mb) gc trigger (Mb) max used (Mb) 183 | Ncells 185810 10.0 592000 31.7 592000 31.7 184 | Vcells 4336019 33.1 849472918 6481.0 3649161952 27840.9 185 | [1] 0.9140785 186 | used (Mb) gc trigger (Mb) max used (Mb) 187 | Ncells 185818 10.0 592000 31.7 592000 31.7 188 | Vcells 4336021 33.1 679578334 5184.8 3649161952 27840.9 189 | used (Mb) gc trigger (Mb) max used (Mb) 190 | Ncells 185822 10.0 592000 31.7 592000 31.7 191 | Vcells 4301910 32.9 543662667 4147.9 3649161952 27840.9 192 | used (Mb) gc trigger (Mb) max used (Mb) 193 | Ncells 191149 10.3 592000 31.7 592000 31.7 194 | Vcells 4257357 32.5 434930133 3318.3 3649161952 27840.9 195 | read 1read 2101_18_02_42_02_343490708 196 | -------------------------------------------------------------------------------- /example_specific_training.txt: -------------------------------------------------------------------------------- 1 | T183 AASS 7 121721588 G T Missense_Mutation 2 | K32 ABCA1 9 107556793 T A Splice_Site 3 | RC-T ABCA6 17 67102339 C A Silent 4 | K29 ABCB8 7 150741205 CCTGGA - In_Frame_Indel 5 | 001 ABHD11 7 73151579 G A Missense_Mutation 6 | T-c ABHD5 3 43743774 G C Missense_Mutation 7 | T-a ABHD5 3 43743914 G T Missense_Mutation 8 | K44 ABTB2 11 34181526 A C Missense_Mutation 9 | PD2147a ACBD4 17 43214437 T - Frame_Shift_Indel 10 | PD2147a ACOT12 5 80643677 A T Missense_Mutation 11 | K20 ACSM5 16 20442562 C T Silent 12 | T163 ACVR1C 2 158395186 ATA - In_Frame_Indel 13 | PD2144a ACVR2A 2 148680563 G A Missense_Mutation 14 | 002 ADAL 15 43632530 C T Missense_Mutation 15 | K38 ADAM11 17 42850280 T C Missense_Mutation 16 | 002 ADAM12 10 127737976 C T Missense_Mutation 17 | PD2144a ADAM17 2 9683393 G C Nonsense_Mutation 18 | 001 ADAMTS10 19 8668659 C T Missense_Mutation 19 | RC-T ADAMTS10 19 8668659 C T Missense_Mutation 20 | 001 ADAMTSL4 1 150532552 AGGAC - Frame_Shift_Indel 21 | K20 ADCY9 16 4164729 G A Missense_Mutation 22 | K1 ADPRM 17 10608832 GA - Frame_Shift_Indel 23 | T164 AFF2 X 148055040 G A Missense_Mutation 24 | PD2127a AFF3 2 100625295 A G Silent 25 | PD2147a AFF4 5 132232053 A T Missense_Mutation 26 | T164 AFM 4 74357644 A G Missense_Mutation 27 | 002 AGER 6 32150947 G T Missense_Mutation 28 | RC-T AGR3 7 16900147 T G Missense_Mutation 29 | K27 AGXT 2 241812428 C T Missense_Mutation 30 | T183 AHCYL1 1 110561085 A C Missense_Mutation 31 | K27 AHNAK 11 62296799 T A Missense_Mutation 32 | T183 AIM1 6 106992792 A - Frame_Shift_Indel 33 | K27 AKAP4 X 49958409 G T Missense_Mutation 34 | 001 AKAP8 19 15471720 G A Missense_Mutation 35 | K1 AKAP9 7 91643642 G T Missense_Mutation 36 | 001 AKAP9 7 91690733 G C Missense_Mutation 37 | K44 AKR1D1 7 137773499 C A Silent 38 | T127 ALDH2 12 112230482 G A Missense_Mutation 39 | T142 ALG2 9 101980550 T A Missense_Mutation 40 | 001 ALKBH8 11 107424632 T A Missense_Mutation 41 | 001 ALS2CR12 2 202216087 G T Missense_Mutation 42 | T164 AMOTL2 3 134085210 - G Frame_Shift_Indel 43 | K38 AMPD2 1 110169834 G A Silent 44 | T144 ANGPTL2 9 129851349 - G Frame_Shift_Indel 45 | 001 ANKRD26 10 27326806 C G Missense_Mutation 46 | T144 ANKRD35 1 145562188 A T Missense_Mutation 47 | T183 ANKS3 16 4747097 G T Missense_Mutation 48 | T127 ANLN 7 36435987 T A Missense_Mutation 49 | 001 ANO5 11 22294457 T - Frame_Shift_Indel 50 | PD2126a ANO6 12 45803231 A G Missense_Mutation 51 | T164 AP3M2 8 42025195 G A Nonsense_Mutation 52 | K44 APC 5 112174742 G C Missense_Mutation 53 | PD2125a APOC2 19 45451775 C T Silent 54 | PD2127a AREL1 14 75142441 C T Silent 55 | T-i ARHGEF3 3 56787540 C A Missense_Mutation 56 | PD2125a ARHGEF7 13 111944635 A G Missense_Mutation 57 | T164 ARID1A 1 27057801 - AAGGCCCCAGCGGGTATGGTCAACAGGGC Frame_Shift_Indel 58 | PD2126a ARID1A 1 27094351 G A Missense_Mutation 59 | PD2127a ARID1A 1 27106655 T C Missense_Mutation 60 | T166 ARID2 12 46246626 G T Missense_Mutation 61 | PD2127a ARID5B 10 63850639 A T Nonsense_Mutation 62 | K44 ARNT 1 150786561 G T Missense_Mutation 63 | K44 ASB15 7 123276929 CT - Frame_Shift_Indel 64 | PD2147a ASB16 17 42249629 A C Missense_Mutation 65 | T163 ASTN1 1 176903346 C A Missense_Mutation 66 | PD2127a ASTN2 9 119976966 G A Missense_Mutation 67 | T127 ATF7IP 12 14577562 A G Missense_Mutation 68 | PD2127a ATG2B 14 96794874 T A Splice_Site 69 | T-e ATG7 3 11383650 G T Missense_Mutation 70 | T183 ATM 11 108142070 CAA - In_Frame_Indel 71 | T163 ATP10A 15 25926168 G T Missense_Mutation 72 | 002 ATP11A 13 113481146 G A Missense_Mutation 73 | K20 ATP13A2 1 17322541 AG - Frame_Shift_Indel 74 | K38 ATP2B4 1 203669410 A G Silent 75 | K44 ATP6V0B 1 44441484 C A Missense_Mutation 76 | T144 ATP7B 13 52518409 C G Missense_Mutation 77 | K1 ATP8B1 18 55315940 T C Missense_Mutation 78 | K38 ATRNL1 10 116889189 G A Missense_Mutation 79 | T163 ATRNL1 10 117309044 G T Nonsense_Mutation 80 | 001 ATXN1 6 16306994 G A Missense_Mutation 81 | PD2126a ATXN10 22 46085613 C A Silent 82 | 002 ATXN3 14 92548802 T A Missense_Mutation 83 | T-b ATXN7 3 63985157 C A Missense_Mutation 84 | 002 B4GAT1 11 66114447 G - Frame_Shift_Indel 85 | K27 B4GALNT1 12 58025149 T A Splice_Site 86 | K38 BAG3 10 121436485 A G Silent 87 | K27 BAI3 6 70049278 T C Missense_Mutation 88 | K44 BAP1 3 52437803 TT - Frame_Shift_Indel 89 | T-b BAP1 3 52439842 G - Frame_Shift_Indel 90 | T166 BAP1 3 52442066 C G Missense_Mutation 91 | T22 BARHL2 1 91182719 C A Nonsense_Mutation 92 | 001 BCAS2 1 115118221 C T Missense_Mutation 93 | 001 BCL11A 2 60688070 C A Missense_Mutation 94 | T144 BCL3 19 45260646 G C Missense_Mutation 95 | 002 BCL7A 12 122468662 C A Missense_Mutation 96 | T166 BCL9L 11 118772682 C G Missense_Mutation 97 | T144 BDKRB2 14 96707683 C G Missense_Mutation 98 | K3 BIRC6 2 32692607 G T Missense_Mutation 99 | T183 BMP2 20 6758975 T G Missense_Mutation 100 | T166 BMP6 6 7727525 AGC - In_Frame_Indel 101 | K44 BMX X 15560290 T G Missense_Mutation 102 | T166 BPHL 6 3129277 A T Splice_Site 103 | K27 BRD4 19 15365041 T A Nonsense_Mutation 104 | T-a BSN 3 49691434 C T Missense_Mutation 105 | PD2127a BTBD9 6 38142761 T A Nonstop_Mutation 106 | K48 BTNL2 6 32370970 G - Frame_Shift_Indel 107 | T144 BZRAP1 17 56389871 C T Missense_Mutation 108 | PD2125a C10orf120 10 124457732 C G Silent 109 | K31 EDRF1 10 127441424 G C Missense_Mutation 110 | 001 C11orf68 11 65685051 C T Missense_Mutation 111 | K38 LRRC74A 14 77304288 C A Missense_Mutation 112 | K38 KDF1 1 27278663 T C Missense_Mutation 113 | T163 C1orf177 1 55273243 C G Missense_Mutation 114 | RC-T C22orf31 22 29454821 G A Missense_Mutation 115 | 001 C3 19 6713210 G C Missense_Mutation 116 | 001 C3orf20 3 14803014 G C Missense_Mutation 117 | K44 C6 5 41154051 T C Silent 118 | T-h CACNA2D3 3 55107571 C T Silent 119 | K29 CAD 2 27445851 G A Missense_Mutation 120 | T-d CADM2 3 85935413 C T Silent 121 | K20 CALML6 1 1847128 C T Missense_Mutation 122 | T164 CAMLG 5 134076999 C G Missense_Mutation 123 | 002 CAPN14 2 31424845 C A Missense_Mutation 124 | T-i CAPN7 3 15259048 A G Missense_Mutation 125 | K44 CARD11 7 2946429 C A Missense_Mutation 126 | K44 CARD11 7 2963978 G C Missense_Mutation 127 | K32 CARD6 5 40853250 C T Nonsense_Mutation 128 | K20 CARM1 19 11027411 C G Silent 129 | T127 CASKIN1 16 2235347 G T Missense_Mutation 130 | 001 CASP2 7 143001797 G T Missense_Mutation 131 | K3 CASP5 11 104878000 A T Missense_Mutation 132 | PD2126a CAST 5 96077268 A C Missense_Mutation 133 | T-h CAV3 3 8787347 C G Missense_Mutation 134 | T144 CBS 21 44486486 C A Missense_Mutation 135 | K44 ACKR2 3 42906536 T A Missense_Mutation 136 | PD2125a CCDC103 17 42979966 G A Silent 137 | RC-T CCDC109B 4 110606404 A T Splice_Site 138 | K38 PRIMPOL 4 185587192 T A Missense_Mutation 139 | T144 CCDC124 19 18053576 A T Missense_Mutation 140 | T163 CCDC155 19 49899051 G A Missense_Mutation 141 | PD2126a CCDC36 3 49293588 ATAGAAA - Frame_Shift_Indel 142 | T-h CCDC36 3 49293819 T G Missense_Mutation 143 | K44 CCDC60 12 119960773 G T Missense_Mutation 144 | T-g CCDC66 3 56653900 A G Missense_Mutation 145 | PD2125a CCL28 5 43388453 C - Splice_Site 146 | PD2147a CCNDBP1 15 43483790 G A Silent 147 | 001 CCR6 6 167550439 T A Missense_Mutation 148 | 002 CCT3 1 156304539 C - Frame_Shift_Indel 149 | K20 CD4 12 6926395 C T Missense_Mutation 150 | 001 CD44 11 35231528 G C Missense_Mutation 151 | T166 CDC37 19 10505818 T G Missense_Mutation 152 | K38 CDC42BPB 14 103410391 C A Silent 153 | K27 CDCA7 2 174224113 A T Missense_Mutation 154 | T-c CDCP1 3 45130627 G C Missense_Mutation 155 | 001 CDH19 18 64235921 G T Missense_Mutation 156 | RC-T CDH5 16 66420923 T A Missense_Mutation 157 | T166 CDH8 16 61687757 C A Missense_Mutation 158 | PD2126a CDK17 12 96692721 G A Missense_Mutation 159 | 001 CDKN1B 12 12871131 G - Frame_Shift_Indel 160 | T164 CEP350 1 180022917 T G Missense_Mutation 161 | T144 CEP68 2 65299963 A T Missense_Mutation 162 | T183 CEP68 2 65301495 G A Missense_Mutation 163 | T183 CEP68 2 65301497 T G Missense_Mutation 164 | K44 CEP85 1 26582054 C - Frame_Shift_Indel 165 | K31 CEP97 3 101447680 G T Missense_Mutation 166 | T142 CEP97 3 101450699 G A Missense_Mutation 167 | T142 CFH 1 196694326 A T Missense_Mutation 168 | T142 CFHR5 1 196963349 C A Nonsense_Mutation 169 | K44 CHD1 5 98223884 C T Missense_Mutation 170 | T144 CHRM2 7 136700938 TGCACTT - Frame_Shift_Indel 171 | K1 CHRNB1 17 7358628 A G Missense_Mutation 172 | 001 CIB2 15 78416057 C - Frame_Shift_Indel 173 | 001 CLCN2 3 184072350 C - Frame_Shift_Indel 174 | 002 CLEC3B 3 45077050 C A Nonsense_Mutation 175 | T142 CLIP4 2 29358480 G T Missense_Mutation 176 | K27 CLMN 14 95690192 A G Missense_Mutation 177 | T166 CLPTM1L 5 1341916 A T Missense_Mutation 178 | T183 CLPX 15 65449241 T C Missense_Mutation 179 | K44 CLYBL 13 100517139 G C Missense_Mutation 180 | T-g CMTM8 3 32398997 A G Missense_Mutation 181 | T142 CNDP1 18 72223592 - TGC In_Frame_Indel 182 | K20 CNGB3 8 87616420 A G Missense_Mutation 183 | PD2125a CNKSR1 1 26515956 C T Missense_Mutation 184 | 002 CNTN2 1 205041237 G A Missense_Mutation 185 | T-b CNTN3 3 74413659 T A Missense_Mutation 186 | T-g CNTN4 3 3081933 G A Silent 187 | T-i CNTN6 3 1337371 T C Silent 188 | K44 COG4 16 70557415 G T Missense_Mutation 189 | T144 COL11A1 1 103491861 C A Missense_Mutation 190 | T142 COL17A1 10 105809175 C T Missense_Mutation 191 | T166 COL17A1 10 105813716 G C Missense_Mutation 192 | T183 COL21A1 6 56031778 C A Missense_Mutation 193 | RC-T COL22A1 8 139715576 A C Silent 194 | T166 COL2A1 12 48381420 C A Missense_Mutation 195 | 002 COL2A1 12 48370920 C A Nonsense_Mutation 196 | T163 COL4A3BP 5 74722195 C T Splice_Site 197 | PD2127a COL5A1 9 137686950 C T Missense_Mutation 198 | T164 COL5A2 2 189929291 - ACTTGTGTCAAATAAGGGCCAATGCTTTGTATCC Frame_Shift_Indel 199 | RC-T COL5A3 19 10116301 G A Missense_Mutation 200 | T163 COL7A1 3 48608117 T A Missense_Mutation 201 | K44 COLGALT1 19 17683335 G C Missense_Mutation 202 | K20 COPB1 11 14515799 C T Missense_Mutation 203 | T164 COPG2 7 130337763 TTACC - Frame_Shift_Indel 204 | T-j CPNE9 3 9756618 A G Missense_Mutation 205 | 002 CPVL 7 29103731 T G Missense_Mutation 206 | K44 CPXM1 20 2775185 G T Missense_Mutation 207 | K27 CR2 1 207639869 A T Splice_Site 208 | T164 CREBBP 16 3801726 C - Splice_Site 209 | T22 CRISPLD1 8 75926309 G A Missense_Mutation 210 | K1 CRLF3 17 29120510 A T Missense_Mutation 211 | T166 CRY2 11 45892062 C A Missense_Mutation 212 | K44 CSMD3 8 113668551 A - Frame_Shift_Indel 213 | PD2127a CSMD3 8 113358398 T A Missense_Mutation 214 | PD2125a CSN3 4 71114990 C A Silent 215 | T22 CST8 20 23473650 G A Missense_Mutation 216 | T142 CTCF 16 67660619 G T Splice_Site 217 | T183 CTNNA2 2 80782890 G A Missense_Mutation 218 | T-b CTNNB1 3 41266697 A G Missense_Mutation 219 | T142 CUL3 2 225422456 A G Missense_Mutation 220 | K48 CUL7 6 43010858 A G Missense_Mutation 221 | T183 CUL9 6 43155650 C T Missense_Mutation 222 | RC-T CUZD1 10 124594540 C T Missense_Mutation 223 | 001 RTP5 2 242814354 T G Missense_Mutation 224 | T183 CYLD 16 50816266 A - Frame_Shift_Indel 225 | K38 CYP2E1 10 135346283 G A Missense_Mutation 226 | 001 CYP4F3 19 15769207 CCCAAAG - Frame_Shift_Indel 227 | PD2144a CYTIP 2 158272284 G A Missense_Mutation 228 | PD2126a DAB2 5 39376176 G T Missense_Mutation 229 | 001 DACH2 X 86071070 A G Missense_Mutation 230 | 001 DAPK1 9 90317948 G C Missense_Mutation 231 | T164 DAW1 2 228783522 A C Missense_Mutation 232 | T127 DBR1 3 137886099 ATATATCA - Frame_Shift_Indel 233 | K44 DCAF4 14 73421137 C A Missense_Mutation 234 | RC-T DCAF4L1 4 41984794 G A Missense_Mutation 235 | 002 DCLRE1A 10 115607083 T G Missense_Mutation 236 | T-g DCP1A 3 53378932 C T Missense_Mutation 237 | K38 DCP1B 12 2062324 - GCA In_Frame_Indel 238 | 001 DCSTAMP 8 105361447 G A Missense_Mutation 239 | 002 DCX X 110653364 G A Missense_Mutation 240 | T163 DDX20 1 112303375 ATC - In_Frame_Indel 241 | T166 DDX4 5 55112292 G A Missense_Mutation 242 | 001 DDX52 17 36002244 C G Missense_Mutation 243 | PD2125a DDX53 X 23018596 A C Missense_Mutation 244 | 001 DDX58 9 32487509 A T Missense_Mutation 245 | K44 DENND1A 9 126520094 C T Missense_Mutation 246 | T-b DENND6A 3 57614059 C G Missense_Mutation 247 | T-f DHFRL1 3 93780283 T A Missense_Mutation 248 | T163 DHPS 19 12790278 G A Missense_Mutation 249 | K38 DHX37 12 125438490 G A Silent 250 | 001 DIO1 1 54371787 AAC - In_Frame_Indel 251 | 001 DIRAS3 1 68512683 C T Missense_Mutation 252 | K20 DIS3 13 73350157 G T Missense_Mutation 253 | 001 DIXDC1 11 111888597 T - Frame_Shift_Indel 254 | T-b DLEC1 3 38164073 ATG - In_Frame_Indel 255 | T-i DLEC1 3 38163900 C T Silent 256 | RC-T DLGAP5 14 55649136 A T Silent 257 | T164 DLST 14 75357775 C T Missense_Mutation 258 | T-b DNAH1 3 52386615 C T Silent 259 | T-b DNAH1 3 52404178 A G Missense_Mutation 260 | T-d DNAH1 3 52420315 C T Missense_Mutation 261 | K44 DNAH2 17 7720977 C G Silent 262 | 002 DNAH2 17 7681656 C T Missense_Mutation 263 | K3 DNAH5 5 13770929 G A Silent 264 | K3 DNAH5 5 13891154 G A Silent 265 | 002 DNAJC27 2 25190090 C A Nonsense_Mutation 266 | 002 DNHD1 11 6568627 G C Missense_Mutation 267 | T144 DNMT3A 2 25475066 T A Missense_Mutation 268 | 001 DNMT3A 2 25523017 C A Missense_Mutation 269 | T-b DOCK3 3 51399347 G A Silent 270 | K31 DOCK5 8 25189857 T C Missense_Mutation 271 | 001 DOPEY1 6 83848337 A C Missense_Mutation 272 | 002 DPYSL4 10 134013888 C A Silent 273 | PD2127a DST 6 56471673 T C Missense_Mutation 274 | 001 DUSP12 1 161719926 T G Missense_Mutation 275 | K27 DYM 18 46570457 C A Missense_Mutation 276 | K20 DYNC1H1 14 102510223 G C Missense_Mutation 277 | PD2127a DYNC1I1 7 95657584 A T Missense_Mutation 278 | T144 DYNC2H1 11 103128394 T A Missense_Mutation 279 | T166 EFCAB13 17 45517800 AAAG TTT Frame_Shift_Indel 280 | K38 EGFL7 9 139565408 A G Missense_Mutation 281 | T164 EHHADH 3 184911199 G T Missense_Mutation 282 | PD2147a EIF4B 12 53416352 C G Missense_Mutation 283 | T163 EIF4ENIF1 22 31854568 C T Missense_Mutation 284 | 001 EIF4G2 11 10825746 - T Frame_Shift_Indel 285 | K38 ELN 7 73466081 T A Silent 286 | T163 ELP3 8 27989919 A - Frame_Shift_Indel 287 | T-f EMC3 3 10016159 C T Silent 288 | 002 EMP1 12 13366418 G A Nonsense_Mutation 289 | T144 ENAH 1 225742758 C A Missense_Mutation 290 | 002 ENDOU 12 48110160 G A Missense_Mutation 291 | 002 ENPP5 6 46135891 C T Missense_Mutation 292 | T-e ENTPD3 3 40442421 A G Missense_Mutation 293 | K44 EPHA7 6 94066736 CATCAACCAA - Frame_Shift_Indel 294 | T-h EPM2AIP1 3 37033820 A G Missense_Mutation 295 | K48 EPSTI1 13 43462434 T G Missense_Mutation 296 | K31 ERCC5 13 103514525 A G Silent 297 | 001 ERCC5 13 103514868 C A Missense_Mutation 298 | 002 ERCC6L2 9 98684586 G C Splice_Site 299 | K20 ERICH1 8 623600 G A Missense_Mutation 300 | T144 ERP27 12 15067675 ACTCTGA - Frame_Shift_Indel 301 | PD2147a ESYT2 7 158529750 C T Silent 302 | 001 EXT1 8 118816994 C G Missense_Mutation 303 | RC-T F2RL2 5 75914104 C A Missense_Mutation 304 | K31 FAAH 1 46879189 G T Missense_Mutation 305 | K29 FAM111B 11 58877138 G A Missense_Mutation 306 | T166 FAM126B 2 201846181 T C Missense_Mutation 307 | 001 FAM129B 9 130271358 C A Missense_Mutation 308 | T163 FAM13C 10 61029715 T A Missense_Mutation 309 | PD2127a FAM173B 5 10239283 A G Silent 310 | K1 FAM200A 7 99145282 G C Missense_Mutation 311 | T142 FAM84A 2 14774146 G A Missense_Mutation 312 | 002 EEF2KMT 16 5143514 G T Silent 313 | K44 FAM92B 16 85135911 G T Missense_Mutation 314 | T127 FANCM 14 45645649 C T Missense_Mutation 315 | T183 FAR1 11 13729529 C T Missense_Mutation 316 | 002 FARP1 13 99045913 C A Missense_Mutation 317 | K38 FARP1 13 99076833 C T Missense_Mutation 318 | T-g FBLN2 3 13672866 C T Silent 319 | PD2127a FBN2 5 127627328 T A Missense_Mutation 320 | T144 FBXL16 16 746872 G T Missense_Mutation 321 | T144 FBXL20 17 37499495 C T Splice_Site 322 | PD2147a FBXO28 1 224345375 C A Missense_Mutation 323 | 001 FBXO4 5 41934121 C G Silent 324 | PD2127a FER 5 108294968 C T Nonsense_Mutation 325 | T164 FFAR4 10 95347067 TCT - In_Frame_Indel 326 | T-g FGD5 3 14862732 C T Silent 327 | K44 FGR 1 27939534 G A Missense_Mutation 328 | K20 FIP1L1 4 54319249 AG - Frame_Shift_Indel 329 | 002 FLG 1 152286668 G T Missense_Mutation 330 | PD2125a FLRT3 20 14306799 T C Missense_Mutation 331 | T142 FMN1 15 33256306 A G Splice_Site 332 | K1 FMN2 1 240370503 G C Missense_Mutation 333 | PD2126a FOCAD 9 20990265 T A Silent 334 | 002 FOXK2 17 80545003 G T Missense_Mutation 335 | T144 FRA10AC1 10 95447204 A C Missense_Mutation 336 | T22 FRMD1 6 168465592 C T Missense_Mutation 337 | K38 FRMPD2 10 49414802 T C Missense_Mutation 338 | T166 FTO 16 53860105 C G Missense_Mutation 339 | 002 FUBP3 9 133488388 C A Missense_Mutation 340 | K38 FUT3 19 5844669 G A Missense_Mutation 341 | T166 FYB 5 39134413 C T Missense_Mutation 342 | T166 FYB 5 39134990 C G Missense_Mutation 343 | T-e FYCO1 3 46008983 G A Missense_Mutation 344 | T166 FZD2 17 42636527 G C Missense_Mutation 345 | PD2127a FZR1 19 3532607 C T Nonsense_Mutation 346 | PD2147a G6PC 17 41053120 A G Missense_Mutation 347 | T144 GAL3ST4 7 99764193 G T Missense_Mutation 348 | 001 GALNT11 7 151805164 G C Missense_Mutation 349 | K48 GALNTL6 4 173232810 A G Missense_Mutation 350 | T183 GBA2 9 35737276 G C Missense_Mutation 351 | T163 GBGT1 9 136029647 A C Missense_Mutation 352 | RC-T GBP6 1 89844012 G A Silent 353 | RC-T GCNT2 6 10586542 T C Missense_Mutation 354 | K48 GDI1 X 153670744 A T Missense_Mutation 355 | PD2125a GEN1 2 17962775 T C Missense_Mutation 356 | PD2127a GLI1 12 57865610 C T Silent 357 | T144 GLI3 7 42005868 C A Missense_Mutation 358 | K31 GLIS1 1 53972434 G A Missense_Mutation 359 | K1 GLYR1 16 4871547 C T Splice_Site 360 | PD2125a GPANK1 6 31630173 G T Missense_Mutation 361 | PD2144a GPATCH8 17 42475605 G T Silent 362 | PD2127a GPI 19 34859538 C T Silent 363 | 002 GPR126 6 142725060 A C Missense_Mutation 364 | 002 GPR149 3 154146593 G T Missense_Mutation 365 | RC-T GPR158 10 25887635 A G Missense_Mutation 366 | T166 GPR4 19 46094740 G T Missense_Mutation 367 | T166 GPR52 1 174417931 A T Nonsense_Mutation 368 | 002 GPR98 5 90073831 G A Missense_Mutation 369 | 002 GRHL1 2 10139114 C A Missense_Mutation 370 | 002 GRIK5 19 42546817 G A Silent 371 | T166 GRIN2A 16 9934641 G T Missense_Mutation 372 | T144 GRK1 13 114438187 C A Missense_Mutation 373 | T183 GRK4 4 3021416 C G Missense_Mutation 374 | T-b GRM2 3 51743411 G A Missense_Mutation 375 | T164 GRM4 6 34100955 C A Missense_Mutation 376 | T164 GSAP 7 76959659 - TATTTTATTTTATTTTATT Frame_Shift_Indel 377 | K44 POMGNT2 3 43122502 C T Missense_Mutation 378 | T183 GTF2F2 13 45725846 C A Missense_Mutation 379 | K38 GTF2F2 13 45841404 C T Nonsense_Mutation 380 | T142 GTF3C3 2 197653989 TTAA - Frame_Shift_Indel 381 | K48 GTPBP10 7 89984511 A G Missense_Mutation 382 | T163 GTPBP4 10 1045035 G T Missense_Mutation 383 | K48 GTSE1 22 46704412 A G Missense_Mutation 384 | K48 GUCY1A3 4 156651215 C T Silent 385 | PD2144a GYG2 X 2773196 G A Missense_Mutation 386 | PD2144a GZF1 20 23345310 C T Missense_Mutation 387 | 002 HAAO 2 42996948 C A Nonsense_Mutation 388 | T-d HACL1 3 15609395 G A Silent 389 | 001 HDAC6 X 48675014 C A Missense_Mutation 390 | PD2127a HDAC9 7 18914186 C A Missense_Mutation 391 | K1 HEATR4 14 73963342 T G Missense_Mutation 392 | K38 HEATR4 14 73985777 A G Missense_Mutation 393 | T164 HECTD1 14 31598054 A C Missense_Mutation 394 | K31 HECTD2 10 93247507 - T Frame_Shift_Indel 395 | PD2147a HELLS 10 96352065 A G Missense_Mutation 396 | K20 HERC2 15 28389088 T A Silent 397 | T142 HEXB 5 74001087 T A Missense_Mutation 398 | T144 HGSNAT 8 43024320 G T Missense_Mutation 399 | T166 HIF1A 14 62193507 A C Missense_Mutation 400 | T142 HIPK2 7 139299073 G - Frame_Shift_Indel 401 | K20 HIPK3 11 33370139 A C Missense_Mutation 402 | RC-T HLA-DPA1 6 33037639 G A Missense_Mutation 403 | 001 HMG20A 15 77770822 A C Missense_Mutation 404 | PD2127a HMGXB3 5 149386186 A C Missense_Mutation 405 | T142 HNRNPF 10 43882575 T C Missense_Mutation 406 | K48 HNRNPUL1 19 41782071 C G Missense_Mutation 407 | K38 HOXD1 2 177054719 A G Missense_Mutation 408 | 001 HPS5 11 18306944 G - Frame_Shift_Indel 409 | T166 HSDL1 16 84163953 C A Nonsense_Mutation 410 | K44 HSP90AB1 6 44218246 G T Missense_Mutation 411 | 002 HSPA14 10 14891811 G A Splice_Site 412 | T144 HSPBP1 19 55785947 G C Nonsense_Mutation 413 | K20 HYAL3 3 50332534 T C Missense_Mutation 414 | 001 IFI16 1 158984570 A T Nonsense_Mutation 415 | 001 IFNAR1 21 34717554 G A Missense_Mutation 416 | PD2126a IFNB1 9 21077581 A G Silent 417 | T-i IFRD2 3 50327685 C T Missense_Mutation 418 | RC-T IFT81 12 110643239 A G Splice_Site 419 | T144 IGF1R 15 99251192 A G Missense_Mutation 420 | PD2126a IGSF22 11 18745765 G A Missense_Mutation 421 | PD2125a IKZF3 17 37922744 C G Missense_Mutation 422 | 001 IL12RB2 1 67795338 T C Missense_Mutation 423 | T142 IL13RA1 X 117892138 A T Missense_Mutation 424 | T-h IL17RE 3 9948480 A G Missense_Mutation 425 | K48 IL27 16 28511200 GGA - In_Frame_Indel 426 | T164 IMMT 2 86371807 G C Missense_Mutation 427 | T166 INADL 1 62550241 G C Missense_Mutation 428 | T166 INPP5J 22 31523958 C A Missense_Mutation 429 | 001 INTS1 7 1512818 T - Frame_Shift_Indel 430 | K27 IPO13 1 44415087 A T Splice_Site 431 | 002 IQCA1 2 237253239 T A Missense_Mutation 432 | K31 IRX6 16 55362929 G T Nonsense_Mutation 433 | K44 ISYNA1 19 18546649 AGG - In_Frame_Indel 434 | K20 ISYNA1 19 18547022 G A Silent 435 | K38 ITGA10 1 145537475 A G Missense_Mutation 436 | K44 ITGA5 12 54795980 A C Missense_Mutation 437 | 002 ITGA6 2 173333974 G A Missense_Mutation 438 | 001 ITGB3 17 45380104 G T Missense_Mutation 439 | T-b ITIH1 3 52817014 A G Silent 440 | 001 ITIH5 10 7621865 C A Missense_Mutation 441 | T127 ITPR3 6 33659443 G T Missense_Mutation 442 | T166 JAKMIP1 4 6087156 C T Missense_Mutation 443 | 001 KANSL1 17 44116456 G C Missense_Mutation 444 | T-c KAT2B 3 20189472 C A Missense_Mutation 445 | K48 KCNA3 1 111216161 G A Missense_Mutation 446 | K20 KCNA5 12 5154372 C G Silent 447 | T127 KCNH5 14 63246448 G A Missense_Mutation 448 | T163 KCNK5 6 39196709 TCCTA - Frame_Shift_Indel 449 | T127 KCNN2 5 113698701 ACAACAACTCCA - In_Frame_Indel 450 | T164 KCNQ5 6 73904624 G T Missense_Mutation 451 | K44 KCNT2 1 196309673 C T Silent 452 | 001 KDM5C X 53222717 C - Frame_Shift_Indel 453 | 001 KDM5C X 53222723 C G Missense_Mutation 454 | K29 KDM5C X 53253930 G A Missense_Mutation 455 | K20 KDM5C X 53222666 G A Nonsense_Mutation 456 | 001 KDM5C X 53228342 T A Splice_Site 457 | 002 KDM5D Y 21877290 C A Missense_Mutation 458 | PD2147a KDM6A X 44969480 TG - Frame_Shift_Indel 459 | 002 SPIDR 8 48309021 C G Missense_Mutation 460 | 001 KIAA0355 19 34843646 C A Missense_Mutation 461 | K48 RIC1 9 5774177 GAG - In_Frame_Indel 462 | 001 KIAA1524 3 108285373 A - Frame_Shift_Indel 463 | T166 KIAA2022 X 73960664 C T Missense_Mutation 464 | K3 KIF16B 20 16486752 G C Silent 465 | T142 KIF1A 2 241725902 C T Missense_Mutation 466 | T163 KIF1B 1 10386341 G T Nonsense_Mutation 467 | RC-T KIF6 6 39325092 T A Missense_Mutation 468 | PD2147a KIR3DL1 19 55333270 C T Silent 469 | 001 KL 13 33635697 C - Frame_Shift_Indel 470 | T163 KLC1 14 104139370 C T Missense_Mutation 471 | K27 KLC1 14 104139526 T C Splice_Site 472 | PD2127a KLHL13 X 117043963 A T Missense_Mutation 473 | 001 KLHL18 3 47385303 G A Missense_Mutation 474 | T163 KLHL22 22 20812232 C T Missense_Mutation 475 | K31 KLK13 19 51559858 T A Nonsense_Mutation 476 | 001 KLK4 19 51412045 C T Missense_Mutation 477 | K27 KMT2D 12 49426759 AGCAAC - In_Frame_Indel 478 | T166 KNTC1 12 123057718 G A Missense_Mutation 479 | T166 KRT10 17 38976322 T G Missense_Mutation 480 | 001 KRT4 12 53202570 C T Missense_Mutation 481 | 001 LAMA3 18 21533020 T C Missense_Mutation 482 | T166 LAMB1 7 107616265 G A Missense_Mutation 483 | T166 LAMB3 1 209790823 C T Missense_Mutation 484 | 001 LATS2 13 21562148 C A Nonsense_Mutation 485 | 002 LCT 2 136594625 - T Frame_Shift_Indel 486 | K31 LGR5 12 71977673 G C Missense_Mutation 487 | 001 LIAS 4 39466770 A - Frame_Shift_Indel 488 | K27 LILRB2 19 54783669 T C Missense_Mutation 489 | RC-T LIPI 21 15537643 A T Missense_Mutation 490 | T144 LMO1 11 8251962 C T Missense_Mutation 491 | T-b LMOD3 3 69171499 AGA - In_Frame_Indel 492 | PD2127a LOC81691 16 20855285 A C Silent 493 | 002 LPHN2 1 82409438 C G Missense_Mutation 494 | T144 LPIN1 2 11913810 TGGCTG - In_Frame_Indel 495 | T163 LPIN1 2 11943158 - T Frame_Shift_Indel 496 | RC-T LRBA 4 151520231 G A Missense_Mutation 497 | K44 LRCH1 13 47224417 CTGTATCACAA - Frame_Shift_Indel 498 | T142 LRP1 12 57537561 C T Missense_Mutation 499 | PD2125a LRP1B 2 141986909 A G Silent 500 | T164 LRP1B 2 141202132 C A Missense_Mutation 501 | T144 LRP2 2 169985191 T - Frame_Shift_Indel 502 | PD2125a LRP2 2 170034344 G A Silent 503 | PD2147a LRP2 2 170175275 A G Missense_Mutation 504 | T163 LRRC16A 6 25450217 C G Missense_Mutation 505 | 002 LRRC31 3 169557957 C T Missense_Mutation 506 | RC-T LRRK2 12 40699689 A G Missense_Mutation 507 | T183 LRRK2 12 40740662 A G Missense_Mutation 508 | T144 LRRK2 12 40760823 T C Missense_Mutation 509 | T-f LTF 3 46496910 G A Silent 510 | PD2144a LTN1 21 30342888 C T Silent 511 | 001 MAGEB16 X 35820821 TGATG - Frame_Shift_Indel 512 | K31 MAGEC1 X 140996397 C T Silent 513 | T-e MAGI1 3 65456135 T C Missense_Mutation 514 | K38 MAPK13 6 36106721 C A Missense_Mutation 515 | T164 MARCH6 5 10402517 T C Missense_Mutation 516 | T164 MATR3 5 138658633 G T Missense_Mutation 517 | T164 MATR3 5 138658634 C T Missense_Mutation 518 | 002 MAU2 19 19465166 G T Missense_Mutation 519 | K44 MCOLN2 1 85392404 C A Missense_Mutation 520 | K44 MED30 8 118552125 C T Missense_Mutation 521 | T22 MEGF8 19 42853844 G C Missense_Mutation 522 | T144 MEIS1 2 66670063 C - Frame_Shift_Indel 523 | 002 MERTK 2 112786314 C T Missense_Mutation 524 | T166 MET 7 116422117 T G Missense_Mutation 525 | PD2126a METTL13 1 171751154 A G Missense_Mutation 526 | 001 METTL3 14 21966415 G C Missense_Mutation 527 | PD2144a METTL4 18 2567091 G A Missense_Mutation 528 | K20 MFSD6L 17 8701350 C A Silent 529 | 002 MGA 15 42058932 G A Silent 530 | K27 MGAT5 2 135028017 A T Missense_Mutation 531 | T142 MIP 12 56848348 A G Missense_Mutation 532 | T183 MKL1 22 40831511 G A Missense_Mutation 533 | PD2147a MKL2 16 14339507 A G Missense_Mutation 534 | 002 MLLT4 6 168352522 G T Silent 535 | 001 MMAB 12 109998874 G T Nonsense_Mutation 536 | T22 MPDU1 17 7487218 T C Missense_Mutation 537 | 002 MPHOSPH8 13 20245344 A T Splice_Site 538 | T142 MPP4 2 202545715 G A Missense_Mutation 539 | K44 MPRIP 17 17039564 CAG - In_Frame_Indel 540 | 001 MR1 1 181021644 A T Missense_Mutation 541 | K20 MRPL37 1 54665950 C A Missense_Mutation 542 | 001 MRPL51 12 6602287 TT - Frame_Shift_Indel 543 | K27 MSH2 2 47705421 A T Nonsense_Mutation 544 | T166 MSL1 17 38285755 C G Nonsense_Mutation 545 | 001 MSRB2 10 23399206 A - Frame_Shift_Indel 546 | T144 MSTN 2 190924834 C G Missense_Mutation 547 | K20 MTCH2 11 47660308 T A Missense_Mutation 548 | T166 MTOR 1 11168338 C G Missense_Mutation 549 | 001 MTOR 1 11174383 A G Missense_Mutation 550 | K31 MTOR 1 11174426 C T Missense_Mutation 551 | PD2127a MTOR 1 11184559 G A Missense_Mutation 552 | K27 MUC15 11 26582625 C T Missense_Mutation 553 | T144 MUC16 19 9074040 G C Missense_Mutation 554 | K44 MUC7 4 71346529 G T Missense_Mutation 555 | K48 MVK 12 110012694 G A Missense_Mutation 556 | 002 MYBPC1 12 102055018 C T Silent 557 | 002 MYC 8 128752925 C T Silent 558 | T164 MYEF2 15 48461016 TTCTT - Frame_Shift_Indel 559 | PD2144a MYH1 17 10399827 G T Missense_Mutation 560 | 001 MYH8 17 10299657 T G Missense_Mutation 561 | T166 MYO1H 12 109853389 C T Missense_Mutation 562 | K20 MYO3A 10 26357733 A T Nonsense_Mutation 563 | PD2126a MYO6 6 76589831 T A Silent 564 | T166 MYO6 6 76623831 T G Missense_Mutation 565 | T164 MYO7B 2 128321742 G A Nonsense_Mutation 566 | T166 MYOCD 17 12655830 G A Missense_Mutation 567 | K48 MYT1 20 62839374 GATGAA - In_Frame_Indel 568 | T164 MYT1L 2 1795698 G A Missense_Mutation 569 | RC-T N4BP2 4 40119504 G T Silent 570 | 002 NADSYN1 11 71174508 C A Missense_Mutation 571 | 001 NAP1L3 X 92928275 G A Missense_Mutation 572 | 002 NAP1L5 4 89618486 GGA - In_Frame_Indel 573 | K27 NAV2 11 19905899 A T Missense_Mutation 574 | T166 NBEAL1 2 204009832 A - Frame_Shift_Indel 575 | T166 NBEAL2 3 47043451 G C Missense_Mutation 576 | RC-T NCAM2 21 22656592 A T Missense_Mutation 577 | K27 NCAPG2 7 158472747 C T Missense_Mutation 578 | T163 NCBP1 9 100407984 G A Missense_Mutation 579 | K31 NCCRP1 19 39691390 G T Silent 580 | T142 NDFIP2 13 80107515 C T Missense_Mutation 581 | T183 NDST4 4 115997643 G C Missense_Mutation 582 | K1 NDUFS2 1 161180392 T G Missense_Mutation 583 | T163 NEDD4L 18 56056338 G T Missense_Mutation 584 | T-b NEK10 3 27338698 T C Missense_Mutation 585 | K20 NF1 17 29576000 A G Splice_Site 586 | K48 NF2 22 30000018 T - Frame_Shift_Indel 587 | K44 NFASC 1 204957772 A C Missense_Mutation 588 | T144 NFE2L1 17 46128959 - C Frame_Shift_Indel 589 | T163 NFE2L1 17 46133961 G T Splice_Site 590 | T164 NGDN 14 23946416 T A Missense_Mutation 591 | 001 NGEF 2 233791888 G T Missense_Mutation 592 | T22 NHLRC1 6 18122146 A T Missense_Mutation 593 | K44 NIPBL 5 37000582 GAA - In_Frame_Indel 594 | T-c NISCH 3 52504895 G A Silent 595 | T22 NLGN2 17 7318068 A C Missense_Mutation 596 | 001 NLRP7 19 55451315 A C Missense_Mutation 597 | K44 NLRP9 19 56244181 G A Missense_Mutation 598 | 002 NOL11 17 65718738 T A Silent 599 | PD2147a NPAP1 15 24922328 C A Silent 600 | K1 NPC1 18 21119843 G A Silent 601 | T144 NPEPL1 20 57273777 C T Missense_Mutation 602 | 002 NPHP4 1 6038367 G A Missense_Mutation 603 | 001 NPHS1 19 36322595 G A Missense_Mutation 604 | T-e NR1D2 3 24003660 A G Missense_Mutation 605 | T164 NR1H4 12 100904818 G C Missense_Mutation 606 | K44 NR3C1 5 142779565 T G Missense_Mutation 607 | 001 NRAP 10 115412717 T A Nonsense_Mutation 608 | T127 NSL1 1 212965098 C A Missense_Mutation 609 | K44 NTHL1 16 2096342 C T Silent 610 | K44 NTHL1 16 2096341 T C Missense_Mutation 611 | K3 NTRK1 1 156844372 G T Missense_Mutation 612 | PD2126a NUDT7 16 77759415 C T Silent 613 | RC-T NUP155 5 37305246 A G Silent 614 | K48 NUP188 9 131768946 A T Missense_Mutation 615 | T164 NUP205 7 135303289 TGTCCCCAGGACC - Frame_Shift_Indel 616 | T-b NUP210 3 13383322 C T Missense_Mutation 617 | K48 NUP54 4 77065320 - ACA In_Frame_Indel 618 | 002 NUP62 19 50412772 T C Missense_Mutation 619 | 001 NUSAP1 15 41667966 A - Frame_Shift_Indel 620 | 002 OBP2A 9 138439776 T A Missense_Mutation 621 | T22 OCA2 15 28267661 G A Missense_Mutation 622 | PD2127a OGT X 70779127 T C Missense_Mutation 623 | 002 OR10H3 19 15852408 T G Missense_Mutation 624 | T127 OR11L1 1 248004369 TCTACACTGT - Frame_Shift_Indel 625 | T183 OR11L1 1 248004978 G T Missense_Mutation 626 | K27 OR13G1 1 247835802 A T Nonsense_Mutation 627 | PD2144a OR1I1 19 15198065 C G Silent 628 | RC-T OR2B11 1 247614553 C T Silent 629 | T144 OR2B11 1 247614531 C T Missense_Mutation 630 | PD2127a OR2M5 1 248309179 C A Missense_Mutation 631 | PD2127a OR3A1 17 3195190 C T Silent 632 | PD2127a OR3A1 17 3195189 G A Nonsense_Mutation 633 | PD2127a OR5AP2 11 56409040 A - Frame_Shift_Indel 634 | PD2147a OR6K3 1 158687308 C T Missense_Mutation 635 | PD2126a OR6Q1 11 57798938 T A Missense_Mutation 636 | K31 OR9K2 12 55523686 G A Missense_Mutation 637 | T183 OTOP3 17 72943377 C T Missense_Mutation 638 | T144 OVOL1 11 65562040 GACCT - Frame_Shift_Indel 639 | T-j OXNAD1 3 16312495 G A Silent 640 | T164 P4HA1 10 74813330 A G Missense_Mutation 641 | T-g P4HTM 3 49039984 A G Missense_Mutation 642 | RC-T PABPC1 8 101727766 T C Silent 643 | RC-T PABPC1 8 101727750 T C Missense_Mutation 644 | 002 PACS1 11 66003390 G A Silent 645 | T183 PAK1 11 77034376 G C Silent 646 | T183 PANX3 11 124489683 G C Missense_Mutation 647 | T183 PAOX 10 135195140 A G Missense_Mutation 648 | T163 PAPOLB 7 4900588 G - Frame_Shift_Indel 649 | T144 PAPPA 9 119065182 C A Missense_Mutation 650 | T183 PAPSS1 4 108575925 A C Missense_Mutation 651 | PD2147a PARD6B 20 49366431 T C Silent 652 | 002 PARP8 5 50090830 C A Nonsense_Mutation 653 | T-e PBRM1 3 52643954 A - Frame_Shift_Indel 654 | T-g PBRM1 3 52613205 T - Frame_Shift_Indel 655 | T-j PBRM1 3 52696231 T - Frame_Shift_Indel 656 | T-d PBRM1 3 52598182 - A Frame_Shift_Indel 657 | PD2126a PBRM1 3 52649442 - T Frame_Shift_Indel 658 | T-f PBRM1 3 52651510 - T Frame_Shift_Indel 659 | 002 PBRM1 3 52668752 - T Frame_Shift_Indel 660 | T-h PBRM1 3 52682449 A G Missense_Mutation 661 | T142 PBRM1 3 52595948 C A Nonsense_Mutation 662 | K20 PBRM1 3 52623177 G C Nonsense_Mutation 663 | T-a PBRM1 3 52620705 C T Splice_Site 664 | T183 PBRM1 3 52621368 C T Splice_Site 665 | 002 PCDH15 10 55782710 G - Frame_Shift_Indel 666 | T142 PCDH15 10 55583062 T A Missense_Mutation 667 | K31 PCDHAC1 5 140307318 C T Nonsense_Mutation 668 | T22 PCDHB1 5 140432524 C T Missense_Mutation 669 | 002 PCDHB4 5 140502541 A G Missense_Mutation 670 | K48 PCK2 14 24568323 C T Missense_Mutation 671 | 002 PCLO 7 82763696 G C Nonsense_Mutation 672 | T166 PCSK6 15 101866590 C G Missense_Mutation 673 | T-b PDE12 3 57545301 G A Missense_Mutation 674 | K31 PDE6C 10 95405773 T A Missense_Mutation 675 | PD2126a PDE6C 10 95422894 T G Missense_Mutation 676 | K38 PEMT 17 17415867 A G Missense_Mutation 677 | 002 PGK1 X 77372833 G C Missense_Mutation 678 | T183 PHF1 6 33382526 AAG - In_Frame_Indel 679 | 001 PHF21B 22 45312250 G T Missense_Mutation 680 | 001 PIAS3 1 145578718 C G Missense_Mutation 681 | T166 PIK3C2A 11 17143877 C G Missense_Mutation 682 | T166 PIK3CA 3 178922301 G T Missense_Mutation 683 | RC-T PIP4K2A 10 22880652 C T Missense_Mutation 684 | T183 PITPNC1 17 65628311 - TT Frame_Shift_Indel 685 | K27 PKP2 12 33031455 T A Missense_Mutation 686 | PD2127a PLAU 10 75675107 A G Missense_Mutation 687 | 001 PLB1 2 28801019 T A Missense_Mutation 688 | 001 PLCL1 2 198948857 A - Frame_Shift_Indel 689 | T166 PLCZ1 12 18852774 A C Missense_Mutation 690 | K44 PLEKHG6 12 6436897 GAA - In_Frame_Indel 691 | T144 PLEKHO1 1 150131715 G A Missense_Mutation 692 | K38 PLOD1 1 12030752 A G Missense_Mutation 693 | T142 PLOD3 7 100852139 G A Missense_Mutation 694 | 001 PLRG1 4 155467288 - T Frame_Shift_Indel 695 | PD2144a PLXDC1 17 37263687 G A Silent 696 | K48 PLXNA4 7 132169649 G T Missense_Mutation 697 | 002 POFUT2 21 46703346 G A Missense_Mutation 698 | PD2127a POLD3 11 74329770 G T Missense_Mutation 699 | PD2127a POLD3 11 74329772 A T Missense_Mutation 700 | 002 POLG 15 89861811 C T Missense_Mutation 701 | 002 POLG2 17 62481852 T G Missense_Mutation 702 | 002 POLI 18 51820113 G A Missense_Mutation 703 | T144 POR 7 75613127 T C Missense_Mutation 704 | K44 POSTN 13 38158197 G A Silent 705 | T164 POU4F2 4 147561328 C A Missense_Mutation 706 | T22 POU4F2 4 147561565 G T Missense_Mutation 707 | K20 PPARGC1B 5 149216471 A G Missense_Mutation 708 | T127 PPDPF 20 62152664 G T Missense_Mutation 709 | K48 PPFIA1 11 70201822 C T Missense_Mutation 710 | 001 PPFIA1 11 70224166 T A Missense_Mutation 711 | T127 PPFIA4 1 203022969 A G Missense_Mutation 712 | K44 PPM1G 2 27604483 AAG - In_Frame_Indel 713 | 002 PPP1R1B 17 37791905 A T Missense_Mutation 714 | 001 PPP6R2 22 50878183 C G Missense_Mutation 715 | RC-T PRAMEF7 1 12980116 G A Silent 716 | RC-T PREX2 8 68942885 G T Missense_Mutation 717 | T-d PRKAR2A 3 48810437 G A Missense_Mutation 718 | PD2127a PRKCE 2 46313388 C - Frame_Shift_Indel 719 | K44 PROM2 2 95941835 G A Missense_Mutation 720 | T-e PROS1 3 93629525 C T Missense_Mutation 721 | T142 PROSER1 13 39596511 C T Missense_Mutation 722 | PD2125a PRPF40B 12 50027318 G A Missense_Mutation 723 | K44 PRPF6 20 62658470 AGA - In_Frame_Indel 724 | T144 PRPF8 17 1576715 G A Missense_Mutation 725 | RC-T PSKH2 8 87060914 C T Missense_Mutation 726 | T183 PSMA5 1 109957979 A T Missense_Mutation 727 | K48 PSMC3 11 47446696 C T Silent 728 | 001 PSMD7 16 74335551 G A Splice_Site 729 | K48 PSORS1C1 6 31106502 - C Frame_Shift_Indel 730 | T163 PTCH2 1 45294272 C T Missense_Mutation 731 | 002 PTCHD3 10 27702413 G A Missense_Mutation 732 | 002 PTEN 10 89685315 G - Splice_Site 733 | K38 PTK7 6 43126585 A G Missense_Mutation 734 | K48 PTPN12 7 77265144 G A Missense_Mutation 735 | T144 PTPRB 12 70988486 - A Frame_Shift_Indel 736 | T183 PTPRE 10 129846083 G C Missense_Mutation 737 | T-j PTPRG 3 62118314 C T Silent 738 | K1 PTPRG 3 62258677 T G Missense_Mutation 739 | T142 PTPRM 18 8380423 C T Nonsense_Mutation 740 | 001 PTPRZ1 7 121653122 A G Missense_Mutation 741 | T144 PUF60 8 144899842 G A Missense_Mutation 742 | K48 PVRL1 11 119535680 AGGG - Frame_Shift_Indel 743 | 002 PVRL4 1 161059030 GCT - In_Frame_Indel 744 | K38 QSOX1 1 180155207 G A Missense_Mutation 745 | T127 RAB11FIP5 2 73339562 C T Missense_Mutation 746 | 001 RAB27A 15 55497741 T A Missense_Mutation 747 | T22 RAB44 6 36690462 C T Missense_Mutation 748 | T166 RABEP1 17 5264783 C T Missense_Mutation 749 | T-e RAF1 3 12626130 T C Silent 750 | T-i RAF1 3 12650270 T C Silent 751 | K44 RAI14 5 34821885 TCT - In_Frame_Indel 752 | 002 RAI2 X 17819422 AT - Frame_Shift_Indel 753 | 001 RALGDS 9 135984240 T C Missense_Mutation 754 | T163 RALGPS1 9 129958783 T A Missense_Mutation 755 | K38 RANBP10 16 67763587 G A Silent 756 | T127 RASAL2 1 178411859 C - Frame_Shift_Indel 757 | 002 RBBP6 16 24580308 - T Frame_Shift_Indel 758 | 001 RBFOX2 22 36157335 C T Missense_Mutation 759 | T183 RCN1 11 32112952 G C Missense_Mutation 760 | 002 REG3G 2 79254935 C A Silent 761 | K44 RIF1 2 152321672 AA - Frame_Shift_Indel 762 | 001 RIMBP2 12 130927064 C T Missense_Mutation 763 | K20 RLBP1 15 89753602 C A Missense_Mutation 764 | 001 RLF 1 40703301 G A Missense_Mutation 765 | T166 RNF139 8 125498452 - T Frame_Shift_Indel 766 | RC-T RNF213 17 78321468 C A Missense_Mutation 767 | 002 RNF8 6 37336689 T A Missense_Mutation 768 | T127 RNF8 6 37336831 T G Missense_Mutation 769 | PD2147a RPL3 22 39710171 G T Missense_Mutation 770 | K38 RPL32 3 12880966 A G Silent 771 | K27 RPL3L 16 2004105 G A Silent 772 | RC-T RPS12 6 133138146 T C Silent 773 | T163 RPS6KA3 X 20185787 G T Missense_Mutation 774 | 001 RPS8 1 45241793 G T Missense_Mutation 775 | RC-T RPS9 19 54705126 T A Missense_Mutation 776 | K38 RTBDN 19 12939547 G A Missense_Mutation 777 | K44 RTF1 15 41766880 AACA - Frame_Shift_Indel 778 | T142 RTN4 2 55252524 G C Missense_Mutation 779 | K44 RUNX2 6 45512966 C A Missense_Mutation 780 | K44 RYR1 19 38976356 C T Silent 781 | T163 RYR1 19 38990633 G A Missense_Mutation 782 | K48 RYR1 19 39018309 G A Missense_Mutation 783 | RC-T RYR1 19 38954385 A T Splice_Site 784 | T-c SACM1L 3 45751040 G A Silent 785 | K44 SACS 13 23909949 T - Frame_Shift_Indel 786 | K44 SAMD9L 7 92761597 G A Nonsense_Mutation 787 | K31 SAP30BP 17 73695938 C T Missense_Mutation 788 | 001 SATL1 X 84362594 T C Missense_Mutation 789 | 001 SBF1 22 50900272 G T Missense_Mutation 790 | T142 SCARB1 12 125302181 C T Missense_Mutation 791 | PD2127a SCN10A 3 38768438 G A Missense_Mutation 792 | T-e SCN10A 3 38770198 T G Missense_Mutation 793 | K20 SCN1A 2 166901733 A T Silent 794 | 002 SCNN1G 16 23223474 T A Splice_Site 795 | RC-T SCYL2 12 100731284 G T Missense_Mutation 796 | 002 SEC24A 5 134002515 C G Missense_Mutation 797 | PD2125a SEMA3A 7 83634829 G A Missense_Mutation 798 | K27 SEMA5B 3 122642539 CAA - In_Frame_Indel 799 | RC-T SEMA5B 3 122680141 A G Silent 800 | RC-T SEPP1 5 42801426 T C Missense_Mutation 801 | RC-T SERPINA10 14 94756741 C T Missense_Mutation 802 | K20 SERPINA3 14 95081211 G A Missense_Mutation 803 | 001 SESN2 1 28599163 C G Missense_Mutation 804 | 002 SETD2 3 47142998 - GAAG Frame_Shift_Indel 805 | T-g SETD2 3 47162229 G A Silent 806 | T164 SETD2 3 47144892 C T Missense_Mutation 807 | 002 SETD2 3 47158204 A G Missense_Mutation 808 | 001 SETD2 3 47161717 G A Missense_Mutation 809 | PD2126a SETD2 3 47164325 T A Nonsense_Mutation 810 | 001 SETD2 3 47059231 T A Splice_Site 811 | T127 SETDB1 1 150933541 T G Missense_Mutation 812 | K48 SETDB2 13 50050635 T A Missense_Mutation 813 | K44 SFTPD 10 81700484 G C Missense_Mutation 814 | 002 SFXN5 2 73298799 GGC - In_Frame_Indel 815 | RC-T SGIP1 1 67206387 C A Missense_Mutation 816 | 001 SGOL1 3 20215857 A G Missense_Mutation 817 | RC-T SH3GL1 19 4363791 G A Missense_Mutation 818 | T163 SH3PXD2A 10 105361663 CCCTAATGGCTGG - Frame_Shift_Indel 819 | K38 SH3PXD2B 5 171766365 G A Missense_Mutation 820 | K27 SHANK1 19 51207773 T A Splice_Site 821 | T142 SIKE1 1 115321794 G A Missense_Mutation 822 | T166 SIN3A 15 75693171 A G Missense_Mutation 823 | RC-T SLC11A2 12 51388333 G T Nonsense_Mutation 824 | K1 SLC12A1 15 48551460 C T Silent 825 | T127 SLC16A11 17 6945095 G A Missense_Mutation 826 | K1 SLC16A11 17 6945713 A T Missense_Mutation 827 | K38 SLC22A11 11 64335097 G C Missense_Mutation 828 | T-h SLC22A14 3 38349150 TTTG - Frame_Shift_Indel 829 | 002 SLC22A7 6 43266294 G T Missense_Mutation 830 | K20 SLC25A13 7 95818962 T A Missense_Mutation 831 | K44 SLC25A19 17 73282807 CACCAA - In_Frame_Indel 832 | K38 SLC25A22 11 792451 G A Missense_Mutation 833 | T163 SLC25A45 11 65147602 C A Splice_Site 834 | 001 SLC2A12 6 134328018 A G Missense_Mutation 835 | RC-T SLC2A5 1 9097665 G T Missense_Mutation 836 | PD2147a SLC34A2 4 25677963 G C Silent 837 | RC-T SLC36A2 5 150704907 C A Missense_Mutation 838 | T183 SLC39A6 18 33706499 G A Missense_Mutation 839 | T144 SLC3A2 11 62650382 A - Splice_Site 840 | 002 SLC40A1 2 190440038 C - Frame_Shift_Indel 841 | T127 SLC41A2 12 105303531 C - Frame_Shift_Indel 842 | K31 SLC41A3 3 125725361 AGC - In_Frame_Indel 843 | PD2125a SLC44A3 1 95330381 G T Missense_Mutation 844 | RC-T SLC47A1 17 19476142 A T Missense_Mutation 845 | PD2127a SLC4A4 4 72420907 A G Silent 846 | T164 SLC4A9 5 139751070 C G Missense_Mutation 847 | PD2127a SLC6A16 19 49813078 C T Missense_Mutation 848 | K44 SLC6A2 16 55690729 C T Silent 849 | K27 SLC6A3 5 1414891 G A Silent 850 | PD2127a SLC7A3 X 70147450 A T Missense_Mutation 851 | PD2125a SLC8A1 2 40656752 A T Silent 852 | 002 SLFN5 17 33592101 A - Frame_Shift_Indel 853 | K38 SLIT3 5 168110973 G A Silent 854 | K20 SLITRK5 13 88330212 G A Missense_Mutation 855 | K27 SMARCA4 19 11141447 T G Missense_Mutation 856 | T-b SMARCC1 3 47712210 A G Splice_Site 857 | T164 SMC4 3 160146620 T G Missense_Mutation 858 | T142 SMURF2 17 62552046 ATGG - Frame_Shift_Indel 859 | PD2127a SMYD4 17 1690187 C G Missense_Mutation 860 | K1 MTCL1 18 8720362 G A Silent 861 | T164 SOS1 2 39222358 AACACCGTT - In_Frame_Indel 862 | T164 SOX17 8 55372305 A G Missense_Mutation 863 | K44 SOX9 17 70118926 AAG - In_Frame_Indel 864 | 001 SOX9 17 70118864 C A Missense_Mutation 865 | 001 SPATA21 1 16727227 G A Nonsense_Mutation 866 | T144 SPEF2 5 35654709 G A Missense_Mutation 867 | T166 SPEF2 5 35659271 C T Nonsense_Mutation 868 | T144 SPHK1 17 74383398 C T Missense_Mutation 869 | 002 SPNS1 16 28994190 A G Missense_Mutation 870 | T166 SPNS3 17 4348335 A T Missense_Mutation 871 | 002 SPTAN1 9 131395564 C A Missense_Mutation 872 | K20 SPTB 14 65239577 G A Silent 873 | K1 SPTBN4 19 40978599 G A Missense_Mutation 874 | T144 SPTLC1 9 94812274 C T Missense_Mutation 875 | RC-T SRBD1 2 45789811 T A Missense_Mutation 876 | T166 SRCAP 16 30736293 A T Missense_Mutation 877 | T-b SRGAP3 3 9034594 C T Missense_Mutation 878 | RC-T SRGAP3 3 9066948 T A Nonsense_Mutation 879 | RC-T SRPK2 7 104773346 C T Splice_Site 880 | 001 SSNA1 9 140084290 T - Frame_Shift_Indel 881 | 001 SSR3 3 156272853 T C Missense_Mutation 882 | T144 SSTR4 20 23016316 G A Missense_Mutation 883 | 002 ST8SIA1 12 22440159 C T Missense_Mutation 884 | T166 STARD13 13 33692292 C A Missense_Mutation 885 | K27 STAT6 12 57500101 C T Missense_Mutation 886 | T22 STK40 1 36820978 C A Missense_Mutation 887 | 002 SUN2 22 39134593 A T Missense_Mutation 888 | T-e SUSD5 3 33195202 T C Missense_Mutation 889 | K20 SYNE2 14 64580195 A G Missense_Mutation 890 | T164 SYNPO 5 150027515 G A Missense_Mutation 891 | T166 TAAR5 6 132910755 C A Missense_Mutation 892 | K27 TAB2 6 149699806 A T Missense_Mutation 893 | T166 TAF1L 9 32633813 C T Missense_Mutation 894 | T144 TAS1R2 1 19175898 G C Nonsense_Mutation 895 | K44 TBC1D2 9 100962620 AAC - In_Frame_Indel 896 | K44 TCEAL8 X 102508863 C G Missense_Mutation 897 | 001 TCF12 15 57544686 G A Missense_Mutation 898 | PD2144a TCHH 1 152081494 C G Missense_Mutation 899 | K31 TECTA 11 121038831 C A Silent 900 | K31 TECTA 11 121008666 C T Nonsense_Mutation 901 | PD2127a TEPP 16 58011759 C T Silent 902 | T166 TET1 10 70404686 G - Frame_Shift_Indel 903 | K44 TET2 4 106156666 G T Nonsense_Mutation 904 | 001 TH 11 2185608 C T Missense_Mutation 905 | K44 THBS3 1 155167989 TG - Frame_Shift_Indel 906 | 002 THSD1 13 52971608 C T Silent 907 | T164 TLX2 2 74741964 C T Missense_Mutation 908 | T142 TM9SF3 10 98325097 A T Missense_Mutation 909 | T144 TMEFF1 9 103261070 G A Missense_Mutation 910 | K20 TMEM130 7 98457927 G T Missense_Mutation 911 | T22 TMEM151A 11 66062067 G T Missense_Mutation 912 | K29 TMEM171 5 72424230 A - Frame_Shift_Indel 913 | T166 TMEM2 9 74309514 AC - Frame_Shift_Indel 914 | T22 TMEM44 3 194344008 C T Missense_Mutation 915 | T166 TMEM54 1 33363776 G C Missense_Mutation 916 | T142 TMPRSS11B 4 69093666 G A Missense_Mutation 917 | PD2125a TMPRSS7 3 111782388 T A Silent 918 | K20 TNC 9 117848599 G A Missense_Mutation 919 | PD2127a TNFRSF13B 17 16855793 G T Missense_Mutation 920 | 001 TNIK 3 170856133 T - Frame_Shift_Indel 921 | T163 TNIK 3 170908509 G T Missense_Mutation 922 | 002 TNIK 3 171177819 C A Nonsense_Mutation 923 | K44 TNKS 8 9562226 CTT - In_Frame_Indel 924 | 002 TNKS1BP1 11 57076542 C T Missense_Mutation 925 | 002 TNR 1 175355422 G A Missense_Mutation 926 | 001 TOM1 22 35723300 G - Frame_Shift_Indel 927 | T-h TOPAZ1 3 44328924 A C Silent 928 | PD2127a TOX 8 59750765 C T Missense_Mutation 929 | K44 TP53 17 7578277 GAG - In_Frame_Indel 930 | 002 TP53 17 7577539 G A Missense_Mutation 931 | K1 TPH2 12 72425339 A T Missense_Mutation 932 | T144 TPO 2 1437306 G C Missense_Mutation 933 | T-g TRAIP 3 49881339 C T Silent 934 | T-g TRAIP 3 49885642 A C Splice_Site 935 | T-f TRAK1 3 42251580 GGA - In_Frame_Indel 936 | 002 TRAPPC12 2 3428416 G T Missense_Mutation 937 | T22 TREH 11 118530079 C A Missense_Mutation 938 | K20 TRIM24 7 138269610 A C Missense_Mutation 939 | T166 TRIM25 17 54991297 T A Missense_Mutation 940 | 002 TRIM71 3 32932868 C G Missense_Mutation 941 | 002 TRIO 5 14359589 C G Missense_Mutation 942 | 002 TRIP11 14 92465743 T C Missense_Mutation 943 | K48 TRPC6 11 101353695 T A Missense_Mutation 944 | 002 TRRAP 7 98581774 G A Missense_Mutation 945 | T22 TSC1 9 135801128 T A Splice_Site 946 | PD2147a TSHR 14 81606140 T G Silent 947 | K20 TSHZ3 19 31769561 C A Missense_Mutation 948 | K38 TSNARE1 8 143425403 G A Silent 949 | K38 TTC13 1 231044713 A T Missense_Mutation 950 | PD2127a TTN 2 179449098 C A Missense_Mutation 951 | PD2125a TULP2 19 49398324 C T Missense_Mutation 952 | 002 TXLNB 6 139581547 GA - Frame_Shift_Indel 953 | T164 UBA6 4 68497620 G A Nonsense_Mutation 954 | RC-T UBA7 3 49848498 C A Silent 955 | K48 UBAC1 9 138836944 T - Frame_Shift_Indel 956 | K20 UBE3B 12 109923824 A C Missense_Mutation 957 | RC-T UBE3C 7 157041244 T A Silent 958 | RC-T UBE4A 11 118255625 A C Missense_Mutation 959 | K20 UBQLN1 9 86281297 T C Missense_Mutation 960 | T144 UBR4 1 19513735 G C Missense_Mutation 961 | K3 UBR5 8 103289349 T - Frame_Shift_Indel 962 | 001 UGT2A1 4 70460968 C T Splice_Site 963 | T144 UGT2B7 4 69973881 - TC Frame_Shift_Indel 964 | T142 UNC13B 9 35397188 C G Missense_Mutation 965 | K44 UNC79 14 94088169 G T Missense_Mutation 966 | 001 UNC80 2 210658553 C A Missense_Mutation 967 | PD2147a UPF3B X 118979158 T - Splice_Site 968 | K3 UROC1 3 126202295 C T Missense_Mutation 969 | T163 USP13 3 179439299 G C Missense_Mutation 970 | K48 USP21 1 161130795 T C Missense_Mutation 971 | PD2147a USP24 1 55557770 G T Nonsense_Mutation 972 | 001 USP51 X 55514714 C G Missense_Mutation 973 | K44 UTS2R 17 80332434 C A Silent 974 | RC-T VAC14 16 70729468 T A Missense_Mutation 975 | 002 VARS 6 31760805 A G Silent 976 | T166 VEPH1 3 156979066 C A Missense_Mutation 977 | PD2126a VHL 3 10183772 GCAGTC - In_Frame_Indel 978 | T-g VHL 3 10188261 AATT - Frame_Shift_Indel 979 | T127 VHL 3 10183764 AA - Frame_Shift_Indel 980 | T-b VHL 3 10191499 AG - Frame_Shift_Indel 981 | T183 VHL 3 10188278 ATCTCTCA - Frame_Shift_Indel 982 | 002 VHL 3 10191513 T - Frame_Shift_Indel 983 | PD2144a VHL 3 10191534 G - Frame_Shift_Indel 984 | T-j VHL 3 10188320 G - Frame_Shift_Indel 985 | T166 VHL 3 10183757 TCT - In_Frame_Indel 986 | T142 VHL 3 10191540 CAGGAGACT - Nonsense_Mutation 987 | T-e VHL 3 10188203 - C Frame_Shift_Indel 988 | T163 VHL 3 10188200 C A Missense_Mutation 989 | T-a VHL 3 10188206 T G Missense_Mutation 990 | K44 VHL 3 10188218 G T Missense_Mutation 991 | T22 VHL 3 10191479 C G Missense_Mutation 992 | T144 VHL 3 10191513 T C Missense_Mutation 993 | K31 VHL 3 10191485 G T Nonsense_Mutation 994 | T166 VIPR1 3 42572355 T A Missense_Mutation 995 | T166 VIPR1 3 42572356 A T Missense_Mutation 996 | 002 VPS13A 9 79954536 C T Silent 997 | T144 VPS13C 15 62299529 T G Missense_Mutation 998 | T144 VPS18 15 41195512 C G Nonsense_Mutation 999 | PD2125a VTN 17 26696780 - G Frame_Shift_Indel 1000 | 001 WDR24 16 735684 G A Missense_Mutation 1001 | T144 WDR3 1 118496133 C A Missense_Mutation 1002 | 001 WDR62 19 36594487 T C Missense_Mutation 1003 | 001 WDR7 18 54444054 CT - Frame_Shift_Indel 1004 | 001 WHSC1 4 1902962 G T Missense_Mutation 1005 | T166 WRNIP1 6 2785295 G A Missense_Mutation 1006 | 001 WSCD2 12 108603968 G A Missense_Mutation 1007 | T144 WWC1 5 167868745 G A Nonsense_Mutation 1008 | 002 WWC2 4 184201998 AAG - In_Frame_Indel 1009 | T166 WWP1 8 87450865 ATTTA - Frame_Shift_Indel 1010 | T166 XAB2 19 7687448 C T Missense_Mutation 1011 | T-i XCR1 3 46062889 A G Missense_Mutation 1012 | T183 XPO5 6 43499223 T A Missense_Mutation 1013 | T-d XYLB 3 38416748 G A Silent 1014 | T-d XYLB 3 38401831 A G Missense_Mutation 1015 | PD2127a YWHAB 20 43530469 G A Missense_Mutation 1016 | K1 YWHAB 20 43533712 C G Missense_Mutation 1017 | 002 ZBTB46 20 62421407 G A Missense_Mutation 1018 | K31 ZC3H14 14 89061267 A G Missense_Mutation 1019 | 001 ZC3H18 16 88694377 T - Frame_Shift_Indel 1020 | T142 ZC3H4 19 47570581 C T Missense_Mutation 1021 | 001 ZC3HC1 7 129666125 G T Missense_Mutation 1022 | K1 ZDHHC5 11 57461331 C T Nonsense_Mutation 1023 | 002 ZEB1 10 31799718 T G Missense_Mutation 1024 | T183 ZFAT 8 135490816 C - Frame_Shift_Indel 1025 | RC-T ZFAT 8 135614406 A G Missense_Mutation 1026 | K1 ZFHX3 16 72831103 C - Frame_Shift_Indel 1027 | 002 ZFHX4 8 77765426 C T Missense_Mutation 1028 | K20 ZFP36L2 2 43452311 C T Missense_Mutation 1029 | K48 ZNF133 20 18296174 A T Missense_Mutation 1030 | 002 ZNF14 19 19823124 A - Frame_Shift_Indel 1031 | K27 ZNF16 8 146157127 T A Missense_Mutation 1032 | T-f ZNF197 3 44683964 ATCC - Frame_Shift_Indel 1033 | T144 ZNF213 16 3187364 G T Missense_Mutation 1034 | T144 ZNF214 11 7022159 C T Missense_Mutation 1035 | PD2147a ZNF22 10 45498936 C T Silent 1036 | T166 ZNF281 1 200377996 C T Missense_Mutation 1037 | K31 ZNF285 19 44892153 G C Missense_Mutation 1038 | T-a ZNF385D 3 21462942 G T Splice_Site 1039 | K20 ZNF408 11 46727367 G A Missense_Mutation 1040 | K38 ZNF439 19 11979104 A G Missense_Mutation 1041 | PD2127a ZNF442 19 12460630 T A Missense_Mutation 1042 | T-b ZNF445 3 44488293 T C Missense_Mutation 1043 | RC-T ZNF462 9 109688470 G T Missense_Mutation 1044 | K44 ZNF471 19 57037314 T G Silent 1045 | 001 ZNF493 19 21606235 T G Missense_Mutation 1046 | K27 ZNF510 9 99525872 C G Missense_Mutation 1047 | 001 ZNF519 18 14106244 T C Missense_Mutation 1048 | 001 ZNF521 18 22805039 T A Missense_Mutation 1049 | T164 ZNF598 16 2049883 - GGA In_Frame_Indel 1050 | RC-T ZNF605 12 133503641 T A Missense_Mutation 1051 | K44 ZNF672 1 249142626 C A Missense_Mutation 1052 | RC-T ZNF776 19 58265519 G C Missense_Mutation 1053 | 001 ZNF780A 19 40581873 C T Missense_Mutation 1054 | K27 ZNF800 7 127013853 T A Missense_Mutation 1055 | K29 ZNF804A 2 185463702 A G Missense_Mutation 1056 | T144 ZZEF1 17 3920811 C A Missense_Mutation 1057 | -------------------------------------------------------------------------------- /monte_carlo_sim.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include 9 | #include 10 | #include 11 | 12 | 13 | std::vector rankSort(const std::vector& v_temp) { 14 | std::vector > v_sort(v_temp.size()); 15 | 16 | for (size_t i = 0U; i < v_sort.size(); ++i) { 17 | v_sort[i] = std::make_pair(v_temp[i], i); 18 | } 19 | 20 | std::sort(v_sort.begin(), v_sort.end()); 21 | 22 | std::pair rank; 23 | std::vector result(v_temp.size()); 24 | 25 | for (size_t i = 0U; i < v_sort.size(); ++i) { 26 | if (v_sort[i].first != rank.first) { 27 | rank = std::make_pair(v_sort[i].first, i); 28 | } 29 | result[v_sort[i].second] = rank.second + 1; 30 | } 31 | return result; 32 | } 33 | 34 | 35 | 36 | int main(int argc, char **argv) 37 | { 38 | int thread_n, sim_n; 39 | std::string eta_file, N_file, LRT_file, omega_file, date, file_n; 40 | 41 | std::istringstream argv_thread(argv[1]); 42 | std::istringstream argv_sim(argv[2]); 43 | std::istringstream argv_eta(argv[3]); 44 | std::istringstream argv_N(argv[4]); 45 | std::istringstream argv_LRT(argv[5]); 46 | std::istringstream argv_omega(argv[6]); 47 | std::istringstream argv_date(argv[7]); 48 | std::istringstream argv_file(argv[8]); 49 | 50 | if (!(argv_thread >> thread_n)) 51 | { 52 | std::cerr << "Invalid number for multicore: " << argv[1] << std::endl; 53 | exit(1); 54 | } 55 | if (!(argv_sim >> sim_n)) 56 | { 57 | std::cerr << "Invalid number for simulation time: " << argv[2] << std::endl; 58 | exit(1); 59 | } 60 | if (!(argv_eta >> eta_file)) 61 | { 62 | std::cerr << "Invalid file name for eta: " << argv[3] << std::endl; 63 | exit(1); 64 | } 65 | if (!(argv_N >> N_file)) 66 | { 67 | std::cerr << "Invalid file name for N: " << argv[4] << std::endl; 68 | exit(1); 69 | } 70 | if (!(argv_LRT >> LRT_file)) 71 | { 72 | std::cerr << "Invalid file name for LRT: " << argv[5] << std::endl; 73 | exit(1); 74 | } 75 | if (!(argv_omega >> omega_file)) 76 | { 77 | std::cerr << "Invalid file name for omega: " << argv[6] << std::endl; 78 | exit(1); 79 | } 80 | if (!(argv_date >> date)) 81 | { 82 | std::cerr << "Invalid file name for date: " << argv[7] << std::endl; 83 | exit(1); 84 | } 85 | if (!(argv_file >> file_n)) 86 | { 87 | std::cerr << "Invalid file number: " << argv[8] << std::endl; 88 | exit(1); 89 | } 90 | 91 | std::vector> eta; 92 | std::vector> N; 93 | std::vector> omega; 94 | std::vector> LRT; 95 | 96 | 97 | std::ifstream IN_ETA(eta_file); 98 | if (IN_ETA.is_open()) 99 | { 100 | std::string str; 101 | while (std::getline(IN_ETA, str)) 102 | { 103 | std::vector vec_tmp; 104 | const char *loc = str.c_str(); 105 | vec_tmp.push_back(std::atof(loc)); 106 | loc = std::strstr(loc, "\t"); 107 | while (loc != NULL) 108 | { 109 | vec_tmp.push_back(atof(loc + 1)); 110 | loc = std::strstr(loc + 1, "\t"); 111 | } 112 | eta.push_back(vec_tmp); 113 | } 114 | IN_ETA.close(); 115 | } 116 | else 117 | { 118 | fprintf(stderr, "There was an error opening the eta file.\n"); 119 | exit(1); 120 | } 121 | 122 | std::ifstream IN_N(N_file); 123 | if (IN_N.is_open()) 124 | { 125 | std::string str; 126 | while (std::getline(IN_N, str)) 127 | { 128 | std::vector vec_tmp; 129 | const char *loc = str.c_str(); 130 | vec_tmp.push_back(std::atoi(loc)); 131 | loc = std::strstr(loc, "\t"); 132 | while (loc != NULL) 133 | { 134 | vec_tmp.push_back(atoi(loc + 1)); 135 | loc = std::strstr(loc + 1, "\t"); 136 | } 137 | N.push_back(vec_tmp); 138 | } 139 | IN_N.close(); 140 | } 141 | else 142 | { 143 | fprintf(stderr, "There was an error opening the N file.\n"); 144 | exit(1); 145 | } 146 | 147 | std::ifstream IN_OMEGA(omega_file); 148 | if (IN_OMEGA.is_open()) 149 | { 150 | std::string str; 151 | while (std::getline(IN_OMEGA, str)) 152 | { 153 | std::vector vec_tmp; 154 | const char *loc = str.c_str(); 155 | vec_tmp.push_back(std::atof(loc)); 156 | loc = std::strstr(loc, "\t"); 157 | while (loc != NULL) 158 | { 159 | vec_tmp.push_back(atof(loc + 1)); 160 | loc = std::strstr(loc + 1, "\t"); 161 | } 162 | omega.push_back(vec_tmp); 163 | } 164 | IN_OMEGA.close(); 165 | } 166 | else 167 | { 168 | fprintf(stderr, "There was an error opening the omega file.\n"); 169 | exit(1); 170 | } 171 | 172 | std::ifstream IN_LRT(LRT_file); 173 | if (IN_LRT.is_open()) 174 | { 175 | std::string str; 176 | while (std::getline(IN_LRT, str)) 177 | { 178 | std::vector vec_tmp; 179 | const char *loc = str.c_str(); 180 | vec_tmp.push_back(std::atof(loc)); 181 | loc = std::strstr(loc, "\t"); 182 | while (loc != NULL) 183 | { 184 | vec_tmp.push_back(atof(loc + 1)); 185 | loc = std::strstr(loc + 1, "\t"); 186 | } 187 | LRT.push_back(vec_tmp); 188 | } 189 | IN_LRT.close(); 190 | } 191 | else 192 | { 193 | fprintf(stderr, "There was an error opening the LRT file.\n"); 194 | exit(1); 195 | } 196 | 197 | if (!(N.size() == omega.size() && N.size() == LRT.size() && N[0].size() == eta.size() && eta.size() == omega[0].size())) 198 | { 199 | std::cerr << "dimensions were not compatible: " << N.size() << " " << omega.size() << " " << LRT.size() << " " << N[0].size() << " " << eta.size() << " " << omega[0].size() << std::endl; 200 | } 201 | int gene_n = N.size(); 202 | int sample_n = eta[0].size(); 203 | int type_n = eta.size(); 204 | std::vector abs_LRT(gene_n); 205 | 206 | for (size_t k = 0; k < gene_n; k++) 207 | { 208 | abs_LRT[k] = std::abs(LRT[k][0]); 209 | } 210 | 211 | std::vector p2sides_noN(gene_n); 212 | 213 | std::vector p2sides_noN_adj(gene_n); 214 | 215 | omp_set_num_threads(thread_n); 216 | 217 | #pragma omp declare reduction(vec_plus : std::vector : \ 218 | std::transform(omp_out.begin(), omp_out.end(), omp_in.begin(), omp_out.begin(), std::plus())) \ 219 | initializer(omp_priv = omp_orig) 220 | 221 | #pragma omp parallel for reduction(vec_plus:p2sides_noN) 222 | for (int k = 0; k < gene_n; k++) 223 | { 224 | std::default_random_engine generator(k); 225 | std::poisson_distribution<> rpois; 226 | for (int t = 0; t < sim_n; t++) 227 | { 228 | 229 | double numerator = 0; 230 | 231 | double denominator_noN = 0; 232 | for (int j = 0; j < type_n; j++) 233 | { 234 | double numerator_help = 0; 235 | 236 | double denominator_noN_help = 0; 237 | for (int i = 0; i < sample_n; i++) 238 | { 239 | rpois = std::poisson_distribution<>(N[k][j] * eta[j][i]); 240 | numerator_help += rpois(generator) / eta[j][i]; 241 | 242 | denominator_noN_help += 1.0 / eta[j][i]; 243 | } 244 | numerator_help -= sample_n * N[k][j]; 245 | numerator += omega[k][j] * numerator_help; 246 | 247 | denominator_noN += omega[k][j] * omega[k][j] * denominator_noN_help; 248 | } 249 | 250 | 251 | double monte_carlo_T_noN = numerator / std::sqrt(denominator_noN); 252 | 253 | double abs_monte_carlo_T_noN = std::abs(monte_carlo_T_noN); 254 | 255 | for (int pk = 0; pk < gene_n; pk++) 256 | { 257 | 258 | if (abs_monte_carlo_T_noN >= abs_LRT[pk]) 259 | { 260 | p2sides_noN[pk]++; 261 | } 262 | } 263 | } 264 | 265 | 266 | } 267 | 268 | 269 | std::transform(p2sides_noN.begin(), p2sides_noN.end(), p2sides_noN.begin(), [&sim_n, &gene_n](double value) {return value / (sim_n*gene_n); }); 270 | std::vector p2sides_noN_rank = rankSort(p2sides_noN); 271 | std::transform(p2sides_noN.cbegin(), p2sides_noN.cend(), p2sides_noN_rank.cbegin(), p2sides_noN_adj.begin(), [&gene_n](const double &p, const size_t &rank) {return p * gene_n / rank; }); 272 | 273 | std::ofstream ofile_p(date + "_" + file_n + "_p.tmp"); 274 | 275 | if (ofile_p.is_open()) 276 | { 277 | for (size_t k = 0; k < gene_n; k++) 278 | { 279 | ofile_p << p2sides_noN[k] << "\t" << p2sides_noN_adj[k] << std::endl; 280 | } 281 | 282 | ofile_p.close(); 283 | } 284 | else 285 | { 286 | std::cout << "opening error for the output file for p" << std::endl; 287 | } 288 | 289 | return 0; 290 | } 291 | 292 | -------------------------------------------------------------------------------- /pre-drgap.pl: -------------------------------------------------------------------------------- 1 | #!/usr/bin/perl 2 | use strict; 3 | use autodie; 4 | use FileHandle; 5 | use Getopt::Long; 6 | our($input_file,$class_number); 7 | our$date; 8 | #our$fix_name; 9 | my $optionOK = GetOptions( 10 | 'i|input_file=s' =>\$input_file, 11 | 'n|class_number=s' => \$class_number, 12 | 'd|date=s' => \$date, 13 | ); 14 | #------------------------------------------------------------ 15 | our%gene_class; 16 | open CLASS,'<',"${date}_gene-class.tmp"; 17 | my$header=; 18 | our$line=0; 19 | while(){ 20 | $line+=1; 21 | chomp; 22 | if(/^\d+\s+(\w+\-?\.?\w*)\s+(\d+)/){ 23 | $gene_class{$1}=$2; 24 | } 25 | else{ 26 | print "invalid format of gene class in line $line\n"; 27 | } 28 | } 29 | close CLASS; 30 | my%fh; 31 | foreach my$i(1..$class_number){ 32 | open $fh{$i},">${date}_subclass-$i.tmp"; 33 | } 34 | open INPUT,'<',"$input_file"; 35 | open TMP,'>',"${date}_genes-which-are-in-input-file-but-not-in-the-chara-file.tmp"; 36 | our$n=0;#the number of genes which are in input file but not in the characteristic file------# 37 | while(){ 38 | chomp; 39 | my@element=split(/\t/);#print "$element[1]\n"; 40 | my$class=$gene_class{$element[1]}; 41 | if(exists($gene_class{$element[1]})){ 42 | $fh{$class}->print("$_\n"); 43 | } 44 | else{ 45 | print TMP "$element[0]\t$element[1]\t$element[2]\n"; 46 | $n+=1; 47 | } 48 | } 49 | print("The number of genes which are in the input file but not in the characteristic file is $n\n"); 50 | close INPUT; 51 | foreach my$i(1..$class_number){ 52 | close $fh{$i}; 53 | } 54 | close TMP; 55 | -------------------------------------------------------------------------------- /prior/PAN: -------------------------------------------------------------------------------- 1 | ACO1 2 | ACSL6 3 | ACTB 4 | ACTG1 5 | ACVR1B 6 | ACVR2A 7 | ADAM10 8 | AFF4 9 | AHNAK 10 | AHR 11 | AKAP9 12 | ALK 13 | ANK3 14 | APC 15 | ARFGEF2 16 | ARHGAP26 17 | ARHGAP35 18 | ARID1A 19 | ARID1B 20 | ARID2 21 | ARNTL 22 | ASH1L 23 | ASPM 24 | ATM 25 | ATR 26 | ATRX 27 | AXIN1 28 | BAP1 29 | BAZ2B 30 | BCLAF1 31 | BCOR 32 | BLM 33 | BMPR2 34 | BNC2 35 | BPTF 36 | BRAF 37 | BRWD1 38 | CACNA1B 39 | CAPN7 40 | CASP8 41 | CCAR1 42 | CCT5 43 | CDH1 44 | CDK12 45 | CDKN1B 46 | CDKN2A 47 | CEP290 48 | CHD1L 49 | CHD4 50 | CHD9 51 | CHEK2 52 | CIC 53 | CLSPN 54 | CLTC 55 | CNOT1 56 | CNOT3 57 | COL1A1 58 | COQ10A 59 | CREBBP 60 | CSNK1G3 61 | CTNNB1 62 | CUL3 63 | DDX3X 64 | DDX5 65 | DICER1 66 | DLG1 67 | EEF1A1 68 | EGFR 69 | EIF2AK3 70 | EIF2C3 71 | EIF4G1 72 | ELF3 73 | EP300 74 | EPC1 75 | ERBB2IP 76 | ERCC2 77 | EZH2 78 | FAM123B 79 | FAM46C 80 | FAT1 81 | FBXO11 82 | FBXO38 83 | FBXW7 84 | FGFR1 85 | FGFR2 86 | FGFR3 87 | FLT3 88 | FMN2 89 | FMR1 90 | FN1 91 | FXR1 92 | GATA3 93 | GNAI2 94 | GNAS 95 | HGF 96 | HIST2H3D 97 | HLA-A 98 | HLA-B 99 | HLA-DQB1 100 | HNRPDL 101 | HRAS 102 | HSP90AB1 103 | IDH1 104 | IREB2 105 | IRF2 106 | IRF8 107 | IRS2 108 | ITSN1 109 | KALRN 110 | KDM6A 111 | KEAP1 112 | KLF4 113 | KRAS 114 | KRTAP10-1 115 | LRP6 116 | MACF1 117 | MAP2K4 118 | MAP3K4 119 | MECOM 120 | MED12 121 | MET 122 | MGA 123 | MLL 124 | MLL2 125 | MLL3 126 | MNDA 127 | MSR1 128 | MST1 129 | MTOR 130 | MUC21 131 | MUC4 132 | MUC6 133 | MYB 134 | MYC 135 | MYCN 136 | MYD88 137 | MYH10 138 | NCKAP1 139 | NCOR1 140 | NF1 141 | NFATC4 142 | NFE2L2 143 | NOTCH1 144 | NOTCH2 145 | NR4A2 146 | NRAS 147 | NSD1 148 | NUP107 149 | NUP98 150 | OTOP1 151 | PAX5 152 | PBRM1 153 | PCDH18 154 | PCSK6 155 | PIK3CA 156 | PIK3CB 157 | PLCB1 158 | PPT2 159 | PRKAR1A 160 | PTCH1 161 | PTEN 162 | PTPN11 163 | PTPRU 164 | RAD21 165 | RB1 166 | RBM10 167 | RHOA 168 | RPL5 169 | RPS6KA3 170 | RTN4 171 | SEC24D 172 | SETD2 173 | SETDB1 174 | SF3B1 175 | SH2B3 176 | SHMT1 177 | SIN3A 178 | SMAD4 179 | SMARCA4 180 | SMC1A 181 | SMO 182 | SOS1 183 | SOS2 184 | SOX9 185 | SPTAN1 186 | SRGAP3 187 | STAG1 188 | STAG2 189 | SYNCRIP 190 | TAF1 191 | TAOK1 192 | TAOK2 193 | TBL1XR1 194 | TBX3 195 | TCF4 196 | TCF7L2 197 | TEKT4 198 | TGFBR2 199 | TJP1 200 | TMEM63B 201 | TMTC1 202 | TNS3 203 | TP53 204 | TP53BP1 205 | TPSAB1 206 | TRIO 207 | TSC2 208 | WT1 209 | XPO1 210 | ZC3H11A 211 | ZFP36L2 212 | ZNF814 213 | -------------------------------------------------------------------------------- /reduction.pl: -------------------------------------------------------------------------------- 1 | #!/usr/bin/perl -w 2 | 3 | use strict; 4 | use Carp; 5 | use Getopt::Long; 6 | use English; 7 | use Pod::Usage; 8 | use File::Spec; 9 | use File::Path; 10 | use File::Copy; 11 | use File::Basename; 12 | use Cwd; 13 | 14 | my ($input_mt,$exp_mt,$ref_genome); 15 | my ($out_dir,$simul_t, $prefix, $ratio, $ave_prop_CDS, $each_prop_CDS,$pathway); 16 | $simul_t = 0; 17 | $ratio = 0.05; 18 | $ave_prop_CDS = 1.0; 19 | my ($help,$man,$version,$usage); 20 | my $optionOK = GetOptions( 21 | 'i|input_mt=s' => \$input_mt, 22 | 'g|exp_mt=s' => \$exp_mt, 23 | 'f|ref_genome=s' => \$ref_genome, 24 | ); 25 | pod2usage(-verbose=>2) if($man or $usage or $help or $version); 26 | if(!$input_mt) { 27 | pod2usage(1); 28 | croak "You need specify input mutation data file"; 29 | } 30 | $input_mt = File::Spec->rel2abs($input_mt); 31 | if(!$exp_mt) { 32 | pod2usage(1); 33 | croak "You need specify input the predfined gene mutation table"; 34 | } 35 | $exp_mt = File::Spec->rel2abs($exp_mt); 36 | 37 | if(!$ref_genome) { 38 | pod2usage(1); 39 | croak "You need to specify the reference genome"; 40 | } 41 | $ref_genome = File::Spec->rel2abs($ref_genome); 42 | 43 | croak "The file you sepcified does not exist" unless (-e $input_mt && -e $exp_mt && -e $ref_genome); 44 | 45 | croak "The file:$input_mt does not exist" unless (-e $input_mt); 46 | 47 | croak "The file:$exp_mt does not exist" unless (-e $exp_mt); 48 | 49 | croak "The file:$ref_genome does not exist" unless (-e $ref_genome); 50 | 51 | if($each_prop_CDS){ 52 | $each_prop_CDS = File::Spec->rel2abs($each_prop_CDS); 53 | croak "The file you sepcified does not exist" unless (-e $each_prop_CDS); 54 | } 55 | 56 | my $input_base = fileparse($input_mt); 57 | 58 | if(!$prefix) { 59 | $prefix = $input_base; 60 | } 61 | 62 | # check for the presence of R 63 | if( !`which R 2> err.log` ){ 64 | unlink("err.log"); 65 | die "R is not found, which is required\n"; 66 | } 67 | unlink("err.log"); 68 | 69 | 70 | ##-------------------------------------------------------------------------------------------------------- 71 | my %mtype_mbase; 72 | %mtype_mbase = ( 73 | AG => "AT_GC", 74 | TC => "AT_GC", 75 | AC => "AT_CG", 76 | TG => "AT_CG", 77 | AT => "AT_TA", 78 | TA => "AT_TA", 79 | CT => "CG_TA", 80 | GA => "CG_TA", 81 | CA => "CG_AT", 82 | GT => "CG_AT", 83 | CG => "CG_GC", 84 | GC => "CG_GC" 85 | ); 86 | 87 | my @mbase = ("AT_GC", "AT_CG", "AT_TA", "CG_TA", "CG_AT", "CG_GC"); 88 | my @pointmt = ("silent", "missense", "nonsense", "splicing"); 89 | 90 | my (%sample_mtype_pointmt); 91 | 92 | my (%chr_seq, @chr_name, $chr); 93 | 94 | my (@query, @query_match, $qn, $qc, $qm); 95 | my (@db_gene, $dgn, @db_gene_match, $dgm); 96 | my (%gene, @gene_id, $gn, @strand, %sample, @sample_id, $sn); 97 | my (%FSindel, %nFSindel); 98 | my (@mutation_obs); 99 | my ($flank1bp, $ref_var, $mutation); 100 | my (@pathway, $pid, $pname, $pgn); 101 | my (@exp_table, @obs_table, $etn, $otn, $gmt, $smt); 102 | 103 | my ($logfile, $gene_exp_file, $gene_obs_file, $pathway_exp_file, $pathway_obs_file); 104 | my ($out_gene_summary, $out_gene_detail, $out_pathway_summary, $out_pathway_detail); 105 | 106 | 107 | $gene_exp_file = $input_mt . "_exp.tmp"; 108 | $gene_obs_file = $input_mt . "_obs.tmp"; 109 | 110 | 111 | my ($row, @rowarray, $flag, $sum); 112 | my ($i, $j, $k, $h, $m, $s, $pn, $pt); 113 | 114 | 115 | $logfile = "reduction". "\.log.tmp"; 116 | open(LOG,">>$logfile") || die "Cannot creat the file $logfile: $!"; 117 | 118 | open(IN, "$input_mt") || die "Cannot open the file $input_mt: $!"; 119 | @query = (); $qn = 0; $flag=0; 120 | 121 | while(){ 122 | chomp; 123 | @rowarray = split(/\s+/); 124 | $rowarray[2] =~ s/^chr//i; 125 | $rowarray[6] =~ tr/A-Z/a-z/; $rowarray[6] =~ s/splic\w+/splicing/i; $rowarray[6] =~ s/nonsynonymous/missense/i; $rowarray[6] =~ s/synonymous/silent/i; $rowarray[6] =~ s/inframeshift/nFS_indel/i; $rowarray[6] =~ s/nonframeshift/nFS_indel/i;$rowarray[6] =~ s/frameshift/FS_indel/i; $rowarray[6] =~ s/fs_indel/FS_indel/i;$rowarray[6] =~ s/nfs_indel/nFS_indel/i; 126 | if( (($rowarray[6] =~ /silent/ || $rowarray[6] =~ /missense/ || $rowarray[6] =~ /nonsense/ || $rowarray[6] =~ /splicing/) && $rowarray[4]=~ /[ACGT]/ && $rowarray[5]=~ /[ACGT]/) || $rowarray[6] =~ /FS_indel/){ 127 | push @query,[@rowarray]; 128 | $qn++; 129 | } 130 | else{ 131 | if($flag==0){ print LOG "\nGenes do not have any of one defined mutations: silent, missense, nonsense, splicing, FS_indel and nFS_indel\:\n"; } 132 | print LOG "@rowarray\n"; 133 | $flag++; 134 | } 135 | } 136 | close(IN) || die "Cannot close the file $input_mt: $!"; 137 | $qc = scalar(@rowarray); 138 | @query = sort custom_c1 @query; 139 | 140 | open(IN, "$exp_mt") || die "Cannot open the file $exp_mt: $!"; 141 | $row = ; 142 | @db_gene = (); $dgn = 0; 143 | while(){ 144 | chomp; 145 | @rowarray = split(/\s+/); 146 | push @db_gene,[@rowarray]; 147 | $dgn++; 148 | } 149 | close(IN) || die "Cannot close the file $exp_mt: $!"; 150 | @db_gene = sort custom_c0 @db_gene; 151 | 152 | 153 | @query_match = (); $qm = 0; %sample = (); %gene = (); 154 | $flag = 0; $j = 0; 155 | for($i=0;$i<$qn;$i++){ 156 | $k = 0; 157 | for(;$j<$dgn;$j++){ 158 | if($query[$i][1] eq $db_gene[$j][0]){ 159 | @rowarray = (); 160 | for($h=0;$h<$qc;$h++){ $rowarray[$h] = $query[$i][$h]; } 161 | push @query_match,[@rowarray]; 162 | $qm++; 163 | $k++; 164 | 165 | $sample{$rowarray[0]}=0; 166 | $gene{$rowarray[1]}=0; 167 | 168 | last; 169 | } 170 | elsif($query[$i][1] lt $db_gene[$j][0]){ last; } 171 | } 172 | if($k==0){ 173 | if($flag==0){ print LOG "\nQueried genes are not found in the database $exp_mt\:\n"; } 174 | print LOG "$query[$i][1]\n"; 175 | $flag++; 176 | } 177 | } 178 | 179 | @gene_id = sort keys (%gene); 180 | $gn = scalar(@gene_id); 181 | @sample_id = sort keys(%sample); 182 | $sn = scalar(@sample_id); 183 | 184 | @db_gene_match = (); $dgm = 0; 185 | 186 | open(TMP, ">$gene_exp_file") || die "Cannot open the file $gene_exp_file: $!"; 187 | 188 | @strand = (); 189 | $j = 0; 190 | for($i=0;$i<$gn;$i++){ 191 | for(;$j<$dgn;$j++){ 192 | if($gene_id[$i] eq $db_gene[$j][0]){ 193 | push(@strand,$db_gene[$j][4]); 194 | @rowarray = (); 195 | push(@rowarray,$db_gene[$j][0]); 196 | for($h=5;$h<42;$h++){ push(@rowarray,$db_gene[$j][$h]); } 197 | push @db_gene_match,[@rowarray]; 198 | $dgm++; 199 | 200 | print TMP "$db_gene[$j][0]"; 201 | for($h=5;$h<42;$h++){ print TMP "\t$db_gene[$j][$h]"; } 202 | print TMP "\n"; 203 | 204 | last; 205 | } 206 | elsif($gene_id[$i] lt $db_gene[$j][0]){ last; } 207 | } 208 | } 209 | close(TMP) || die "Cannot close the file $gene_exp_file: $!"; 210 | 211 | # read reference sequence 212 | 213 | %chr_seq = (); @chr_name = (); 214 | 215 | open(IN, "$ref_genome") || die "Cannot open the file $ref_genome: $!"; 216 | while(){ 217 | if(/\>chr(\w+)\s+/ || /\>(\w+)\s+/) { $chr=$1; $chr_seq{$chr} = ""; next;} 218 | else { $chr_seq{$chr} .= $_; } 219 | } 220 | close(IN) || die "Cannot close the file $ref_genome: $!"; 221 | 222 | @chr_name = keys(%chr_seq); 223 | foreach $chr (@chr_name) { 224 | $chr_seq{$chr} =~ s/\s+//g; $chr_seq{$chr} =~ s/\d+//g; $chr_seq{$chr} =~ tr/acgt/ACGT/; 225 | } 226 | 227 | # print observed mutation table 228 | 229 | open(TMP, ">$gene_obs_file") || die "Cannot open the file $gene_obs_file: $!"; 230 | 231 | $j = 0; 232 | @mutation_obs = (); 233 | for($i=0;$i<$gn;$i++){####$gn: number of matched gene#### 234 | %sample_mtype_pointmt = (); %FSindel = (); %nFSindel = (); 235 | for($s=0;$s<$sn;$s++){####$sn: number of matched sample#### 236 | $FSindel{$sample_id[$s]} = 0; $nFSindel{$sample_id[$s]} = 0; 237 | $sample_mtype_pointmt{$sample_id[$s]}={ 238 | AT_GC => { silent => 0, missense => 0, nonsense => 0, splicing => 0 }, 239 | AT_CG => { silent => 0, missense => 0, nonsense => 0, splicing => 0 }, 240 | AT_TA => { silent => 0, missense => 0, nonsense => 0, splicing => 0 }, 241 | CG_TA => { silent => 0, missense => 0, nonsense => 0, splicing => 0, silent_CpG => 0, missense_CpG => 0, nonsense_CpG => 0, splicing_CpG => 0}, 242 | CG_AT => { silent => 0, missense => 0, nonsense => 0, splicing => 0, silent_CpG => 0, missense_CpG => 0, nonsense_CpG => 0, splicing_CpG => 0}, 243 | CG_GC => { silent => 0, missense => 0, nonsense => 0, splicing => 0, silent_CpG => 0, missense_CpG => 0, nonsense_CpG => 0, splicing_CpG => 0} 244 | } 245 | } 246 | for(;$j<$qm;$j++){ 247 | if($gene_id[$i] eq $query_match[$j][1]){ 248 | if($query_match[$j][6] =~ /^FS_indel/){ $FSindel{$query_match[$j][0]}++; } 249 | elsif($query_match[$j][6] =~ /nFS_indel/){ $nFSindel{$query_match[$j][0]}++; } 250 | else { 251 | 252 | $flank1bp = substr($chr_seq{$query_match[$j][2]}, $query_match[$j][3]-2, 3); 253 | if(length($query_match[$j][4])==1){ 254 | $ref_var = $query_match[$j][4] . $query_match[$j][5]; 255 | } 256 | else{ 257 | $ref_var = substr($query_match[$j][4],0,1) . substr($query_match[$j][5],0,1); 258 | } 259 | if($strand[$i] eq "-"){ 260 | $ref_var =~ tr/ACGT/TGCA/; 261 | $flank1bp = reverse($flank1bp); $flank1bp =~tr/ACGT/TGCA/; 262 | } 263 | $mutation = $query_match[$j][6]; 264 | if($flank1bp =~/CG/){ $mutation .= "_CpG"; } 265 | 266 | $sample_mtype_pointmt{$query_match[$j][0]}{$mtype_mbase{$ref_var}}{$mutation}++; 267 | } 268 | } 269 | if($gene_id[$i] lt $query_match[$j][1]){ last; } 270 | } 271 | @rowarray = (); 272 | push (@rowarray,$gene_id[$i]); 273 | print TMP "$gene_id[$i]"; 274 | for($s=0;$s<$sn;$s++){ 275 | for($k=0;$k<4;$k++){ 276 | for($h=0;$h<6;$h++){ 277 | $pn = $sample_mtype_pointmt{$sample_id[$s]}{$mbase[$h]}{$pointmt[$k]}; 278 | print TMP "\t$pn"; 279 | push (@rowarray,$pn); 280 | } 281 | $pt = $pointmt[$k] . "_CpG"; 282 | for($h=3;$h<6;$h++){ 283 | $pn = $sample_mtype_pointmt{$sample_id[$s]}{$mbase[$h]}{$pt}; 284 | print TMP "\t$pn"; 285 | push (@rowarray,$pn); 286 | } 287 | } 288 | print TMP "\t$FSindel{$sample_id[$s]}\t$nFSindel{$sample_id[$s]}"; 289 | push (@rowarray,$FSindel{$sample_id[$s]}); 290 | push (@rowarray,$nFSindel{$sample_id[$s]}); 291 | } 292 | print TMP "\n"; 293 | push @mutation_obs,[@rowarray]; 294 | } 295 | close(TMP) || die "Cannot close the file $gene_obs_file: $!"; 296 | 297 | sub custom_c0 { 298 | $a->[0] cmp $b->[0]; 299 | } 300 | 301 | sub custom_c1 { 302 | $a->[1] cmp $b->[1]; 303 | } 304 | 305 | sub generate_random_string 306 | { 307 | my ($length_of_randomstring)=shift; 308 | 309 | my @chars=('a'..'z','A'..'Z','0'..'9','_'); 310 | my $random_string; 311 | foreach (1..$length_of_randomstring) 312 | { 313 | $random_string.=$chars[rand @chars]; 314 | } 315 | return $random_string; 316 | } 317 | -------------------------------------------------------------------------------- /run.driverml.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | help_file=" 4 | \n 5 | \n 6 | \tThis is a free application used for identifying cancer driver genes.\n 7 | \n 8 | \tUSAGE:path/run.driverml.sh -w -i -f -p [options]\n 9 | \n 10 | \tRequired Arguments:\n 11 | \n 12 | \t-w/--pathway\t\t\tThe path to the program.\n 13 | \n 14 | \t-i/--input\t\t\tThe input mutation data file which is in Mutation Annotation Format (MAF). The MAF specifications could\n\t\t\t\t\tbe seen on the manual of this application or NCI Wiki.\n 15 | \n 16 | \t-f/--reference_file\t\tThe human reference genome file for input data.\n 17 | \n 18 | \t-r/--reference_training\t\tThe human reference genome file for training data.\n 19 | \n 20 | \tOptions:\n 21 | \n 22 | \t-g/--mutation_table\t\tThe predefined gene mutation table. Default: GRCh38.\n 23 | \n 24 | \t-y/--tumor_type\t\t\tTraining mutation data. Default: Pan-caner.\n 25 | \n 26 | \t-m/--multicore \t\tSet the number of parallel instances to be run concurrently. Default: 20.\n 27 | \n 28 | \t-o/--output \t\tThe name of the output file. Default: summary.\n 29 | \n 30 | \t-t/--simulation_time \tThe times in monte carlo simulation. Default: 10000.\n 31 | \n 32 | \t-n/--interpolation_number \tThe max interpolation number used to estimate gene characteristics which are unknown. Default: 100.\n 33 | \n 34 | \t-c/--cluster \t\tThe max cluster number used to estimate background mutation rate. Default: 50.\n 35 | \n 36 | \t-p/--prior\t\t\tSet the prior information. Default: Non-TCGA genes from DriverDB and IntOGen databases.\n 37 | \n 38 | \t-d/--indelratio\t\t\tThe ratio of point to indel mutation in background. Default: 0.05.\n 39 | \n 40 | \tOther:\n 41 | \n 42 | \t-h/--help\t\t\tDisplay the help file.\n 43 | \n 44 | \t-v/--version\t\t\tDisplay version information.\n 45 | " 46 | 47 | # initialization of variables 48 | path= 49 | input_file= 50 | mutation_table_file=hg38_refGene.exp 51 | reference_file= 52 | reference_training= 53 | prior= 54 | multi_core=4 55 | monte_carlo_times=2500 56 | indel_ratio=0.05 57 | interpolation_number=100 58 | cluster_number=1 59 | eps=1 60 | tumortype= 61 | output_prefix=summary 62 | date=$(date +%m_%d_%H_%M_%S_%N) 63 | date_pre=pre_${date} 64 | # read the arguments 65 | TEMP=`getopt -o w:i:g:f:p:y:m:o:t:r:d:n:c:e:hv --long pathway:,input:,mutation_table:,reference_genome:,prior:,tumor_type:,multicore:,output:,simulation_time:,reference_training:,indelratio:,interpolation:,cluster:,eps:,help,version -- "$@"` 66 | eval set -- "$TEMP" 67 | 68 | # extract arguments into variables. 69 | while true ; do 70 | case "$1" in 71 | -w|--pathway) 72 | path=$2 ; shift 2 ;; 73 | -i|--input) 74 | input_file=$2 ; shift 2 ;; 75 | -g|--mutation_table) 76 | mutation_table_file=$2 ; shift 2 ;; 77 | -f|--reference_genome) 78 | reference_file=$2 ; shift 2 ;; 79 | -p|--prior) 80 | prior=$2 ; shift 2 ;; 81 | -y|--tumor_type) 82 | tumortype=$2 ; shift 2 ;; 83 | -m|--mluticore) 84 | multi_core=$2 ; shift 2 ;; 85 | -o|--output) 86 | output_prefix=$2 ; shift 2 ;; 87 | -t|--simulation_time) 88 | monte_carlo_times=$2 ; shift 2 ;; 89 | -r|--reference_training) 90 | reference_training=$2 ; shift 2 ;; 91 | -d|--indelratio) 92 | indel_ratio=$2 ; shift 2 ;; 93 | -n|--interpolation) 94 | interpolation_number=$2 ; shift 2 ;; 95 | -c|--cluster) 96 | cluster_number=$2 ; shift 2 ;; 97 | -e|--eps) 98 | eps=$2 ; shift 2 ;; 99 | -h|--help) 100 | echo -e $help_file; exit 1 ;; 101 | -v|--version) 102 | echo '1.0'; exit 0 ;; 103 | --) shift ; break ;; 104 | *) echo "Internal error!" ; exit 1 ;; 105 | esac 106 | done 107 | 108 | # warning for required parameters 109 | if [ -z $path ]; then 110 | echo -e $help_file 111 | exit 0 112 | elif [ -z $input_file ]; then 113 | echo -e $help_file 114 | exit 0 115 | elif [ -z $reference_training ]; then 116 | echo -e $help_file 117 | exit 0 118 | elif [ -z $reference_file ]; then 119 | echo -e $help_file 120 | exit 0 121 | fi 122 | 123 | if [ -z $prior ]; then 124 | prior=${path}/prior/PAN 125 | else 126 | prior_t=${path}/$prior 127 | prior=$prioy_t 128 | fi 129 | 130 | if [ -z $tumortype ];then 131 | training=${path}/training/PAN 132 | else 133 | training=${path}/$tumortype 134 | fi 135 | 136 | #pre-processing 137 | ################################### 138 | Hugo_Symbol=$(awk -F "\t" 'NR==1{gsub(/\t/,"\n");print $0}' $input_file | grep "Hugo_Symbol" -n | awk -F ":" '{print $1}') 139 | Chromosome=$(awk -F "\t" 'NR==1{gsub(/\t/,"\n");print $0}' $input_file | grep "Chromosome" -n | awk -F ":" '{print $1}') 140 | Start_Position=$(awk -F "\t" 'NR==1{gsub(/\t/,"\n");print $0}' $input_file | grep "Start_Position" -n | awk -F ":" '{print $1}') 141 | Variant_Classification=$(awk -F "\t" 'NR==1{gsub(/\t/,"\n");print $0}' $input_file | grep "Variant_Classification" -n | awk -F ":" '{print $1}') 142 | Reference_Allele=$(awk -F "\t" 'NR==1{gsub(/\t/,"\n");print $0}' $input_file | grep "Reference_Allele" -n | awk -F ":" '{print $1}') 143 | Tumor_Seq_Allele2=$(awk -F "\t" 'NR==1{gsub(/\t/,"\n");print $0}' $input_file | grep "Tumor_Seq_Allele2" -n | awk -F ":" '{print $1}') 144 | Tumor_Sample_Barcode=$(awk -F "\t" 'NR==1{gsub(/\t/,"\n");print $0}' $input_file | grep "Tumor_Sample_Barcode" -n | awk -F ":" '{print $1}') 145 | Variant_Type=$(awk -F "\t" 'NR==1{gsub(/\t/,"\n");print $0}' $input_file | grep "Variant_Type" -n | awk -F ":" '{print $1}') 146 | 147 | if [ -n "$Hugo_Symbol" ] && [ -n "$Chromosome" ] && [ -n "$Start_Position" ] && [ -n "$Variant_Classification" ] && [ -n "$Reference_Allele" ] && [ -n "$Tumor_Seq_Allele2" ] && [ -n "$Tumor_Sample_Barcode" ] && [ -n "$Variant_Type" ];then 148 | 149 | awk -F "\t" -v Hugo="$Hugo_Symbol" -v Chromo="$Chromosome" -v Start="$Start_Position" -v Variant="$Variant_Classification" -v Reference="$Reference_Allele" -v Tumor_Seq="$Tumor_Seq_Allele2" -v Tumor_Sample="$Tumor_Sample_Barcode" -v Variant_T="$Variant_Type" '{print $Hugo"\t"$Chromo"\t"$Start"\t"$Variant"\t"$Reference"\t"$Tumor_Seq"\t"$Tumor_Sample"\t"$Variant_T}' $input_file > ${date}_input_1.tmp 150 | else 151 | echo "The format of mutation file is not acceptable. MAF format is required." 152 | exit 0 153 | fi 154 | 155 | awk -F "\t" -v row=1 'NR>=row && $4~/Frame_Shift_Del/{print $7"\t"$1"\t"$2"\t"$3"\t"$5"\t"$6"\t""Frame_Shift_Del""\t"$8}NR>=row && $4~/Frame_Shift_Ins/{print $7"\t"$1"\t"$2"\t"$3"\t"$5"\t"$6"\t""Frame_Shift_Ins""\t"$8}NR>=row && $4~/In_Frame_Del/{print $7"\t"$1"\t"$2"\t"$3"\t"$5"\t"$6"\t""In_Frame_Del""\t"$8}NR>=row && $4~/In_Frame_Ins/{print $7"\t"$1"\t"$2"\t"$3"\t"$5"\t"$6"\t""In_Frame_Ins""\t"$8}NR>=row && $4~/Missense_Mutation/{print $7"\t"$1"\t"$2"\t"$3"\t"$5"\t"$6"\t""Missense_Mutation""\t"$8}NR>=row && $4~/Nonsense_Mutation/{print $7"\t"$1"\t"$2"\t"$3"\t"$5"\t"$6"\t""Nonsense_Mutation""\t"$8}NR>=row && $4~/Nonstop_Mutation/{print $7"\t"$1"\t"$2"\t"$3"\t"$5"\t"$6"\t""Nonstop_Mutation""\t"$8}NR>=row && $4~/Silent/{print $7"\t"$1"\t"$2"\t"$3"\t"$5"\t"$6"\t""Silent""\t"$8}NR>=row && $4~/Splice_Site/ && $8~/SNP|DNP|TNP/{print $7"\t"$1"\t"$2"\t"$3"\t"$5"\t"$6"\t""Splice_Site""\t"$8}NR>=row && $4~/Translation_Start_Site/ && $8~/SNP|DNP|TNP/{print $7"\t"$1"\t"$2"\t"$3"\t"$5"\t"$6"\t""Translation_Start_Site""\t"$8}' ${date}_input_1.tmp > ${date}_input_intermediate2_file.tmp 156 | 157 | awk -F "\t" '{gsub(/chr/,"",$3);print $0}' ${date}_input_intermediate2_file.tmp > ${date}_input_intermediate_file.tmp 158 | 159 | awk -F "\t" -v row=1 'NR>=row && $7~/Frame_Shift_Del|Frame_Shift_Ins/{print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t""Fs_indel"}NR>=row && $7~/In_Frame_Del|In_Frame_Ins/{print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t""nFs_indel"}NR>=row && $7~/Missense_Mutation/{print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t""missense"}NR>=row && $7~/Nonsense_Mutation/{print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t""nonsense"}NR>=row && $7~/Silent/{print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t""silent"}NR>=row && $7~/Splice_Site/ && $8~/SNP|DNP|TNP/{print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t""splicing"}' ${date}_input_intermediate_file.tmp > ${date}_input_intermediate_file_pre.tmp 160 | 161 | awk -F "\t" '$3~/^[1-9]$|^[1-2][0-9]$|^X$|^Y$/ && $2!~/^ENSG/ && $2!~/^LOC/{print $0}' ${date}_input_intermediate_file_pre.tmp > ${date}_input_file.tmp 162 | 163 | $path/add-genes-to-characteristic.pl -i ${date}_input_file.tmp -d $date 164 | 165 | awk -F "\t" -v row=1 'NR>=row && $7~/Frame_Shift_Del|Frame_Shift_Ins|Frame_Shift_Indel/{print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t""Fs_indel"}NR>=row && $7~/In_Frame_Del|In_Frame_Ins|In_Frame_Indel/{print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t""nFs_indel"}NR>=row && $7~/Missense_Mutation/{print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t""missense"}NR>=row && $7~/Nonsense_Mutation/{print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t""nonsense"}NR>=row && $7~/Silent/{print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t""silent"}NR>=row && $7~/Splice_Site/{print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t""splicing"}' $training > ${date_pre}_input_intermediate_file.tmp 166 | 167 | awk -F "\t" '$3~/^[1-9]$|^[1-2][0-9]$|^X$|^Y$/ && $2!~/^ENSG/ && $2!~/^LOC/{print $0}' ${date_pre}_input_intermediate_file.tmp > ${date_pre}_input_file.tmp 168 | 169 | $path/add-genes-to-characteristic.pl -i ${date_pre}_input_file.tmp -d $date_pre 170 | 171 | # find the best parameter 172 | a=1 173 | b=$cluster_number 174 | while [ 1 -gt 0 ] 175 | do 176 | if [ $a -eq $b ];then 177 | cluster_final=$a 178 | break 179 | elif [ $(($b-$a)) -gt 1 ];then 180 | c=$((($a+$b)/2)) 181 | elif [ $b -eq $c ];then 182 | cluster_final=$a 183 | break 184 | else 185 | c=$b 186 | fi 187 | $path/cluster.r $interpolation_number $c ${date} 188 | $path/pre-drgap.pl -n $c -i ${date}_input_file.tmp -d $date 189 | for t in $(seq 1 $c) 190 | do 191 | $path/reduction.pl -i ${date}_subclass-$t.tmp -g ${path}/$mutation_table_file -f $reference_file 192 | $path/write_eta.r ${date}_subclass-$t.tmp_exp.tmp ${date}_subclass-$t.tmp_obs.tmp 193 | done 194 | eta_zero=0 195 | for q in $(seq 1 $c) 196 | do 197 | eta_zero=$(($eta_zero+$(perl $path/exam-eta.pl -f $q -d $date))) 198 | done 199 | if [ $eta_zero -gt 0 ]; then 200 | b=$c 201 | else 202 | a=$c 203 | fi 204 | done 205 | echo "Final cluster number is $cluster_final" 206 | 207 | #manipulate data format 208 | $path/cluster.r $interpolation_number $cluster_final ${date} 209 | $path/pre-drgap.pl -n $cluster_final -i ${date}_input_file.tmp -d $date 210 | 211 | g++ -std=c++11 -fopenmp -o ${date}_monte_carlo.out.tmp $path/monte_carlo_sim.cpp 212 | 213 | for p in $(seq 1 1) 214 | do 215 | 216 | $path/cluster.r $interpolation_number 1 ${date_pre} 217 | 218 | $path/pre-drgap.pl -n 1 -i ${date_pre}_input_file.tmp -d $date_pre 219 | 220 | $path/reduction.pl -i ${date_pre}_subclass-$p.tmp -g ${path}/hg19_refGene.exp -f $reference_training 221 | 222 | done 223 | 224 | #statistic test 225 | 226 | for p in $(seq 1 $cluster_final) 227 | do 228 | $path/reduction.pl -i ${date}_subclass-$p.tmp -g ${path}/$mutation_table_file -f $reference_file 229 | 230 | $path/sta.r $p $multi_core $monte_carlo_times 1 $indel_ratio $prior $date $eps 231 | 232 | ./${date}_monte_carlo.out.tmp $multi_core $monte_carlo_times ${date}_${p}_eta.tmp ${date}_${p}_N.tmp ${date}_${p}_lrt.tmp ${date}_${p}_para.tmp $date $p 233 | 234 | done 235 | 236 | # aggregate results 237 | 238 | $path/assemble.r $cluster_final 1 $output_prefix $date 239 | 240 | # remove temporary files 241 | echo $date 242 | rm ${date}*.tmp 243 | rm ${date_pre}*.tmp 244 | -------------------------------------------------------------------------------- /sta.r: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env Rscript 2 | 3 | rm(list = ls()) 4 | gc() 5 | args<-commandArgs(TRUE) 6 | sample_num<-as.numeric(args[1]) 7 | core_number<-as.numeric(args[2]) 8 | monte_t<-as.numeric(args[3]) 9 | geneRatio<-as.numeric(args[4]) 10 | indelRatio<-as.numeric(args[5]) 11 | prior<-as.character(args[6]) 12 | date<-as.character(args[7]) 13 | eps_test<-as.numeric(args[8]) 14 | if(is.na(core_number))core_number<-10 15 | if(is.na(monte_t))monte_t<-10000 16 | if(is.na(geneRatio))geneRatio<-1 17 | if(is.na(indelRatio))indelRatio<-0.05 18 | #pre date_pre=pre_${date} 19 | date<-paste("pre",date,sep="_") 20 | sample_num<-1 21 | date_pre <- paste(date,"_",sep = "") 22 | ex_file <- paste("subclass-",sample_num,sep = "") 23 | exp_file <- paste(ex_file,'.tmp_exp.tmp',sep = "") 24 | exp_file_date<-paste(date_pre,exp_file,sep="") 25 | ob_file <- paste("subclass-",sample_num,sep = "") 26 | obs_file <- paste(ob_file,'.tmp_obs.tmp',sep = "") 27 | obs_file_date<-paste(date_pre,obs_file,sep="") 28 | geneTableExp <- read.table(exp_file_date, header = FALSE, row.names = 1) 29 | geneTableObs <- read.table(obs_file_date, header = FALSE, row.names = 1) 30 | if(nrow(geneTableExp) != nrow(geneTableObs) | any(rownames(geneTableExp) != rownames(geneTableObs))){ 31 | stop('Invalid row number or row names of the 2 input files!') 32 | } 33 | nGene <- nrow(geneTableExp) 34 | nType <- ncol(geneTableExp) 35 | if(nType == 36){ 36 | ifIndel <- F 37 | }else if(nType == 37){ 38 | nType <- 38 39 | ifIndel <- T 40 | }else stop('Invalid column number of .exp file!') 41 | if(ncol(geneTableObs) %% nType == 0){ 42 | nPeople <- ncol(geneTableObs) / nType 43 | }else stop('Invalid column number of .obs file!') 44 | 45 | if(ifIndel){ 46 | arrGeneTableExp <- cbind(as.matrix(geneTableExp), geneTableExp[, ncol(geneTableExp)]) 47 | dimnames(arrGeneTableExp) <- list(geneName = rownames(geneTableExp), type = c(colnames(geneTableExp)[-ncol(geneTableExp)], 'Fs_indel', 'nFs_ind 48 | el')) 49 | }else{ 50 | arrGeneTableExp <- as.matrix(geneTableExp) 51 | } 52 | fun_LRTnew_num <- function(n,N,eta) 53 | { 54 | logL <- sum(-N + (n / eta)) 55 | return(logL) 56 | } 57 | fun_LRTnew_den <- function(eta) 58 | { 59 | logL <- sum(1 / eta) 60 | return(logL) 61 | } 62 | arrGeneTableObs <- array(as.matrix(geneTableObs), dim = c(nGene, nType, nPeople)) 63 | rm(geneTableObs) 64 | gc() 65 | dimnames(arrGeneTableObs) <- list(geneName = rownames(geneTableExp), type = colnames(arrGeneTableExp), peopleIdx = paste('p', 1:nPeople, sep = '' 66 | )) 67 | for(i in 1:nPeople){ 68 | if(any(temp <- arrGeneTableObs[, , i] > arrGeneTableExp))arrGeneTableObs[, , i][temp] <- arrGeneTableExp[temp] 69 | } 70 | M <- arrGeneTableExp[, 1:9] 71 | N <- arrGeneTableExp[, -(1:9)] 72 | obsData <- list(m = arrGeneTableObs[, 1:9, ], n = arrGeneTableObs[, -(1:9), ]) 73 | rm(arrGeneTableObs) 74 | gc() 75 | etaEstimate <- apply(obsData$m, 2:3, sum) / apply(M, 2, sum) 76 | meanEtaPrior <- apply(etaEstimate, 1, mean) 77 | varEtaPrior <- apply(etaEstimate, 1, var) 78 | alphaPrior <- (meanEtaPrior * (1 - meanEtaPrior) / varEtaPrior - 1) * meanEtaPrior 79 | betaPrior <- (meanEtaPrior * (1 - meanEtaPrior) / varEtaPrior - 1) * (1 - meanEtaPrior) 80 | etaEstimate <- (apply(obsData$m, 2:3, sum) + alphaPrior) / (apply(M, 2, sum) + alphaPrior + betaPrior) * geneRatio 81 | rm(meanEtaPrior,varEtaPrior,alphaPrior,betaPrior) 82 | gc() 83 | if(sum(is.na(etaEstimate))>0){ 84 | etaEstimate[is.na(etaEstimate)] <- min(etaEstimate[!is.na(etaEstimate)]) 85 | } 86 | if(ifIndel){ 87 | etaEstimateIndel <- apply(obsData$m, 3, sum) / sum(M) 88 | meanEtaPriorIndel <- mean(etaEstimateIndel) 89 | varEtaPriorIndel <- var(etaEstimateIndel) 90 | alphaPriorIndel <- (meanEtaPriorIndel * (1 - meanEtaPriorIndel ) / varEtaPriorIndel - 1) * meanEtaPriorIndel 91 | betaPriorIndel <- (meanEtaPriorIndel * (1 - meanEtaPriorIndel) / varEtaPriorIndel - 1) * (1 - meanEtaPriorIndel) 92 | etaEstimateIndel <- (apply(obsData$m, 3, sum) + alphaPriorIndel) / (sum(M) + alphaPriorIndel + betaPriorIndel) 93 | etaEstimateIndel <- etaEstimateIndel * indelRatio * geneRatio 94 | etaEstimate <- rbind(etaEstimate, etaEstimate, etaEstimate, etaEstimateIndel, etaEstimateIndel) 95 | rownames(etaEstimate) <- colnames(N) 96 | }else{ 97 | etaEstimate <- rbind(etaEstimate, etaEstimate, etaEstimate) 98 | rownames(etaEstimate) <- colnames(N) 99 | } 100 | rm(etaEstimateIndel,meanEtaPriorIndel,varEtaPriorIndel,alphaPriorIndel,betaPriorIndel) 101 | gc() 102 | nTypeTest <- nType - 9 103 | gene_name <- row.names(geneTableExp) 104 | gene_name <- as.character(gene_name) 105 | census <- read.table(prior,header=F,sep='\t') 106 | census <- census[,1] 107 | census <- as.character(census) 108 | label<-rep(0,time=length(gene_name)) 109 | for (i in 1:length(gene_name)) { 110 | for (j in 1:length(census)) { 111 | if (gene_name[i]==census[j]) { 112 | label[i]<--1 113 | } 114 | } 115 | } 116 | pre_gene_name<-gene_name 117 | rm(census,gene_name) 118 | gc() 119 | a<-rep(1,time=nTypeTest) 120 | para_tmp<-array(NA,c(nGene,nTypeTest)) 121 | para_tmp2<-array(NA,c(1,nTypeTest)) 122 | para_all<-array(NA,c(nGene,nTypeTest)) 123 | for (k in which(label!=0)){ 124 | f<-function(w){ 125 | s<-c(0,0,0) 126 | for (j in 1:nTypeTest){ 127 | s[1]<-s[1]+w[j]*fun_LRTnew_num(obsData$n[k, j, ],N[k,j],etaEstimate[j, ]) 128 | s[2]<-s[2]+w[j]^2*fun_LRTnew_den(etaEstimate[j, ]) 129 | } 130 | s[3]<-s[3]+s[1]/sqrt(s[2]) 131 | return(-s[3]) 132 | } 133 | para<-optim(a,f) 134 | para_tmp[k,]<-para$par 135 | para_tmp2<-rbind(para_tmp2,para$par) 136 | } 137 | para_tmp2<-para_tmp2[-1,] 138 | para_mean<-colMeans(para_tmp2) 139 | for(k in 1:nGene){ 140 | if(k %in% which(label!=0)){ 141 | para_all[k,]<-para_tmp[k,] 142 | } 143 | else{ 144 | para_all[k,]<-para_mean 145 | } 146 | } 147 | pre_para_all<-para_all 148 | row.names(pre_para_all)<-pre_gene_name 149 | pre_para_mean<-para_mean 150 | print(pre_para_mean) 151 | rm(para,para_tmp,para_tmp2,para_mean,label,para_all) 152 | gc() 153 | date<-as.character(args[7]) 154 | sample_num<-as.numeric(args[1]) 155 | date_pre <- paste(date,"_",sep = "") 156 | ex_file <- paste("subclass-",sample_num,sep = "") 157 | exp_file <- paste(ex_file,'.tmp_exp.tmp',sep = "") 158 | exp_file_date<-paste(date_pre,exp_file,sep="") 159 | ob_file <- paste("subclass-",sample_num,sep = "") 160 | obs_file <- paste(ob_file,'.tmp_obs.tmp',sep = "") 161 | obs_file_date<-paste(date_pre,obs_file,sep="") 162 | geneTableExp <- read.table(exp_file_date, header = FALSE, row.names = 1) 163 | geneTableObs <- read.table(obs_file_date, header = FALSE, row.names = 1) 164 | if(nrow(geneTableExp) != nrow(geneTableObs) | any(rownames(geneTableExp) != rownames(geneTableObs))){ 165 | stop('Invalid row number or row names of the 2 input files!') 166 | } 167 | nGene <- nrow(geneTableExp) 168 | nType <- ncol(geneTableExp) 169 | if(nType == 36){ 170 | ifIndel <- F 171 | }else if(nType == 37){ 172 | nType <- 38 173 | ifIndel <- T 174 | }else stop('Invalid column number of .exp file!') 175 | if(ncol(geneTableObs) %% nType == 0){ 176 | nPeople <- ncol(geneTableObs) / nType 177 | }else stop('Invalid column number of .obs file!') 178 | 179 | if(ifIndel){ 180 | arrGeneTableExp <- cbind(as.matrix(geneTableExp), geneTableExp[, ncol(geneTableExp)]) 181 | dimnames(arrGeneTableExp) <- list(geneName = rownames(geneTableExp), type = c(colnames(geneTableExp)[-ncol(geneTableExp)], 'Fs_indel', 'nFs_ind 182 | el')) 183 | }else{ 184 | arrGeneTableExp <- as.matrix(geneTableExp) 185 | } 186 | arrGeneTableObs <- array(as.matrix(geneTableObs), dim = c(nGene, nType, nPeople)) 187 | rm(geneTableObs) 188 | gc() 189 | dimnames(arrGeneTableObs) <- list(geneName = rownames(geneTableExp), type = colnames(arrGeneTableExp), peopleIdx = paste('p', 1:nPeople, sep = '' 190 | )) 191 | for(i in 1:nPeople){ 192 | if(any(temp <- arrGeneTableObs[, , i] > arrGeneTableExp))arrGeneTableObs[, , i][temp] <- arrGeneTableExp[temp] 193 | } 194 | M <- arrGeneTableExp[, 1:9] 195 | N <- arrGeneTableExp[, -(1:9)] 196 | obsData <- list(m = arrGeneTableObs[, 1:9, ], n = arrGeneTableObs[, -(1:9), ]) 197 | rm(arrGeneTableObs) 198 | gc() 199 | etaEstimate <- apply(obsData$m, 2:3, sum) / apply(M, 2, sum) 200 | meanEtaPrior <- apply(etaEstimate, 1, mean) 201 | varEtaPrior <- apply(etaEstimate, 1, var) 202 | alphaPrior <- (meanEtaPrior * (1 - meanEtaPrior) / varEtaPrior - 1) * meanEtaPrior 203 | betaPrior <- (meanEtaPrior * (1 - meanEtaPrior) / varEtaPrior - 1) * (1 - meanEtaPrior) 204 | etaEstimate <- (apply(obsData$m, 2:3, sum) + alphaPrior) / (apply(M, 2, sum) + alphaPrior + betaPrior) * geneRatio 205 | rm(meanEtaPrior,varEtaPrior,alphaPrior,betaPrior) 206 | gc() 207 | if(sum(is.na(etaEstimate))>0){ 208 | etaEstimate[is.na(etaEstimate)] <- min(etaEstimate[!is.na(etaEstimate)]) 209 | } 210 | if(ifIndel){ 211 | etaEstimateIndel <- apply(obsData$m, 3, sum) / sum(M) 212 | meanEtaPriorIndel <- mean(etaEstimateIndel) 213 | varEtaPriorIndel <- var(etaEstimateIndel) 214 | alphaPriorIndel <- (meanEtaPriorIndel * (1 - meanEtaPriorIndel ) / varEtaPriorIndel - 1) * meanEtaPriorIndel 215 | betaPriorIndel <- (meanEtaPriorIndel * (1 - meanEtaPriorIndel) / varEtaPriorIndel - 1) * (1 - meanEtaPriorIndel) 216 | etaEstimateIndel <- (apply(obsData$m, 3, sum) + alphaPriorIndel) / (sum(M) + alphaPriorIndel + betaPriorIndel) 217 | etaEstimateIndel <- etaEstimateIndel * indelRatio * geneRatio 218 | etaEstimate <- rbind(etaEstimate, etaEstimate, etaEstimate, etaEstimateIndel, etaEstimateIndel) 219 | rownames(etaEstimate) <- colnames(N) 220 | }else{ 221 | etaEstimate <- rbind(etaEstimate, etaEstimate, etaEstimate) 222 | rownames(etaEstimate) <- colnames(N) 223 | } 224 | rm(etaEstimateIndel,meanEtaPriorIndel,varEtaPriorIndel,alphaPriorIndel,betaPriorIndel) 225 | gc() 226 | nTypeTest <- nType - 9 227 | gene_name <- row.names(geneTableExp) 228 | gene_name <- as.character(gene_name) 229 | 230 | para_all<-array(NA,c(nGene,nTypeTest)) 231 | 232 | #pre 233 | for (k in 1:nGene){ 234 | if(gene_name[k]%in%pre_gene_name){ 235 | para_all[k,]<-pre_para_all[gene_name[k],] 236 | } 237 | else{ 238 | para_all[k,]<-pre_para_mean 239 | } 240 | } 241 | if (is.na(sum(para_all))){ 242 | for (k in 1:nGene){ 243 | para_all[k,]<-c(rep(1,9),rep(7,9),rep(5,9),7,3) 244 | } 245 | para_all[1,] 246 | } 247 | 248 | para_file<-paste(date_pre,sample_num,sep="") 249 | para_file_name<-paste(para_file,"para.tmp",sep="_") 250 | 251 | write.table(para_all,para_file_name,quote = F,sep = "\t",row.names = F,col.names = F) 252 | 253 | LRT <- pValueLRT <- array(NA, c(nGene, nTypeTest)) 254 | denominator_help <- array(NA,c(nGene, nTypeTest)) 255 | dimnames(denominator_help) <- dimnames(N) 256 | dimnames(LRT) <- dimnames(pValueLRT) <- dimnames(N) 257 | for(k in 1:nGene) 258 | { 259 | for(j in 1:nTypeTest) 260 | { 261 | denominator_help[k,j] <- fun_LRTnew_den(etaEstimate[j, ]) * para_all[k,j] * para_all[k,j] 262 | 263 | } 264 | } 265 | denominator <- rowSums(denominator_help) 266 | denominator_ok <- rep(NA,nGene) 267 | for(k in 1:nGene){ 268 | denominator_ok[k]<-sqrt(denominator[k]) 269 | } 270 | rm(denominator,denominator_help) 271 | gc() 272 | for(k in 1:nGene) 273 | { 274 | for(j in 1:nTypeTest) 275 | { 276 | LRT[k, j] <- fun_LRTnew_num(obsData$n[k, j, ],N[k,j],etaEstimate[j, ]) * para_all[k,j] / denominator_ok[k] 277 | } 278 | } 279 | epsilon<-eps_test*sum(LRT<0)/length(LRT) 280 | print(epsilon) 281 | etaEstimate<-etaEstimate*(1+epsilon) 282 | 283 | LRT <- pValueLRT <- array(NA, c(nGene, nTypeTest)) 284 | denominator_help <- array(NA,c(nGene, nTypeTest)) 285 | dimnames(denominator_help) <- dimnames(N) 286 | dimnames(LRT) <- dimnames(pValueLRT) <- dimnames(N) 287 | for(k in 1:nGene) 288 | { 289 | for(j in 1:nTypeTest) 290 | { 291 | denominator_help[k,j] <- fun_LRTnew_den(etaEstimate[j, ]) * para_all[k,j] * para_all[k,j] 292 | 293 | } 294 | } 295 | denominator <- rowSums(denominator_help) 296 | denominator_ok <- rep(NA,nGene) 297 | for(k in 1:nGene){ 298 | denominator_ok[k]<-sqrt(denominator[k]) 299 | } 300 | rm(denominator,denominator_help) 301 | gc() 302 | for(k in 1:nGene) 303 | { 304 | for(j in 1:nTypeTest) 305 | { 306 | LRT[k, j] <- fun_LRTnew_num(obsData$n[k, j, ],N[k,j],etaEstimate[j, ]) * para_all[k,j] / denominator_ok[k] 307 | } 308 | } 309 | eta_file<-paste(date_pre,sample_num,sep="") 310 | eta_file_name<-paste(eta_file,"eta.tmp",sep="_") 311 | 312 | write.table(etaEstimate,eta_file_name,quote = F,sep = "\t",row.names = F,col.names = F) 313 | 314 | rm(para_all,denominator_ok,etaEstimate) 315 | gc() 316 | multipleLRT <- rowSums(LRT) 317 | 318 | lrt_file<-paste(date_pre,sample_num,sep="") 319 | lrt_file_name<-paste(lrt_file,"lrt.tmp",sep="_") 320 | 321 | write.table(multipleLRT,lrt_file_name,quote = F,sep = "\t",row.names = F,col.names = F) 322 | 323 | 324 | 325 | N_file<-paste(date_pre,sample_num,sep="") 326 | N_file_name<-paste(N_file,"N.tmp",sep="_") 327 | 328 | write.table(N,N_file_name,quote = F,sep = "\t",row.names = F,col.names = F) 329 | 330 | rm(LRT) 331 | gc() 332 | 333 | nMutationSilent <- apply(obsData$m, 1, sum) 334 | nMutationMissense <- apply(obsData$n[, 1:9, ], 1, sum) 335 | nMutationNonsense <- apply(obsData$n[, 10:18, ], 1, sum) 336 | nMutationSplicing <- apply(obsData$n[, 19:27, ], 1, sum) 337 | if(ifIndel){ 338 | nMutationFsIndel <- apply(obsData$n[, 28, ], 1, sum) 339 | nMutationNFsIndel <- apply(obsData$n[, 29, ], 1, sum) 340 | nMutationTotal <- nMutationSilent + nMutationMissense + nMutationNonsense + nMutationSplicing + nMutationFsIndel + nMutationNFsIndel 341 | outSummary <- data.frame(gene = rownames(N), total= nMutationTotal, silent = nMutationSilent, 342 | missense = nMutationMissense, nonsense = nMutationNonsense, splicing = nMutationSplicing, Fs_indel = nMutationFsIndel, 343 | nFs_indel = nMutationNFsIndel, 344 | LRT = multipleLRT) 345 | }else{ 346 | nMutationTotal <- nMutationSilent + nMutationMissense + nMutationNonsense + nMutationSplicing 347 | outSummary <- data.frame(gene = rownames(N), total= nMutationTotal, silent = nMutationSilent, 348 | missense = nMutationMissense, nonsense = nMutationNonsense, splicing = nMutationSplicing, 349 | LRT = multipleLRT) 350 | } 351 | out_fi <- paste('out_file_',sample_num,sep = "") 352 | out_file <- paste(out_fi,'.tmp',sep="") 353 | out_file_date<-paste(date_pre,out_file,sep='') 354 | write.table(outSummary, out_file_date, row.names = F, quote = F, sep = '\t') 355 | -------------------------------------------------------------------------------- /training.tar.bz2: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HelloYiHan/DriverML/f1e53f9922b1d8b26d347cc4ab5e777500e2c2d5/training.tar.bz2 -------------------------------------------------------------------------------- /write_eta.r: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env Rscript 2 | rm(list = ls()) 3 | args <- commandArgs(trailingOnly = TRUE) 4 | infileExp <- args[1] 5 | infileObs <- args[2] 6 | if(is.na(infileExp) | is.na(infileObs))stop('No input files in R!') 7 | geneRatio <- 1 8 | indelRatio <- 0.05 9 | ifHeaderExp <- F 10 | ifHeaderObs <- F 11 | output=paste(unlist(strsplit(infileExp,split = "[.]"))[1],'-eta.tmp',sep='') 12 | #*************************************************************************************# 13 | geneTableExp <- read.table(infileExp, header = ifHeaderExp, row.names = 1) 14 | geneTableObs <- read.table(infileObs, header = ifHeaderObs, row.names = 1) 15 | #****************************************************************# 16 | funLogLikelihood <- function(n, N, eta, alpha) 17 | { 18 | logL <- sum(-N * (eta + alpha) + log((N * (eta + alpha))^n) - log(factorial(n))) 19 | return(logL) 20 | } 21 | #---------------------------------------- 22 | fun_LRTnew_num <- function(n,N,eta) 23 | { 24 | logL <- sum(-N + (n / eta)) 25 | return(logL) 26 | } 27 | fun_LRTnew_den <- function(eta) 28 | { 29 | logL <- sum(1 / eta) 30 | return(logL) 31 | } 32 | #---------------------------------------------- 33 | 34 | if(nrow(geneTableExp) != nrow(geneTableObs) | any(rownames(geneTableExp) != rownames(geneTableObs)))stop('Invalid row number or row names of the 2 input files!') 35 | nGene <- nrow(geneTableExp) 36 | nType <- ncol(geneTableExp) 37 | if(nType == 36){ 38 | ifIndel <- F 39 | }else if(nType == 37){ 40 | nType <- 38 41 | ifIndel <- T 42 | }else stop('Invalid column number of .exp file!') 43 | if(ncol(geneTableObs) %% nType == 0){ 44 | nPeople <- ncol(geneTableObs) / nType 45 | }else stop('Invalid column number of .obs file!') 46 | 47 | if(ifIndel){ 48 | arrGeneTableExp <- cbind(as.matrix(geneTableExp), geneTableExp[, ncol(geneTableExp)]) 49 | dimnames(arrGeneTableExp) <- list(geneName = rownames(geneTableExp), type = c(colnames(geneTableExp)[-ncol(geneTableExp)], 'Fs_indel', 'nFs_indel')) 50 | }else{ 51 | arrGeneTableExp <- as.matrix(geneTableExp) 52 | } 53 | 54 | arrGeneTableObs <- array(as.matrix(geneTableObs), dim = c(nGene, nType, nPeople)) 55 | dimnames(arrGeneTableObs) <- list(geneName = rownames(geneTableExp), type = colnames(arrGeneTableExp), peopleIdx = paste('p', 1:nPeople, sep = '')) 56 | 57 | for(i in 1:nPeople){ 58 | if(any(temp <- arrGeneTableObs[, , i] > arrGeneTableExp))arrGeneTableObs[, , i][temp] <- arrGeneTableExp[temp] 59 | } 60 | 61 | M <- arrGeneTableExp[, 1:9] 62 | N <- arrGeneTableExp[, -(1:9)] 63 | obsData <- list(m = arrGeneTableObs[, 1:9, ], n = arrGeneTableObs[, -(1:9), ]) 64 | 65 | etaEstimate <- apply(obsData$m, 2:3, sum) / apply(M, 2, sum) 66 | meanEtaPrior <- apply(etaEstimate, 1, mean) 67 | write.table(meanEtaPrior,file=output,row.names=F,col.names=F,quote=F,sep='\t',append=F) 68 | varEtaPrior <- apply(etaEstimate, 1, var) 69 | alphaPrior <- (meanEtaPrior * (1 - meanEtaPrior) / varEtaPrior - 1) * meanEtaPrior 70 | betaPrior <- (meanEtaPrior * (1 - meanEtaPrior) / varEtaPrior - 1) * (1 - meanEtaPrior) 71 | etaEstimate <- (apply(obsData$m, 2:3, sum) + alphaPrior) / (apply(M, 2, sum) + alphaPrior + betaPrior) * geneRatio 72 | etaEstimate[is.na(etaEstimate)] <- min(etaEstimate[!is.na(etaEstimate)]) 73 | 74 | if(ifIndel){ 75 | etaEstimateIndel <- apply(obsData$m, 3, sum) / sum(M) 76 | meanEtaPriorIndel <- mean(etaEstimateIndel) 77 | write.table(meanEtaPriorIndel,file=output,row.names=F,col.names=F,quote=F,sep='\t',append=T) 78 | varEtaPriorIndel <- var(etaEstimateIndel) 79 | alphaPriorIndel <- (meanEtaPriorIndel * (1 - meanEtaPriorIndel ) / varEtaPriorIndel - 1) * meanEtaPriorIndel 80 | betaPriorIndel <- (meanEtaPriorIndel * (1 - meanEtaPriorIndel) / varEtaPriorIndel - 1) * (1 - meanEtaPriorIndel) 81 | etaEstimateIndel <- (apply(obsData$m, 3, sum) + alphaPriorIndel) / (sum(M) + alphaPriorIndel + betaPriorIndel) 82 | etaEstimateIndel <- etaEstimateIndel * indelRatio * geneRatio 83 | etaEstimate <- rbind(etaEstimate, etaEstimate, etaEstimate, etaEstimateIndel, etaEstimateIndel) 84 | rownames(etaEstimate) <- colnames(N) 85 | }else{ 86 | etaEstimate <- rbind(etaEstimate, etaEstimate, etaEstimate) 87 | rownames(etaEstimate) <- colnames(N) 88 | } 89 | --------------------------------------------------------------------------------