├── .DS_Store ├── example ├── input.bed ├── input.txt └── input.vcf ├── doc ├── .DS_Store ├── img │ ├── new.png │ ├── .DS_Store │ └── icages_pipeline.png ├── misc │ ├── .DS_Store │ ├── license.md │ ├── credit.md │ └── workflow.md ├── user-guide │ ├── .DS_Store │ ├── usage.md │ ├── startup.md │ ├── download.md │ └── example.md ├── mkdocs.yml └── index.md ├── img └── icages_pipeline.png ├── README.md ├── genomeLocusFinder.pl ├── bin ├── icagesJson.pl ├── icagesGene.pl ├── icagesDrug.pl ├── icagesMutation.pl ├── icagesMutationNew.pl └── DGIdb │ └── local │ └── lib.pm └── icages.pl /.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WGLab/icages/HEAD/.DS_Store -------------------------------------------------------------------------------- /example/input.bed: -------------------------------------------------------------------------------- 1 | chr10 89677000 89690000 2 | chr8 38336000 38353000 3 | -------------------------------------------------------------------------------- /doc/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WGLab/icages/HEAD/doc/.DS_Store -------------------------------------------------------------------------------- /doc/img/new.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WGLab/icages/HEAD/doc/img/new.png -------------------------------------------------------------------------------- /doc/img/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WGLab/icages/HEAD/doc/img/.DS_Store -------------------------------------------------------------------------------- /doc/misc/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WGLab/icages/HEAD/doc/misc/.DS_Store -------------------------------------------------------------------------------- /doc/user-guide/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WGLab/icages/HEAD/doc/user-guide/.DS_Store -------------------------------------------------------------------------------- /img/icages_pipeline.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WGLab/icages/HEAD/img/icages_pipeline.png -------------------------------------------------------------------------------- /doc/img/icages_pipeline.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WGLab/icages/HEAD/doc/img/icages_pipeline.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # iCAGES 2 | This is iCAGES command line that prioritizes personalized cancer driver mutations, genes and therapies. 3 | 4 | Please see http://icages.openbioinformatics.org/en/latest for documentation. 5 | 6 | 7 | -------------------------------------------------------------------------------- /doc/mkdocs.yml: -------------------------------------------------------------------------------- 1 | site_name: iCAGES Documentation 2 | site_url: http://icages.openbioinformatics.org/ 3 | repo_url: http://github.com/WangGenomicsLab/iCAGES/doc/ 4 | repo_name: GitHub 5 | site_description: Documentation for iCAGES software 6 | site_author: Coco Dong et. al. 7 | site_favicon: favicon.ico 8 | 9 | docs_dir: . 10 | include_search: true 11 | #use_absolute_urls: true 12 | #use_directory_urls: false 13 | 14 | pages: 15 | - iCAGES: index.md 16 | - User Guide: 17 | - Download and install iCAGES: user-guide/download.md 18 | - Quick Start-Up Guide: user-guide/startup.md 19 | - Usage: user-guide/usage.md 20 | - Examples: user-guide/example.md 21 | - Misc: 22 | - How it works: misc/workflow.md 23 | - Credit: misc/credit.md 24 | - License: misc/license.md 25 | -------------------------------------------------------------------------------- /doc/misc/license.md: -------------------------------------------------------------------------------- 1 | ## License Agreement 2 | By using the software, you acknowledge that you agree to the terms below: 3 | 4 | For academic and non-profit use, you are free to fork, download, modify, distribute and use the software without restriction. 5 | 6 | For commercial use, you are required to contact [Stevens Institute of Innovation](https://stevens.usc.edu/contact-us/) at USC directly to discuss licensing options. 7 | 8 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 9 | -------------------------------------------------------------------------------- /doc/misc/credit.md: -------------------------------------------------------------------------------- 1 | ## Acknowledgements 2 | 3 | The iCAGES software is originally designed by Coco Dong, under the mentorship of Dr. Kai Wang. Other developers and significant contributors include Zeyu He, a graduate student from New York University who helped delevelop web interface for iCAGES. Other members from our lab include Yunfei Guo and Hui Yang and many iCAGES users have provided feedbacks, bug reports, code snipets and suggestions to improve the functionality of iCAGES and I am indebted to them for their invaluable help. 4 | 5 | ## Reference 6 | 7 | - Chengliang Dong, Yunfei Guo, Hui Yang, Zeyu He, Xiaoming Liu, Kai Wang [**iCAGES: integrated CAncer GEnome Score for comprehensively prioritizing driver genes in personal cancer genomes**] (http://genomemedicine.biomedcentral.com/articles/10.1186/s13073-016-0390-0). Genome Medicine. 2016. DOI: 10.1186/s13073-016-0390-0 8 | 9 | -------------------------------------------------------------------------------- /doc/index.md: -------------------------------------------------------------------------------- 1 | # iCAGES Documentation 2 | 3 | iCAGES is an efficient software tool to prioritizes personalized cancer driver mutations, genes and drugs. Given input in ANNOVAR input format, VCF format or BED format, iCAGES can generate a list of 4 | 5 | - iCAGES mutation scores for all point coding variants, non-coding variants and structural variations, which measure genomic cancer driving potential of these variants. 6 | 7 | - iCAGES gene scores for all mutated genes, which measure personal cancer driving potential of these genes. 8 | 9 | - iCAGES drug scores for all potential drugs targeting any gene mutated in this particular patient. 10 | 11 | Please click the menu items to navigate through this website. The web interface of iCAGES may be previously accessed [here](http://icages.wglab.edu) (the server was shut down by the previous university, so if you want to run the web server, please download it and run yourself from [here](https://github.com/WGLab/icages-server). If you have questions, comments and bug reports, please post them in the Disqus comment form in this website (or email me or my mentor Dr. Kai Wang directly). Thank you very much for your help and support! 12 | 13 | --- 14 | 15 | ![new](img/new.png) 2015Feb26: v1.0.0 initial stable release of iCAGES command line is available now! 16 | 17 | ![new](img/new.png) 2015Nov27: v1.0.1 stable release of iCAGES command line is available now! 18 | 19 | ![new](img/new.png) 2017Jan08: v1.0.2 stable release of iCAGES command line is available now! 20 | -------------------------------------------------------------------------------- /doc/misc/workflow.md: -------------------------------------------------------------------------------- 1 | ## How iCAGES works 2 | ![iCAGES pipeline](/img/icages_pipeline.png) 3 | 4 | A flowchart to show the process of iCAGES package. The iCAGES package consists of three layers. The input file contains all variants identified from the patient; it can be either in ANNOVAR input format or in VCF format. The first layer of iCAGES prioritizes mutations. It computes three different feature scores for annotating the gene, including radial SVM score for each of its point coding mutation, CNV normalized peak score for each of its structural variation and FunSeq2 score for each of its point non-coding mutation. The second layer of iCAGES prioritizes cancer driver genes. It takes three features scores from the first layer, generates the corresponding Phenolyzer score for each mutated gene and computes an LR score for this gene (iCAGES gene score). The final level of iCAGES prioritizes targeted drugs. It first queries DGIdb for potential drugs that interact with the patient’ s mutated genes and their neighbors. Next, it calculates the joint probability for each drug being the most effective (iCAGES drug score) from three feature scores, which are iCAGES score for its direct/indirect target, normalized BioSystems probability measuring the maximum relatedness of the drugs’ direct target with each mutated gene (final target) in the patient and PubChem active probability measuring the bioactivity of the drug. The final output of iCAGES consists of three major elements, a prioritized list of mutations, a prioritized list of genes with their iCAGES gene scores as well as a prioritized list of targeted drugs with their iCAGES drug scores, all highlighted in red. 5 | 6 | 7 | 8 | --- 9 | 10 |
11 | 24 | 25 | 26 | -------------------------------------------------------------------------------- /doc/user-guide/usage.md: -------------------------------------------------------------------------------- 1 | ## SYNOPSIS 2 | 3 | icages.pl (input_file) [options] 4 | 5 | ## OPTIONS 6 | 7 | - -h, --help 8 | print help message 9 | 10 | - -m, --manual 11 | print manual message 12 | 13 | - -t, --tumor (TEXT) 14 | name of column that contains tumor mutations in your vcf file (if you have multiple samples with tumor mutations, please use this option to select tumor mutations that you want to analyze) 15 | 16 | - -g, --germline (TEXT) 17 | name of column that contains germline mutations in your vcf file (if you have multiple samples with germline mutations, please use this option to select germline mutations that you want to compare your tumor mutations against to generate somatic mutation profiles for the sample you want to analyze) 18 | 19 | - -i, --id (TEXT) 20 | name of column that contains somatic mutations in your multiple sample vcf file with only somatic mutations (if you have multiple samples with tumor and germline mutations, please use -g and -t options instead) 21 | 22 | - -s, --subtype (TEXT) 23 | subtype of the cancer, valid options include "ACC", "BLCA", "BRCA", "CESC", "CHOL", "ESCA", "GBM", "HNSC", "KICH", "KIRC", "KIRP", "LAML", "LGG", "LIHC", "LUSC", "OV", "PAAD", "PCPG", "PRAD", "SARC", "SKCM", "STAD", "TGCT", "TGCA", "THYM", "UCEC", "UCS", "UVM" 24 | 25 | - --logdir 26 | directory for log files generated by iCAGES 27 | 28 | - --tempdir (TEXT) 29 | directory for temporary files generated by iCAGES 30 | 31 | - --outputdir (TEXT) 32 | directory for output files generated by iCAGES 33 | 34 | - -p, --prefix (TEXT) 35 | prefix of all files generated by iCAGES 36 | 37 | - -b, --bed (TEXT) 38 | additional bed file specifying the location of structural variations in the sample. Note that if you only have bed file for the patient of your interest, please use it as input file directly without using -b option, with the command icages.pl input.bed 39 | 40 | - --buildver (TEXT) 41 | reference genome version, valid options include "hg19", "hg38" and "hg18" 42 | 43 | - -e, --expression (TEXT) 44 | bed file describing gene expression patterns, the columns are chromosome, start, end, log fold changes 45 | 46 | --- 47 | 48 |
49 | 62 | 63 | 64 | -------------------------------------------------------------------------------- /example/input.txt: -------------------------------------------------------------------------------- 1 | 1 12919840 12919840 T C 2 | 1 35332717 35332717 C A 3 | 1 55148456 55148456 G T 4 | 1 70504789 70504789 C T 5 | 1 167059520 167059520 A T 6 | 1 182496864 182496864 A T 7 | 1 197073351 197073351 C T 8 | 1 216373211 216373211 G T 9 | 10 37490170 37490170 G A 10 | 10 56089432 56089432 A C 11 | 10 69957135 69957135 A T 12 | 10 125601918 125601918 C A 13 | 10 125804220 125804220 C A 14 | 10 134649656 134649656 C A 15 | 11 5172737 5172737 G T 16 | 11 5905898 5905898 G T 17 | 11 6942832 6942832 C A 18 | 11 16068172 16068172 C T 19 | 11 44297060 44297060 G A 20 | 11 48328274 48328274 G C 21 | 11 89409330 89409330 A G 22 | 11 107965601 107965601 T G 23 | 11 120180145 120180145 C A 24 | 12 32481489 32481489 G A 25 | 12 67699695 67699695 G T 26 | 12 96883571 96883571 T C 27 | 12 119968739 119968739 G C 28 | 13 32340125 32340125 C A 29 | 13 36521559 36521559 G A 30 | 14 20529107 20529107 C T 31 | 14 36154270 36154270 T C 32 | 14 59112264 59112264 G C 33 | 14 63174963 63174963 C A 34 | 14 94007023 94007023 A T 35 | 14 95557552 95557552 G A 36 | 14 105410443 105410443 C G 37 | 15 20739706 20739706 T C 38 | 15 30054548 30054548 A G 39 | 16 20376770 20376770 G T 40 | 16 66422268 66422268 C A 41 | 17 7125396 7125396 A G 42 | 17 21731175 21731175 C A 43 | 17 26958542 26958542 C T 44 | 17 41836054 41836054 C T 45 | 18 14852153 14852153 G T 46 | 18 55319937 55319937 G A 47 | 18 58039082 58039082 C A 48 | 18 61170854 61170854 C T 49 | 19 13249152 13249152 G T 50 | 19 52938422 52938422 C A 51 | 19 53770704 53770704 C G 52 | 19 53911857 53911857 C T 53 | 19 57286834 57286834 A T 54 | 2 84940277 84940277 G T 55 | 2 112945031 112945031 A T 56 | 2 136552307 136552307 T G 57 | 2 157332699 157332699 G T 58 | 2 179588352 179588352 C A 59 | 2 187627271 187627271 G T 60 | 2 189863400 189863400 G T 61 | 2 189910616 189910616 A G 62 | 2 201533514 201533514 T C 63 | 2 238245127 238245127 C A 64 | 20 23473695 23473695 C G 65 | 20 29956465 29956465 G A 66 | 20 47266080 47266080 G C 67 | 22 30661003 30661003 G C 68 | 3 10157495 10157495 A G 69 | 3 11064098 11064098 A T 70 | 3 13413467 13413467 C A 71 | 3 51690015 51690015 C T 72 | 3 52417509 52417509 G T 73 | 3 63821060 63821060 T A 74 | 3 100087931 100087931 T C 75 | 3 112998777 112998777 C T 76 | 3 118621763 118621763 G T 77 | 4 13617077 13617077 C A 78 | 4 20533629 20533629 A G 79 | 4 147830310 147830310 G C 80 | 4 173269688 173269688 T A 81 | 5 19483542 19483542 G T 82 | 5 37326041 37326041 C A 83 | 5 55110949 55110949 C T 84 | 5 75596562 75596562 C A 85 | 5 140536091 140536091 C G 86 | 5 140616128 140616128 C G 87 | 5 140723876 140723876 G T 88 | 5 159686516 159686516 G A 89 | 6 7368953 7368953 G C 90 | 6 49808692 49808692 G C 91 | 6 109752383 109752383 C T 92 | 6 137329736 137329736 C T 93 | 7 47867026 47867026 G T 94 | 7 69583198 69583198 C A 95 | 7 93518498 93518498 G A 96 | 7 149475905 149475905 C A 97 | 8 3076901 3076901 G A 98 | 8 11995670 11995670 G T 99 | 8 37704629 37704629 C T 100 | 8 71068967 71068967 C A 101 | 8 77764912 77764912 G C 102 | 8 77775920 77775920 C A 103 | 8 79510674 79510674 A C 104 | 8 88363968 88363968 C A 105 | 8 89179974 89179974 G A 106 | 8 105503431 105503431 C A 107 | 8 118819526 118819526 G A 108 | 8 118831993 118831993 C A 109 | 9 17409334 17409334 A G 110 | 9 21201831 21201831 G T 111 | 9 43627065 43627065 C T 112 | 9 79325065 79325065 C A 113 | 9 104171019 104171019 A T 114 | 9 105767273 105767273 C A 115 | X 47426121 47426121 C G 116 | X 48213490 48213490 C A 117 | X 54020144 54020144 G T 118 | X 55249114 55249114 G T 119 | X 64140039 64140039 T A 120 | X 71802303 71802303 A T 121 | X 75004601 75004601 C G 122 | X 76939716 76939716 G A 123 | X 78216983 78216983 T A 124 | X 100513426 100513426 G T 125 | X 118717207 118717207 C T 126 | X 130416606 130416606 G T 127 | X 134305608 134305608 G A 128 | X 148037166 148037166 G T 129 | X 38280273 38280274 CA CA 130 | 12 29908693 29908693 C - 131 | 19 53612719 53612719 - T 132 | 2 229997 229997 - GAT 133 | 6 7886214 7886216 GAA - 134 | X 56592088 56592088 - T 135 | -------------------------------------------------------------------------------- /doc/user-guide/startup.md: -------------------------------------------------------------------------------- 1 | ## Important notice 2 | 3 | As iCAGES depends on modules to call web API of DGIdb, please make sure that your PC/Mac is connected to the Internet. Thanks! 4 | 5 | ## ANNOVAR input 6 | 7 | For beginners, the easiest way to use iCAGES is to annotate somatic mutations in [ANNOVAR](http://annovar.openbioinformatics.org/) input format with reference genome version hg19. This exemplary input file is provided in iCAGES package. 8 | ``` 9 | [cocodong@biocluster ~/]$ head input.txt 10 | 1 12919840 12919840 T C 11 | 1 35332717 35332717 C A 12 | 1 55148456 55148456 G T 13 | 1 70504789 70504789 C T 14 | 1 167059520 167059520 A T 15 | 1 182496864 182496864 A T 16 | 1 197073351 197073351 C T 17 | 1 216373211 216373211 G T 18 | 10 37490170 37490170 G A 19 | 10 56089432 56089432 A C 20 | [cocodong@biocluster ~/]$ icages.pl input.txt 21 | 22 | ``` 23 | 24 | ## VCF input with one sample with his/her somatic mutations only 25 | 26 | If you have somatic mutations for one sample in VCF file format with reference genome version hg19, iCAGES can automaticaly detect the input format and analyze your data. This exemplary input file is also provided in iCAGES package. 27 | ``` 28 | [cocodong@biocluster ~/]$ cat input.vcf 29 | ##fileformat=VCFv4.1 30 | ##FORMAT= 31 | ##contig= 32 | ##contig= 33 | ##contig= 34 | ##contig= 35 | ##contig= 36 | ##contig= 37 | ##contig= 38 | ##contig= 39 | ##contig= 40 | ##contig= 41 | ##contig= 42 | ##contig= 43 | ##contig= 44 | ##contig= 45 | ##contig= 46 | ##contig= 47 | ##contig= 48 | ##contig= 49 | ##contig= 50 | ##contig= 51 | ##contig= 52 | ##contig= 53 | ##contig= 54 | ##contig= 55 | ##contig= 56 | #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT tumor 57 | 1 12919840 . T C . . . GT 1|1 58 | 1 35332717 . C A . . . GT 1|1 59 | 1 55148456 . G T . . . GT 1|1 60 | 1 70504789 . C T . . . GT 1|1 61 | 1 167059520 . A T . . . GT 1|1 62 | 1 182496864 . A T . . . GT 1|1 63 | 1 197073351 . C T . . . GT 1|1 64 | 1 216373211 . G T . . . GT 1|1 65 | 10 37490170 . G A . . . GT 1|1 66 | 10 56089432 . A C . . . GT 1|1 67 | ... 68 | [cocodong@biocluster ~/]$ icages.pl input.vcf 69 | ``` 70 | 71 | ## BED input with one sample with his/her somatic structural variations only 72 | 73 | If you only have BED files for all structural variations detected in this patient with reference genome version hg19, iCAGES can also automatically detect the input format of your data and carry on downstream analysis. This exemplary input file is also provided in iCAGES package. 74 | 75 | ``` 76 | [cocodong@biocluster ~/]$ head input.bed 77 | chr12 85865797 85887628 78 | chr20 15052592 15071191 79 | chr16 87340388 87349798 80 | chr2 213000509 213007522 81 | [cocodong@biocluster ~/]$ icages.pl input.bed 82 | ``` 83 | 84 | 85 | 86 | --- 87 | 88 |
89 | 102 | 103 | 104 | -------------------------------------------------------------------------------- /example/input.vcf: -------------------------------------------------------------------------------- 1 | ##fileformat=VCFv4.1 2 | ##FORMAT= 3 | ##contig= 4 | ##contig= 5 | ##contig= 6 | ##contig= 7 | ##contig= 8 | ##contig= 9 | ##contig= 10 | ##contig= 11 | ##contig= 12 | ##contig= 13 | ##contig= 14 | ##contig= 15 | ##contig= 16 | ##contig= 17 | ##contig= 18 | ##contig= 19 | ##contig= 20 | ##contig= 21 | ##contig= 22 | ##contig= 23 | ##contig= 24 | ##contig= 25 | ##contig= 26 | ##contig= 27 | ##contig= 28 | #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT tumor 29 | 1 12919840 . T C . . . GT 1|1 30 | 1 35332717 . C A . . . GT 1|1 31 | 1 55148456 . G T . . . GT 1|1 32 | 1 70504789 . C T . . . GT 1|1 33 | 1 167059520 . A T . . . GT 1|1 34 | 1 182496864 . A T . . . GT 1|1 35 | 1 197073351 . C T . . . GT 1|1 36 | 1 216373211 . G T . . . GT 1|1 37 | 10 37490170 . G A . . . GT 1|1 38 | 10 56089432 . A C . . . GT 1|1 39 | 10 69957135 . A T . . . GT 1|1 40 | 10 125601918 . C A . . . GT 1|1 41 | 10 125804220 . C A . . . GT 1|1 42 | 10 134649656 . C A . . . GT 1|1 43 | 11 5172737 . G T . . . GT 1|1 44 | 11 5905898 . G T . . . GT 1|1 45 | 11 6942832 . C A . . . GT 1|1 46 | 11 16068172 . C T . . . GT 1|1 47 | 11 44297060 . G A . . . GT 1|1 48 | 11 48328274 . G C . . . GT 1|1 49 | 11 89409330 . A G . . . GT 1|1 50 | 11 107965601 . T G . . . GT 1|1 51 | 11 120180145 . C A . . . GT 1|1 52 | 12 32481489 . G A . . . GT 1|1 53 | 12 67699695 . G T . . . GT 1|1 54 | 12 96883571 . T C . . . GT 1|1 55 | 12 119968739 . G C . . . GT 1|1 56 | 13 32340125 . C A . . . GT 1|1 57 | 13 36521559 . G A . . . GT 1|1 58 | 14 20529107 . C T . . . GT 1|1 59 | 14 36154270 . T C . . . GT 1|1 60 | 14 59112264 . G C . . . GT 1|1 61 | 14 63174963 . C A . . . GT 1|1 62 | 14 94007023 . A T . . . GT 1|1 63 | 14 95557552 . G A . . . GT 1|1 64 | 14 105410443 . C G . . . GT 1|1 65 | 15 20739706 . T C . . . GT 1|1 66 | 15 30054548 . A G . . . GT 1|1 67 | 16 20376770 . G T . . . GT 1|1 68 | 16 66422268 . C A . . . GT 1|1 69 | 17 7125396 . A G . . . GT 1|1 70 | 17 21731175 . C A . . . GT 1|1 71 | 17 26958542 . C T . . . GT 1|1 72 | 17 41836054 . C T . . . GT 1|1 73 | 18 14852153 . G T . . . GT 1|1 74 | 18 55319937 . G A . . . GT 1|1 75 | 18 58039082 . C A . . . GT 1|1 76 | 18 61170854 . C T . . . GT 1|1 77 | 19 13249152 . G T . . . GT 1|1 78 | 19 52938422 . C A . . . GT 1|1 79 | 19 53770704 . C G . . . GT 1|1 80 | 19 53911857 . C T . . . GT 1|1 81 | 19 57286834 . A T . . . GT 1|1 82 | 2 84940277 . G T . . . GT 1|1 83 | 2 112945031 . A T . . . GT 1|1 84 | 2 136552307 . T G . . . GT 1|1 85 | 2 157332699 . G T . . . GT 1|1 86 | 2 179588352 . C A . . . GT 1|1 87 | 2 187627271 . G T . . . GT 1|1 88 | 2 189863400 . G T . . . GT 1|1 89 | 2 189910616 . A G . . . GT 1|1 90 | 2 201533514 . T C . . . GT 1|1 91 | 2 238245127 . C A . . . GT 1|1 92 | 20 23473695 . C G . . . GT 1|1 93 | 20 29956465 . G A . . . GT 1|1 94 | 20 47266080 . G C . . . GT 1|1 95 | 22 30661003 . G C . . . GT 1|1 96 | 3 10157495 . A G . . . GT 1|1 97 | 3 11064098 . A T . . . GT 1|1 98 | 3 13413467 . C A . . . GT 1|1 99 | 3 51690015 . C T . . . GT 1|1 100 | 3 52417509 . G T . . . GT 1|1 101 | 3 63821060 . T A . . . GT 1|1 102 | 3 100087931 . T C . . . GT 1|1 103 | 3 112998777 . C T . . . GT 1|1 104 | 3 118621763 . G T . . . GT 1|1 105 | 4 13617077 . C A . . . GT 1|1 106 | 4 20533629 . A G . . . GT 1|1 107 | 4 147830310 . G C . . . GT 1|1 108 | 4 173269688 . T A . . . GT 1|1 109 | 5 19483542 . G T . . . GT 1|1 110 | 5 37326041 . C A . . . GT 1|1 111 | 5 55110949 . C T . . . GT 1|1 112 | 5 75596562 . C A . . . GT 1|1 113 | 5 140536091 . C G . . . GT 1|1 114 | 5 140616128 . C G . . . GT 1|1 115 | 5 140723876 . G T . . . GT 1|1 116 | 5 159686516 . G A . . . GT 1|1 117 | 6 7368953 . G C . . . GT 1|1 118 | 6 49808692 . G C . . . GT 1|1 119 | 6 109752383 . C T . . . GT 1|1 120 | 6 137329736 . C T . . . GT 1|1 121 | 7 47867026 . G T . . . GT 1|1 122 | 7 69583198 . C A . . . GT 1|1 123 | 7 93518498 . G A . . . GT 1|1 124 | 7 149475905 . C A . . . GT 1|1 125 | 8 3076901 . G A . . . GT 1|1 126 | 8 11995670 . G T . . . GT 1|1 127 | 8 37704629 . C T . . . GT 1|1 128 | 8 71068967 . C A . . . GT 1|1 129 | 8 77764912 . G C . . . GT 1|1 130 | 8 77775920 . C A . . . GT 1|1 131 | 8 79510674 . A C . . . GT 1|1 132 | 8 88363968 . C A . . . GT 1|1 133 | 8 89179974 . G A . . . GT 1|1 134 | 8 105503431 . C A . . . GT 1|1 135 | 8 118819526 . G A . . . GT 1|1 136 | 8 118831993 . C A . . . GT 1|1 137 | 9 17409334 . A G . . . GT 1|1 138 | 9 21201831 . G T . . . GT 1|1 139 | 9 43627065 . C T . . . GT 1|1 140 | 9 79325065 . C A . . . GT 1|1 141 | 9 104171019 . A T . . . GT 1|1 142 | 9 105767273 . C A . . . GT 1|1 143 | X 47426121 . C G . . . GT 1|1 144 | X 48213490 . C A . . . GT 1|1 145 | X 54020144 . G T . . . GT 1|1 146 | X 55249114 . G T . . . GT 1|1 147 | X 64140039 . T A . . . GT 1|1 148 | X 71802303 . A T . . . GT 1|1 149 | X 75004601 . C G . . . GT 1|1 150 | X 76939716 . G A . . . GT 1|1 151 | X 78216983 . T A . . . GT 1|1 152 | X 100513426 . G T . . . GT 1|1 153 | X 118717207 . C T . . . GT 1|1 154 | X 130416606 . G T . . . GT 1|1 155 | X 134305608 . G A . . . GT 1|1 156 | X 148037166 . G T . . . GT 1|1 157 | X 38280273 38280274 CA CA . . . GT 1|1 158 | 12 29908693 29908693 C - . . . GT 1|1 159 | 19 53612719 53612719 - T . . . GT 1|1 160 | 2 229997 229997 - GAT . . . GT 1|1 161 | 6 7886214 7886216 GAA - . . . GT 1|1 162 | X 56592088 56592088 - T . . . GT 1|1 163 | -------------------------------------------------------------------------------- /doc/user-guide/download.md: -------------------------------------------------------------------------------- 1 | ## iCAGES main package 2 | 3 | Please join the iCAGES mailing list at google groups [here](https://groups.google.com/forum/?hl=en#!forum/icages) to receive announcements on software updates. 4 | 5 | The latest version of iCAGES (2017Jan08) can be downloaded [here](https://github.com/WGLab/icages/releases/tag/v1.0.2). 6 | 7 | iCAGES is written in Perl and can be run as a standalone application on diverse hardware systems where standard Perl modules are installed. 8 | 9 | ## Download 10 | 11 | ``` 12 | wget https://github.com/WGLab/icages/archive/refs/tags/v1.0.2.tar.gz 13 | ``` 14 | 15 | ## Installation 16 | 17 | - Unzip downloaded file 18 | 19 | ``` 20 | tar -zxvf icages-(version).tar.gz 21 | mv icages-(version) icages 22 | ``` 23 | 24 | - Download and unzip database files 25 | 26 | ``` 27 | cd icages/ 28 | wget http://www.openbioinformatics.org/annovar/download/icages/db.tar.gz 29 | tar -zxvf db.tar.gz 30 | ``` 31 | 32 | The `db.tar.gz` file contains resources for hg19 genome coordinate. 33 | 34 | You can also get the file from 35 | http://www.openbioinformatics.org/annovar/download/icages/hg18_db.tar.gz 36 | and http://www.openbioinformatics.org/annovar/download/icages/hg38_db.tar.gz depending on the genome build. (Note: due to shortage of storage space, these two files are taken offline; if you need them, please contact the PI to set up a temporary link to download). 37 | 38 | 39 | - Install necessary packages for perl. If you have root access, please use cpanm command to download JSON, HTTP::Request and LWP packages for perl 40 | 41 | ``` 42 | cpanm JSON 43 | cpanm HTTP::Request 44 | cpanm LWP 45 | ``` 46 | 47 | - Install the first dependency for iCAGES, ANNOVAR. Please visit [ANNOVAR](http://www.openbioinformatics.org/annovar/annovar_download.html) website and download it. If your current direcotry is icages, then please move annovar/ directory to ./bin diretory 48 | ``` 49 | mv path-to-annovar/annovar/ ./bin/ 50 | ``` 51 | 52 | - Install the second dependency for iCAGES, DGIdb. If your current directory is icages, then please create a directory under ./bin directory and name it DGIdb.Please visit [DGIdb](http://dgidb.genome.wustl.edu/) to read about it and download download the corresponding perl script from [here](wget https://raw.github.com/genome/dgi-db/master/files/perl_example.pl) to ./bin/DGIdb directory 53 | 54 | ``` 55 | mkdir ./bin/DGIdb 56 | wget https://raw.github.com/genome/dgi-db/master/files/perl_example.pl -O ./bin/DGIdb/getDrugList.pl 57 | ``` 58 | 59 | - Please make some modifications of this get_DrugList.pl file. First, add this following line after "parse_opts();" 60 | 61 | ``` 62 | my $output; 63 | open (OUT, ">$output") or die "iCAGES: cannot open file $output for writing the drugs recommended for cancer driver genes\n"; 64 | ``` 65 | 66 | - Then, add this following line after "'help' => \$help," 67 | 68 | ``` 69 | 'output:s' => \$output 70 | ``` 71 | 72 | - Next, comment out this following line 73 | 74 | ``` 75 | print "gene_name\tdrug_name\tinteraction_type\tsource\tgene_categories\n"; 76 | ``` 77 | into 78 | ``` 79 | # print "gene_name\tdrug_name\tinteraction_type\tsource\tgene_categories\n"; 80 | ``` 81 | 82 | - Next, change this following line 83 | 84 | ``` 85 | print "$gene_name\t$drug_name\t$interaction_type\t$source\t$gene_categories\n"; 86 | ``` 87 | into 88 | ``` 89 | print OUT "$gene_name\t$drug_name\t$interaction_type\t$source\t$gene_categories\n"; 90 | ``` 91 | 92 | - And change this following lines 93 | 94 | ``` 95 | print "\n" . 'Unmatched search term: ', $_->{searchTerm}, "\n"; 96 | print 'Possible suggestions: ', join(",", @{$_->{suggestions}}), "\n"; 97 | ``` 98 | into 99 | ``` 100 | print OUT "\n" . 'Unmatched search term: ', $_->{searchTerm}, "\n"; 101 | print OUT 'Possible suggestions: ', join(",", @{$_->{suggestions}}), "\n"; 102 | ``` 103 | 104 | - Install the third dependency for iCAGES, vcftools. Asuming you are already in icages-(version)/bin/ directory, download vcftools through sourceforge 105 | 106 | ``` 107 | wget http://iweb.dl.sourceforge.net/project/vcftools/vcftools_0.1.12b.tar.gz 108 | tar -zxvf vcftools_0.1.12b.tar.gz 109 | mv vcftools_0.1.12b/ vcftools/ 110 | rm vcftools_0.1.12b.tar.gz 111 | ``` 112 | 113 | - Install vcftools by typing the following command 114 | 115 | ``` 116 | cd vcftools 117 | make 118 | ``` 119 | 120 | - Install the fourth dependency for iCAGES, bedtools. Asuming you are already in icages-(version)/bin/ directory, 121 | 122 | ``` 123 | wget https://codeload.github.com/arq5x/bedtools2/tar.gz/v2.25.0 124 | tar -zxvf v2.25.0.tar.gz 125 | mv bedtools2-2.25.0 bedtools 126 | rm v2.25.0.tar.gz 127 | cd bedtools 128 | make 129 | ``` 130 | 131 | ## Additional databases 132 | 133 | Initial databases for iCAGES only includes hg19 reference genome for human. In order to annotate variants with hg18 or hg38 reference genomes, please download these additional databases compiled for these two versions of references. (Note: due to shortage of storage space, these two files are taken offline; if you need them, please contact the PI to set up a temporary link to download). 134 | 135 | - hg18 136 | 137 | ``` 138 | cd icages/db/ 139 | wget http://www.openbioinformatics.org/annovar/download/icages/hg18_db.tar.gz 140 | tar -zxvf hg18_db.tar.gz 141 | ``` 142 | 143 | - hg38 144 | 145 | ``` 146 | cd icages/db/ 147 | wget http://www.openbioinformatics.org/annovar/download/icages/hg38_db.tar.gz 148 | tar -zxvf hg38_db.tar.gz 149 | ``` 150 | 151 | 152 | 153 | --- 154 | 155 |
156 | 169 | 170 | 171 | -------------------------------------------------------------------------------- /genomeLocusFinder.pl: -------------------------------------------------------------------------------- 1 | #!/usr/bin/perl 2 | use strict; 3 | use warnings; 4 | use Fcntl qw/SEEK_END SEEK_SET/; 5 | use Data::Dumper; 6 | #use Time::HiRes qw/gettimeofday tv_interval/; 7 | 8 | my $BINSIZE = 10; #bin size for indexing 9 | my $ENDOFFILE = "END_OF_GENOMELOCUSFINDER_INDEX"; 10 | my $DEBUG = 0; 11 | my $suffix = "binsize${BINSIZE}gidx"; 12 | my $usage = "Usage: $0 13 | index index a sorted tab-delmited text file 14 | find retrieve a record for some loci\n"; 15 | die $usage unless @ARGV >= 1; 16 | my $subprogram = shift @ARGV; 17 | if ($subprogram eq 'index') { 18 | &index(@ARGV); 19 | } elsif ($subprogram eq 'find') { 20 | &find(@ARGV); 21 | } else { 22 | die $usage; 23 | } 24 | 25 | ##########WARNING################################################################## 26 | #typical use cases for this design 27 | #small vocabulary in 1st column 28 | #large database file 29 | #large query file 30 | #only a few annotations for each genomic locus 31 | #not many overlappic loci (e.g. no chr1:1-10, chr1:1-100, chr1:1-1000 etc) 32 | ####MIGHT NOT WORK WELL UNDER OTHER CIRCUMSTATNCES################################ 33 | ##################SUBROUTINES####################### 34 | sub index { 35 | #generate hierarchical index for a sorted tab-delimited txt file 36 | my $usage = "$0 index \n"; 37 | die $usage unless @_ == 2; 38 | warn "Start indexing...\n"; 39 | # my $t0 = [gettimeofday]; 40 | my $fai = shift; 41 | my $in = shift; 42 | my $out = "$in.$suffix"; 43 | #input format 44 | #1 10001 10001 T A 0.18521432 45 | #types of binary representation: Q: unsigned 64-bit integer, 8 bytes 46 | #index layout 47 | ############# 48 | #bin index: 8-byte unsigned integer marking start of each $BINSIZE bp bin in the file being indexed 49 | #key index: (number of keys)*8 bytes of unsigned integers marking start position of each bin index 50 | # section and length associated with each key (for chromosomes, this is equivalent to chr 51 | # length), the order of key indeces is same as the order of vocabulary 52 | #vocabulary: a list of keys separated by tabs 53 | #offset: 8 bytes unsiged integer, mark total bytes for vocabulary 54 | #######################An example index file################################ 55 | #----------------------------------------------------------------- 56 | #(bin index section) 57 | #(chr1 bins) 58 | #0x0000 0000 0000 0001 59 | #0x0000 0000 0000 00FA 60 | #... 61 | #(chr2 bins) 62 | #... 63 | #----------------------------------------------------------------- 64 | #(key index section) 65 | #(chr1 bin start in this index)0x00000001 ||(chr1 length)0xE9A6740 66 | #(chr2 bin start in this index)0x000000EB ||(chr2 length)0xE7EED8D 67 | #... 68 | #----------------------------------------------------------------- 69 | #(vocabulary section) 70 | #chr1\tchr2\t... 71 | #----------------------------------------------------------------- 72 | #(the offset)0x000010AE 73 | #----------------------------------------------------------------- 74 | my %key_length = &readFastaIndex($fai); 75 | my $vocabulary_string; 76 | my %key_start; 77 | my $offset; 78 | 79 | open IN,'<',$in or die "$in: $!"; 80 | open OUT,'>',$out or die "$out: $!"; 81 | binmode OUT; #index is in binary format for space efficiency 82 | 83 | my $current_db_position = 0; #current position in database file 84 | my $count_total = 0; 85 | my $previous_key; 86 | my ($previous_start, $previous_end); 87 | my $walker = 0; #measures how far we have walked on each chr, in unit of $BINSIZE 88 | #every time we pass $BINSIZE, record position 89 | #reset for each new chr 90 | while () { 91 | $count_total++; 92 | # warn "NOTICE: $count_total records processed in ".tv_interval($t0)." seconds\n" if $count_total % 1_000_000 == 0; 93 | #example database record 94 | #0 1 2 3 4 5 95 | #1 10001 10001 T A 0.18521432 96 | #assume database is sorted 97 | chomp; 98 | my @f = split /\t/; 99 | die "ERROR: expect at least 5 fields at line $. of $in\n" unless @f >= 5; 100 | die "ERROR: Database $in not sorted in ascending order at line $. of $in.\n" if defined $previous_start and defined $previous_key and 101 | $previous_key eq $f[0] and (($f[1] < $previous_start) or ($previous_start == $f[1] and $f[2] < $previous_end)); 102 | die "ERROR: start larger than end at line $. of $in\n" unless $f[2] >= $f[1]; 103 | die "ERROR: range out of predefined bounds of $f[0] at line $. of $in.\n" 104 | if $key_length{$f[0]} < $f[1] or $key_length{$f[0]} < $f[2]; 105 | 106 | if (!defined $previous_key or $previous_key ne $f[0]) { 107 | #we got a new key 108 | $key_start{$f[0]} = tell OUT; 109 | if (defined $previous_key) { 110 | #modify length according to how far we have walked 111 | $key_length{$previous_key} = $walker; 112 | } 113 | $walker = 0; 114 | } 115 | 116 | while ($f[1] >= $walker) { 117 | print $current_db_position,"\n" if $DEBUG; 118 | print OUT pack("Q",$current_db_position); 119 | $walker += $BINSIZE; 120 | } 121 | 122 | $current_db_position = tell IN; 123 | $previous_key = $f[0]; 124 | ($previous_start,$previous_end) = @f[1,2]; 125 | } 126 | close IN; 127 | #record last key's end position 128 | $key_length{$previous_key} = $walker; 129 | 130 | #print out key starts and lengths 131 | for my $key(sort keys %key_start) { 132 | print OUT pack("Q",$key_start{$key}); 133 | print OUT pack("Q",$key_length{$key}); 134 | print $key_start{$key},"\n" if $DEBUG; 135 | print $key_length{$key},"\n" if $DEBUG; 136 | } 137 | #print vocabulary 138 | $vocabulary_string = join("\t",sort keys %key_start); 139 | print OUT pack("A".length($vocabulary_string), $vocabulary_string); 140 | print $vocabulary_string,"\n" if $DEBUG; 141 | 142 | #print offset 143 | $offset = length($vocabulary_string); 144 | print OUT pack("Q", $offset); 145 | print $offset,"\n" if $DEBUG; 146 | 147 | #print EOF mark 148 | print OUT pack("A".length($ENDOFFILE), $ENDOFFILE); 149 | close OUT; 150 | 151 | warn "Indexing done. Bin size: $BINSIZE. Assume ASCII Encoding for characters.\n"; 152 | # warn "Processed $count_total records in ".tv_interval($t0)." seconds\n"; 153 | } 154 | sub find { 155 | #retrieve recording overlapping with queries 156 | my $usage = "$0 find \n"; 157 | die $usage unless @_ == 4; 158 | # my $t0 = [gettimeofday]; 159 | my $db = shift; 160 | my $index_file = "$db.$suffix"; 161 | my $query = shift; 162 | my $databaseName = shift; 163 | my $outputFile = shift; 164 | warn "Start querying $db...\n"; 165 | #how to find the record for a specific locus? 166 | #first read the offset at the last 4 bytes 167 | #then load the vocabulary into hash in memory (size is determined by offset) 168 | #then locate key index (each key has one start 4 bytes, one length 4 bytes, total 8 bytes). 169 | #store the key indeces in hash 170 | #locate bin index by key index and genomic coordinate 171 | #go to the file, read from bin start until no more records 172 | my %key_index = &loadKeyIndex($index_file); 173 | #example query 174 | #1 114438528 114438528 G A TCGA-06-0155-01B-01D-1492-08 175 | open DB,'<',$db or die "$db: $!"; 176 | open INDEX,'<',$index_file or die "$index_file: $!"; 177 | open QUERY,'<',$query or die "$query: $!"; 178 | open OUT, ">", $outputFile or die "$outputFile: $!"; 179 | 180 | my $count_match = 0; 181 | my $count_total = 0; 182 | my $count_total_db_lookup = 1; 183 | 184 | while () { 185 | $count_total++; 186 | # warn "NOTICE: $count_total records processed in ".tv_interval($t0)." seconds\n" if $count_total % 1_000_000 == 0; 187 | # print "query: $_" ; 188 | chomp; 189 | my @f = split /\t/; 190 | die "ERROR: expect at least 5 fields at line $. of $query\n" unless @f >= 5; 191 | warn "QUERY: @f\n" if $DEBUG; 192 | my ($chr, $start, $end, $ref, $alt) = @f[0..4]; 193 | # chr start 194 | if ($chr =~ /chr/) { 195 | $chr =~ /chr(.*)/; 196 | $chr = $1; 197 | } 198 | if ((not exists $key_index{$chr}) or $start > $key_index{$chr}->[1]) { 199 | warn "NOTICE: @f no match found\n" if $DEBUG; 200 | } else { 201 | seek INDEX, $key_index{$chr}->[0] + 8*(int $start/$BINSIZE), SEEK_SET; 202 | my $position_in_db; 203 | read INDEX, $position_in_db, 8; 204 | $position_in_db = unpack("Q", $position_in_db); 205 | seek DB, $position_in_db, SEEK_SET; 206 | while () { 207 | #search within database 208 | #until no more annotations can be possibly found 209 | $count_total_db_lookup++; 210 | # print; 211 | chomp; 212 | my @db_fields = split /\t/; 213 | warn "DB:@db_fields\n" if $DEBUG; 214 | 215 | 216 | if ($db_fields[0] eq $chr and $db_fields[1] == $start and $db_fields[2] == $end 217 | and $db_fields[3] eq $ref and $db_fields[4] eq $alt) { 218 | print OUT join("\t", $databaseName, $db_fields[5], @f[0..4]), "\n"; 219 | $count_match++; 220 | } 221 | if ($db_fields[0] ne $chr or $db_fields[1] > $start) { 222 | last; #if chromosomes are not equal, or the start coordinate is larger than query start 223 | #there is no need to continue searching 224 | } 225 | } 226 | } 227 | } 228 | close QUERY; 229 | close DB; 230 | close INDEX; 231 | warn "Querying done. Found $count_match matches in $count_total queries.\n"; 232 | # warn "Processed $count_total records in ".tv_interval($t0)." seconds\n"; 233 | warn "Average database lookup: ".($count_total_db_lookup/$count_total)."\n"; 234 | } 235 | sub loadKeyIndex { 236 | #load vocabulary and key indeces of the index file 237 | my $idx = shift; 238 | 239 | my $offset; 240 | my %key_index; #values are [start_position, length] 241 | my @vocabulary; 242 | my $buffer; 243 | open IN,'<',$idx or die "$!"; 244 | binmode IN; 245 | 246 | #check end 247 | seek IN, -length($ENDOFFILE), SEEK_END or die "$!"; 248 | read IN, $buffer, length($ENDOFFILE) or die "$!"; 249 | $buffer = unpack("A".length($ENDOFFILE), $buffer); 250 | die "Incomplete or corrupted index file, please re-index the database.\n" unless $buffer eq $ENDOFFILE; 251 | 252 | #offset 253 | seek IN, -8-length($ENDOFFILE), SEEK_END or die "$!"; 254 | read IN, $offset, 8 or die "$!"; 255 | $offset = unpack("Q",$offset); 256 | print "offset: $offset\n" if $DEBUG; 257 | 258 | #vocabulary 259 | ##reopen the filehandle because last time we read to the end 260 | #open IN,'<',$idx or die "$!"; 261 | #binmode IN; 262 | seek IN, -(8 + $offset + length($ENDOFFILE)), SEEK_END or die "$!"; 263 | read IN, $buffer, $offset; 264 | @vocabulary = split /\t/,(unpack ("A".$offset,$buffer)); 265 | print "@vocabulary" if $DEBUG; 266 | 267 | #key index 268 | seek IN, -(length($ENDOFFILE) + 8 + $offset + 16*@vocabulary), SEEK_END; 269 | for my $key(@vocabulary) { 270 | my ($start, $len); 271 | read IN, $start, 8; 272 | read IN, $len, 8; 273 | $start = unpack("Q", $start); 274 | $len = unpack("Q", $len); 275 | $key_index{$key} = [$start, $len]; 276 | } 277 | 278 | print Dumper(%key_index) if $DEBUG; 279 | close IN; 280 | return %key_index; 281 | } 282 | sub readFastaIndex { 283 | my $in = shift; 284 | #chr1 243000000 80 120 285 | my %vocabulary; 286 | open IN,'<',$in or die "$in: $!"; 287 | while () { 288 | my @f = split /\t/; 289 | $vocabulary{$f[0]} = $f[1]; 290 | } 291 | close IN; 292 | return %vocabulary; 293 | } 294 | -------------------------------------------------------------------------------- /bin/icagesJson.pl: -------------------------------------------------------------------------------- 1 | #!/usr/bin/perl 2 | use strict; 3 | use warnings; 4 | use List::Util qw(min max); 5 | use Pod::Usage; 6 | use Getopt::Long; 7 | use JSON; 8 | 9 | ###################################################################################################################################### 10 | ######################################################## variable declaration ######################################################## 11 | ###################################################################################################################################### 12 | 13 | my $rawInputFile = $ARGV[0]; 14 | my $icagesLocation = $ARGV[1]; 15 | my $prefix = $ARGV[2]; 16 | my ($icagesMutationsRef, $icagesGenesRef, $icagesDrugsRef, $logInformationRef); 17 | my $json; 18 | my $nowString; 19 | $nowString = localtime(); 20 | 21 | ###################################################################################################################################### 22 | ########################################################### main #################################################################### 23 | ###################################################################################################################################### 24 | 25 | ($icagesMutationsRef, $icagesGenesRef, $icagesDrugsRef, $logInformationRef) = &loadResult($rawInputFile, $prefix); 26 | $json = &createJson($icagesMutationsRef, $icagesGenesRef, $icagesDrugsRef, $logInformationRef); 27 | &printJson($rawInputFile, $json, $prefix); 28 | 29 | ###################################################################################################################################### 30 | ############################################################# subroutines ############################################################ 31 | ###################################################################################################################################### 32 | 33 | sub loadResult { 34 | print "NOTICE: start loading three output files from iCAGES\n"; 35 | my ($rawInputFile, $prefix, $icagesMutationsFile, $icagesGenesFile, $icagesDrugsFile); 36 | my ($icagesMutationsRef, $icagesGenesRef, $icagesDrugsRef); 37 | my (%icagesMutations, %icagesGenes, %icagesDrugs); 38 | my ($missenseCount, $noncodingCount, $structuralVariationCount); 39 | my ($geneCount, $driverCount, $cgcCount, $keggCount, $drugCount); 40 | my %logInformation; 41 | $rawInputFile = shift; 42 | $prefix = shift; 43 | $icagesMutationsFile = $rawInputFile . $prefix. ".annovar.icagesMutations.csv"; 44 | $icagesGenesFile = $rawInputFile . $prefix. ".annovar.icagesGenes.csv"; 45 | $icagesDrugsFile = $rawInputFile . $prefix. ".annovar.icagesDrugs.csv"; 46 | ($icagesMutationsRef, $missenseCount, $noncodingCount, $structuralVariationCount) = &loadMutations($icagesMutationsFile); 47 | ($icagesGenesRef, $geneCount, $driverCount, $cgcCount, $keggCount) = &loadGenes($icagesGenesFile); 48 | ($icagesDrugsRef, $drugCount) = &loadDrugs($icagesDrugsFile); 49 | %icagesMutations = %{$icagesMutationsRef}; 50 | %icagesGenes = %{$icagesGenesRef}; 51 | %icagesDrugs = %{$icagesDrugsRef}; 52 | %logInformation = ("Gene_count" => $geneCount, "Driver_count" => $driverCount, "CGC_count" => $cgcCount, "KEGG_count" => $keggCount, "Missense_count" => $missenseCount, "Noncoding_count" => $noncodingCount, "Structural_variation_count" => $structuralVariationCount, "Drug_count" => $drugCount); 53 | return (\%icagesMutations, \%icagesGenes, \%icagesDrugs, \%logInformation); 54 | } 55 | 56 | sub createJson { 57 | print "NOTICE: start creating JSON file\n"; 58 | my ($icagesMutationsRef, $icagesGenesRef, $icagesDrugsRef, $logInformationRef); 59 | my (%icagesMutations, %icagesGenes, %icagesDrugs); 60 | my %json; 61 | my $json; 62 | $icagesMutationsRef = shift; 63 | $icagesGenesRef = shift; 64 | $icagesDrugsRef = shift; 65 | $logInformationRef = shift; 66 | %icagesMutations = %{$icagesMutationsRef}; 67 | %icagesGenes = %{$icagesGenesRef}; 68 | %icagesDrugs = %{$icagesDrugsRef}; 69 | foreach my $gene (sort keys %icagesGenes){ 70 | $icagesGenes{$gene}{"Mutation"} = $icagesMutations{$gene}; 71 | $icagesGenes{$gene}{"Children"} = $icagesDrugs{$gene}; 72 | push @{$json{"Output"}}, $icagesGenes{$gene}; 73 | } 74 | $json{"Log"} = $logInformationRef; 75 | $json = encode_json \%json; 76 | return $json; 77 | } 78 | 79 | sub printJson { 80 | my ($rawInputFile, $json, $prefix, $icagesJsonFile); 81 | $rawInputFile = shift; 82 | $json = shift; 83 | $prefix = shift; 84 | $icagesJsonFile = $rawInputFile . $prefix . ".icages.json"; 85 | open(OUT, ">$icagesJsonFile") or die ; 86 | print OUT "$json\n"; 87 | close OUT; 88 | print "NOTICE: end runing iCAGES packge at $nowString\n"; 89 | } 90 | 91 | sub loadMutations { 92 | print "NOTICE: start processing icagesMutations.csv file to generate JSON file\n"; 93 | my ($icagesMutationsFile, $title); 94 | my ($missenseCount, $noncodingCount, $structuralVariationCount); 95 | my %icagesMutations; 96 | $icagesMutationsFile = shift; 97 | open(MUT, "$icagesMutationsFile") or die; 98 | $title = ; 99 | $missenseCount = 0; 100 | $noncodingCount = 0; 101 | $structuralVariationCount = 0; 102 | while(){ 103 | chomp; 104 | my @line = split(",", $_); 105 | my $geneName = $line[0]; 106 | my $chrmosomeNumber = $line[1]; 107 | my $start = $line[2]; 108 | my $end = $line[3]; 109 | my $reference = $line[4]; 110 | my $alternative = $line[5]; 111 | my $category = $line[6]; 112 | if($category eq "point coding"){ 113 | $missenseCount ++; 114 | }elsif($category eq "point noncoding"){ 115 | $noncodingCount ++; 116 | }elsif($category eq "structural variation"){ 117 | $structuralVariationCount ++; 118 | } 119 | my $mutationSyntax = $line[7]; 120 | my $proteinSyntax = $line[8]; 121 | my $scoreCategory = $line[9]; 122 | my $mutationScore = $line[10] eq "NA" ? "NA" : sprintf("%.2f", $line[10]); 123 | 124 | my %anonymousHash = ("Chromosome" => $chrmosomeNumber, "Start_position" => $start, "End_position" => $end, "Reference_allele" => $reference, "Alternative_allele" => $alternative, "Mutation_syntax" => $mutationSyntax, "Protein_syntax" => $proteinSyntax, "Mutation_category" => $category, "Score_category" => $scoreCategory, "Driver_mutation_score" => $mutationScore); 125 | push @{$icagesMutations{$geneName}}, \%anonymousHash; 126 | } 127 | close MUT; 128 | return (\%icagesMutations, $missenseCount, $noncodingCount, $structuralVariationCount); 129 | } 130 | 131 | sub loadGenes { 132 | print "NOTICE: start processing icagesGenes.csv file to generate JSON file\n"; 133 | my ($icagesGenesFile, $title); 134 | my ($geneCount, $driverCount, $cgcCount, $keggCount); 135 | $geneCount = 0; 136 | $driverCount = 0; 137 | $cgcCount = 0; 138 | $keggCount = 0; 139 | $icagesGenesFile = shift; 140 | my $keggRef; 141 | my %kegg; 142 | my %icagesGenes; 143 | my $DBLocation = $icagesLocation . "db/"; 144 | $keggRef = loadDatabase($DBLocation); 145 | %kegg = %{$keggRef}; 146 | open(GENE, "$icagesGenesFile") or die; 147 | $title = ; 148 | # limit the number of maximum genes of 20 149 | while(){ 150 | chomp; 151 | last if $geneCount == 49; 152 | $geneCount ++; 153 | my @line = split(",", $_); 154 | my $geneName = $line[0]; 155 | my $category = $line[1]; 156 | my $phenolyzer = $line[5] eq "NA" ? "NA" : sprintf("%.2f", $line[5]); 157 | my $icagesGeneScore = $line[6] eq "NA" ? "NA" : sprintf("%.2f", $line[6]); 158 | my $url; 159 | if ($category eq "Cancer Gene Census"){ 160 | $cgcCount ++; 161 | $url = "http://cancer.sanger.ac.uk/cosmic/gene/overview?ln=" . $geneName; 162 | }elsif($category eq "KEGG Cancer Pathway"){ 163 | $keggCount ++; 164 | $url = "http://www.genome.jp/dbget-bin/www_bget?hsa:" . $kegg{$geneName}; 165 | }else{ 166 | $url = "http://www.genecards.org/cgi-bin/carddisp.pl?gene=". $geneName. "&search=96a88d95c4a24ffc2ac1129a92af7b02"; 167 | } 168 | my %anonymousHash = ("Gene_url" => $url, "Name" => $geneName, "Category" => $category, "Phenolyzer_score" => $phenolyzer, "iCAGES_gene_score" => $icagesGeneScore); 169 | $icagesGenes{$geneName} = \%anonymousHash; 170 | } 171 | my @sortediCAGESGeneScore = sort {$icagesGenes{$b}{"iCAGES_gene_score"} <=> $icagesGenes{$a}{"iCAGES_gene_score"}} keys %icagesGenes; 172 | my $percentile = int(($#sortediCAGESGeneScore + 1) * 0.20); 173 | # my $criticalValue = $icagesGenes{$sortediCAGESGeneScore[$percentile-1]}{"iCAGES_gene_score"}; 174 | my $criticalValue = 0.11; 175 | foreach my $gene (sort keys %icagesGenes){ 176 | if($icagesGenes{$gene}{"iCAGES_gene_score"} > $criticalValue){ 177 | $icagesGenes{$gene}{"Driver"} = "TRUE"; 178 | $driverCount ++; 179 | }else{ 180 | $icagesGenes{$gene}{"Driver"} = "FALSE"; 181 | } 182 | } 183 | close GENE; 184 | return (\%icagesGenes, $geneCount, $driverCount, $cgcCount, $keggCount); 185 | } 186 | 187 | 188 | 189 | sub loadDrugs { 190 | print "NOTICE: start processing icagesGenes.csv file to generate JSON file\n"; 191 | my ($icagesDrugsFile, $title); 192 | my %icagesDrugs; 193 | my $drug_count; 194 | $drug_count = 0; 195 | $icagesDrugsFile = shift; 196 | open(DRUG, "$icagesDrugsFile") or die; 197 | $title = ; 198 | while(){ 199 | $drug_count ++; 200 | chomp; 201 | my @line = split(",", $_); 202 | my $drugName = $line[0]; 203 | my $finalTarget = $line[1]; 204 | my $directTarget = $line[2]; 205 | my $maxBioSystemsScore = $line[4] eq "NA" ? "NA" : sprintf("%.2f", $line[4]); 206 | my $maxActivityScore = $line[5] eq "NA" ? "NA" : sprintf("%.2f", $line[5]); 207 | my $icagesDrugScore = $line[6] eq "NA" ? "NA" : sprintf("%.2f", $line[6]); 208 | # add fda(7,8) and clinical trial(9,10,11,12) 209 | my $FDA_tag = ( $line[8] eq "NA" and $line[9] eq "NA" ) ? "FALSE" : "TRUE" ; 210 | my $CT_tag = ( $line[10] eq "NA" and $line[11] eq "NA" and $line[12] eq "NA" and $line[13] eq "NA" ) ? "FALSE" : "TRUE" ; 211 | my (%fda, %ct); 212 | my %anonymousHash; 213 | if($FDA_tag eq "FALSE" and $CT_tag eq "FALSE"){ 214 | %anonymousHash = ("Drug_name" => $drugName, "Final_target_gene" => $finalTarget, "Direct_target_gene" => $directTarget, "BioSystems_probability" => $maxBioSystemsScore, "PubChem_active_probability" => $maxActivityScore, "iCAGES_drug_score" => $icagesDrugScore, "Target_mutation_tag" => "FALSE" , "FDA_tag" => "FALSE", "CT_tag" => "FALSE"); 215 | }elsif($FDA_tag eq "TRUE" and $CT_tag eq "FALSE"){ 216 | %fda = ("Status" => $line[8], "Active_ingredient" => $line[9]); 217 | %anonymousHash = ("Drug_name" => $drugName, "Final_target_gene" => $finalTarget, "Direct_target_gene" => $directTarget, "BioSystems_probability" => $maxBioSystemsScore, "PubChem_active_probability" => $maxActivityScore, "iCAGES_drug_score" => $icagesDrugScore, "Target_mutation_tag" => "FALSE" , "FDA_tag" => "TRUE", "CT_tag" => "FALSE", "FDA_Info" => \%fda); 218 | }elsif($CT_tag eq "TRUE" and $FDA_tag eq "FALSE"){ 219 | %ct = ("Name" => $line[10], "Organization" => $line[11], "Phase" => $line[12] , "URL" => $line[13]); 220 | %anonymousHash = ("Drug_name" => $drugName, "Final_target_gene" => $finalTarget, "Direct_target_gene" => $directTarget, "BioSystems_probability" => $maxBioSystemsScore, "PubChem_active_probability" => $maxActivityScore, "iCAGES_drug_score" => $icagesDrugScore, "Target_mutation_tag" => "FALSE" , "FDA_tag" => "FALSE", "CT_tag" => "TRUE", "CT_Children"=> [\%ct]); 221 | }else{ 222 | %fda = ("Status" => $line[8], "Active_ingredient" => $line[9]); 223 | %ct= ("Name" => $line[10], "Organization" => $line[11], "Phase" => $line[12] , "URL" => $line[13]); 224 | %anonymousHash = ("Drug_name" => $drugName, "Final_target_gene" => $finalTarget, "Direct_target_gene" => $directTarget, "BioSystems_probability" => $maxBioSystemsScore, "PubChem_active_probability" => $maxActivityScore, "iCAGES_drug_score" => $icagesDrugScore, "Target_mutation_tag" => "FALSE" , "FDA_tag" => "TRUE", "CT_tag" => "TRUE", "FDA_Info" => \%fda, "CT_Children" => [\%ct]); 225 | } 226 | 227 | push @{$icagesDrugs{$finalTarget}}, \%anonymousHash; 228 | } 229 | close DRUG; 230 | return (\%icagesDrugs, $drug_count); 231 | } 232 | 233 | 234 | sub loadDatabase { 235 | print "NOTICE: start loading CGC, KEGG cancer pathway databases"; 236 | my ($DBLocation, $cgcFile, $keggFile); 237 | my (%cgc, %kegg); 238 | $DBLocation = shift; 239 | $keggFile = $DBLocation . "kegg.gene"; 240 | open(KEGG, "$keggFile") or die ; 241 | while(){ 242 | chomp; 243 | my @line = split; 244 | $kegg{$line[0]} = $line[1]; 245 | } 246 | close KEGG; 247 | return (\%kegg); 248 | } 249 | 250 | 251 | -------------------------------------------------------------------------------- /bin/icagesGene.pl: -------------------------------------------------------------------------------- 1 | #!/usr/bin/perl 2 | use strict; 3 | use warnings; 4 | use List::Util qw(min max); 5 | use Pod::Usage; 6 | use Getopt::Long; 7 | 8 | ###################################################################################################################################### 9 | ######################################################## variable declaration ######################################################## 10 | ###################################################################################################################################### 11 | 12 | my ($rawInputFile, $icagesLocation, $subtype, $prefix); 13 | my %phenolyzer; 14 | 15 | ###################################################################################################################################### 16 | ########################################################### main #################################################################### 17 | ###################################################################################################################################### 18 | 19 | $rawInputFile = $ARGV[0]; 20 | $icagesLocation = $ARGV[1]; 21 | $subtype = $ARGV[2]; 22 | $prefix = $ARGV[3]; 23 | %phenolyzer = &loadPhenolyzer($icagesLocation); 24 | &processMutation($rawInputFile, $icagesLocation, \%phenolyzer, $subtype, $prefix); 25 | 26 | 27 | ###################################################################################################################################### 28 | ############################################################# subroutines ############################################################ 29 | ###################################################################################################################################### 30 | 31 | sub loadPhenolyzer { 32 | print "NOTICE: start loading Phenolyzer\n"; 33 | my %phenolyzer; 34 | my ($icagesLocation, $DBLocation, $phenolyzerDB); 35 | $icagesLocation = shift; 36 | $DBLocation = $icagesLocation . "db/"; 37 | $phenolyzerDB = $DBLocation . "phenolyzer.score"; 38 | open(PHE, "$phenolyzerDB") or die "ERROR: cannot open $phenolyzerDB\n"; 39 | while(){ 40 | chomp; 41 | my @line = split("\t", $_); 42 | $phenolyzer{$line[0]} = $line[1]; 43 | } 44 | return %phenolyzer; 45 | } 46 | 47 | 48 | sub processMutation{ 49 | print "NOTICE: start process mutation files from iCAGES layer one\n"; 50 | my ($rawInputFile, $icagesLocation, $DBLocation, $icagesMutations, $icagesGenes, $ref, $subtype); 51 | my (%phenolyzer, %icagesGenes, %icagesPrint); 52 | my (%cgc, %kegg); 53 | my ($cgcRef, $keggRef); 54 | $rawInputFile = shift; 55 | $icagesLocation = shift; 56 | $ref = shift; 57 | $subtype = shift; 58 | $prefix = shift; 59 | %phenolyzer = %{$ref}; 60 | $DBLocation = $icagesLocation . "db/"; 61 | ($cgcRef, $keggRef) = loadDatabase($DBLocation); 62 | %cgc = %{$cgcRef}; 63 | %kegg = %{$keggRef}; 64 | $icagesMutations = $rawInputFile . $prefix . ".annovar.icagesMutations.csv"; 65 | $icagesGenes = $rawInputFile . $prefix . ".annovar.icagesGenes.csv"; 66 | open(MUTATIONS, "$icagesMutations") or die "ERROR: cannot open $icagesMutations\n"; 67 | my $header = ; 68 | open(GENES, ">$icagesGenes") or die "ERROR: cannot open $icagesGenes\n"; 69 | while(){ 70 | chomp; 71 | my @line = split(",", $_); 72 | next unless defined $line[0]; 73 | if(defined $icagesGenes{$line[0]}{$line[9]}){ 74 | if($line[10] eq "NA" or $icagesGenes{$line[0]}{$line[9]} eq "NA"){ 75 | next; 76 | }else{ 77 | $icagesGenes{$line[0]}{$line[9]} = max($line[10], $icagesGenes{$line[0]}{$line[9]}); 78 | }; 79 | }else{ 80 | $icagesGenes{$line[0]}{$line[9]} = $line[10]; 81 | }; 82 | }; 83 | 84 | ####### count genes 85 | my $geneCount = 0; 86 | my $cgcCount = 0; 87 | my $keggCount = 0; 88 | 89 | 90 | foreach my $gene (sort keys %icagesGenes){ 91 | $geneCount ++; 92 | my ($radialSVM, $funseq, $cnv, $phenolyzer, $icagesGene, $category); 93 | if (exists $icagesGenes{$gene}{"radial SVM"} and $icagesGenes{$gene}{"radial SVM"} ne "NA"){ 94 | $radialSVM = $icagesGenes{$gene}{"radial SVM"}; 95 | }else{ 96 | $radialSVM = 0; 97 | }; 98 | if (exists $icagesGenes{$gene}{"FunSeq2"} and $icagesGenes{$gene}{"FunSeq2"} ne "NA"){ 99 | $funseq = $icagesGenes{$gene}{"FunSeq2"} ; 100 | }else{ 101 | $funseq = 0; 102 | }; 103 | if (exists $icagesGenes{$gene}{"CNV normalized signal"} and $icagesGenes{$gene}{"CNV normalized signal"} ne "NA"){ 104 | $cnv = $icagesGenes{$gene}{"CNV normalized signal"} ; 105 | }else{ 106 | $cnv = 0; 107 | }; 108 | 109 | next if $cnv==0 and $funseq==0 and $radialSVM==0; 110 | 111 | if (exists $phenolyzer{$gene}){ 112 | $phenolyzer = $phenolyzer{$gene}; 113 | }else{ 114 | $phenolyzer = 0; 115 | }; 116 | if (exists $cgc{$gene}){ 117 | $cgcCount ++; 118 | $category = "Cancer Gene Census"; 119 | }elsif(exists $kegg{$gene}){ 120 | $keggCount ++; 121 | $category = "KEGG Cancer Pathway"; 122 | }else{ 123 | $category = "Other Category"; 124 | } 125 | if($subtype eq "ACC"){ 126 | $icagesGene = 1.36771 -0.42382 * $radialSVM -0.93636 * $cnv -0.15453 * $funseq -2.43656 * $phenolyzer; 127 | }elsif($subtype eq "BLCA"){ 128 | $icagesGene = 1.058757 -0.038520 * $radialSVM -0.398104 * $cnv -0.003739 * $funseq -0.909022 * $phenolyzer; 129 | }elsif($subtype eq "BRCA"){ 130 | $icagesGene = 1.093442 -0.075614 * $radialSVM -0.479842* $cnv -0.010534 * $funseq -1.022521 * $phenolyzer; 131 | }elsif($subtype eq "CESC"){ 132 | $icagesGene = 1.16180 -0.14885 * $radialSVM -1.32652 * $cnv -0.05598 * $funseq -1.32015 * $phenolyzer; 133 | }elsif($subtype eq "CHOL"){ 134 | $icagesGene = -0.14641 + 0.18845 * $radialSVM + 0.50381 * $cnv +0.13216 * $funseq + 0.61047 * $phenolyzer; 135 | }elsif($subtype eq "COADREAD") { 136 | $icagesGene = -0.062618 + 0.041399 * $radialSVM + 1.315360 * $cnv + 0.008455 * $funseq + 0.990089 * $phenolyzer; 137 | }elsif($subtype eq "COAD"){ 138 | $icagesGene = -0.091516 + 0.069962 * $radialSVM + 1.101200 * $cnv + 0.030804 * $funseq + 1.044726 * $phenolyzer; 139 | }elsif($subtype eq "DLBC") { 140 | $icagesGene = -0.12036 + 0.13884 * $radialSVM + 0.53773 * $cnv + 0.01512 * $funseq + 0.85130 * $phenolyzer; 141 | }elsif($subtype eq "ESCA"){ 142 | $icagesGene = -0.13015 + 0.11817 * $radialSVM + 0.82129 * $cnv + 0.01265 * $funseq + 1.07817 * $phenolyzer; 143 | }elsif($subtype eq "GBM"){ 144 | $icagesGene = -0.12775 + 0.12841 * $radialSVM + 0.45996 * $cnv + 0.02754 * $funseq + 0.92774 * $phenolyzer; 145 | }elsif($subtype eq "GBMLGG"){ 146 | $icagesGene = -0.07477 + 0.05322 * $radialSVM + 1.18877 * $cnv -0.01261 * $funseq + 1.08833 * $phenolyzer; 147 | } elsif($subtype eq "HNSC"){ 148 | $icagesGene = 1.058015 -0.037751 * $radialSVM -0.581648 * $cnv -0.004731 * $funseq -0.946285 * $phenolyzer; 149 | }elsif($subtype eq "KICH"){ 150 | $icagesGene = -0.12298 + 0.16503 * $radialSVM + 0.29233 * $cnv + 0.10256 * $funseq + 0.46457 * $phenolyzer; 151 | }elsif($subtype eq "KIPAN") { 152 | $icagesGene = 1.093222 -0.073644 * $radialSVM -0.345129 * $cnv - 0.028881 * $funseq -1.146782 * $phenolyzer; 153 | }elsif($subtype eq "KIRC"){ 154 | $icagesGene = 1.13778 -0.12890 * $radialSVM -0.54436 * $cnv -0.01729 * $funseq -1.44148 * $phenolyzer; 155 | }elsif($subtype eq "KIRP"){ 156 | $icagesGene = -0.07488 + 0.06861 * $radialSVM + 1.23478 * $cnv +0.01712 * $funseq + 0.96863 * $phenolyzer; 157 | }elsif($subtype eq "LAML"){ 158 | $icagesGene = -0.05995 + 0.15925 * $radialSVM + 0.12114 * $cnv + 0.02738 * $funseq + 0.02367 * $phenolyzer; 159 | }elsif($subtype eq "LGG"){ 160 | $icagesGene = -0.11611 + 0.10186 * $radialSVM + 0.95189 * $cnv + 0.04495 * $funseq + 1.11112 * $phenolyzer; 161 | }elsif($subtype eq "LIHC"){ 162 | $icagesGene = 1.113955 -0.100192 * $radialSVM -0.773660 * $cnv -0.047867 * $funseq -1.076089 * $phenolyzer; 163 | }elsif($subtype eq "LUAD") { 164 | $icagesGene = -0.096116 + 0.074096 * $radialSVM + 1.462135 * $cnv + 0.032892 * $funseq + 1.035469 * $phenolyzer; 165 | }elsif($subtype eq "LUSC"){ 166 | $icagesGene = 1.13182 -0.11578 * $radialSVM -0.53788 * $cnv -0.03912 * $funseq -1.15891 * $phenolyzer; 167 | }elsif($subtype eq "OV"){ 168 | $icagesGene = 1.18324 -0.17481 * $radialSVM -0.72561 * $cnv -0.06436 * $funseq -1.42091 * $phenolyzer; 169 | }elsif($subtype eq "PAAD"){ 170 | $icagesGene = 1.16658 -0.15211 * $radialSVM -0.87784 * $cnv -0.08511 * $funseq -1.33967 * $phenolyzer; 171 | }elsif($subtype eq "PCPG"){ 172 | $icagesGene = -0.12564 + 0.19600 * $radialSVM + 0.08097 * $cnv + 0.06315 * $funseq + 0.28951 * $phenolyzer; 173 | }elsif($subtype eq "PRAD"){ 174 | $icagesGene = 1.124113 -0.105090 * $radialSVM -0.299424 * $cnv -0.001517 * $funseq -1.375397 * $phenolyzer; 175 | }elsif($subtype eq "READ") { 176 | $icagesGene = -0.12973 + 0.12316 * $radialSVM + 1.52412 * $cnv + 0.03016 * $funseq + 0.98996 * $phenolyzer; 177 | }elsif($subtype eq "SARC"){ 178 | $icagesGene = -0.077275 + 0.081598 * $radialSVM + 0.562010 * $cnv -0.006907 * $funseq + 0.772565 * $phenolyzer; 179 | }elsif($subtype eq "SKCM"){ 180 | $icagesGene = 1.076621 -0.051706 * $radialSVM -0.283882 * $cnv -0.015553 * $funseq -1.104187 * $phenolyzer; 181 | }elsif($subtype eq "STAD"){ 182 | $icagesGene = 1.028806 -0.010729 * $radialSVM -0.753154 * $cnv + 0.007523 * $funseq -0.807513 * $phenolyzer; 183 | }elsif($subtype eq "STES") { 184 | $icagesGene = 1.0288425 -0.0127624 * $radialSVM -0.5998427 * $cnv + 0.0004369 * $funseq -0.7377184 * $phenolyzer; 185 | }elsif($subtype eq "TGCT"){ 186 | $icagesGene = -0.12173 + 0.14025 * $radialSVM -0.01375 * $cnv + 0.01167 * $funseq + 0.73381 * $phenolyzer; 187 | }elsif($subtype eq "THCA"){ 188 | $icagesGene = -0.106440 + 0.125886 * $radialSVM + 0.753607 * $cnv + 0.008624 * $funseq + 0.705268 * $phenolyzer; 189 | }elsif($subtype eq "THYM"){ 190 | $icagesGene = -0.07786 + 0.15530 * $radialSVM + 0.25656 * $cnv + 0.04185 * $funseq + 0.15939 * $phenolyzer; 191 | }elsif($subtype eq "UCEC"){ 192 | $icagesGene = -0.064307 + 0.038792 * $radialSVM + 0.992520 * $cnv + 0.006978 * $funseq + 1.058414 * $phenolyzer; 193 | }elsif($subtype eq "UCS"){ 194 | $icagesGene = -0.132534 + 0.153048 * $radialSVM + 0.500817 * $cnv -0.004412 * $funseq + 0.766499 * $phenolyzer; 195 | }else{ 196 | $icagesGene = -8.0124 + 5.5577 * $radialSVM + 267.3371 * $cnv + 0.3741 * $funseq + 10.0949 * $phenolyzer; 197 | # $icagesGene = 0.455825 + 0.053429 * $radialSVM + 0.267669 * $cnv + 0.054989 * $funseq -0.117 * $phenolyzer; 198 | } 199 | $icagesGene = 1/(1+exp(-$icagesGene)); 200 | $icagesPrint{$gene}{"score"} = $icagesGene; 201 | # add driver information requested by user 202 | my $driver = "No"; 203 | if ($icagesGene >= 0.11) { 204 | $driver = "Yes"; 205 | } 206 | $icagesPrint{$gene}{"content"} = "$gene,$category,$radialSVM,$funseq,$cnv,$phenolyzer,$icagesGene,$driver"; 207 | }; 208 | print GENES "geneName,category,radialSVM,funseq,cnv,phenolyzer,icagesGeneScore,driver\n"; 209 | foreach my $gene (sort {$icagesPrint{$b}{"score"} <=> $icagesPrint{$a}{"score"}} keys %icagesPrint){ 210 | print GENES "$icagesPrint{$gene}{\"content\"}\n"; 211 | } 212 | 213 | my $logFile = $rawInputFile . $prefix . ".annovar.icages.log"; 214 | open(LOG, ">>$logFile") or die "iCAGES: cannot open file $logFile\n"; 215 | 216 | print LOG "########### iCAGES Gene Summary ###########\n"; 217 | print LOG "## basic information\n"; 218 | print LOG "Total: $geneCount\n"; 219 | print LOG "Cancer Gene Census Gene: $cgcCount\n"; 220 | print LOG "KEGG Pathway Gene: $keggCount\n\n"; 221 | 222 | } 223 | 224 | 225 | sub loadDatabase { 226 | print "NOTICE: start loading CGC, KEGG cancer pathway databases\n"; 227 | my ($DBLocation, $cgcFile, $keggFile); 228 | my (%cgc, %kegg); 229 | $DBLocation = shift; 230 | $cgcFile = $DBLocation . "cgc.gene"; 231 | $keggFile = $DBLocation . "kegg.gene"; 232 | open(CGC, "$cgcFile") or die ; 233 | open(KEGG, "$keggFile") or die ; 234 | while(){ 235 | chomp; 236 | my @line = split; 237 | $cgc{$line[0]} = 1; 238 | } 239 | while(){ 240 | chomp; 241 | my @line = split; 242 | $kegg{$line[0]} = 1; 243 | } 244 | close CGC; 245 | close KEGG; 246 | return (\%cgc, \%kegg); 247 | } 248 | 249 | 250 | 251 | -------------------------------------------------------------------------------- /doc/user-guide/example.md: -------------------------------------------------------------------------------- 1 | ## ANNOVAR input with new prefix and new direcotry for output 2 | 3 | To change prefix, please use option `-p` or `--prefix` and to change directory where your output will be generated, please use option `--outputdir`. Note that if you have other forms of input, such as VCF format and BED format, the syntax is the same. 4 | 5 | ``` 6 | [cocodong@biocluster ~/]$ head input.txt 7 | 1 12919840 12919840 T C 8 | 1 35332717 35332717 C A 9 | 1 55148456 55148456 G T 10 | 1 70504789 70504789 C T 11 | 1 167059520 167059520 A T 12 | 1 182496864 182496864 A T 13 | 1 197073351 197073351 C T 14 | 1 216373211 216373211 G T 15 | 10 37490170 37490170 G A 16 | 10 56089432 56089432 A C 17 | [cocodong@biocluster ~/]$ icages.pl input.txt -p newname --outputdir newoutputdir 18 | 19 | ``` 20 | 21 | ## ANNOVAR input annotated with hg38 22 | 23 | To change database version, please use option `--buildver`. Note that if you have other forms of input, such as VCF format and BED format, the syntax is the same. 24 | 25 | ``` 26 | [cocodong@biocluster ~/]$ head input.txt 27 | 1 12919840 12919840 T C 28 | 1 35332717 35332717 C A 29 | 1 55148456 55148456 G T 30 | 1 70504789 70504789 C T 31 | 1 167059520 167059520 A T 32 | 1 182496864 182496864 A T 33 | 1 197073351 197073351 C T 34 | 1 216373211 216373211 G T 35 | 10 37490170 37490170 G A 36 | 10 56089432 56089432 A C 37 | [cocodong@biocluster ~/]$ icages.pl input.txt --buildver hg38 38 | 39 | ``` 40 | 41 | ## VCF input with one sample which contains both germline mutations and mutations in his/her tumor 42 | 43 | If you do not have somatic mutations for one sample in VCF file, but what you have is a VCF file that contains both germline mutations and mutations in cancer for this sample, then you can specify the headers for germline mutations using options `-t` or `--tumor` and specify the headers for tumor mutations using options `-g` or `--germline`. iCAGES will be able to extract somatic mutations from this VCF file and carry on downstream analysis for you. In this example, the input file is a VCF file that contains tumor mutations with header "tumor" and germline mutations with header "germline", all annotated with reference genome version of hg19. 44 | ``` 45 | [cocodong@biocluster ~/]$ cat input.vcf 46 | ##fileformat=VCFv4.1 47 | ##FORMAT= 48 | ##contig= 49 | ##contig= 50 | ##contig= 51 | ##contig= 52 | ##contig= 53 | ##contig= 54 | ##contig= 55 | ##contig= 56 | ##contig= 57 | ##contig= 58 | ##contig= 59 | ##contig= 60 | ##contig= 61 | ##contig= 62 | ##contig= 63 | ##contig= 64 | ##contig= 65 | ##contig= 66 | ##contig= 67 | ##contig= 68 | ##contig= 69 | ##contig= 70 | ##contig= 71 | ##contig= 72 | ##contig= 73 | #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT tumor germline 74 | 1 12919840 . T C . . . GT 1|1 0|0 75 | 1 35332717 . C A . . . GT 1|1 0|0 76 | 1 55148456 . G T . . . GT 1|1 0|0 77 | 1 70504789 . C T . . . GT 1|1 0|0 78 | 1 167059520 . A T . . . GT 1|1 0|0 79 | 1 182496864 . A T . . . GT 1|1 0|0 80 | 1 197073351 . C T . . . GT 1|1 0|0 81 | 1 216373211 . G T . . . GT 1|1 0|0 82 | 10 37490170 . G A . . . GT 1|1 0|0 83 | 10 56089432 . A C . . . GT 1|1 0|0 84 | ... 85 | [cocodong@biocluster ~/]$ icages.pl input.vcf -t tumor -g germline 86 | ``` 87 | 88 | ## VCF input with multiple samples which contains both germline mutations and tumor mutations 89 | 90 | iCAGES is a personalized cancer driver analysis pipeline, so it only does analysis for ONE single patient. But if what you have is a VCF file that contains both germline mutations and tumor mutations for multiple individuals, then you can specify the headers for germline mutations for the patient of your interest using options `-t` or `--tumor` and specify the headers for tumor mutations for the patient of your interest using options `-g` or `--germline`. iCAGES will be able to extract somatic mutations for this particular individual from this VCF file and carry on downstream analysis for you. In this example, the input file is a VCF file that contains mutations from two individuals Sapmle1 and Sample2, each of them have both tumor mutations and germline mutations with slightly different headers, all annotated with reference genome version of hg19. By specifying headers for Sample1, iCAGES analyzes this sample for you. 91 | ``` 92 | [cocodong@biocluster ~/]$ cat input.vcf 93 | ##fileformat=VCFv4.1 94 | ##FORMAT= 95 | ##contig= 96 | ##contig= 97 | ##contig= 98 | ##contig= 99 | ##contig= 100 | ##contig= 101 | ##contig= 102 | ##contig= 103 | ##contig= 104 | ##contig= 105 | ##contig= 106 | ##contig= 107 | ##contig= 108 | ##contig= 109 | ##contig= 110 | ##contig= 111 | ##contig= 112 | ##contig= 113 | ##contig= 114 | ##contig= 115 | ##contig= 116 | ##contig= 117 | ##contig= 118 | ##contig= 119 | ##contig= 120 | #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample1Tumor Sample1Germline Sample2Tumor Sample2Germline 121 | 1 12919840 . T C . . . GT 1|1 0|0 1|1 0|0 122 | 1 35332717 . C A . . . GT 1|1 0|0 1|1 0|0 123 | 1 55148456 . G T . . . GT 1|1 0|0 1|1 0|0 124 | 1 70504789 . C T . . . GT 1|1 0|0 1|1 0|0 125 | 1 167059520 . A T . . . GT 1|1 0|0 1|1 0|0 126 | 1 182496864 . A T . . . GT 1|1 0|0 1|1 0|0 127 | 1 197073351 . C T . . . GT 1|1 0|0 1|1 0|0 128 | 1 216373211 . G T . . . GT 1|1 0|0 1|1 0|0 129 | 10 37490170 . G A . . . GT 1|1 0|0 1|1 0|0 130 | 10 56089432 . A C . . . GT 1|1 0|0 1|1 0|0 131 | ... 132 | [cocodong@biocluster ~/]$ icages.pl input.vcf -t Sample1Tumor -g Sample1Germline 133 | ``` 134 | 135 | ## VCF input with multiple samples which contains only somatic mutations 136 | 137 | Again, iCAGES is a personalized cancer driver analysis pipeline, so it only does analysis for ONE single patient. But if what you have is a VCF file that contains somatic mutations for multiple individuals, then you can specify the header for the patient of your interest using options `-i` or `--id`. iCAGES will be able to extract somatic mutations for this particular individual from this VCF file and carry on downstream analysis for you. In this example, the input file is a VCF file that contains somatic mutations from two individuals Sapmle1 and Sample2, all annotated with reference genome version of hg19. By specifying header for Sample1, iCAGES analyzes this sample for you. 138 | ``` 139 | [cocodong@biocluster ~/]$ cat input.vcf 140 | ##fileformat=VCFv4.1 141 | ##FORMAT= 142 | ##contig= 143 | ##contig= 144 | ##contig= 145 | ##contig= 146 | ##contig= 147 | ##contig= 148 | ##contig= 149 | ##contig= 150 | ##contig= 151 | ##contig= 152 | ##contig= 153 | ##contig= 154 | ##contig= 155 | ##contig= 156 | ##contig= 157 | ##contig= 158 | ##contig= 159 | ##contig= 160 | ##contig= 161 | ##contig= 162 | ##contig= 163 | ##contig= 164 | ##contig= 165 | ##contig= 166 | ##contig= 167 | #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample1 Sample2 168 | 1 12919840 . T C . . . GT 1|1 0|0 169 | 1 35332717 . C A . . . GT 1|1 1|1 170 | 1 55148456 . G T . . . GT 1|1 0|0 171 | 1 70504789 . C T . . . GT 1|1 1|1 172 | 1 167059520 . A T . . . GT 1|1 0|0 173 | 1 182496864 . A T . . . GT 0|0 0|0 174 | 1 197073351 . C T . . . GT 1|1 1|1 175 | 1 216373211 . G T . . . GT 1|1 0|0 176 | 10 37490170 . G A . . . GT 1|1 0|0 177 | 10 56089432 . A C . . . GT 1|1 0|0 178 | ... 179 | [cocodong@biocluster ~/]$ icages.pl input.vcf -i Sample1 180 | ``` 181 | 182 | ## VCF input with multiple samples and BED files with additional structural variations 183 | 184 | VCF has immaure development of annotation on structural variations. In order to better annotate personal cancer mutation profiles, we made iCAGES to support additional BED file input, which profiles structural variations, using options `-b` or `--bed`. iCAGES will be able to combine information from VCF files and BED files to do downstream data analysis for you. In this example, the input files are a VCF file that contains somatic mutations from two individuals Sapmle1 and Sample2, all annotated with reference genome version of hg19 and a BED file that contains coordinates of structural varations. This exemplary BED file is also provided in the package. 185 | ``` 186 | [cocodong@biocluster ~/]$ cat input.vcf 187 | ##fileformat=VCFv4.1 188 | ##FORMAT= 189 | ##contig= 190 | ##contig= 191 | ##contig= 192 | ##contig= 193 | ##contig= 194 | ##contig= 195 | ##contig= 196 | ##contig= 197 | ##contig= 198 | ##contig= 199 | ##contig= 200 | ##contig= 201 | ##contig= 202 | ##contig= 203 | ##contig= 204 | ##contig= 205 | ##contig= 206 | ##contig= 207 | ##contig= 208 | ##contig= 209 | ##contig= 210 | ##contig= 211 | ##contig= 212 | ##contig= 213 | ##contig= 214 | #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample1 Sample2 215 | 1 12919840 . T C . . . GT 1|1 0|0 216 | 1 35332717 . C A . . . GT 1|1 1|1 217 | 1 55148456 . G T . . . GT 1|1 0|0 218 | 1 70504789 . C T . . . GT 1|1 1|1 219 | 1 167059520 . A T . . . GT 1|1 0|0 220 | 1 182496864 . A T . . . GT 0|0 0|0 221 | 1 197073351 . C T . . . GT 1|1 1|1 222 | 1 216373211 . G T . . . GT 1|1 0|0 223 | 10 37490170 . G A . . . GT 1|1 0|0 224 | 10 56089432 . A C . . . GT 1|1 0|0 225 | ... 226 | [cocodong@biocluster ~/]$ cat input.bed 227 | chr10 89677000 89690000 228 | chr8 38336000 38353000 229 | [cocodong@biocluster ~/]$ icages.pl input.vcf -i Sample1 -b input.bed 230 | ``` 231 | 232 | 233 | 234 | 235 | --- 236 | 237 |
238 | 251 | 252 | 253 | 254 | -------------------------------------------------------------------------------- /icages.pl: -------------------------------------------------------------------------------- 1 | #!/usr/bin/perl 2 | use strict; 3 | use warnings; 4 | use Pod::Usage; 5 | use Getopt::Long; 6 | 7 | ###################################################################################################################################### 8 | ######################################################## variable declaration ######################################################## 9 | ###################################################################################################################################### 10 | my ($icagesMutation, $icagesGene, $icagesDrug, $icagesJson); 11 | my ($inputFile, $inputDir, $icagesLocation, $tumor, $germline, $id, $subtype , $logDir, $outputDir, $tempDir, $prefix, $bed, $hg19, $expression); 12 | 13 | ###################################################################################################################################### 14 | ############################################################# main ################################################################## 15 | ###################################################################################################################################### 16 | ($inputDir, $icagesLocation, $tumor, $germline, $id, $subtype , $logDir, $outputDir, $tempDir, $prefix, $bed, $hg19, $expression) = &processArguments(); 17 | &checkReady($icagesLocation); 18 | # use yunfei's new index function 19 | $icagesMutation = $icagesLocation. "bin/icagesMutationNew.pl"; 20 | $icagesGene = $icagesLocation . "bin/icagesGene.pl"; 21 | $icagesDrug = $icagesLocation . "bin/icagesDrug.pl"; 22 | $icagesJson = $icagesLocation . "bin/icagesJson.pl"; 23 | $inputFile = $ARGV[0]; 24 | &genLogFile($inputFile, $inputDir, $tumor, $germline, $id, $subtype , $logDir, $outputDir, $tempDir, $prefix, $bed, $hg19, $expression); 25 | !system("perl $icagesMutation $inputFile $inputDir $icagesLocation $tumor $germline $id $prefix $bed $hg19 $expression") or die "ERROR: cannot call icagesMutation module\n"; 26 | !system("perl $icagesGene $inputDir $icagesLocation $subtype $prefix ") or die "ERROR: cannot call icagesGene module\n"; 27 | !system("perl $icagesDrug $inputDir $icagesLocation $prefix") or die "ERROR: cannot call icagesDrug module\n"; 28 | !system("perl $icagesJson $inputDir $icagesLocation $prefix") or die "ERROR: cannot call icagesJson module\n"; 29 | &moveFiles($inputDir, $prefix, $logDir, $outputDir, $tempDir); 30 | 31 | ###################################################################################################################################### 32 | ########################################################## subroutines ############################################################### 33 | ###################################################################################################################################### 34 | 35 | sub genLogFile{ 36 | my ($inputFile, $inputDir, $tumor, $germline, $id, $subtype , $logDir, $outputDir, $tempDir, $prefix, $bed, $hg19, $expression); 37 | $inputFile = shift; 38 | $inputDir = shift; 39 | $tumor = shift; 40 | $germline = shift; 41 | $id = shift; 42 | $subtype = shift; 43 | $logDir = shift; 44 | $outputDir = shift; 45 | $tempDir = shift; 46 | $prefix = shift; 47 | $bed = shift; 48 | $hg19 = shift; 49 | $expression = shift; 50 | 51 | my $annovarInputFile = $inputDir . "/" . $prefix . ".annovar"; 52 | my $logFile = $annovarInputFile . ".icages.log"; 53 | open(LOG, ">$logFile") or die "ERROR: cannot open log file\n"; 54 | print LOG "########### iCAGES Parameter Summary ###########\n"; 55 | print LOG "## basic information\n"; 56 | print LOG "Input file:\t$inputFile\n"; 57 | print LOG "Input directory:\t$inputDir\n"; 58 | print LOG "Temp directory:\t$tempDir\n"; 59 | print LOG "Output directory:\t$outputDir\n\n"; 60 | 61 | 62 | print LOG "## sample information\n"; 63 | print LOG "Prefix:\t$prefix\n"; 64 | print LOG "Subtype:\t$subtype\n\n"; 65 | 66 | 67 | print LOG "## advanced information\n"; 68 | print LOG "Tumor sample id (if any):\t$tumor\n"; 69 | print LOG "Germline sample id (if any):\t$germline\n"; 70 | print LOG "Sample id:\t$id\n"; 71 | print LOG "BED file (if any):\t$bed\n"; 72 | print LOG "HG version:\t$hg19\n"; 73 | print LOG "Expression file:\t$expression\n\n"; 74 | close LOG; 75 | } 76 | 77 | 78 | sub moveFiles{ 79 | my ($inputDir, $prefix, $logDir, $outputDir, $tempDir); 80 | my ( $outputFile, $tempFile, $logFile, $jsonFile); 81 | $inputDir = shift; 82 | $prefix = shift; 83 | $logDir = shift; 84 | $outputDir = shift; 85 | $tempDir = shift; 86 | if($inputDir ne $outputDir){ 87 | $outputFile = "$inputDir/$prefix*icages*.csv"; 88 | $jsonFile = "$inputDir/$prefix*.json"; 89 | !system("mv $outputFile $outputDir") or die "ERROR: cannot move iCAGES file\n"; 90 | !system("mv $jsonFile $outputDir") or die "ERROR: cannot move iCAGES file\n"; 91 | } 92 | if($inputDir ne $logDir){ 93 | $logFile = "$inputDir/$prefix*.log"; 94 | !system("mv $logFile $logDir") or die "ERROR: cannot move iCAGES file\n"; 95 | } 96 | if($inputDir ne $tempDir){ 97 | $tempFile = "$inputDir/$prefix.*"; 98 | !system("mv $tempFile $tempDir") or die "ERROR: cannot move iCAGES file\n"; 99 | } 100 | } 101 | 102 | sub checkReady() { 103 | my $icagesLocation = shift; 104 | my $dbLocation = $icagesLocation . "db"; 105 | if(-d $dbLocation){ 106 | return 1; 107 | }else{ 108 | die "ERROR: please download iCAGES database first https://github.com/WangGenomicsLab/icages \n"; 109 | } 110 | } 111 | 112 | sub processArguments { 113 | my ($help, $manual, $tumor, $germline, $id, $subtype, $logDir, $outputDir, $tempDir, $prefix, $inputDir, $inputLocation, $icagesLocation, $bed, $expression, $hg); 114 | ################### initialize arguments ################## 115 | GetOptions( 'help|h' => \$help, 116 | 'manual|man|m' => \$manual, 117 | 'tumor|t=s' => \$tumor, # name for tumor in the vcf file 118 | 'germline|g=s' => \$germline , # name for germline in the vcf file 119 | 'id|i=s' => \$id, # sample identifier for the person of interest for multiple sample vcf file 120 | 'subtype|s=s' => \$subtype, # cancer subtype 121 | 'logdir=s' => \$logDir, # log directory 122 | 'outputdir=s' => \$outputDir, 123 | 'tempdir=s' => \$tempDir, 124 | 'prefix|p=s' => \$prefix, 125 | 'bed|b=s' => \$bed, # bed file describing structural variations 126 | 'expression|e=s' => \$expression, 127 | 'buildver=s' => \$hg 128 | )or pod2usage (); 129 | ################### locations ######################## 130 | if($hg and $hg ne "hg19" and $hg ne "hg38" and $hg ne "hg18"){ 131 | pod2usage (); 132 | } 133 | @ARGV == 1 or pod2usage (); # check only has one argument 134 | $inputLocation = $ARGV[0]; 135 | $inputDir = $inputLocation; 136 | if($inputDir =~ /\//){ 137 | $inputDir =~ /(.*\/)(.*?)$/; 138 | $inputDir = $1; 139 | }else{ 140 | $inputDir = "./" ; 141 | } 142 | $icagesLocation = "$0"; 143 | $icagesLocation =~ /(.*)icages\.pl/; 144 | $icagesLocation = $1; 145 | $icagesLocation = "./" if $icagesLocation eq ""; 146 | ###### all directories should end up with / ### 147 | if(!$prefix){ 148 | $prefix = $ARGV[0]; 149 | if($prefix =~ /\//){ 150 | $prefix =~ /(.*\/)(.*?)$/; 151 | $prefix = $2; 152 | } 153 | } 154 | if(!$tumor){ 155 | $tumor = "NA"; 156 | } 157 | if(!$germline){ 158 | $germline = "NA"; 159 | } 160 | if(!$id){ 161 | $id = "NA"; 162 | } 163 | if(!$subtype){ 164 | $subtype = "NA"; 165 | } 166 | if(!$bed){ 167 | $bed = "NA"; 168 | } 169 | if(!$expression){ 170 | $expression = "NA"; 171 | } 172 | if(!$hg){ 173 | $hg = "hg19"; 174 | } 175 | if(!$logDir){ 176 | if(!$outputDir){ 177 | $logDir = $inputDir; 178 | }else{ 179 | $logDir = $outputDir; 180 | } 181 | } 182 | if(!$outputDir){ 183 | $outputDir = $inputDir; 184 | } 185 | if(!$tempDir){ 186 | if(!$outputDir){ 187 | $tempDir = $inputDir; 188 | }else{ 189 | $tempDir = $outputDir; 190 | } 191 | } 192 | if(-d $logDir){ 193 | if(!($logDir =~ /\/$/)){ 194 | $logDir = $logDir . "/"; 195 | } 196 | }else{ 197 | if(!($logDir =~ /\/$/)){ 198 | $logDir = $logDir . "/"; 199 | } 200 | mkdir($logDir) or die "ERROR: no such directory for log files\n"; 201 | } 202 | if(-d $outputDir){ 203 | if(!($outputDir =~ /\/$/)){ 204 | $outputDir = $outputDir ."/"; 205 | } 206 | }else{ 207 | if(!($outputDir =~ /\/$/)){ 208 | $outputDir = $outputDir ."/"; 209 | } 210 | mkdir($outputDir) or die "ERROR: no such directory for output files\n"; 211 | } 212 | if(-d $tempDir){ 213 | if(!($tempDir =~ /\/$/)){ 214 | $tempDir = $tempDir ."/"; 215 | } 216 | }else{ 217 | if(!($tempDir =~ /\/$/)){ 218 | $tempDir = $tempDir ."/"; 219 | } 220 | mkdir($tempDir) or die "ERROR: no such directory for temp files\n"; 221 | } 222 | 223 | ######################## arguments ######################## 224 | $help and pod2usage (-verbose=>1, -exitval=>1, -output=>\*STDOUT); 225 | $manual and pod2usage (-verbose=>2, -exitval=>1, -output=>\*STDOUT); 226 | return ($inputDir, $icagesLocation, $tumor, $germline, $id, $subtype , $logDir, $outputDir, $tempDir, $prefix, $bed, $hg, $expression); 227 | } 228 | 229 | ###################################################################################################################################### 230 | ############################################################ manual page ############################################################# 231 | ###################################################################################################################################### 232 | 233 | =head1 NAME 234 | 235 | iCAGES (integrated CAncer GEnome Score) command line package for web interface. 236 | 237 | =head1 SYNOPSIS 238 | 239 | icages.pl [options] 240 | 241 | Options: 242 | -h, --help print help message 243 | -m, --manual print manual message 244 | -t, --tumor name of column that contains tumor mutations in your vcf file (if you have multiple samples with tumor mutations, please use this option to select tumor mutations that you want to analyze) 245 | -g, --germline name of column that contains germline mutations in your vcf file (if you have multiple samples with germline mutations, please use this option to select germline mutations that you want to compare your tumor mutations against to generate somatic mutation profiles for the sample you want to analyze) 246 | -i, --id name of column that contains somatic mutations in your multiple sample vcf file with only somatic mutations (if you have multiple samples with tumor and germline mutations, please use -g and -t options instead) 247 | -s, --subtype subtype of the cancer, valid options include "ACC", "BLCA", "BRCA", "CESC", "CHOL", "ESCA", "GBM", "HNSC", "KICH", "KIRC", "KIRP", "LAML", "LGG", "LIHC", "LUSC", "OV", "PAAD", "PCPG", "PRAD", "SARC", "SKCM", "STAD", "TGCT", "TGCA", "THYM", "UCEC", "UCS", "UVM" 248 | --logdir directory for log files generated by iCAGES 249 | --tempdir directory for temporary files generated by iCAGES 250 | --outputdir directory for output files generated by iCAGES 251 | -p, --prefix prefix of all files generated by iCAGES 252 | -b, --bed additional bed file specifying the location of structural variations in the sample 253 | --buildver reference genome version, valid options include "hg19" (default), "hg38" and "hg18" 254 | -e, --expression bed file describing gene expression patterns, the columns are chromosome, start, end, log fold changes 255 | 256 | Function: iCAGES predicts cancer driver genes given somatic mutations (in ANNOVAR/VCF format) from a patient. 257 | 258 | Example: icages.pl /path/to/input.vcf 259 | 260 | Installation: before using iCAGES, please first install it by 'perl icagesInitiate.pl' command. 261 | 262 | Version: 1.0 263 | 264 | Last update: Wed Feb 25 12:51:17 PST 2015 265 | 266 | =head1 OPTIONS 267 | 268 | =over 8 269 | 270 | =item B<--help> 271 | 272 | print a brief usage message and detailed explanation of options. 273 | 274 | =item B<--manual> 275 | 276 | print the manual page and exit. 277 | 278 | =back 279 | 280 | =cut 281 | 282 | -------------------------------------------------------------------------------- /bin/icagesDrug.pl: -------------------------------------------------------------------------------- 1 | #!/usr/bin/perl 2 | use strict; 3 | use warnings; 4 | use List::Util qw(min max); 5 | use Pod::Usage; 6 | use Getopt::Long; 7 | 8 | ###################################################################################################################################### 9 | ######################################################## variable declaration ######################################################## 10 | ###################################################################################################################################### 11 | 12 | my ($rawInputFile, $icagesLocation, $prefix); 13 | my (%biosystem, %neighbors, %activity, %onc, %sup, %icagesGenes); 14 | # fda and clinical 15 | # fda target 16 | my (%fda, %clin, %fda_target, $fda_target_ref, $fdaRef, $clinRef); 17 | my ($biosystemRef, $activityRef, $oncRef, $supRef, $icagesGenesRef, $neighborsRef, $supDrugRef, $oncDrugRef, $otherDrugRef); 18 | 19 | 20 | ###################################################################################################################################### 21 | ########################################################### main #################################################################### 22 | ###################################################################################################################################### 23 | 24 | $rawInputFile = $ARGV[0]; 25 | $icagesLocation = $ARGV[1]; 26 | $prefix = $ARGV[2]; 27 | # add DGIdb database genes to reduce number of genes to query DGIdb 28 | ($biosystemRef, $activityRef, $oncRef, $supRef, $fdaRef, $clinRef, $fda_target_ref, $supDrugRef, $oncDrugRef, $otherDrugRef) = &loadDatabase($icagesLocation); 29 | 30 | %biosystem = %{$biosystemRef}; 31 | %activity = %{$activityRef}; 32 | %onc = %{$oncRef}; 33 | %sup = %{$supRef}; 34 | %fda = %{$fdaRef}; 35 | %clin = %{$clinRef}; 36 | %fda_target = %{$fda_target_ref}; 37 | # %dgiGenes = %{$dgiGenesRef}; 38 | $icagesGenesRef = &getiCAGES($rawInputFile, $prefix); 39 | %icagesGenes = %{$icagesGenesRef}; 40 | 41 | $neighborsRef = &getNeighbors(\%icagesGenes, \%biosystem, \%onc, \%sup); 42 | %neighbors = %{$neighborsRef}; 43 | &getDrugs ($rawInputFile, $icagesLocation, \%neighbors, \%onc, \%sup, $prefix, \%fda_target, $supDrugRef, $oncDrugRef, $otherDrugRef); 44 | &processDrugs($rawInputFile, \%neighbors, \%activity, $prefix , \%fda, \%clin); # add fda and clin 45 | 46 | ###################################################################################################################################### 47 | ############################################################# subroutines ############################################################ 48 | ###################################################################################################################################### 49 | 50 | sub loadDatabase { 51 | print "NOTICE: start loading Databases\n"; 52 | my (%biosystem, %activity, %onc, %sup, %fda, %clin, %fda_target_ref, %supDrugDB, %oncDrugDB, %otherDrugDB); 53 | my ($icagesLocation, $DBLocation, $biosystemDB, $activityDB, $oncDB, $supDB, $fdaDB, $clinicalDB, $supDrugDB, $oncDrugDB, $otherDrugDB); 54 | $icagesLocation = shift; 55 | $DBLocation = $icagesLocation . "db/"; 56 | $biosystemDB = $DBLocation . "biosystem.score"; 57 | $activityDB = $DBLocation . "drug.score"; 58 | $oncDB = $DBLocation . "oncogene.gene"; 59 | $supDB = $DBLocation . "suppressor.gene"; 60 | # fda and clinical trial, note that only one clinical trial would be provided, for more information, please visit myclinicaltrial.com 61 | $fdaDB = $DBLocation . "FDA_cancer.txt"; 62 | $clinicalDB = $DBLocation . "ClinicalTrial.txt"; 63 | # $dgiGenesDB = $DBLocation . "DGIdb.genes"; 64 | $supDrugDB = $DBLocation . "suppressor.drug"; 65 | $oncDrugDB = $DBLocation . "oncogene.drug"; 66 | $otherDrugDB = $DBLocation . "othergene.drug"; 67 | open(BIO, "$biosystemDB") or die "ERROR: cannot open $biosystemDB\n"; 68 | open(ACT, "$activityDB") or die "ERROR: cannot open $activityDB\n"; 69 | open(ONC, "$oncDB") or die "ERROR: cannot open $oncDB\n"; 70 | open(SUP, "$supDB") or die "ERROR: cannot open $supDB\n"; 71 | # fda and clinical trial 72 | open(FDA, "$fdaDB") or die "ERROR: cannot open $fdaDB\n"; 73 | open(CLIN, "$clinicalDB") or die "ERROR: cannot open $clinicalDB\n"; 74 | # open(DGI, "$dgiGenesDB") or die "ERROR: cannot open $dgiGenesDB\n"; 75 | open(SUPDRUG, "$supDrugDB") or die "ERROR: cannot open $supDrugDB\n"; 76 | open(ONCDRUG, "$oncDrugDB") or die "ERROR: cannot open $oncDrugDB\n"; 77 | open(OTHERDRUG, "$otherDrugDB") or die "ERROR: cannot open $otherDrugDB\n"; 78 | 79 | while () { 80 | chomp; 81 | my $line = $_; 82 | my @line = split("\t", $_); 83 | next unless defined $line[0] and defined $line[1]; 84 | $supDrugDB{$line[0]}{$line[1]} = $line; 85 | # print "$line[0]\t$line[1]\t$line\n"; 86 | } 87 | 88 | while () { 89 | chomp; 90 | my $line = $_; 91 | my @line = split("\t", $_); 92 | next unless defined $line[0] and defined $line[1]; 93 | $oncDrugDB{$line[0]}{$line[1]} = $line; 94 | # print "$line[0]\t$line[1]\t$line\n"; 95 | } 96 | 97 | while () { 98 | chomp; 99 | my $line = $_; 100 | my @line = split("\t", $_); 101 | next unless defined $line[0] and defined $line[1]; 102 | $otherDrugDB{$line[0]}{$line[1]} = $line; 103 | # print "$line[0]\t$line[1]\t$line\n"; 104 | } 105 | 106 | # while(){ 107 | # chomp; 108 | # $dgiGenes{$_} = 1; 109 | #} 110 | 111 | while(){ 112 | chomp; 113 | my @line = split("\t", $_); 114 | $biosystem{$line[0]}{$line[1]} = $line[2]; 115 | } 116 | 117 | while(){ 118 | chomp; 119 | my @line = split("\t", $_); 120 | $activity{$line[0]} = $line[1]; 121 | } 122 | 123 | while(){ 124 | chomp; 125 | my @line = split("\t", $_); 126 | $onc{$line[0]} = 1; 127 | } 128 | 129 | while(){ 130 | chomp; 131 | my @line = split("\t", $_); 132 | $sup{$line[0]} = 1; 133 | } 134 | 135 | while(){ 136 | chomp; 137 | my @line = split("\t", $_); 138 | # drugname: subtype,tradename 139 | $line[0] = "NA" unless defined $line[0]; 140 | $line[1] = "NA" unless defined $line[1]; 141 | $line[3] = "NA" unless defined $line[3]; 142 | my $content = $line[1] . "," . $line[3]; 143 | $fda{$line[0]} = $content; 144 | my @target = split(";", $line[2]); 145 | for(0..$#target){ 146 | $fda_target{$target[$_]}{$line[0]} = 1; 147 | } 148 | } 149 | 150 | while(){ 151 | chomp; 152 | my @line = split("\t", $_); 153 | # drugname: trialname,organization,phase,url 154 | $line[0] = "NA" unless defined $line[0]; 155 | $line[1] = "NA"unless defined $line[1]; 156 | $line[2] = "NA"unless defined $line[2]; 157 | $line[3] = "NA" unless defined $line[3]; 158 | $line[4] = "NA" unless defined $line[4]; 159 | my $content = $line[1] . "," . $line[2] . "," . $line[3] . "," . $line[4]; 160 | $clin{$line[0]} = $content; 161 | } 162 | 163 | close FDA; 164 | close CLIN; 165 | close SUP; 166 | close ONC; 167 | close ACT; 168 | close BIO; 169 | close SUPDRUG; 170 | close ONCDRUG; 171 | close OTHERDRUG; 172 | return (\%biosystem, \%activity, \%onc, \%sup, \%fda, \%clin, \%fda_target, \%supDrugDB, \%oncDrugDB, \%otherDrugDB); 173 | } 174 | 175 | sub getiCAGES{ 176 | print "NOTICE: start process gene files from iCAGES layer two\n"; 177 | my ($rawInputFile, $icagesGenes, $prefix); 178 | my %icagesGenes; 179 | $rawInputFile = shift; 180 | $prefix = shift; 181 | $icagesGenes = $rawInputFile . $prefix . ".annovar.icagesGenes.csv"; 182 | open(GENES, "$icagesGenes") or die "ERROR: cannot open $icagesGenes\n"; 183 | my $header = ; 184 | while(){ 185 | chomp; 186 | my @line = split(",", $_); 187 | $icagesGenes{$line[0]} = $line[5]; 188 | } 189 | return \%icagesGenes; 190 | } 191 | 192 | sub getNeighbors{ 193 | print "NOTICE: start getting top five neighbors for mutated genes\n"; 194 | my (%icagesGenes, %biosystem, %neighbors, %onc, %sup); 195 | my ($icagesGenesRef, $biosystemRef, $oncRef, $supRef); 196 | my $index; 197 | $icagesGenesRef = shift; 198 | $biosystemRef = shift; 199 | $oncRef = shift; 200 | $supRef = shift; 201 | # $dgiGenesRef = shift; 202 | %icagesGenes = %{$icagesGenesRef}; 203 | %biosystem = %{$biosystemRef}; 204 | %onc = %{$oncRef}; 205 | %sup = %{$supRef}; 206 | 207 | foreach my $gene (sort keys %icagesGenes){ 208 | $index = 0; 209 | $neighbors{$gene}{$gene}{"biosystem"} = 1; 210 | $neighbors{$gene}{$gene}{"icages"} = $icagesGenes{$gene}; 211 | $neighbors{$gene}{$gene}{"product"} = $icagesGenes{$gene}; 212 | foreach my $neighbor (sort { $biosystem{$b} <=> $biosystem{$a} } keys %{$biosystem{$gene}}){ 213 | # if(exists $onc{$gene} or exists $sup{$gene}){ 214 | # last if $index == 10; 215 | # }else{ 216 | last if $index == 5; 217 | # } 218 | $index ++; 219 | $neighbors{$neighbor}{$gene}{"biosystem"} = $biosystem{$gene}{$neighbor}; 220 | $neighbors{$neighbor}{$gene}{"icages"} = $icagesGenes{$gene}; 221 | $neighbors{$neighbor}{$gene}{"product"} = $icagesGenes{$gene} * $biosystem{$gene}{$neighbor}; 222 | } 223 | } 224 | return \%neighbors; 225 | } 226 | 227 | sub getDrugs{ 228 | print "NOTICE: start getting drugs for seed genes\n"; 229 | my (%neighbors, %onc, %sup); 230 | my ($neighborsRef, $oncRef, $supRef); 231 | my (@seeds, @onc, @sup, @other); 232 | my ($onc, $sup, $other); 233 | $onc=""; 234 | $sup=""; 235 | $other=""; 236 | # fda 237 | my $fda_target_ref; 238 | my %fda_target; 239 | 240 | my ($rawInputFile, $supFile, $oncFile, $otherFile, $icagesLocation, $callDgidb, $prefix); 241 | $rawInputFile = shift; 242 | $icagesLocation = shift; 243 | $neighborsRef = shift; 244 | $oncRef = shift; 245 | $supRef = shift; 246 | $prefix = shift; 247 | $fda_target_ref = shift; 248 | # three kinds of drugs 249 | my $supDrugRef = shift; 250 | my $oncDrugRef = shift; 251 | my $otherDrugRef = shift; 252 | my %supDrug; 253 | my %oncDrug; 254 | my %otherDrug; 255 | %supDrug = %{$supDrugRef}; 256 | %oncDrug = %{$oncDrugRef}; 257 | %otherDrug = %{$otherDrugRef}; 258 | 259 | %neighbors = %{$neighborsRef}; 260 | %onc = %{$oncRef}; 261 | %sup = %{$supRef}; 262 | %fda_target = %{$fda_target_ref}; 263 | @seeds = keys %neighbors; 264 | $callDgidb = $icagesLocation . "bin/DGIdb/getDrugList.pl"; 265 | $supFile = $rawInputFile . $prefix . ".suppressor.drug"; 266 | $oncFile = $rawInputFile . $prefix.".oncogene.drug"; 267 | $otherFile = $rawInputFile . $prefix. ".other.drug"; 268 | 269 | # find FDA drugs for genes in case DGIdb missed it 270 | my %all_genes; 271 | for(0..$#seeds){ 272 | # print "$seeds[$_]\n"; 273 | if(exists $sup{$seeds[$_]} and $seeds[$_] =~ /[a-zA-Z0-9]+/){ 274 | # if(exists $dgiGenes{$seeds[$_]}){ 275 | push @sup, $seeds[$_]; 276 | # } 277 | $all_genes{$seeds[$_]} = 1; 278 | }elsif(exists $onc{$seeds[$_]} and $seeds[$_] =~ /[a-zA-Z0-9]+/){ 279 | # if(exists $dgiGenes{$seeds[$_]}){ 280 | push @onc, $seeds[$_]; 281 | # } 282 | $all_genes{$seeds[$_]} = 1; 283 | }else{ 284 | if($seeds[$_] =~ /[a-zA-Z0-9]+/){ 285 | # if(exists $dgiGenes{$seeds[$_]}){ 286 | push @other, $seeds[$_]; 287 | # } 288 | $all_genes{$seeds[$_]} = 1; 289 | } 290 | } 291 | } 292 | $sup = join(",", @sup); 293 | $onc = join(",", @onc); 294 | $other = join(",", @other); 295 | 296 | open(SUP, ">$supFile") or die; 297 | open(ONC, ">$oncFile") or die; 298 | open(OTHER, ">$otherFile") or die; 299 | 300 | 301 | if($sup ne "" ){ 302 | # print "iCAGES: $callDgidb --genes='$sup' --interaction_type='activator,other/unknown,n/a,inducer,positive allosteric modulator,potentiator,stimulator' --source_trust_levels='Expert curated' --output='$supFile'"; 303 | for(0..$#sup) { 304 | # print "$sup[$_]\n"; 305 | if (exists $supDrug{$sup[$_]}) { 306 | foreach my $key (sort keys %{$supDrug{$sup[$_]}}) { 307 | print SUP "$supDrug{$sup[$_]}{$key}\n"; 308 | } 309 | } 310 | } 311 | 312 | # !system("$callDgidb --genes='$sup' --interaction_type='activator,other/unknown,n/a,inducer,positive allosteric modulator,potentiator,stimulator' --source_trust_levels='Expert curated' --output='$supFile'") or warn "ERROR: cannot gt drugs\n$callDgidb --genes='$sup' --interaction_type='activator,other/unknown,n/a,inducer,positive allosteric modulator,potentiator,stimulator' --source_trust_levels='Expert curated' --output='$supFile'"; 313 | } 314 | if($onc ne ""){ 315 | # print "iCAGES: $callDgidb --genes='$onc' --interaction_type='agonist,antisense,competitive,immunotherapy,inhibitory allosteric modulator,inverse agonist,negative modulator,partial agonist,partial antagonist,vaccine,inhibitor,suppressor,antibody,antagonist,blocker,other/unknown,n/a' --source_trust_levels='Expert curated' --output='$oncFile'"; 316 | #!system("$callDgidb --genes='$onc' --interaction_type='agonist,antisense,competitive,immunotherapy,inhibitory allosteric modulator,inverse agonist,negative modulator,partial agonist,partial antagonist,vaccine,inhibitor,suppressor,antibody,antagonist,blocker,other/unknown,n/a' --source_trust_levels='Expert curated' --output='$oncFile'") or warn "ERROR: cannot get drugs\n$callDgidb --genes='$onc' --interaction_type='agonist,antisense,competitive,immunotherapy,inhibitory allosteric modulator,inverse agonist,negative modulator,partial agonist,partial antagonist,vaccine,inhibitor,suppressor,antibody,antagonist,blocker,other/unknown,n/a' --source_trust_levels='Expert curated' --output='$oncFile'\n"; 317 | 318 | for(0..$#onc) { 319 | # print "$onc[$_]\n"; 320 | if (exists $oncDrug{$onc[$_]}) { 321 | foreach my $key (sort keys %{$oncDrug{$onc[$_]}}) { 322 | print ONC "$oncDrug{$onc[$_]}{$key}\n"; 323 | } 324 | } 325 | } 326 | } 327 | if($other ne ""){ 328 | # print "$callDgidb --genes='$other' --source_trust_levels='Expert curated' --output='$otherFile'"; 329 | 330 | #!system("$callDgidb --genes='$other' --source_trust_levels='Expert curated' --output='$otherFile'") or warn "ERROR: cannot get drugs\n$callDgidb --genes='$other' --source_trust_levels='Expert curated' --output='$otherFile'\n"; 331 | 332 | for(0..$#other) { 333 | # print "$other[$_]\n"; 334 | if (exists $otherDrug{$other[$_]}) { 335 | foreach my $key (sort keys %{$otherDrug{$other[$_]}}) { 336 | print OTHER "$otherDrug{$other[$_]}{$key}\n"; 337 | } 338 | } 339 | } 340 | 341 | } 342 | 343 | # create an output file 344 | my $fdaDrug = $rawInputFile . $prefix. ".fda.drug"; 345 | open(FDADRUG, ">$fdaDrug") or die "ERROR: cannot create $fdaDrug for outputing FDA drugs\n"; 346 | foreach my $gene (sort keys %all_genes){ 347 | if(exists $fda_target{$gene}){ 348 | foreach my $drug (sort keys %{$fda_target{$gene}}){ 349 | print FDADRUG "$gene\t$drug\n"; 350 | } 351 | } 352 | } 353 | } 354 | 355 | 356 | sub processDrugs{ 357 | print "NOTICE: start processing drugs from DGIdb\n"; 358 | my ($rawInputFile, $matchFile, $allDrugs, $icagesDrugs, $prefix); 359 | my (%neighbors, %activity, %icagesDrug, %icagesPrint); 360 | my ($neighborsRef, $activityRef); 361 | my ($oncDrugFile, $supDrugFile, $otherDrugFile); 362 | # fda and clin 363 | my ($fdaRef, $clinRef); 364 | my (%fda, %clin); 365 | # fda drug list 366 | my $FDADrugFile; 367 | 368 | $rawInputFile = shift; 369 | $neighborsRef = shift; 370 | $activityRef = shift; 371 | $prefix = shift; 372 | $fdaRef = shift; 373 | $clinRef = shift; 374 | %fda = %{$fdaRef}; 375 | %clin = %{$clinRef}; 376 | %neighbors = %{$neighborsRef}; 377 | %activity = %{$activityRef}; 378 | $matchFile = $rawInputFile . $prefix . ".*.drug"; 379 | $oncDrugFile = $rawInputFile . $prefix . ".oncogene.drug"; 380 | $supDrugFile = $rawInputFile . $prefix . ".suppressor.drug"; 381 | $otherDrugFile = $rawInputFile . $prefix. ".other.drug"; 382 | $FDADrugFile = $rawInputFile . $prefix. ".fda.drug"; 383 | $allDrugs = $rawInputFile . $prefix . ".drug.all"; 384 | $icagesDrugs = $rawInputFile . $prefix . ".annovar.icagesDrugs.csv"; 385 | if(! -e $oncDrugFile){ 386 | !system("touch $oncDrugFile") or die "ERROR: cannot create $oncDrugFile\n"; 387 | } 388 | if(! -e $supDrugFile){ 389 | !system("touch $supDrugFile") or die "ERROR: cannot create $supDrugFile\n"; 390 | } 391 | if(! -e $otherDrugFile){ 392 | !system("touch $otherDrugFile") or die "ERROR: cannot create $otherDrugFile\n"; 393 | } 394 | if(! -e $FDADrugFile){ 395 | !system("touch $FDADrugFile") or die "ERROR: cannot create $FDADrugFile\n"; 396 | } 397 | !system("cat $oncDrugFile $supDrugFile $otherDrugFile $FDADrugFile > $allDrugs") or die "ERROR: cannot create an empty drug file\n"; 398 | open(DRUG, "$allDrugs") or die "ERROR: cannot open drug file $allDrugs\n"; 399 | open(OUT, ">$icagesDrugs") or die "ERROR: cannot open $icagesDrugs\n"; 400 | while(){ 401 | chomp; 402 | my @line = split("\t", $_); 403 | next unless defined $line[1]; 404 | my $neighbor = $line[0]; 405 | my $index = 0; 406 | foreach my $target (sort { $neighbors{$neighbor}{$b}{"product"} <=> $neighbors{$neighbor}{$a}{"product"} } keys %{$neighbors{$neighbor}}){ 407 | last if $index == 1; 408 | if(exists $icagesDrug{$line[1]}{$neighbor}){ 409 | if($neighbors{$neighbor}{$target}{"product"} > $icagesDrug{$line[1]}{$neighbor}{$target}{"biosystem"} * $icagesDrug{$line[1]}{$neighbor}{$target}{"icages"}){ 410 | $icagesDrug{$line[1]}{$neighbor}{$target}{"biosystem"} = $neighbors{$neighbor}{$target}{"biosystem"}; 411 | $icagesDrug{$line[1]}{$neighbor}{$target}{"icages"} = $neighbors{$neighbor}{$target}{"icages"} ; 412 | if(exists $activity{$line[1]}){ 413 | $icagesDrug{$line[1]}{$neighbor}{$target}{"activity"} = $activity{$line[1]}; 414 | }else{ 415 | $icagesDrug{$line[1]}{$neighbor}{$target}{"activity"} = 0; 416 | } 417 | # $icagesDrug{$line[1]}{$neighbor}{$target}{"activity"} = 1 if exists $fda{$line[1]} ; 418 | # $icagesDrug{$line[1]}{$neighbor}{$target}{"activity"} = max(0.5, $icagesDrug{$line[1]}{$neighbor}{$target}{"activity"}) if exists $clin{$line[1]}; 419 | } 420 | }else{ 421 | $icagesDrug{$line[1]}{$neighbor}{$target}{"biosystem"} = $neighbors{$neighbor}{$target}{"biosystem"}; 422 | $icagesDrug{$line[1]}{$neighbor}{$target}{"icages"} = $neighbors{$neighbor}{$target}{"icages"} ; 423 | if(exists $activity{$line[1]}){ 424 | $icagesDrug{$line[1]}{$neighbor}{$target}{"activity"} = $activity{$line[1]}; 425 | }else{ 426 | $icagesDrug{$line[1]}{$neighbor}{$target}{"activity"} = 0; 427 | } 428 | # $icagesDrug{$line[1]}{$neighbor}{$target}{"activity"} = 1 if exists $fda{$line[1]} ; 429 | # $icagesDrug{$line[1]}{$neighbor}{$target}{"activity"} = max(0.5, $icagesDrug{$line[1]}{$neighbor}{$target}{"activity"}) if exists $clin{$line[1]}; 430 | 431 | } 432 | $index ++; 433 | } 434 | } 435 | 436 | ##### count drug 437 | my $drugCount = 0; 438 | my $gooddrugCount = 0; 439 | foreach my $drug (sort keys %icagesDrug){ 440 | foreach my $neighbor (sort keys %{$icagesDrug{$drug}}){ 441 | foreach my $final (sort keys %{$icagesDrug{$drug}{$neighbor}}){ 442 | my $icagesDrug = $icagesDrug{$drug}{$neighbor}{$final}{"biosystem"} * $icagesDrug{$drug}{$neighbor}{$final}{"icages"} * $icagesDrug{$drug}{$neighbor}{$final}{"activity"}; 443 | my $tier = 3; 444 | if(exists $fda{$drug}){ 445 | $tier = 1; 446 | }elsif(exists $clin{$drug}){ 447 | $tier = 2; 448 | } 449 | $icagesPrint{$tier}{$drug}{"score"} = $icagesDrug; 450 | 451 | if( $icagesDrug{$drug}{$neighbor}{$final}{"activity"} == 0){ 452 | $icagesDrug{$drug}{$neighbor}{$final}{"activity"} = "NA"; 453 | } 454 | if($icagesDrug == 0){ 455 | $icagesDrug{$drug}{$neighbor}{$final}{"drug"} ="NA"; 456 | } 457 | $icagesPrint{$tier}{$drug}{"content"} = "$drug,$final,$neighbor,$icagesDrug{$drug}{$neighbor}{$final}{\"icages\"},$icagesDrug{$drug}{$neighbor}{$final}{\"biosystem\"},$icagesDrug{$drug}{$neighbor}{$final}{\"activity\"},$icagesDrug,$tier"; 458 | } 459 | } 460 | } 461 | print OUT "drugName,finalTarget,directTarget,iCAGESGeneScore,maxBioSystemsScore,maxActivityScore,icagesDrugScore,tier,FDA_approvedSubtype,FDA_activeIngredient,CLT_name,CLT_organization,CLT_phase,CLT_url\n"; 462 | foreach my $tier (sort {$a <=> $b} keys %icagesPrint){ 463 | foreach my $drug (sort {$icagesPrint{$tier}{$b}{"score"} <=> $icagesPrint{$tier}{$a}{"score"}} keys %{$icagesPrint{$tier}}){ 464 | $drugCount ++; 465 | $gooddrugCount ++ if $icagesPrint{$tier}{$drug}{"score"} > 0.5; 466 | # check fda and clinical trial 467 | my $printContent = $icagesPrint{$tier}{$drug}{"content"}; 468 | if(exists $fda{$drug}){ 469 | $printContent .= "," . $fda{$drug}; 470 | }else{ 471 | $printContent .= ",NA,NA"; 472 | } 473 | if(exists $clin{$drug}){ 474 | $printContent .= "," . $clin{$drug}; 475 | }else{ 476 | $printContent .= ",NA,NA,NA,NA"; 477 | } 478 | print OUT "$printContent\n"; 479 | } 480 | } 481 | close OUT; 482 | close DRUG; 483 | 484 | my $logFile = $rawInputFile . $prefix . ".annovar.icages.log"; 485 | open(LOG, ">>$logFile") or die "iCAGES: cannot open file $logFile\n"; 486 | print LOG "########### iCAGES Drug Summary ###########\n"; 487 | print LOG "## basic information\n"; 488 | print LOG "Total: $drugCount\n"; 489 | print LOG "Good drug (iCAGES drug score >= 0.5): $gooddrugCount\n"; 490 | } 491 | 492 | 493 | 494 | 495 | 496 | 497 | 498 | -------------------------------------------------------------------------------- /bin/icagesMutation.pl: -------------------------------------------------------------------------------- 1 | #!/usr/bin/perl 2 | use strict; 3 | use warnings; 4 | use List::Util qw(min max); 5 | use Pod::Usage; 6 | use Getopt::Long; 7 | 8 | ###################################################################################################################################### 9 | ######################################################## variable declaration ######################################################## 10 | ###################################################################################################################################### 11 | 12 | my ($annovarInputFile,$rawInputFile, $inputDir ,$icagesLocation , $tumor ,$germline, $id ,$prefix, $bed, $hg, $expression); 13 | my $nowString; 14 | my (%sup, %onc); 15 | 16 | ###################################################################################################################################### 17 | ########################################################### main #################################################################### 18 | ###################################################################################################################################### 19 | $rawInputFile = $ARGV[0]; 20 | $inputDir = $ARGV[1]; 21 | $icagesLocation = $ARGV[2]; 22 | $tumor = $ARGV[3]; 23 | $germline = $ARGV[4]; 24 | $id = $ARGV[5]; 25 | $prefix = $ARGV[6]; 26 | $bed = $ARGV[7]; 27 | $hg = $ARGV[8]; 28 | $expression = $ARGV[9]; 29 | $nowString = localtime(); 30 | $annovarInputFile = &runAnnovar($rawInputFile, $inputDir ,$icagesLocation ,$tumor ,$germline ,$id, $prefix, $bed, $hg , $expression); 31 | &processAnnovar($annovarInputFile, $hg, $icagesLocation); 32 | 33 | ###################################################################################################################################### 34 | ############################################################# subroutines ############################################################ 35 | ###################################################################################################################################### 36 | 37 | sub runAnnovar { 38 | print "NOTICE: start runing iCAGES packge at $nowString\n"; 39 | my ($rawInputFile, $annovarInputFile); #ANNOVAR input files 40 | my ($icagesLocation, $callAnnotateVariation, $DBLocation); #ANNOVAR commands 41 | my ($tumor, $germline ,$id ,$prefix); #VCF conversion paramters & prefix for output 42 | my ($radialSVMDB, $radialSVMIndex, $funseq2DB, $funseq2Index, $refGeneDB, $refGeneIndex, $cnvDB); #ANNOVAR DB files: iCAGES score (index), refGene (fasta), dbSNP 43 | my ($log, $annovarLog); 44 | my $nowString = localtime; #SYSTEM local time 45 | my $bed; # location of bed file 46 | my $expression; # location of expression log change file 47 | $rawInputFile = shift; 48 | $inputDir = shift; 49 | $icagesLocation = shift; 50 | $tumor = shift; 51 | $germline = shift; 52 | $id = shift; 53 | $prefix = shift; 54 | $bed = shift; 55 | $hg = shift; 56 | $expression = shift; 57 | $callAnnotateVariation = $icagesLocation . "bin/annovar/annotate_variation.pl"; 58 | $annovarInputFile = $inputDir . "/" . $prefix . ".annovar"; 59 | $DBLocation = $icagesLocation . "db/"; 60 | &formatConvert($rawInputFile, $annovarInputFile, $icagesLocation, $tumor, $germline , $id , $bed , $expression); 61 | ÷Mutation($annovarInputFile); 62 | &loadDatabase($DBLocation); 63 | &annotateMutation($icagesLocation, $annovarInputFile, $hg); 64 | return $annovarInputFile; 65 | } 66 | 67 | 68 | sub processAnnovar{ 69 | print "NOTICE: start processing output from ANNOVAR\n"; 70 | my $annovarInputFile = shift; 71 | my $hg = shift; 72 | my $icagesLocation = shift; 73 | # still have to load onc gene set and suppressor gene set 74 | my $oncDB = $icagesLocation . "/db/oncogene.gene"; 75 | my $supDB = $icagesLocation . "/db/suppressor.gene"; 76 | my (%onc, %sup); 77 | open(ONCDB, "$oncDB") or die "ERROR: cannot open $oncDB\n"; 78 | open(SUPDB, "$supDB") or die "ERROR: cannot open $supDB\n"; 79 | while(){ 80 | chomp; 81 | $onc{$_} =1; 82 | } 83 | close ONCDB; 84 | while(){ 85 | chomp; 86 | $sup{$_} = 1; 87 | } 88 | close SUPDB; 89 | my $annovarVariantFunction = $annovarInputFile . ".variant_function"; 90 | # create a temp file to store bed file of variant function : chr start end score 91 | my $genebed = $annovarInputFile . ".variant_function.bed"; 92 | open(GENEFORBED, "$annovarVariantFunction") or die "ERROR: cannot open $annovarVariantFunction\n"; 93 | open(GENEBED, ">$genebed") or die "ERROR: cannot create $genebed file\n"; 94 | my $lastline = "" ; 95 | while(){ 96 | chomp; 97 | my @line = split("\t", $_); 98 | my $printout = "$line[2]\t$line[3]\t$line[4]"; 99 | if($printout eq $lastline){ 100 | next; 101 | }else{ 102 | print GENEBED "$line[2]\t$line[3]\t$line[4]\n"; 103 | } 104 | $lastline = $printout; 105 | } 106 | close GENEBED; 107 | close GENEFORBED; 108 | my $annovarExonVariantFunction = $annovarInputFile . ".exonic_variant_function"; 109 | my $annovarRadialSVM = $annovarInputFile . ".snp." . $hg . "_iCAGES_dropped"; 110 | my $annovarCNV = $annovarInputFile . ".cnv." . $hg . "_cnv"; 111 | # create a temp file to store bed file of cnv with this format : chr start end score 112 | my $cnvbed = $annovarInputFile . ".cnv." . $hg . "_cnv.bed"; 113 | 114 | # create a final file to store the final result of bedtools intersect 115 | my $cnvfinal = $annovarInputFile . ".cnv.final"; 116 | 117 | my $annovarFunseq2 = $annovarInputFile . ".snp." . $hg . "_funseq2_dropped"; 118 | my $icagesMutations = $annovarInputFile . ".icagesMutations.csv"; 119 | # add bedtools 120 | my $bedtools = $icagesLocation . "/bin/bedtools/bin/bedtools"; 121 | open(GENE, "$annovarVariantFunction") or die "ERROR: cannot open file $annovarVariantFunction\n"; 122 | open(EXON, "$annovarExonVariantFunction") or die "ERROR: cannot open file $annovarExonVariantFunction\n"; 123 | if(!-e $annovarCNV){ 124 | !system("touch $annovarCNV") or die "ERROR: cannot create file $annovarCNV\n"; 125 | } 126 | if(!-e $annovarRadialSVM){ 127 | !system("touch $annovarRadialSVM") or die "ERROR: cannot create file $annovarRadialSVM\n"; 128 | } 129 | if(!-e $annovarFunseq2){ 130 | !system("touch $annovarFunseq2") or die "ERROR: cannot create file $annovarFunseq2\n"; 131 | } 132 | 133 | open(CNV, "$annovarCNV") or die "ERROR: cannot open file $annovarCNV\n"; 134 | open(RADIAL, "$annovarRadialSVM") or die "ERROR: cannot open file $annovarRadialSVM\n"; 135 | open(FUNSEQ, "$annovarFunseq2") or die "ERROR: cannot open file $annovarFunseq2\n"; 136 | open(OUT, ">$icagesMutations") or die "ERROR: cannot open file $icagesMutations\n"; 137 | my (%radialSVM, %funseq, %cnv, %exon); 138 | my (%pointcoding); 139 | my %icagesMutations; 140 | 141 | ######## count location information 142 | my $exonCount = 0; 143 | my $intronCount = 0; 144 | my $noncodingRNACount = 0; 145 | my $intergenicCount = 0; 146 | my $otherCount = 0; 147 | 148 | ######## count annotation information 149 | my $radialSVMCount = 0; 150 | my $funseqCount = 0; 151 | my $cnvCount = 0; 152 | 153 | while(){ 154 | chomp; 155 | my @line; 156 | my $key; 157 | @line = split(/\t/, $_); 158 | $key = "$line[2],$line[3],$line[4],$line[5],$line[6]"; 159 | $radialSVM{$key} = $line[1]; 160 | } 161 | while(){ 162 | chomp; 163 | my @line; 164 | my $key; 165 | @line = split(/\t/, $_); 166 | $key = "$line[2],$line[3],$line[4],$line[5],$line[6]"; 167 | $funseq{$key} = $line[1]; 168 | } 169 | 170 | # cnv cannot be processed using key and value 171 | # create a file to temporarily store 172 | 173 | open(CNVBED, ">$cnvbed") or die "ERROR: cannot open $cnvbed for write:\n"; 174 | 175 | while(){ 176 | chomp; 177 | my @line; 178 | my $key; 179 | my $score; 180 | @line = split(/\t/, $_); 181 | $score = $line[1]; 182 | $score =~ /Score=(.*);/; 183 | $score = $1; 184 | # chr start end score 185 | print CNVBED "$line[2]\t$line[3]\t$line[4]\t$score\n"; 186 | } 187 | close CNVBED; 188 | 189 | 190 | # get intersect 191 | if(-z $cnvbed ){ 192 | !system("touch $cnvfinal") or die "ERROR: cannot create file $cnvfinal\n"; 193 | }else{ 194 | !system("$bedtools intersect -a $cnvbed -b $genebed -wa > $cnvfinal") or die "ERROR: cannot find intersect using bedtools, please check whether or not you have installed bedtools\n"; 195 | } 196 | 197 | open(CNVFINAL, "$cnvfinal") or die "ERROR: cannot open $cnvfinal for read:\n"; 198 | while(){ 199 | chomp; 200 | my @line = split("\t", $_); 201 | # note that this key is different for snv 202 | my $key = "$line[0],$line[1],$line[2]"; 203 | $cnv{$key} = $line[3]; 204 | } 205 | 206 | while(){ 207 | chomp; 208 | my (@line, @syntax, @content); 209 | my ($key, $mut, $pro); 210 | @line = split(/\t/, $_); 211 | @syntax = split(",", $line[2]); 212 | @content = split(":", $syntax[0]); 213 | $mut = $content[3]; 214 | $pro = $content[4]; 215 | $key = "$line[3],$line[4],$line[5],$line[6],$line[7]"; 216 | $exon{$key}{"mutationSyntax"} = $mut; 217 | $exon{$key}{"proteinSyntax"} = $pro; 218 | if($line[4] == $line[5]){ 219 | $pointcoding{$key} = 1; 220 | } 221 | } 222 | while(){ 223 | chomp; 224 | my @line; 225 | my ($key, $gene); #hash key used for fetch radial SVM score from %radialSVM: mutation->radialSVM 226 | my ($category, $mutationSyntax, $proteinSyntax, $scoreCategory, $score); 227 | @line = split(/\t/, $_); 228 | $gene = $line[1]; 229 | next unless defined $gene; 230 | $key = "$line[2],$line[3],$line[4],$line[5],$line[6]"; 231 | # note that structural variation key is different !!! chr,start,end; 232 | my $structKey = "$line[2],$line[3],$line[4]"; 233 | my %cnvScore ; # we also need this hash to store cnv score for each gene 234 | 235 | next unless defined $key; 236 | next unless defined $structKey; 237 | 238 | 239 | ####### process gene for noncoding variants 240 | my @printGene; 241 | if($gene =~ /(.*?)\(dist=(.*?)\),(.*?)\(dist=(.*?)\)/){ 242 | my $gene1 = $1; 243 | my $gene2 = $3; 244 | my $dist1 = $2; 245 | my $dist2 = $4; 246 | if($dist1 eq "NONE" and $dist2 eq "NONE"){ 247 | $printGene[0] = $gene1; 248 | }elsif($dist1 eq "NONE"){ 249 | $printGene[0] = $gene2; 250 | }elsif($dist2 eq "NONE"){ 251 | $printGene[0] = $gene1; 252 | }elsif($dist1 <= $dist2){ 253 | $printGene[0] = $gene1; 254 | }else{ 255 | $printGene[0] = $gene2; 256 | } 257 | }elsif($gene =~ /([A-Z|0-9|-]+?)\(.*\),([A-Z|0-9|-]+?)\(.*\)$/){ 258 | $printGene[0] = $1; 259 | $printGene[1] = $3; 260 | }elsif($gene =~ /([A-Z|0-9|-]+?)\(.*\);([A-Z|0-9|-]+?)\(.*\)$/){ 261 | $printGene[0] = $1; 262 | $printGene[1] = $3; 263 | }elsif($gene =~ /([A-Z|0-9|-]+?)\(.*\)$/){ 264 | $printGene[0] = $1; 265 | }elsif($gene =~ /;/ or $gene =~ /,/){ 266 | my @gene = split(/;|,/, $gene); 267 | for(0..$#gene){ 268 | $printGene[$_] = $gene[$_]; 269 | } 270 | }else{ 271 | $printGene[0] = $gene; 272 | } 273 | 274 | 275 | if($line[0] =~ /^exonic/ || $line[0] =~ /^splicing/ ){ 276 | $exonCount ++; 277 | }elsif($line[0] =~ /^intron/){ 278 | $intronCount ++; 279 | }elsif($line[0] =~ /^ncRNA/){ 280 | $noncodingRNACount ++; 281 | }elsif($line[0] =~ /^intergenic/){ 282 | $intergenicCount ++; 283 | }else{ 284 | $otherCount ++; 285 | } 286 | 287 | if ($line[3] == $line[4]){ 288 | if(exists $pointcoding{$key}){ 289 | $radialSVMCount ++; 290 | $category = "point coding"; 291 | if(defined $exon{$key}{"mutationSyntax"}){ 292 | $mutationSyntax = $exon{$key}{"mutationSyntax"}; 293 | }else{ 294 | $mutationSyntax = "NA"; 295 | } 296 | if(defined $exon{$key}{"proteinSyntax"}){ 297 | $proteinSyntax = $exon{$key}{"proteinSyntax"}; 298 | }else{ 299 | $proteinSyntax = "NA"; 300 | } 301 | $scoreCategory = "radial SVM"; 302 | if(exists $radialSVM{$key}){ 303 | $score = $radialSVM{$key} 304 | }else{ 305 | $score = "NA"; 306 | } 307 | }else{ 308 | $funseqCount ++; 309 | $category = "point noncoding"; 310 | $mutationSyntax = "NA"; 311 | $proteinSyntax = "NA"; 312 | $scoreCategory = "FunSeq2"; 313 | if(exists $funseq{$key}){ 314 | $score = $funseq{$key} 315 | }else{ 316 | $score = "NA"; 317 | } 318 | } 319 | }else{ 320 | $cnvCount ++ ; 321 | $category = "structural variation"; 322 | if(exists $exon{$key}{"mutationSyntax"} and exists $exon{$key}{"proteinSyntax"}){ 323 | $mutationSyntax = $exon{$key}{"mutationSyntax"}; 324 | $proteinSyntax = $exon{$key}{"proteinSyntax"}; 325 | }else{ 326 | $mutationSyntax = "NA"; 327 | $proteinSyntax = "NA"; 328 | } 329 | $scoreCategory = "CNV normalized signal"; 330 | for(0..$#printGene){ 331 | if(exists $cnv{$structKey} and (exists $onc{$printGene[$_]} or exists $sup{$printGene[$_]})){ 332 | $score = $cnv{$structKey}; 333 | }else{ 334 | $score = "NA"; 335 | } 336 | $cnvScore{$printGene[$_]} = $score; 337 | } 338 | } 339 | 340 | for(0..$#printGene){ 341 | $icagesMutations{$printGene[$_]}{$key}{"category"} = $category; 342 | $icagesMutations{$printGene[$_]}{$key}{"mutationSyntax"} = $mutationSyntax; 343 | $icagesMutations{$printGene[$_]}{$key}{"proteinSyntax"} = $proteinSyntax; 344 | $icagesMutations{$printGene[$_]}{$key}{"scoreCategory"} = $scoreCategory; 345 | if($category eq "structural variation"){ 346 | $icagesMutations{$printGene[$_]}{$key}{"score"} = $cnvScore{$printGene[$_]}; 347 | }else{ 348 | $icagesMutations{$printGene[$_]}{$key}{"score"} = $score; 349 | } 350 | } 351 | } 352 | print OUT "geneName,chrmosomeNumber,start,end,reference,alternative,category,mutationSyntax,proteinSyntax,scoreCategory,mutationScore\n"; 353 | foreach my $gene (sort keys %icagesMutations){ 354 | foreach my $mutation (sort keys %{$icagesMutations{$gene}}){ 355 | if (defined $gene and defined $mutation and defined $icagesMutations{$gene}{$mutation}{"category"} and defined $icagesMutations{$gene}{$mutation}{"mutationSyntax"} and defined $icagesMutations{$gene}{$mutation}{"proteinSyntax"} and defined $icagesMutations{$gene}{$mutation}{"scoreCategory"} and defined $icagesMutations{$gene}{$mutation}{"score"}){ 356 | print OUT "$gene,$mutation,$icagesMutations{$gene}{$mutation}{\"category\"},$icagesMutations{$gene}{$mutation}{\"mutationSyntax\"},$icagesMutations{$gene}{$mutation}{\"proteinSyntax\"},$icagesMutations{$gene}{$mutation}{\"scoreCategory\"},$icagesMutations{$gene}{$mutation}{\"score\"}\n" ; 357 | }else{ 358 | print "$gene,$mutation,$icagesMutations{$gene}{$mutation}{\"category\"},$icagesMutations{$gene}{$mutation}{\"mutationSyntax\"},$icagesMutations{$gene}{$mutation}{\"proteinSyntax\"},$icagesMutations{$gene}{$mutation}{\"scoreCategory\"},$icagesMutations{$gene}{$mutation}{\"score\"}\n" ; 359 | } 360 | } 361 | } 362 | 363 | my $logFile = $annovarInputFile . ".icages.log"; 364 | open(LOG, ">>$logFile") or die "iCAGES: cannot open file $logFile\n"; 365 | 366 | print LOG "## location information\n"; 367 | print LOG "exonic/splice: $exonCount\n"; 368 | print LOG "intronic: $intronCount\n"; 369 | print LOG "noncoding RNA: $noncodingRNACount\n"; 370 | print LOG "intergenic: $intergenicCount\n"; 371 | print LOG "other: $otherCount\n\n"; 372 | 373 | print LOG "## annotation information\n"; 374 | print LOG "point coding variants with radialSVM annotation: $radialSVMCount\n"; 375 | print LOG "point noncoding variants with FunSeq2 annotation: $funseqCount\n"; 376 | print LOG "Indels and SVs with CNV signal annotation: $cnvCount\n\n"; 377 | } 378 | 379 | 380 | sub formatConvert{ 381 | # $rawInputFile, $annovarInputFile, $icagesLocation, $tumor, $germline , $id, $bed, $expression 382 | print "NOTICE: start input file format checking and converting format if needed\n"; 383 | my ($rawInputFile, $annovarInputFile, $icagesLocation ); 384 | my ( $tumor, $germline , $id, $bed, $prefix , $expression); # parameters for vcf conversion 385 | my $callConvertToAnnovar; 386 | my $callvcftools; 387 | my $formatCheckFirstLine; 388 | my $isbedFormat = 0; # check whether or not this file is in bed format 389 | $rawInputFile = shift; 390 | $annovarInputFile = shift; 391 | $icagesLocation = shift; 392 | $tumor = shift; 393 | $germline = shift; 394 | $id = shift; 395 | $bed = shift; 396 | $expression = shift; 397 | open(IN, "$rawInputFile") or die "ERROR: cannot open $rawInputFile\n"; 398 | $formatCheckFirstLine = ; 399 | chomp $formatCheckFirstLine; 400 | my $multipleSampleCheck = 0; 401 | if($formatCheckFirstLine =~ /^#/){ 402 | # check the whole file to see if this is multple sample 403 | 404 | while(){ 405 | chomp; 406 | my $line = $_; 407 | my @line = split; 408 | if($line[0] =~ /^#CHROM/){ 409 | if($#line > 9){ 410 | $multipleSampleCheck = 1; 411 | } 412 | last; 413 | } 414 | } 415 | }else{ 416 | my @line = split(/\t| /, $formatCheckFirstLine); 417 | if($#line == 2 ){ 418 | $isbedFormat = 1; 419 | }elsif($#line != 2 and defined $line[3] and defined $line[4] ){ 420 | if($line[3] !~ /[a|t|c|g|A|T|C|G|-]+/ or $line[4] !~ /[a|t|c|g|A|T|C|G|-]+/){ 421 | $isbedFormat = 1; 422 | } 423 | } 424 | } 425 | close IN; 426 | $callConvertToAnnovar = $icagesLocation . "bin/annovar/convert2annovar.pl"; 427 | $callvcftools = $icagesLocation . "bin/vcftools/bin/vcftools"; 428 | if($formatCheckFirstLine =~ /^##fileformat=VCF/){ #VCF 429 | if($multipleSampleCheck and $tumor eq "NA" and $germline eq "NA" and $id eq "NA"){ 430 | die "ERROR: your vcf file contains multiple samples please specify a valid sample identifier \n"; 431 | } 432 | if($tumor ne "NA" and $germline ne "NA"){ 433 | !system("$callvcftools --recode --vcf $rawInputFile --indv $tumor --out $rawInputFile.$tumor") or die "ERROR: please specify a valid sample identifier for tumor sample\n"; 434 | !system("$callvcftools --recode --vcf $rawInputFile --indv $germline --out $rawInputFile.$germline") or die "ERROR: please specify a valid sample identifier for germline variants\n"; 435 | !system("$callConvertToAnnovar -format vcf4 $rawInputFile.$tumor.recode.vcf > $rawInputFile.$tumor.ann") or die "ERROR: cannot execute convert2annovar.pl for converting VCF file\n"; 436 | !system("$callConvertToAnnovar -format vcf4 $rawInputFile.$germline.recode.vcf > $rawInputFile.$germline.ann") or die "ERROR: cannot execute convert2annovar.pl for converting VCF file\n"; 437 | !system("cat $rawInputFile.$tumor.ann $rawInputFile.$germline.ann | uniq -u > $annovarInputFile") or die "ERROR: cannot generate somatic variants input file for iCAGES, please double check teh format of your input files\n"; 438 | }elsif($id ne "NA"){ 439 | !system("$callvcftools --recode --vcf $rawInputFile --indv $id --out $rawInputFile.$id") or die "ERROR: please specify a valid sample identifier for somatic variants of your interest\n"; 440 | !system("$callConvertToAnnovar -format vcf4 $rawInputFile.$id.recode.vcf > $annovarInputFile") or die "ERROR: cannot execute convert2annovar.pl for converting VCF file\n"; 441 | }else{ 442 | !system("$callConvertToAnnovar -format vcf4 $rawInputFile > $annovarInputFile") or die "ERROR: cannot execute convert2annovar.pl for converting VCF file\n"; 443 | } 444 | 445 | }elsif($isbedFormat){ 446 | # BED 447 | print "iCAGES: your input file is likely to be a bed file and iCAGES is converting it to ANNOVAR input format\n"; 448 | # !system("$callConvertToAnnovar -format bed $rawInputFile > $annovarInputFile") or die "ERROR: cannot convert your BED file input into ANNOVAR input format, please double check your input file\n"; 449 | my @cnv; 450 | open(INPUTCNV, "$rawInputFile") or die; 451 | my $checkFiveFields = 0; 452 | while(){ 453 | chomp; 454 | push @cnv, $_; 455 | my @line = split; 456 | $checkFiveFields = 1 if $#line == 4; 457 | } 458 | close INPUTCNV; 459 | open(OUTPUTCNV, ">$annovarInputFile") or die; 460 | for(0..$#cnv){ 461 | if( $checkFiveFields == 1){ 462 | print OUTPUTCNV "$cnv[$_]\n"; 463 | }else{ 464 | print OUTPUTCNV "$cnv[$_]\t0\t0\n"; 465 | } 466 | } 467 | }else{ 468 | #ANNOVAR 469 | !system("cp $rawInputFile $annovarInputFile") or die "ERROR: cannot use input file $rawInputFile\n"; 470 | } 471 | if($bed ne "NA"){ 472 | # there is a bug in annovar convert2annovar.pl for bed 473 | # !system("$callConvertToAnnovar -format bed $bed > $bed.out") or die "ERROR: cannot convert your BED file input into ANNOVAR input format, please double check your input file\n"; 474 | # !system('awk \'{print $1 "\t" $2 "\t" $3 "\t0\t0" }\' $bed > $bed.out') or die "ERROR: cannot convert your BED file input into ANNOVAR input format\n"; 475 | open(ANNBED, ">>", $annovarInputFile) or die "iCAGES: cannot find converted input file for iCAGES in ANNOVAR input format\n"; 476 | open(BED, "$bed") or die "iCAGES: cannot open ANNOVAR input file generated by your input BED file\n"; 477 | my @bed; 478 | while(){ 479 | chomp; 480 | push @bed, $_; 481 | } 482 | close BED; 483 | for(0..$#bed){ 484 | print ANNBED "$bed[$_]\t0\t0\n"; 485 | } 486 | close ANNBED; 487 | } 488 | if($expression ne "NA"){ 489 | !system("$callConvertToAnnovar -format bed $expression > $expression.out") or die "ERROR: cannot convert your BED file input into ANNOVAR input format, please double check your input file\n"; 490 | open(ANNBED, ">>", $annovarInputFile) or die "iCAGES: cannot find converted input file for iCAGES in ANNOVAR input format\n"; 491 | open(EXP, "$expression.out") or die "iCAGES: cannot open ANNOVAR input file generated by your input BED file\n"; 492 | my @bed; 493 | while(){ 494 | chomp; 495 | push @bed, $_; 496 | } 497 | close EXP; 498 | for(0..$#bed){ 499 | print ANNBED "$bed[$_]\n"; 500 | } 501 | close ANNBED; 502 | } 503 | } 504 | 505 | 506 | sub divideMutation{ 507 | print "NOTICE: start dividing mutations to SNP and structural variation\n"; 508 | my ($annovarInputFile, $snpFile, $cnvFile); 509 | $annovarInputFile = shift; 510 | $snpFile = $annovarInputFile . ".snp"; 511 | $cnvFile = $annovarInputFile . ".cnv"; 512 | 513 | open(OUT, "$annovarInputFile") or die "iCAGES: cannot open input file $annovarInputFile\n"; 514 | open(SNP, ">$snpFile") or die "iCAGES: cannot open input file $snpFile\n"; 515 | open(CNV, ">$cnvFile") or die "iCAGES: cannot open input file $cnvFile\n"; 516 | 517 | #### add variant information into log file 518 | my $logFile = $annovarInputFile . ".icages.log"; 519 | open(LOG, ">$logFile") or die "iCAGES: cannot open file $logFile\n"; 520 | my $variantCount = 0; 521 | my $snvCount = 0; 522 | my $cnvCount = 0; 523 | 524 | while(){ 525 | chomp; 526 | $variantCount ++; 527 | my $printLine = $_; 528 | my @line = split(/\t/, $_); 529 | if ($line[1] == $line[2] and $line[3] ne "-" and $line[4] ne "-"){ 530 | $snvCount ++; 531 | print SNP "$printLine\n"; 532 | }else{ 533 | $cnvCount ++; 534 | print CNV "$printLine\n"; 535 | } 536 | } 537 | close OUT; 538 | close SNP; 539 | close CNV; 540 | 541 | print LOG "########### iCAGES Variant Summary ###########\n"; 542 | print LOG "## basic information\n"; 543 | print LOG "Total: $variantCount\n"; 544 | print LOG "SNVs: $snvCount\n"; 545 | print LOG "Indels and Structural variants: $cnvCount\n\n"; 546 | 547 | } 548 | 549 | sub loadDatabase{ 550 | print "NOTICE: start loading databases\n"; 551 | my $DBLocation = shift; 552 | my $supLocation = $DBLocation . "suppressor.gene"; 553 | my $oncLocation = $DBLocation . "oncogene.gene"; 554 | print "NOTICE: start extracting suppressor genes\n"; 555 | open(SUP, "$supLocation") or die "cannot open $supLocation\n"; 556 | while(){ 557 | chomp; 558 | $sup{$_} = 1; 559 | } 560 | close SUP; 561 | print "NOTICE: start extracting oncogenes\n"; 562 | open(ONC, "$oncLocation") or die "cannot open $oncLocation\n"; 563 | while(){ 564 | chomp; 565 | $onc{$_} = 1; 566 | } 567 | close ONC; 568 | } 569 | 570 | 571 | 572 | sub annotateMutation{ 573 | my ($DBLocation, $icagesLocation, $callAnnovar, $annovarInputFile, $snpFile, $cnvFile, $hg); 574 | $icagesLocation = shift; 575 | $annovarInputFile = shift; 576 | $hg = shift; 577 | $DBLocation = $icagesLocation . "db/"; 578 | $snpFile = $annovarInputFile . ".snp"; 579 | $cnvFile = $annovarInputFile . ".cnv"; 580 | $callAnnovar = $icagesLocation . "bin/annovar/annotate_variation.pl"; 581 | my @children_pids; 582 | $children_pids[0] = fork(); 583 | if($children_pids[0] == 0){ 584 | print "NOTICE: start to run ANNOVAR region annotation to annotate structural variations or variants associated with CNV changes\n"; 585 | if(-s $cnvFile){ 586 | !system("$callAnnovar -regionanno -build $hg -out $cnvFile -dbtype cnv $cnvFile $DBLocation -scorecolumn 4 --colsWanted 0") or die "ERROR: cannot call structural varation\n"; 587 | }else{ 588 | print "NOTICE: CNV file has 0 size\n"; 589 | } 590 | exit 0; 591 | } 592 | $children_pids[1] = fork(); 593 | if($children_pids[1] == 0){ 594 | print "NOTICE: start to run ANNOVAR index function to fetch radial SVM score for each mutation \n"; 595 | if(-s $snpFile){ 596 | !system("$callAnnovar -filter -out $snpFile -build $hg -dbtype iCAGES $snpFile $DBLocation") or die "ERROR: cannot call icages\n"; 597 | }else{ 598 | print "NOTICE: SNV file has 0 size\n"; 599 | } 600 | exit 0; 601 | } 602 | $children_pids[2] = fork(); 603 | if($children_pids[2] == 0){ 604 | print "NOTICE: start to run ANNOVAR index function to fetch funseq score for each mutation \n"; 605 | if(-s $snpFile){ 606 | !system("$callAnnovar -filter -out $snpFile -build $hg -dbtype funseq2 $snpFile $DBLocation") or die "ERROR: cannot call funseq2\n"; 607 | }else{ 608 | print "NOTICE: SNV file has 0 size, skip funseq score annotation\n"; 609 | } 610 | exit 0; 611 | } 612 | $children_pids[3] = fork(); 613 | if($children_pids[3] == 0){ 614 | print "NOTICE: start annotating each mutaiton using ANNOVAR\n"; 615 | !system("$callAnnovar -out $annovarInputFile -build $hg $annovarInputFile $DBLocation") or die "ERROR: cannot call annovar\n"; 616 | exit 0; 617 | } 618 | for (0.. $#children_pids){ 619 | waitpid($children_pids[$_], 0); 620 | } 621 | } 622 | 623 | 624 | 625 | -------------------------------------------------------------------------------- /bin/icagesMutationNew.pl: -------------------------------------------------------------------------------- 1 | #!/usr/bin/perl 2 | use strict; 3 | use warnings; 4 | use List::Util qw(min max); 5 | use Pod::Usage; 6 | use Getopt::Long; 7 | 8 | ###################################################################################################################################### 9 | ######################################################## variable declaration ######################################################## 10 | ###################################################################################################################################### 11 | 12 | my ($annovarInputFile,$rawInputFile, $inputDir ,$icagesLocation , $tumor ,$germline, $id ,$prefix, $bed, $hg, $expression); 13 | my $nowString; 14 | my (%sup, %onc); 15 | 16 | ###################################################################################################################################### 17 | ########################################################### main #################################################################### 18 | ###################################################################################################################################### 19 | $rawInputFile = $ARGV[0]; 20 | $inputDir = $ARGV[1]; 21 | $icagesLocation = $ARGV[2]; 22 | $tumor = $ARGV[3]; 23 | $germline = $ARGV[4]; 24 | $id = $ARGV[5]; 25 | $prefix = $ARGV[6]; 26 | $bed = $ARGV[7]; 27 | $hg = $ARGV[8]; 28 | $expression = $ARGV[9]; 29 | $nowString = localtime(); 30 | $annovarInputFile = &runAnnovar($rawInputFile, $inputDir ,$icagesLocation ,$tumor ,$germline ,$id, $prefix, $bed, $hg , $expression); 31 | &processAnnovar($annovarInputFile, $hg, $icagesLocation); 32 | 33 | ###################################################################################################################################### 34 | ############################################################# subroutines ############################################################ 35 | ###################################################################################################################################### 36 | 37 | sub runAnnovar { 38 | print "NOTICE: start runing iCAGES packge at $nowString\n"; 39 | my ($rawInputFile, $annovarInputFile); #ANNOVAR input files 40 | my ($icagesLocation, $callAnnotateVariation, $DBLocation); #ANNOVAR commands 41 | my ($tumor, $germline ,$id ,$prefix); #VCF conversion paramters & prefix for output 42 | my ($radialSVMDB, $radialSVMIndex, $funseq2DB, $funseq2Index, $refGeneDB, $refGeneIndex, $cnvDB); #ANNOVAR DB files: iCAGES score (index), refGene (fasta), dbSNP 43 | my ($log, $annovarLog); 44 | my $nowString = localtime; #SYSTEM local time 45 | my $bed; # location of bed file 46 | my $expression; # location of expression log change file 47 | $rawInputFile = shift; 48 | $inputDir = shift; 49 | $icagesLocation = shift; 50 | $tumor = shift; 51 | $germline = shift; 52 | $id = shift; 53 | $prefix = shift; 54 | $bed = shift; 55 | $hg = shift; 56 | $expression = shift; 57 | $callAnnotateVariation = $icagesLocation . "bin/annovar/annotate_variation.pl"; 58 | 59 | # ANNOVAR have already installed 60 | if (! -e $callAnnotateVariation) { 61 | $callAnnotateVariation = "annotate_variation.pl"; 62 | print ("WARNING: cannot find ANNOVAR in ./bin/ direcotry, assuming that you have installed ANNOVAR\n"); 63 | } 64 | $annovarInputFile = $inputDir . "/" . $prefix . ".annovar"; 65 | $DBLocation = $icagesLocation . "db/"; 66 | # $DBLocation = "/ssd/icages-humandb/"; 67 | &formatConvert($rawInputFile, $annovarInputFile, $icagesLocation, $tumor, $germline , $id , $bed , $expression); 68 | ÷Mutation($annovarInputFile); 69 | &loadDatabase($DBLocation); 70 | &annotateMutation($icagesLocation, $annovarInputFile, $hg); 71 | return $annovarInputFile; 72 | } 73 | 74 | 75 | sub processAnnovar{ 76 | print "NOTICE: start processing output from ANNOVAR\n"; 77 | my $annovarInputFile = shift; 78 | my $hg = shift; 79 | my $icagesLocation = shift; 80 | # still have to load onc gene set and suppressor gene set 81 | my $oncDB = $icagesLocation . "/db/oncogene.gene"; 82 | my $supDB = $icagesLocation . "/db/suppressor.gene"; 83 | my (%onc, %sup); 84 | open(ONCDB, "$oncDB") or die "ERROR: cannot open $oncDB\n"; 85 | open(SUPDB, "$supDB") or die "ERROR: cannot open $supDB\n"; 86 | while(){ 87 | chomp; 88 | $onc{$_} =1; 89 | } 90 | close ONCDB; 91 | while(){ 92 | chomp; 93 | $sup{$_} = 1; 94 | } 95 | close SUPDB; 96 | my $annovarVariantFunction = $annovarInputFile . ".variant_function"; 97 | # create a temp file to store bed file of variant function : chr start end score 98 | my $genebed = $annovarInputFile . ".variant_function.bed"; 99 | open(GENEFORBED, "$annovarVariantFunction") or die "ERROR: cannot open $annovarVariantFunction\n"; 100 | open(GENEBED, ">$genebed") or die "ERROR: cannot create $genebed file\n"; 101 | my $lastline = "" ; 102 | while(){ 103 | chomp; 104 | my @line = split("\t", $_); 105 | my $printout = "$line[2]\t$line[3]\t$line[4]"; 106 | if($printout eq $lastline){ 107 | next; 108 | }else{ 109 | print GENEBED "$line[2]\t$line[3]\t$line[4]\n"; 110 | } 111 | $lastline = $printout; 112 | } 113 | close GENEBED; 114 | close GENEFORBED; 115 | my $annovarExonVariantFunction = $annovarInputFile . ".exonic_variant_function"; 116 | my $annovarRadialSVM = $annovarInputFile . ".snp." . $hg . "_iCAGES_dropped"; 117 | my $annovarCNV = $annovarInputFile . ".cnv." . $hg . "_cnv"; 118 | # create a temp file to store bed file of cnv with this format : chr start end score 119 | my $cnvbed = $annovarInputFile . ".cnv." . $hg . "_cnv.bed"; 120 | 121 | # create a final file to store the final result of bedtools intersect 122 | my $cnvfinal = $annovarInputFile . ".cnv.final"; 123 | 124 | my $annovarFunseq2 = $annovarInputFile . ".snp." . $hg . "_funseq2_dropped"; 125 | my $icagesMutations = $annovarInputFile . ".icagesMutations.csv"; 126 | # add bedtools 127 | my $bedtools = $icagesLocation . "/bin/bedtools/bin/bedtools"; 128 | 129 | # bedtools have already installed 130 | if (! -e $bedtools) { 131 | $bedtools = "bedtools"; 132 | print ("WARNING: did not find bedtools in ./bin/ directory, assuming you have installed bedtools\n"); 133 | } 134 | open(GENE, "$annovarVariantFunction") or die "ERROR: cannot open file $annovarVariantFunction\n"; 135 | open(EXON, "$annovarExonVariantFunction") or die "ERROR: cannot open file $annovarExonVariantFunction\n"; 136 | if(!-e $annovarCNV){ 137 | !system("touch $annovarCNV") or die "ERROR: cannot create file $annovarCNV\n"; 138 | } 139 | if(!-e $annovarRadialSVM){ 140 | !system("touch $annovarRadialSVM") or die "ERROR: cannot create file $annovarRadialSVM\n"; 141 | } 142 | if(!-e $annovarFunseq2){ 143 | !system("touch $annovarFunseq2") or die "ERROR: cannot create file $annovarFunseq2\n"; 144 | } 145 | 146 | open(CNV, "$annovarCNV") or die "ERROR: cannot open file $annovarCNV\n"; 147 | open(RADIAL, "$annovarRadialSVM") or die "ERROR: cannot open file $annovarRadialSVM\n"; 148 | open(FUNSEQ, "$annovarFunseq2") or die "ERROR: cannot open file $annovarFunseq2\n"; 149 | open(OUT, ">$icagesMutations") or die "ERROR: cannot open file $icagesMutations\n"; 150 | my (%radialSVM, %funseq, %cnv, %exon); 151 | my (%pointcoding); 152 | my %icagesMutations; 153 | 154 | ######## count location information 155 | my $exonCount = 0; 156 | my $intronCount = 0; 157 | my $noncodingRNACount = 0; 158 | my $intergenicCount = 0; 159 | my $otherCount = 0; 160 | 161 | ######## count annotation information 162 | my $radialSVMCount = 0; 163 | my $funseqCount = 0; 164 | my $cnvCount = 0; 165 | 166 | while(){ 167 | chomp; 168 | my @line; 169 | my $key; 170 | @line = split(/\t/, $_); 171 | $key = "$line[2],$line[3],$line[4],$line[5],$line[6]"; 172 | $radialSVM{$key} = $line[1]; 173 | } 174 | while(){ 175 | chomp; 176 | my @line; 177 | my $key; 178 | @line = split(/\t/, $_); 179 | $key = "$line[2],$line[3],$line[4],$line[5],$line[6]"; 180 | $funseq{$key} = $line[1]; 181 | } 182 | 183 | # cnv cannot be processed using key and value 184 | # create a file to temporarily store 185 | 186 | open(CNVBED, ">$cnvbed") or die "ERROR: cannot open $cnvbed for write:\n"; 187 | 188 | while(){ 189 | chomp; 190 | my @line; 191 | my $key; 192 | my $score; 193 | @line = split(/\t/, $_); 194 | $score = $line[1]; 195 | $score =~ /Score=(.*);/; 196 | $score = $1; 197 | # chr start end score 198 | print CNVBED "$line[2]\t$line[3]\t$line[4]\t$score\n"; 199 | } 200 | close CNVBED; 201 | 202 | 203 | # get intersect 204 | if(-z $cnvbed ){ 205 | # print "-x\n"; 206 | !system("touch $cnvfinal") or die "ERROR: cannot create file $cnvfinal\n"; 207 | }else{ 208 | # print "not -x\n"; 209 | !system("$bedtools intersect -a $cnvbed -b $genebed -wa > $cnvfinal") or die "ERROR: cannot find intersect using bedtools, please check whether or not you have installed bedtools\n"; 210 | } 211 | 212 | open(CNVFINAL, "$cnvfinal") or die "ERROR: cannot open $cnvfinal for read:\n"; 213 | while(){ 214 | chomp; 215 | my @line = split("\t", $_); 216 | # note that this key is different for snv 217 | my $key = "$line[0],$line[1],$line[2]"; 218 | $cnv{$key} = $line[3]; 219 | } 220 | 221 | while(){ 222 | chomp; 223 | my (@line, @syntax, @content); 224 | my ($key, $mut, $pro); 225 | @line = split(/\t+/, $_); 226 | @syntax = split(",", $line[2]); 227 | @content = split(":", $syntax[0]); 228 | $mut = $content[3]; 229 | $pro = $content[4]; 230 | $key = "$line[3],$line[4],$line[5],$line[6],$line[7]"; 231 | $exon{$key}{"mutationSyntax"} = $mut; 232 | $exon{$key}{"proteinSyntax"} = $pro; 233 | if($line[4] == $line[5]){ 234 | $pointcoding{$key} = 1; 235 | } 236 | } 237 | while(){ 238 | chomp; 239 | my @line; 240 | my ($key, $gene); #hash key used for fetch radial SVM score from %radialSVM: mutation->radialSVM 241 | my ($category, $mutationSyntax, $proteinSyntax, $scoreCategory, $score); 242 | @line = split(/\t+/, $_); 243 | $gene = $line[1]; 244 | next unless defined $gene; 245 | $key = "$line[2],$line[3],$line[4],$line[5],$line[6]"; 246 | # note that structural variation key is different !!! chr,start,end; 247 | my $structKey = "$line[2],$line[3],$line[4]"; 248 | my %cnvScore ; # we also need this hash to store cnv score for each gene 249 | 250 | next unless defined $key; 251 | next unless defined $structKey; 252 | 253 | 254 | ####### process gene for noncoding variants 255 | my @printGene; 256 | if($gene =~ /(.*?)\(dist=(.*?)\),(.*?)\(dist=(.*?)\)/){ 257 | my $gene1 = $1; 258 | my $gene2 = $3; 259 | my $dist1 = $2; 260 | my $dist2 = $4; 261 | if($dist1 eq "NONE" and $dist2 eq "NONE"){ 262 | $printGene[0] = $gene1; 263 | }elsif($dist1 eq "NONE"){ 264 | $printGene[0] = $gene2; 265 | }elsif($dist2 eq "NONE"){ 266 | $printGene[0] = $gene1; 267 | }elsif($dist1 <= $dist2){ 268 | $printGene[0] = $gene1; 269 | }else{ 270 | $printGene[0] = $gene2; 271 | } 272 | }elsif($gene =~ /([A-Z|0-9|-]+?)\(.*\),([A-Z|0-9|-]+?)\(.*\)$/){ 273 | $printGene[0] = $1; 274 | $printGene[1] = $3; 275 | }elsif($gene =~ /([A-Z|0-9|-]+?)\(.*\);([A-Z|0-9|-]+?)\(.*\)$/){ 276 | $printGene[0] = $1; 277 | $printGene[1] = $3; 278 | }elsif($gene =~ /([A-Z|0-9|-]+?)\(.*\)$/){ 279 | $printGene[0] = $1; 280 | }elsif($gene =~ /;/ or $gene =~ /,/){ 281 | my @gene = split(/;|,/, $gene); 282 | for(0..$#gene){ 283 | $printGene[$_] = $gene[$_]; 284 | } 285 | }elsif($gene ne ""){ 286 | $printGene[0] = $gene; 287 | } 288 | 289 | 290 | if($line[0] =~ /^exonic/ || $line[0] =~ /^splicing/ ){ 291 | $exonCount ++; 292 | }elsif($line[0] =~ /^intron/){ 293 | $intronCount ++; 294 | }elsif($line[0] =~ /^ncRNA/){ 295 | $noncodingRNACount ++; 296 | }elsif($line[0] =~ /^intergenic/){ 297 | $intergenicCount ++; 298 | }else{ 299 | $otherCount ++; 300 | } 301 | 302 | if ($line[3] == $line[4]){ 303 | if(exists $pointcoding{$key}){ 304 | $radialSVMCount ++; 305 | $category = "point coding"; 306 | if(defined $exon{$key}{"mutationSyntax"}){ 307 | $mutationSyntax = $exon{$key}{"mutationSyntax"}; 308 | }else{ 309 | $mutationSyntax = "NA"; 310 | } 311 | if(defined $exon{$key}{"proteinSyntax"}){ 312 | $proteinSyntax = $exon{$key}{"proteinSyntax"}; 313 | }else{ 314 | $proteinSyntax = "NA"; 315 | } 316 | $scoreCategory = "radial SVM"; 317 | if(exists $radialSVM{$key}){ 318 | $score = $radialSVM{$key} 319 | }else{ 320 | $score = "NA"; 321 | } 322 | }else{ 323 | $funseqCount ++; 324 | $category = "point noncoding"; 325 | $mutationSyntax = "NA"; 326 | $proteinSyntax = "NA"; 327 | $scoreCategory = "FunSeq2"; 328 | if(exists $funseq{$key}){ 329 | $score = $funseq{$key} 330 | }else{ 331 | $score = "NA"; 332 | } 333 | } 334 | }else{ 335 | $cnvCount ++ ; 336 | $category = "structural variation"; 337 | if(exists $exon{$key}{"mutationSyntax"} and exists $exon{$key}{"proteinSyntax"}){ 338 | $mutationSyntax = $exon{$key}{"mutationSyntax"}; 339 | $proteinSyntax = $exon{$key}{"proteinSyntax"}; 340 | }else{ 341 | $mutationSyntax = "NA"; 342 | $proteinSyntax = "NA"; 343 | } 344 | $scoreCategory = "CNV normalized signal"; 345 | for(0..$#printGene){ 346 | if(exists $cnv{$structKey} and (exists $onc{$printGene[$_]} or exists $sup{$printGene[$_]})){ 347 | $score = $cnv{$structKey}; 348 | }else{ 349 | $score = "NA"; 350 | } 351 | $cnvScore{$printGene[$_]} = $score; 352 | } 353 | } 354 | 355 | for(0..$#printGene){ 356 | $icagesMutations{$printGene[$_]}{$key}{"category"} = $category; 357 | $icagesMutations{$printGene[$_]}{$key}{"mutationSyntax"} = $mutationSyntax; 358 | $icagesMutations{$printGene[$_]}{$key}{"proteinSyntax"} = $proteinSyntax; 359 | $icagesMutations{$printGene[$_]}{$key}{"scoreCategory"} = $scoreCategory; 360 | if($category eq "structural variation"){ 361 | $icagesMutations{$printGene[$_]}{$key}{"score"} = $cnvScore{$printGene[$_]}; 362 | }else{ 363 | $icagesMutations{$printGene[$_]}{$key}{"score"} = $score; 364 | } 365 | } 366 | } 367 | print OUT "geneName,chrmosomeNumber,start,end,reference,alternative,category,mutationSyntax,proteinSyntax,scoreCategory,mutationScore\n"; 368 | foreach my $gene (sort keys %icagesMutations){ 369 | foreach my $mutation (sort keys %{$icagesMutations{$gene}}){ 370 | if (defined $gene and defined $mutation and defined $icagesMutations{$gene}{$mutation}{"category"} and defined $icagesMutations{$gene}{$mutation}{"mutationSyntax"} and defined $icagesMutations{$gene}{$mutation}{"proteinSyntax"} and defined $icagesMutations{$gene}{$mutation}{"scoreCategory"} and defined $icagesMutations{$gene}{$mutation}{"score"}){ 371 | print OUT "$gene,$mutation,$icagesMutations{$gene}{$mutation}{\"category\"},$icagesMutations{$gene}{$mutation}{\"mutationSyntax\"},$icagesMutations{$gene}{$mutation}{\"proteinSyntax\"},$icagesMutations{$gene}{$mutation}{\"scoreCategory\"},$icagesMutations{$gene}{$mutation}{\"score\"}\n" ; 372 | }else{ 373 | print "$gene,$mutation,$icagesMutations{$gene}{$mutation}{\"category\"},$icagesMutations{$gene}{$mutation}{\"mutationSyntax\"},$icagesMutations{$gene}{$mutation}{\"proteinSyntax\"},$icagesMutations{$gene}{$mutation}{\"scoreCategory\"},$icagesMutations{$gene}{$mutation}{\"score\"}\n" ; 374 | } 375 | } 376 | } 377 | 378 | my $logFile = $annovarInputFile . ".icages.log"; 379 | open(LOG, ">>$logFile") or die "iCAGES: cannot open file $logFile\n"; 380 | 381 | print LOG "## location information\n"; 382 | print LOG "exonic/splice: $exonCount\n"; 383 | print LOG "intronic: $intronCount\n"; 384 | print LOG "noncoding RNA: $noncodingRNACount\n"; 385 | print LOG "intergenic: $intergenicCount\n"; 386 | print LOG "other: $otherCount\n\n"; 387 | 388 | print LOG "## annotation information\n"; 389 | print LOG "point coding variants with radialSVM annotation: $radialSVMCount\n"; 390 | print LOG "point noncoding variants with FunSeq2 annotation: $funseqCount\n"; 391 | print LOG "Indels and SVs with CNV signal annotation: $cnvCount\n\n"; 392 | close OUT; 393 | } 394 | 395 | 396 | sub formatConvert{ 397 | # $rawInputFile, $annovarInputFile, $icagesLocation, $tumor, $germline , $id, $bed, $expression 398 | print "NOTICE: start input file format checking and converting format if needed\n"; 399 | my ($rawInputFile, $annovarInputFile, $icagesLocation ); 400 | my ( $tumor, $germline , $id, $bed, $prefix , $expression); # parameters for vcf conversion 401 | my $callConvertToAnnovar; 402 | my $callvcftools; 403 | my $formatCheckFirstLine; 404 | my $isbedFormat = 0; # check whether or not this file is in bed format 405 | $rawInputFile = shift; 406 | $annovarInputFile = shift; 407 | $icagesLocation = shift; 408 | $tumor = shift; 409 | $germline = shift; 410 | $id = shift; 411 | $bed = shift; 412 | $expression = shift; 413 | open(IN, "$rawInputFile") or die "ERROR: cannot open $rawInputFile\n"; 414 | $formatCheckFirstLine = ; 415 | chomp $formatCheckFirstLine; 416 | my $multipleSampleCheck = 0; 417 | if($formatCheckFirstLine =~ /^#/){ 418 | # check the whole file to see if this is multple sample 419 | 420 | while(){ 421 | chomp; 422 | my $line = $_; 423 | my @line = split; 424 | if($line[0] =~ /^#CHROM/){ 425 | if($#line > 9){ 426 | $multipleSampleCheck = 1; 427 | } 428 | last; 429 | } 430 | } 431 | 432 | }else{ 433 | my @line = split(/\t| /, $formatCheckFirstLine); 434 | if($#line == 2 ){ 435 | $isbedFormat = 1; 436 | }elsif($#line != 2 and defined $line[3] and defined $line[4] ){ 437 | if($line[3] !~ /[a|t|c|g|A|T|C|G|-]+/ or $line[4] !~ /[a|t|c|g|A|T|C|G|-]+/){ 438 | $isbedFormat = 1; 439 | } 440 | } 441 | if ($isbedFormat == 1) { 442 | print "iCAEGS: your file is likely to be a BED format\n"; 443 | 444 | } 445 | } 446 | close IN; 447 | $callConvertToAnnovar = $icagesLocation . "bin/annovar/convert2annovar.pl"; 448 | # maybe the user already have installed annovar 449 | if (! -e $callConvertToAnnovar) { 450 | $callConvertToAnnovar = "convert2annovar.pl"; 451 | print("WARNING: there is no ANNOVAR installed in the bin directory, assuming that you have already installed ANNOVAR in your system\n"); 452 | } 453 | 454 | 455 | $callvcftools = $icagesLocation . "bin/vcftools/bin/vcftools"; 456 | # maybe the user already have installed vcftools 457 | if (! -e $callvcftools ){ 458 | $callvcftools = "vcftools"; 459 | print("WARNING: there is no vcftools installed in the bin directory, assuming that you have already installed vcftools in your system\n"); 460 | } 461 | 462 | 463 | if($formatCheckFirstLine =~ /^##fileformat=VCF/){ #VCF 464 | if($multipleSampleCheck and $tumor eq "NA" and $germline eq "NA" and $id eq "NA"){ 465 | die "ERROR: your vcf file contains multiple samples please specify a valid sample identifier \n"; 466 | } 467 | if($tumor ne "NA" and $germline ne "NA"){ 468 | !system("$callvcftools --recode --vcf $rawInputFile --indv $tumor --out $rawInputFile.$tumor") or die "ERROR: please specify a valid sample identifier for tumor sample\n"; 469 | !system("$callvcftools --recode --vcf $rawInputFile --indv $germline --out $rawInputFile.$germline") or die "ERROR: please specify a valid sample identifier for germline variants\n"; 470 | !system("$callConvertToAnnovar -format vcf4 $rawInputFile.$tumor.recode.vcf > $rawInputFile.$tumor.ann") or die "ERROR: cannot execute convert2annovar.pl for converting VCF file\n"; 471 | !system("$callConvertToAnnovar -format vcf4 $rawInputFile.$germline.recode.vcf > $rawInputFile.$germline.ann") or die "ERROR: cannot execute convert2annovar.pl for converting VCF file\n"; 472 | !system("cat $rawInputFile.$tumor.ann $rawInputFile.$germline.ann | uniq -u > $annovarInputFile") or die "ERROR: cannot generate somatic variants input file for iCAGES, please double check teh format of your input files\n"; 473 | }elsif($id ne "NA"){ 474 | !system("$callvcftools --recode --vcf $rawInputFile --indv $id --out $rawInputFile.$id") or die "ERROR: please specify a valid sample identifier for somatic variants of your interest\n"; 475 | !system("$callConvertToAnnovar -format vcf4 $rawInputFile.$id.recode.vcf > $annovarInputFile") or die "ERROR: cannot execute convert2annovar.pl for converting VCF file\n"; 476 | }else{ 477 | !system("$callConvertToAnnovar -format vcf4 $rawInputFile > $annovarInputFile") or die "ERROR: cannot execute convert2annovar.pl for converting VCF file\n"; 478 | } 479 | 480 | }elsif($isbedFormat){ 481 | # BED 482 | print "iCAGES: your input file is likely to be a bed file and iCAGES is converting it to ANNOVAR input format\n"; 483 | # !system("$callConvertToAnnovar -format bed $rawInputFile > $annovarInputFile") or die "ERROR: cannot convert your BED file input into ANNOVAR input format, please double check your input file\n"; 484 | my @cnv; 485 | open(INPUTCNV, "$rawInputFile") or die; 486 | my $checkFiveFields = 0; 487 | while(){ 488 | chomp; 489 | push @cnv, $_; 490 | my @line = split(/\t| /, $_); 491 | $checkFiveFields = 1 if $#line == 4; 492 | } 493 | close INPUTCNV; 494 | open(OUTPUTCNV, ">$annovarInputFile") or die; 495 | for(0..$#cnv){ 496 | if( $checkFiveFields == 1){ 497 | print OUTPUTCNV "$cnv[$_]\n"; 498 | }else{ 499 | print OUTPUTCNV "$cnv[$_]\t0\t0\n"; 500 | } 501 | } 502 | close OUTPUTCNV; 503 | }else{ 504 | #ANNOVAR 505 | !system("cp $rawInputFile $annovarInputFile") or die "ERROR: cannot use input file $rawInputFile\n"; 506 | } 507 | if($bed ne "NA"){ 508 | # there is a bug in annovar convert2annovar.pl for bed 509 | # !system("$callConvertToAnnovar -format bed $bed > $bed.out") or die "ERROR: cannot convert your BED file input into ANNOVAR input format, please double check your input file\n"; 510 | # !system('awk \'{print $1 "\t" $2 "\t" $3 "\t0\t0" }\' $bed > $bed.out') or die "ERROR: cannot convert your BED file input into ANNOVAR input format\n"; 511 | open(ANNBED, ">>", $annovarInputFile) or die "iCAGES: cannot find converted input file for iCAGES in ANNOVAR input format\n"; 512 | open(BED, "$bed") or die "iCAGES: cannot open ANNOVAR input file generated by your input BED file\n"; 513 | my @bed; 514 | while(){ 515 | chomp; 516 | push @bed, $_; 517 | } 518 | close BED; 519 | for(0..$#bed){ 520 | print ANNBED "$bed[$_]\t0\t0\n"; 521 | } 522 | close ANNBED; 523 | } 524 | if($expression ne "NA"){ 525 | !system("$callConvertToAnnovar -format bed $expression > $expression.out") or die "ERROR: cannot convert your BED file input into ANNOVAR input format, please double check your input file\n"; 526 | open(ANNBED, ">>", $annovarInputFile) or die "iCAGES: cannot find converted input file for iCAGES in ANNOVAR input format\n"; 527 | open(EXP, "$expression.out") or die "iCAGES: cannot open ANNOVAR input file generated by your input BED file\n"; 528 | my @bed; 529 | while(){ 530 | chomp; 531 | push @bed, $_; 532 | } 533 | close EXP; 534 | for(0..$#bed){ 535 | print ANNBED "$bed[$_]\n"; 536 | } 537 | close ANNBED; 538 | } 539 | } 540 | 541 | 542 | sub divideMutation{ 543 | print "NOTICE: start dividing mutations to SNP and structural variation\n"; 544 | my ($annovarInputFile, $snpFile, $cnvFile); 545 | $annovarInputFile = shift; 546 | $snpFile = $annovarInputFile . ".snp"; 547 | $cnvFile = $annovarInputFile . ".cnv"; 548 | 549 | open(OUT, "<$annovarInputFile") or die "iCAGES: cannot open input file $annovarInputFile\n"; 550 | open(SNP, ">$snpFile") or die "iCAGES: cannot open input file $snpFile\n"; 551 | open(CNV, ">$cnvFile") or die "iCAGES: cannot open input file $cnvFile\n"; 552 | 553 | #### add variant information into log file 554 | my $logFile = $annovarInputFile . ".icages.log"; 555 | open(LOG, ">>$logFile") or die "iCAGES: cannot open file $logFile\n"; 556 | my $variantCount = 0; 557 | my $snvCount = 0; 558 | my $cnvCount = 0; 559 | 560 | while(){ 561 | chomp; 562 | $variantCount ++; 563 | my $printLine = $_; 564 | # get rid of ^M sign 565 | if ($printLine =~ /\r/){ 566 | $printLine =~ s/\r//g; 567 | } 568 | 569 | my @line = split(/\t| /, $printLine); 570 | 571 | if ($line[3] ne "" and $line[4] ne "" and $line[1] == $line[2] and $line[3] ne "-" and $line[4] ne "-"){ 572 | $snvCount ++; 573 | print SNP "$printLine\n"; 574 | }else{ 575 | $cnvCount ++; 576 | print CNV "$printLine\n"; 577 | } 578 | } 579 | 580 | close OUT; 581 | close SNP; 582 | close CNV; 583 | 584 | print LOG "########### iCAGES Variant Summary ###########\n"; 585 | print LOG "## basic information\n"; 586 | print LOG "Total: $variantCount\n"; 587 | print LOG "SNVs: $snvCount\n"; 588 | print LOG "Indels and Structural variants: $cnvCount\n\n"; 589 | 590 | close LOG; 591 | } 592 | 593 | sub loadDatabase{ 594 | print "NOTICE: start loading databases\n"; 595 | my $DBLocation = shift; 596 | my $supLocation = $DBLocation . "suppressor.gene"; 597 | my $oncLocation = $DBLocation . "oncogene.gene"; 598 | print "NOTICE: start extracting suppressor genes\n"; 599 | open(SUP, "$supLocation") or die "cannot open $supLocation\n"; 600 | while(){ 601 | chomp; 602 | $sup{$_} = 1; 603 | } 604 | close SUP; 605 | print "NOTICE: start extracting oncogenes\n"; 606 | open(ONC, "$oncLocation") or die "cannot open $oncLocation\n"; 607 | while(){ 608 | chomp; 609 | $onc{$_} = 1; 610 | } 611 | close ONC; 612 | } 613 | 614 | 615 | 616 | sub annotateMutation{ 617 | my ($DBLocation, $icagesLocation, $callAnnovar, $annovarInputFile, $snpFile, $cnvFile, $hg); 618 | $icagesLocation = shift; 619 | $annovarInputFile = shift; 620 | $hg = shift; 621 | $DBLocation = $icagesLocation . "db/"; 622 | $snpFile = $annovarInputFile . ".snp"; 623 | $cnvFile = $annovarInputFile . ".cnv"; 624 | $callAnnovar = $icagesLocation . "bin/annovar/annotate_variation.pl"; 625 | # sometimes the user 626 | if (! -e $callAnnovar) { 627 | $callAnnovar = "annotate_variation.pl"; 628 | print("WARNING: did not find ANNOVAR in ./bin/ directory, assuming have installed ANNOVAR\n"); 629 | } 630 | 631 | my @children_pids; 632 | $children_pids[0] = fork(); 633 | if($children_pids[0] == 0){ 634 | print "NOTICE: start to run ANNOVAR region annotation to annotate structural variations or variants associated with CNV changes\n"; 635 | if(-s $cnvFile){ 636 | !system("$callAnnovar -regionanno -build $hg -out $cnvFile -dbtype cnv $cnvFile $DBLocation -scorecolumn 4 --colsWanted 0") or die "ERROR: cannot call structural varation\n"; 637 | }else{ 638 | print "NOTICE: CNV file has 0 size\n"; 639 | } 640 | exit 0; 641 | } 642 | $children_pids[1] = fork(); 643 | if($children_pids[1] == 0){ 644 | print "NOTICE: start to run ANNOVAR index function to fetch radial SVM score for each mutation \n"; 645 | if(-s $snpFile){ 646 | # use yunfei's new query function instead 647 | # print ("/ssd/icages-humandb/genomeLocusFinder.pl find /ssd/icages-humandb/" . $hg . "_iCAGES.txt $snpFile iCAGES $snpFile." . $hg . "_iCAGES_dropped"); 648 | 649 | # !system("/ssd/icages-humandb/genomeLocusFinder.pl find /ssd/icages-humandb/" . $hg . "_iCAGES.txt $snpFile iCAGES $snpFile." . $hg . "_iCAGES_dropped") or die "ERROR: cannot call icages\n"; 650 | !system("$icagesLocation/genomeLocusFinder.pl find $DBLocation" . $hg . "_iCAGES.txt $snpFile iCAGES $snpFile." . $hg . "_iCAGES_dropped") or die "ERROR: cannot call icages\n"; 651 | 652 | # !system("$callAnnovar -filter -out $snpFile -build $hg -dbtype iCAGES $snpFile $DBLocation") or die "ERROR: cannot call icages\n"; 653 | }else{ 654 | print "NOTICE: SNV file has 0 size\n"; 655 | } 656 | exit 0; 657 | } 658 | $children_pids[2] = fork(); 659 | if($children_pids[2] == 0){ 660 | print "NOTICE: start to run ANNOVAR index function to fetch funseq score for each mutation \n"; 661 | if(-s $snpFile){ 662 | # use yunfei's new query function instead 663 | # print ("/ssd/icages-humandb/genomeLocusFinder.pl find /ssd/icages-humandb/" . $hg . "_funseq2.txt $snpFile funseq2 $snpFile." . $hg . "_funseq2_dropped\n"); 664 | # !system("/ssd/icages-humandb/genomeLocusFinder.pl find /ssd/icages-humandb/" . $hg . "_funseq2.txt $snpFile funseq2 $snpFile." . $hg . "_funseq2_dropped") or die "ERROR: cannot call funseq2\n"; 665 | 666 | !system("$icagesLocation/genomeLocusFinder.pl find $DBLocation" . $hg . "_funseq2.txt $snpFile funseq2 $snpFile." . $hg . "_funseq2_dropped") or die "ERROR: cannot call funseq2\n"; 667 | 668 | # !system("$callAnnovar -filter -out $snpFile -build $hg -dbtype funseq2 $snpFile $DBLocation") or die "ERROR: cannot call funseq2\n"; 669 | }else{ 670 | print "NOTICE: SNV file has 0 size, skip funseq score annotation\n"; 671 | } 672 | exit 0; 673 | } 674 | $children_pids[3] = fork(); 675 | if($children_pids[3] == 0){ 676 | print "NOTICE: start annotating each mutaiton using ANNOVAR\n"; 677 | !system("$callAnnovar -out $annovarInputFile -build $hg $annovarInputFile $DBLocation") or die "ERROR: cannot call annovar\n"; 678 | exit 0; 679 | } 680 | for (0.. $#children_pids){ 681 | waitpid($children_pids[$_], 0); 682 | } 683 | } 684 | 685 | 686 | 687 | -------------------------------------------------------------------------------- /bin/DGIdb/local/lib.pm: -------------------------------------------------------------------------------- 1 | package local::lib; 2 | use 5.006; 3 | use strict; 4 | use warnings; 5 | use Config; 6 | 7 | our $VERSION = '2.000014'; 8 | $VERSION = eval $VERSION; 9 | 10 | BEGIN { 11 | *_WIN32 = ($^O eq 'MSWin32' || $^O eq 'NetWare' || $^O eq 'symbian') 12 | ? sub(){1} : sub(){0}; 13 | # punt on these systems 14 | *_USE_FSPEC = ($^O eq 'MacOS' || $^O eq 'VMS' || $INC{'File/Spec.pm'}) 15 | ? sub(){1} : sub(){0}; 16 | } 17 | our $_DIR_JOIN = _WIN32 ? '\\' : '/'; 18 | our $_DIR_SPLIT = (_WIN32 || $^O eq 'cygwin') ? qr{[\\/]} 19 | : qr{/}; 20 | our $_ROOT = _WIN32 ? do { 21 | my $UNC = qr{[\\/]{2}[^\\/]+[\\/][^\\/]+}; 22 | qr{^(?:$UNC|[A-Za-z]:|)$_DIR_SPLIT}; 23 | } : qr{^/}; 24 | our $_PERL; 25 | 26 | sub _cwd { 27 | my $drive = shift; 28 | if (!$_PERL) { 29 | ($_PERL) = $^X =~ /(.+)/; # $^X is internal how could it be tainted?! 30 | if (_is_abs($_PERL)) { 31 | } 32 | elsif (-x $Config{perlpath}) { 33 | $_PERL = $Config{perlpath}; 34 | } 35 | else { 36 | ($_PERL) = 37 | map { /(.*)/ } 38 | grep { -x $_ } 39 | map { join($_DIR_JOIN, $_, $_PERL) } 40 | split /\Q$Config{path_sep}\E/, $ENV{PATH}; 41 | } 42 | } 43 | local @ENV{qw(PATH IFS CDPATH ENV BASH_ENV)}; 44 | my $cmd = $drive ? "eval { Cwd::getdcwd(q($drive)) }" 45 | : 'getcwd'; 46 | my $cwd = `"$_PERL" -MCwd -le "print $cmd"`; 47 | chomp $cwd; 48 | if (!length $cwd && $drive) { 49 | $cwd = $drive; 50 | } 51 | $cwd =~ s/$_DIR_SPLIT?$/$_DIR_JOIN/; 52 | $cwd; 53 | } 54 | 55 | sub _catdir { 56 | if (_USE_FSPEC) { 57 | require File::Spec; 58 | File::Spec->catdir(@_); 59 | } 60 | else { 61 | my $dir = join($_DIR_JOIN, @_); 62 | $dir =~ s{($_DIR_SPLIT)(?:\.?$_DIR_SPLIT)+}{$1}g; 63 | $dir; 64 | } 65 | } 66 | 67 | sub _is_abs { 68 | if (_USE_FSPEC) { 69 | require File::Spec; 70 | File::Spec->file_name_is_absolute($_[0]); 71 | } 72 | else { 73 | $_[0] =~ $_ROOT; 74 | } 75 | } 76 | 77 | sub _rel2abs { 78 | my ($dir, $base) = @_; 79 | return $dir 80 | if _is_abs($dir); 81 | 82 | $base = _WIN32 && $dir =~ s/^([A-Za-z]:)// ? _cwd("$1") 83 | : $base ? $base 84 | : _cwd; 85 | return _catdir($base, $dir); 86 | } 87 | 88 | sub import { 89 | my ($class, @args) = @_; 90 | push @args, @ARGV 91 | if $0 eq '-'; 92 | 93 | my @steps; 94 | my %opts; 95 | my $shelltype; 96 | 97 | while (@args) { 98 | my $arg = shift @args; 99 | # check for lethal dash first to stop processing before causing problems 100 | # the fancy dash is U+2212 or \xE2\x88\x92 101 | if ($arg =~ /\xE2\x88\x92/ or $arg =~ /−/) { 102 | die <<'DEATH'; 103 | WHOA THERE! It looks like you've got some fancy dashes in your commandline! 104 | These are *not* the traditional -- dashes that software recognizes. You 105 | probably got these by copy-pasting from the perldoc for this module as 106 | rendered by a UTF8-capable formatter. This most typically happens on an OS X 107 | terminal, but can happen elsewhere too. Please try again after replacing the 108 | dashes with normal minus signs. 109 | DEATH 110 | } 111 | elsif ($arg eq '--self-contained') { 112 | die <<'DEATH'; 113 | FATAL: The local::lib --self-contained flag has never worked reliably and the 114 | original author, Mark Stosberg, was unable or unwilling to maintain it. As 115 | such, this flag has been removed from the local::lib codebase in order to 116 | prevent misunderstandings and potentially broken builds. The local::lib authors 117 | recommend that you look at the lib::core::only module shipped with this 118 | distribution in order to create a more robust environment that is equivalent to 119 | what --self-contained provided (although quite possibly not what you originally 120 | thought it provided due to the poor quality of the documentation, for which we 121 | apologise). 122 | DEATH 123 | } 124 | elsif( $arg =~ /^--deactivate(?:=(.*))?$/ ) { 125 | my $path = defined $1 ? $1 : shift @args; 126 | push @steps, ['deactivate', $path]; 127 | } 128 | elsif ( $arg eq '--deactivate-all' ) { 129 | push @steps, ['deactivate_all']; 130 | } 131 | elsif ( $arg =~ /^--shelltype(?:=(.*))?$/ ) { 132 | $shelltype = defined $1 ? $1 : shift @args; 133 | } 134 | elsif ( $arg eq '--no-create' ) { 135 | $opts{no_create} = 1; 136 | } 137 | elsif ( $arg =~ /^--/ ) { 138 | die "Unknown import argument: $arg"; 139 | } 140 | else { 141 | push @steps, ['activate', $arg]; 142 | } 143 | } 144 | if (!@steps) { 145 | push @steps, ['activate', undef]; 146 | } 147 | 148 | my $self = $class->new(%opts); 149 | 150 | for (@steps) { 151 | my ($method, @args) = @$_; 152 | $self = $self->$method(@args); 153 | } 154 | 155 | if ($0 eq '-') { 156 | print $self->environment_vars_string($shelltype); 157 | exit 0; 158 | } 159 | else { 160 | $self->setup_local_lib; 161 | } 162 | } 163 | 164 | sub new { 165 | my $class = shift; 166 | bless {@_}, $class; 167 | } 168 | 169 | sub clone { 170 | my $self = shift; 171 | bless {%$self, @_}, ref $self; 172 | } 173 | 174 | sub inc { $_[0]->{inc} ||= \@INC } 175 | sub libs { $_[0]->{libs} ||= [ \'PERL5LIB' ] } 176 | sub bins { $_[0]->{bins} ||= [ \'PATH' ] } 177 | sub roots { $_[0]->{roots} ||= [ \'PERL_LOCAL_LIB_ROOT' ] } 178 | sub extra { $_[0]->{extra} ||= {} } 179 | sub no_create { $_[0]->{no_create} } 180 | 181 | my $_archname = $Config{archname}; 182 | my $_version = $Config{version}; 183 | my @_inc_version_list = reverse split / /, $Config{inc_version_list}; 184 | my $_path_sep = $Config{path_sep}; 185 | 186 | sub _as_list { 187 | my $list = shift; 188 | grep length, map { 189 | !(ref $_ && ref $_ eq 'SCALAR') ? $_ : ( 190 | defined $ENV{$$_} ? split(/\Q$_path_sep/, $ENV{$$_}) 191 | : () 192 | ) 193 | } ref $list ? @$list : $list; 194 | } 195 | sub _remove_from { 196 | my ($list, @remove) = @_; 197 | return @$list 198 | if !@remove; 199 | my %remove = map { $_ => 1 } @remove; 200 | grep !$remove{$_}, _as_list($list); 201 | } 202 | 203 | my @_lib_subdirs = ( 204 | [$_version, $_archname], 205 | [$_version], 206 | [$_archname], 207 | (@_inc_version_list ? \@_inc_version_list : ()), 208 | [], 209 | ); 210 | 211 | sub install_base_bin_path { 212 | my ($class, $path) = @_; 213 | return _catdir($path, 'bin'); 214 | } 215 | sub install_base_perl_path { 216 | my ($class, $path) = @_; 217 | return _catdir($path, 'lib', 'perl5'); 218 | } 219 | sub install_base_arch_path { 220 | my ($class, $path) = @_; 221 | _catdir($class->install_base_perl_path($path), $_archname); 222 | } 223 | 224 | sub lib_paths_for { 225 | my ($class, $path) = @_; 226 | my $base = $class->install_base_perl_path($path); 227 | return map { _catdir($base, @$_) } @_lib_subdirs; 228 | } 229 | 230 | sub _mm_escape_path { 231 | my $path = shift; 232 | $path =~ s/\\/\\\\\\\\/g; 233 | if ($path =~ s/ /\\ /g) { 234 | $path = qq{"\\"$path\\""}; 235 | } 236 | return $path; 237 | } 238 | 239 | sub _mb_escape_path { 240 | my $path = shift; 241 | $path =~ s/\\/\\\\/g; 242 | return qq{"$path"}; 243 | } 244 | 245 | sub installer_options_for { 246 | my ($class, $path) = @_; 247 | return ( 248 | PERL_MM_OPT => 249 | defined $path ? "INSTALL_BASE="._mm_escape_path($path) : undef, 250 | PERL_MB_OPT => 251 | defined $path ? "--install_base "._mb_escape_path($path) : undef, 252 | ); 253 | } 254 | 255 | sub active_paths { 256 | my ($self) = @_; 257 | $self = ref $self ? $self : $self->new; 258 | 259 | return grep { 260 | # screen out entries that aren't actually reflected in @INC 261 | my $active_ll = $self->install_base_perl_path($_); 262 | grep { $_ eq $active_ll } @{$self->inc}; 263 | } _as_list($self->roots); 264 | } 265 | 266 | 267 | sub deactivate { 268 | my ($self, $path) = @_; 269 | $self = $self->new unless ref $self; 270 | $path = $self->resolve_path($path); 271 | $path = $self->normalize_path($path); 272 | 273 | my @active_lls = $self->active_paths; 274 | 275 | if (!grep { $_ eq $path } @active_lls) { 276 | warn "Tried to deactivate inactive local::lib '$path'\n"; 277 | return $self; 278 | } 279 | 280 | my %args = ( 281 | bins => [ _remove_from($self->bins, 282 | $self->install_base_bin_path($path)) ], 283 | libs => [ _remove_from($self->libs, 284 | $self->install_base_perl_path($path)) ], 285 | inc => [ _remove_from($self->inc, 286 | $self->lib_paths_for($path)) ], 287 | roots => [ _remove_from($self->roots, $path) ], 288 | ); 289 | 290 | $args{extra} = { $self->installer_options_for($args{roots}[0]) }; 291 | 292 | $self->clone(%args); 293 | } 294 | 295 | sub deactivate_all { 296 | my ($self) = @_; 297 | $self = $self->new unless ref $self; 298 | 299 | my @active_lls = $self->active_paths; 300 | 301 | my %args; 302 | if (@active_lls) { 303 | %args = ( 304 | bins => [ _remove_from($self->bins, 305 | map $self->install_base_bin_path($_), @active_lls) ], 306 | libs => [ _remove_from($self->libs, 307 | map $self->install_base_perl_path($_), @active_lls) ], 308 | inc => [ _remove_from($self->inc, 309 | map $self->lib_paths_for($_), @active_lls) ], 310 | roots => [ _remove_from($self->roots, @active_lls) ], 311 | ); 312 | } 313 | 314 | $args{extra} = { $self->installer_options_for(undef) }; 315 | 316 | $self->clone(%args); 317 | } 318 | 319 | sub activate { 320 | my ($self, $path) = @_; 321 | $self = $self->new unless ref $self; 322 | $path = $self->resolve_path($path); 323 | $self->ensure_dir_structure_for($path) 324 | unless $self->no_create; 325 | 326 | $path = $self->normalize_path($path); 327 | 328 | my @active_lls = $self->active_paths; 329 | 330 | if (grep { $_ eq $path } @active_lls[1 .. $#active_lls]) { 331 | $self = $self->deactivate($path); 332 | } 333 | 334 | my %args; 335 | if (!@active_lls || $active_lls[0] ne $path) { 336 | %args = ( 337 | bins => [ $self->install_base_bin_path($path), @{$self->bins} ], 338 | libs => [ $self->install_base_perl_path($path), @{$self->libs} ], 339 | inc => [ $self->lib_paths_for($path), @{$self->inc} ], 340 | roots => [ $path, @{$self->roots} ], 341 | ); 342 | } 343 | 344 | $args{extra} = { $self->installer_options_for($path) }; 345 | 346 | $self->clone(%args); 347 | } 348 | 349 | sub normalize_path { 350 | my ($self, $path) = @_; 351 | $path = ( Win32::GetShortPathName($path) || $path ) 352 | if $^O eq 'MSWin32'; 353 | return $path; 354 | } 355 | 356 | sub build_environment_vars_for { 357 | my $self = $_[0]->new->activate($_[1]); 358 | $self->build_environment_vars; 359 | } 360 | sub build_activate_environment_vars_for { 361 | my $self = $_[0]->new->activate($_[1]); 362 | $self->build_environment_vars; 363 | } 364 | sub build_deactivate_environment_vars_for { 365 | my $self = $_[0]->new->deactivate($_[1]); 366 | $self->build_environment_vars; 367 | } 368 | sub build_deact_all_environment_vars_for { 369 | my $self = $_[0]->new->deactivate_all; 370 | $self->build_environment_vars; 371 | } 372 | sub build_environment_vars { 373 | my $self = shift; 374 | ( 375 | PATH => join($_path_sep, _as_list($self->bins)), 376 | PERL5LIB => join($_path_sep, _as_list($self->libs)), 377 | PERL_LOCAL_LIB_ROOT => join($_path_sep, _as_list($self->roots)), 378 | %{$self->extra}, 379 | ); 380 | } 381 | 382 | sub setup_local_lib_for { 383 | my $self = $_[0]->new->activate($_[1]); 384 | $self->setup_local_lib; 385 | } 386 | 387 | sub setup_local_lib { 388 | my $self = shift; 389 | 390 | # if Carp is already loaded, ensure Carp::Heavy is also loaded, to avoid 391 | # $VERSION mismatch errors (Carp::Heavy loads Carp, so we do not need to 392 | # check in the other direction) 393 | require Carp::Heavy if $INC{'Carp.pm'}; 394 | 395 | $self->setup_env_hash; 396 | @INC = @{$self->inc}; 397 | } 398 | 399 | sub setup_env_hash_for { 400 | my $self = $_[0]->new->activate($_[1]); 401 | $self->setup_env_hash; 402 | } 403 | sub setup_env_hash { 404 | my $self = shift; 405 | my %env = $self->build_environment_vars; 406 | for my $key (keys %env) { 407 | if (defined $env{$key}) { 408 | $ENV{$key} = $env{$key}; 409 | } 410 | else { 411 | delete $ENV{$key}; 412 | } 413 | } 414 | } 415 | 416 | sub print_environment_vars_for { 417 | print $_[0]->environment_vars_string_for(@_[1..$#_]); 418 | } 419 | 420 | sub environment_vars_string_for { 421 | my $self = $_[0]->new->activate($_[1]); 422 | $self->environment_vars_string; 423 | } 424 | sub environment_vars_string { 425 | my ($self, $shelltype) = @_; 426 | 427 | $shelltype ||= $self->guess_shelltype; 428 | 429 | my $extra = $self->extra; 430 | my @envs = ( 431 | PATH => $self->bins, 432 | PERL5LIB => $self->libs, 433 | PERL_LOCAL_LIB_ROOT => $self->roots, 434 | map { $_ => $extra->{$_} } sort keys %$extra, 435 | ); 436 | $self->_build_env_string($shelltype, \@envs); 437 | } 438 | 439 | sub _build_env_string { 440 | my ($self, $shelltype, $envs) = @_; 441 | my @envs = @$envs; 442 | 443 | my $build_method = "build_${shelltype}_env_declaration"; 444 | 445 | my $out = ''; 446 | while (@envs) { 447 | my ($name, $value) = (shift(@envs), shift(@envs)); 448 | if ( 449 | ref $value 450 | && @$value == 1 451 | && ref $value->[0] 452 | && ref $value->[0] eq 'SCALAR' 453 | && ${$value->[0]} eq $name) { 454 | next; 455 | } 456 | $out .= $self->$build_method($name, $value); 457 | } 458 | my $wrap_method = "wrap_${shelltype}_output"; 459 | if ($self->can($wrap_method)) { 460 | return $self->$wrap_method($out); 461 | } 462 | return $out; 463 | } 464 | 465 | sub build_bourne_env_declaration { 466 | my ($class, $name, $args) = @_; 467 | my $value = $class->_interpolate($args, '${%s}', qr/["\\\$!`]/, '\\%s'); 468 | 469 | if (!defined $value) { 470 | return qq{unset $name;\n}; 471 | } 472 | 473 | $value =~ s/(^|\G|$_path_sep)\$\{$name\}$_path_sep/$1\${$name}\${$name+$_path_sep}/g; 474 | $value =~ s/$_path_sep\$\{$name\}$/\${$name+$_path_sep}\${$name}/; 475 | 476 | qq{${name}="$value"; export ${name};\n} 477 | } 478 | 479 | sub build_csh_env_declaration { 480 | my ($class, $name, $args) = @_; 481 | my ($value, @vars) = $class->_interpolate($args, '${%s}', '"', '"\\%s"'); 482 | if (!defined $value) { 483 | return qq{unsetenv $name;\n}; 484 | } 485 | 486 | my $out = ''; 487 | for my $var (@vars) { 488 | $out .= qq{if ! \$?$name setenv $name '';\n}; 489 | } 490 | 491 | my $value_without = $value; 492 | if ($value_without =~ s/(?:^|$_path_sep)\$\{$name\}(?:$_path_sep|$)//g) { 493 | $out .= qq{if "\${$name}" != '' setenv $name "$value";\n}; 494 | $out .= qq{if "\${$name}" == '' }; 495 | } 496 | $out .= qq{setenv $name "$value_without";\n}; 497 | return $out; 498 | } 499 | 500 | sub build_cmd_env_declaration { 501 | my ($class, $name, $args) = @_; 502 | my $value = $class->_interpolate($args, '%%%s%%', qr(%), '%s'); 503 | if (!$value) { 504 | return qq{\@set $name=\n}; 505 | } 506 | 507 | my $out = ''; 508 | my $value_without = $value; 509 | if ($value_without =~ s/(?:^|$_path_sep)%$name%(?:$_path_sep|$)//g) { 510 | $out .= qq{\@if not "%$name%"=="" set "$name=$value"\n}; 511 | $out .= qq{\@if "%$name%"=="" }; 512 | } 513 | $out .= qq{\@set "$name=$value_without"\n}; 514 | return $out; 515 | } 516 | 517 | sub build_powershell_env_declaration { 518 | my ($class, $name, $args) = @_; 519 | my $value = $class->_interpolate($args, '$env:%s', '"', '`%s'); 520 | 521 | if (!$value) { 522 | return qq{Remove-Item -ErrorAction 0 Env:\\$name;\n}; 523 | } 524 | 525 | my $maybe_path_sep = qq{\$(if("\$env:$name"-eq""){""}else{"$_path_sep"})}; 526 | $value =~ s/(^|\G|$_path_sep)\$env:$name$_path_sep/$1\$env:$name"+$maybe_path_sep+"/g; 527 | $value =~ s/$_path_sep\$env:$name$/"+$maybe_path_sep+\$env:$name+"/; 528 | 529 | qq{\$env:$name = \$("$value");\n}; 530 | } 531 | sub wrap_powershell_output { 532 | my ($class, $out) = @_; 533 | return $out || " \n"; 534 | } 535 | 536 | sub build_fish_env_declaration { 537 | my ($class, $name, $args) = @_; 538 | my $value = $class->_interpolate($args, '$%s', qr/[\\"' ]/, '\\%s'); 539 | if (!defined $value) { 540 | return qq{set -e $name;\n}; 541 | } 542 | $value =~ s/$_path_sep/ /g; 543 | qq{set -x $name $value;\n}; 544 | } 545 | 546 | sub _interpolate { 547 | my ($class, $args, $var_pat, $escape, $escape_pat) = @_; 548 | return 549 | unless defined $args; 550 | my @args = ref $args ? @$args : $args; 551 | return 552 | unless @args; 553 | my @vars = map { $$_ } grep { ref $_ eq 'SCALAR' } @args; 554 | my $string = join $_path_sep, map { 555 | ref $_ eq 'SCALAR' ? sprintf($var_pat, $$_) : do { 556 | s/($escape)/sprintf($escape_pat, $1)/ge; $_; 557 | }; 558 | } @args; 559 | return wantarray ? ($string, \@vars) : $string; 560 | } 561 | 562 | sub pipeline; 563 | 564 | sub pipeline { 565 | my @methods = @_; 566 | my $last = pop(@methods); 567 | if (@methods) { 568 | \sub { 569 | my ($obj, @args) = @_; 570 | $obj->${pipeline @methods}( 571 | $obj->$last(@args) 572 | ); 573 | }; 574 | } else { 575 | \sub { 576 | shift->$last(@_); 577 | }; 578 | } 579 | } 580 | 581 | sub resolve_path { 582 | my ($class, $path) = @_; 583 | 584 | $path = $class->${pipeline qw( 585 | resolve_relative_path 586 | resolve_home_path 587 | resolve_empty_path 588 | )}($path); 589 | 590 | $path; 591 | } 592 | 593 | sub resolve_empty_path { 594 | my ($class, $path) = @_; 595 | if (defined $path) { 596 | $path; 597 | } else { 598 | '~/perl5'; 599 | } 600 | } 601 | 602 | sub resolve_home_path { 603 | my ($class, $path) = @_; 604 | $path =~ /^~([^\/]*)/ or return $path; 605 | my $user = $1; 606 | my $homedir = do { 607 | if (! length($user) && defined $ENV{HOME}) { 608 | $ENV{HOME}; 609 | } 610 | else { 611 | require File::Glob; 612 | File::Glob::bsd_glob("~$user", File::Glob::GLOB_TILDE()); 613 | } 614 | }; 615 | unless (defined $homedir) { 616 | require Carp; require Carp::Heavy; 617 | Carp::croak( 618 | "Couldn't resolve homedir for " 619 | .(defined $user ? $user : 'current user') 620 | ); 621 | } 622 | $path =~ s/^~[^\/]*/$homedir/; 623 | $path; 624 | } 625 | 626 | sub resolve_relative_path { 627 | my ($class, $path) = @_; 628 | _rel2abs($path); 629 | } 630 | 631 | sub ensure_dir_structure_for { 632 | my ($class, $path) = @_; 633 | unless (-d $path) { 634 | warn "Attempting to create directory ${path}\n"; 635 | } 636 | require File::Basename; 637 | my @dirs; 638 | while(!-d $path) { 639 | push @dirs, $path; 640 | $path = File::Basename::dirname($path); 641 | } 642 | mkdir $_ for reverse @dirs; 643 | return; 644 | } 645 | 646 | sub guess_shelltype { 647 | my $shellbin 648 | = defined $ENV{SHELL} 649 | ? ($ENV{SHELL} =~ /([\w.]+)$/)[-1] 650 | : ( $^O eq 'MSWin32' && exists $ENV{'!EXITCODE'} ) 651 | ? 'bash' 652 | : ( $^O eq 'MSWin32' && $ENV{PROMPT} && $ENV{COMSPEC} ) 653 | ? ($ENV{COMSPEC} =~ /([\w.]+)$/)[-1] 654 | : ( $^O eq 'MSWin32' && !$ENV{PROMPT} ) 655 | ? 'powershell.exe' 656 | : 'sh'; 657 | 658 | for ($shellbin) { 659 | return 660 | /csh$/ ? 'csh' 661 | : /fish/ ? 'fish' 662 | : /command(?:\.com)?$/i ? 'cmd' 663 | : /cmd(?:\.exe)?$/i ? 'cmd' 664 | : /4nt(?:\.exe)?$/i ? 'cmd' 665 | : /powershell(?:\.exe)?$/i ? 'powershell' 666 | : 'bourne'; 667 | } 668 | } 669 | 670 | 1; 671 | __END__ 672 | 673 | =encoding utf8 674 | 675 | =head1 NAME 676 | 677 | local::lib - create and use a local lib/ for perl modules with PERL5LIB 678 | 679 | =head1 SYNOPSIS 680 | 681 | In code - 682 | 683 | use local::lib; # sets up a local lib at ~/perl5 684 | 685 | use local::lib '~/foo'; # same, but ~/foo 686 | 687 | # Or... 688 | use FindBin; 689 | use local::lib "$FindBin::Bin/../support"; # app-local support library 690 | 691 | From the shell - 692 | 693 | # Install LWP and its missing dependencies to the '~/perl5' directory 694 | perl -MCPAN -Mlocal::lib -e 'CPAN::install(LWP)' 695 | 696 | # Just print out useful shell commands 697 | $ perl -Mlocal::lib 698 | PERL_MB_OPT='--install_base /home/username/perl5'; export PERL_MB_OPT; 699 | PERL_MM_OPT='INSTALL_BASE=/home/username/perl5'; export PERL_MM_OPT; 700 | PERL5LIB="/home/username/perl5/lib/perl5"; export PERL5LIB; 701 | PATH="/home/username/perl5/bin:$PATH"; export PATH; 702 | PERL_LOCAL_LIB_ROOT="/home/usename/perl5:$PERL_LOCAL_LIB_ROOT"; export PERL_LOCAL_LIB_ROOT; 703 | 704 | From a .bashrc file - 705 | 706 | [ $SHLVL -eq 1 ] && eval "$(perl -I$HOME/perl5/lib/perl5 -Mlocal::lib)" 707 | 708 | =head2 The bootstrapping technique 709 | 710 | A typical way to install local::lib is using what is known as the 711 | "bootstrapping" technique. You would do this if your system administrator 712 | hasn't already installed local::lib. In this case, you'll need to install 713 | local::lib in your home directory. 714 | 715 | Even if you do have administrative privileges, you will still want to set up your 716 | environment variables, as discussed in step 4. Without this, you would still 717 | install the modules into the system CPAN installation and also your Perl scripts 718 | will not use the lib/ path you bootstrapped with local::lib. 719 | 720 | By default local::lib installs itself and the CPAN modules into ~/perl5. 721 | 722 | Windows users must also see L. 723 | 724 | =over 4 725 | 726 | =item 1. 727 | 728 | Download and unpack the local::lib tarball from CPAN (search for "Download" 729 | on the CPAN page about local::lib). Do this as an ordinary user, not as root 730 | or administrator. Unpack the file in your home directory or in any other 731 | convenient location. 732 | 733 | =item 2. 734 | 735 | Run this: 736 | 737 | perl Makefile.PL --bootstrap 738 | 739 | If the system asks you whether it should automatically configure as much 740 | as possible, you would typically answer yes. 741 | 742 | In order to install local::lib into a directory other than the default, you need 743 | to specify the name of the directory when you call bootstrap, as follows: 744 | 745 | perl Makefile.PL --bootstrap=~/foo 746 | 747 | =item 3. 748 | 749 | Run this: (local::lib assumes you have make installed on your system) 750 | 751 | make test && make install 752 | 753 | =item 4. 754 | 755 | Now we need to setup the appropriate environment variables, so that Perl 756 | starts using our newly generated lib/ directory. If you are using bash or 757 | any other Bourne shells, you can add this to your shell startup script this 758 | way: 759 | 760 | echo '[ $SHLVL -eq 1 ] && eval "$(perl -I$HOME/perl5/lib/perl5 -Mlocal::lib)"' >>~/.bashrc 761 | 762 | If you are using C shell, you can do this as follows: 763 | 764 | /bin/csh 765 | echo $SHELL 766 | /bin/csh 767 | echo 'eval `perl -I$HOME/perl5/lib/perl5 -Mlocal::lib`' >> ~/.cshrc 768 | 769 | If you passed to bootstrap a directory other than default, you also need to 770 | give that as import parameter to the call of the local::lib module like this 771 | way: 772 | 773 | echo '[ $SHLVL -eq 1 ] && eval "$(perl -I$HOME/foo/lib/perl5 -Mlocal::lib=$HOME/foo)"' >>~/.bashrc 774 | 775 | After writing your shell configuration file, be sure to re-read it to get the 776 | changed settings into your current shell's environment. Bourne shells use 777 | C<. ~/.bashrc> for this, whereas C shells use C. 778 | 779 | =back 780 | 781 | If you're on a slower machine, or are operating under draconian disk space 782 | limitations, you can disable the automatic generation of manpages from POD when 783 | installing modules by using the C<--no-manpages> argument when bootstrapping: 784 | 785 | perl Makefile.PL --bootstrap --no-manpages 786 | 787 | To avoid doing several bootstrap for several Perl module environments on the 788 | same account, for example if you use it for several different deployed 789 | applications independently, you can use one bootstrapped local::lib 790 | installation to install modules in different directories directly this way: 791 | 792 | cd ~/mydir1 793 | perl -Mlocal::lib=./ 794 | eval $(perl -Mlocal::lib=./) ### To set the environment for this shell alone 795 | printenv ### You will see that ~/mydir1 is in the PERL5LIB 796 | perl -MCPAN -e install ... ### whatever modules you want 797 | cd ../mydir2 798 | ... REPEAT ... 799 | 800 | When used in a C<.bashrc> file, it is recommended that you protect against 801 | re-activating a directory in a sub-shell. This can be done by checking the 802 | C<$SHLVL> variable as shown in synopsis. Without this, sub-shells created by 803 | the user or other programs will override changes made to the parent shell's 804 | environment. 805 | 806 | If you are working with several C environments, you may want to 807 | remove some of them from the current environment without disturbing the others. 808 | You can deactivate one environment like this (using bourne sh): 809 | 810 | eval $(perl -Mlocal::lib=--deactivate,~/path) 811 | 812 | which will generate and run the commands needed to remove C<~/path> from your 813 | various search paths. Whichever environment was B will 814 | remain the target for module installations. That is, if you activate 815 | C<~/path_A> and then you activate C<~/path_B>, new modules you install will go 816 | in C<~/path_B>. If you deactivate C<~/path_B> then modules will be installed 817 | into C<~/pathA> -- but if you deactivate C<~/path_A> then they will still be 818 | installed in C<~/pathB> because pathB was activated later. 819 | 820 | You can also ask C to clean itself completely out of the current 821 | shell's environment with the C<--deactivate-all> option. 822 | For multiple environments for multiple apps you may need to include a modified 823 | version of the C<< use FindBin >> instructions in the "In code" sample above. 824 | If you did something like the above, you have a set of Perl modules at C<< 825 | ~/mydir1/lib >>. If you have a script at C<< ~/mydir1/scripts/myscript.pl >>, 826 | you need to tell it where to find the modules you installed for it at C<< 827 | ~/mydir1/lib >>. 828 | 829 | In C<< ~/mydir1/scripts/myscript.pl >>: 830 | 831 | use strict; 832 | use warnings; 833 | use local::lib "$FindBin::Bin/.."; ### points to ~/mydir1 and local::lib finds lib 834 | use lib "$FindBin::Bin/../lib"; ### points to ~/mydir1/lib 835 | 836 | Put this before any BEGIN { ... } blocks that require the modules you installed. 837 | 838 | =head2 Differences when using this module under Win32 839 | 840 | To set up the proper environment variables for your current session of 841 | C, you can use this: 842 | 843 | C:\>perl -Mlocal::lib 844 | set PERL_MB_OPT=--install_base C:\DOCUME~1\ADMINI~1\perl5 845 | set PERL_MM_OPT=INSTALL_BASE=C:\DOCUME~1\ADMINI~1\perl5 846 | set PERL5LIB=C:\DOCUME~1\ADMINI~1\perl5\lib\perl5 847 | set PATH=C:\DOCUME~1\ADMINI~1\perl5\bin;%PATH% 848 | 849 | ### To set the environment for this shell alone 850 | C:\>perl -Mlocal::lib > %TEMP%\tmp.bat && %TEMP%\tmp.bat && del %TEMP%\tmp.bat 851 | ### instead of $(perl -Mlocal::lib=./) 852 | 853 | If you want the environment entries to persist, you'll need to add them to the 854 | Control Panel's System applet yourself or use L. 855 | 856 | The "~" is translated to the user's profile directory (the directory named for 857 | the user under "Documents and Settings" (Windows XP or earlier) or "Users" 858 | (Windows Vista or later)) unless $ENV{HOME} exists. After that, the home 859 | directory is translated to a short name (which means the directory must exist) 860 | and the subdirectories are created. 861 | 862 | =head3 PowerShell 863 | 864 | local::lib also supports PowerShell, and can be used with the 865 | C cmdlet. 866 | 867 | Invoke-Expression "$(perl -Mlocal::lib)" 868 | 869 | =head1 RATIONALE 870 | 871 | The version of a Perl package on your machine is not always the version you 872 | need. Obviously, the best thing to do would be to update to the version you 873 | need. However, you might be in a situation where you're prevented from doing 874 | this. Perhaps you don't have system administrator privileges; or perhaps you 875 | are using a package management system such as Debian, and nobody has yet gotten 876 | around to packaging up the version you need. 877 | 878 | local::lib solves this problem by allowing you to create your own directory of 879 | Perl packages downloaded from CPAN (in a multi-user system, this would typically 880 | be within your own home directory). The existing system Perl installation is 881 | not affected; you simply invoke Perl with special options so that Perl uses the 882 | packages in your own local package directory rather than the system packages. 883 | local::lib arranges things so that your locally installed version of the Perl 884 | packages takes precedence over the system installation. 885 | 886 | If you are using a package management system (such as Debian), you don't need to 887 | worry about Debian and CPAN stepping on each other's toes. Your local version 888 | of the packages will be written to an entirely separate directory from those 889 | installed by Debian. 890 | 891 | =head1 DESCRIPTION 892 | 893 | This module provides a quick, convenient way of bootstrapping a user-local Perl 894 | module library located within the user's home directory. It also constructs and 895 | prints out for the user the list of environment variables using the syntax 896 | appropriate for the user's current shell (as specified by the C 897 | environment variable), suitable for directly adding to one's shell 898 | configuration file. 899 | 900 | More generally, local::lib allows for the bootstrapping and usage of a 901 | directory containing Perl modules outside of Perl's C<@INC>. This makes it 902 | easier to ship an application with an app-specific copy of a Perl module, or 903 | collection of modules. Useful in cases like when an upstream maintainer hasn't 904 | applied a patch to a module of theirs that you need for your application. 905 | 906 | On import, local::lib sets the following environment variables to appropriate 907 | values: 908 | 909 | =over 4 910 | 911 | =item PERL_MB_OPT 912 | 913 | =item PERL_MM_OPT 914 | 915 | =item PERL5LIB 916 | 917 | =item PATH 918 | 919 | =item PERL_LOCAL_LIB_ROOT 920 | 921 | =back 922 | 923 | When possible, these will be appended to instead of overwritten entirely. 924 | 925 | These values are then available for reference by any code after import. 926 | 927 | =head1 CREATING A SELF-CONTAINED SET OF MODULES 928 | 929 | See L for one way to do this - but note that 930 | there are a number of caveats, and the best approach is always to perform a 931 | build against a clean perl (i.e. site and vendor as close to empty as possible). 932 | 933 | =head1 IMPORT OPTIONS 934 | 935 | Options are values that can be passed to the C import besides the 936 | directory to use. They are specified as C 937 | or C. 938 | 939 | =head2 --deactivate 940 | 941 | Remove the chosen path (or the default path) from the module search paths if it 942 | was added by C, instead of adding it. 943 | 944 | =head2 --deactivate-all 945 | 946 | Remove all directories that were added to search paths by C from the 947 | search paths. 948 | 949 | =head2 --shelltype 950 | 951 | Specify the shell type to use for output. By default, the shell will be 952 | detected based on the environment. Should be one of: C, C, 953 | C, or C. 954 | 955 | =head2 --no-create 956 | 957 | Prevents C from creating directories when activating dirs. This is 958 | likely to cause issues on Win32 systems. 959 | 960 | =head1 CLASS METHODS 961 | 962 | =head2 ensure_dir_structure_for 963 | 964 | =over 4 965 | 966 | =item Arguments: $path 967 | 968 | =item Return value: None 969 | 970 | =back 971 | 972 | Attempts to create the given path, and all required parent directories. Throws 973 | an exception on failure. 974 | 975 | =head2 print_environment_vars_for 976 | 977 | =over 4 978 | 979 | =item Arguments: $path 980 | 981 | =item Return value: None 982 | 983 | =back 984 | 985 | Prints to standard output the variables listed above, properly set to use the 986 | given path as the base directory. 987 | 988 | =head2 build_environment_vars_for 989 | 990 | =over 4 991 | 992 | =item Arguments: $path 993 | 994 | =item Return value: %environment_vars 995 | 996 | =back 997 | 998 | Returns a hash with the variables listed above, properly set to use the 999 | given path as the base directory. 1000 | 1001 | =head2 setup_env_hash_for 1002 | 1003 | =over 4 1004 | 1005 | =item Arguments: $path 1006 | 1007 | =item Return value: None 1008 | 1009 | =back 1010 | 1011 | Constructs the C<%ENV> keys for the given path, by calling 1012 | L. 1013 | 1014 | =head2 active_paths 1015 | 1016 | =over 4 1017 | 1018 | =item Arguments: None 1019 | 1020 | =item Return value: @paths 1021 | 1022 | =back 1023 | 1024 | Returns a list of active C paths, according to the 1025 | C environment variable and verified against 1026 | what is really in C<@INC>. 1027 | 1028 | =head2 install_base_perl_path 1029 | 1030 | =over 4 1031 | 1032 | =item Arguments: $path 1033 | 1034 | =item Return value: $install_base_perl_path 1035 | 1036 | =back 1037 | 1038 | Returns a path describing where to install the Perl modules for this local 1039 | library installation. Appends the directories C and C to the given 1040 | path. 1041 | 1042 | =head2 lib_paths_for 1043 | 1044 | =over 4 1045 | 1046 | =item Arguments: $path 1047 | 1048 | =item Return value: @lib_paths 1049 | 1050 | =back 1051 | 1052 | Returns the list of paths perl will search for libraries, given a base path. 1053 | This includes the base path itself, the architecture specific subdirectory, and 1054 | perl version specific subdirectories. These paths may not all exist. 1055 | 1056 | =head2 install_base_bin_path 1057 | 1058 | =over 4 1059 | 1060 | =item Arguments: $path 1061 | 1062 | =item Return value: $install_base_bin_path 1063 | 1064 | =back 1065 | 1066 | Returns a path describing where to install the executable programs for this 1067 | local library installation. Appends the directory C to the given path. 1068 | 1069 | =head2 installer_options_for 1070 | 1071 | =over 4 1072 | 1073 | =item Arguments: $path 1074 | 1075 | =item Return value: %installer_env_vars 1076 | 1077 | =back 1078 | 1079 | Returns a hash of environment variables that should be set to cause 1080 | installation into the given path. 1081 | 1082 | =head2 resolve_empty_path 1083 | 1084 | =over 4 1085 | 1086 | =item Arguments: $path 1087 | 1088 | =item Return value: $base_path 1089 | 1090 | =back 1091 | 1092 | Builds and returns the base path into which to set up the local module 1093 | installation. Defaults to C<~/perl5>. 1094 | 1095 | =head2 resolve_home_path 1096 | 1097 | =over 4 1098 | 1099 | =item Arguments: $path 1100 | 1101 | =item Return value: $home_path 1102 | 1103 | =back 1104 | 1105 | Attempts to find the user's home directory. If installed, uses C 1106 | for this purpose. If no definite answer is available, throws an exception. 1107 | 1108 | =head2 resolve_relative_path 1109 | 1110 | =over 4 1111 | 1112 | =item Arguments: $path 1113 | 1114 | =item Return value: $absolute_path 1115 | 1116 | =back 1117 | 1118 | Translates the given path into an absolute path. 1119 | 1120 | =head2 resolve_path 1121 | 1122 | =over 4 1123 | 1124 | =item Arguments: $path 1125 | 1126 | =item Return value: $absolute_path 1127 | 1128 | =back 1129 | 1130 | Calls the following in a pipeline, passing the result from the previous to the 1131 | next, in an attempt to find where to configure the environment for a local 1132 | library installation: L, L, 1133 | L. Passes the given path argument to 1134 | L which then returns a result that is passed to 1135 | L, which then has its result passed to 1136 | L. The result of this final call is returned from 1137 | L. 1138 | 1139 | =head1 OBJECT INTERFACE 1140 | 1141 | =head2 new 1142 | 1143 | =over 4 1144 | 1145 | =item Arguments: %attributes 1146 | 1147 | =item Return value: $local_lib 1148 | 1149 | =back 1150 | 1151 | Constructs a new C object, representing the current state of 1152 | C<@INC> and the relevant environment variables. 1153 | 1154 | =head1 ATTRIBUTES 1155 | 1156 | =head2 roots 1157 | 1158 | An arrayref representing active C directories. 1159 | 1160 | =head2 inc 1161 | 1162 | An arrayref representing C<@INC>. 1163 | 1164 | =head2 libs 1165 | 1166 | An arrayref representing the PERL5LIB environment variable. 1167 | 1168 | =head2 bins 1169 | 1170 | An arrayref representing the PATH environment variable. 1171 | 1172 | =head2 extra 1173 | 1174 | A hashref of extra environment variables (e.g. C and 1175 | C) 1176 | 1177 | =head2 no_create 1178 | 1179 | If set, C will not try to create directories when activating them. 1180 | 1181 | =head1 OBJECT METHODS 1182 | 1183 | =head2 clone 1184 | 1185 | =over 4 1186 | 1187 | =item Arguments: %attributes 1188 | 1189 | =item Return value: $local_lib 1190 | 1191 | =back 1192 | 1193 | Constructs a new C object based on the existing one, overriding the 1194 | specified attributes. 1195 | 1196 | =head2 activate 1197 | 1198 | =over 4 1199 | 1200 | =item Arguments: $path 1201 | 1202 | =item Return value: $new_local_lib 1203 | 1204 | =back 1205 | 1206 | Constructs a new instance with the specified path active. 1207 | 1208 | =head2 deactivate 1209 | 1210 | =over 4 1211 | 1212 | =item Arguments: $path 1213 | 1214 | =item Return value: $new_local_lib 1215 | 1216 | =back 1217 | 1218 | Constructs a new instance with the specified path deactivated. 1219 | 1220 | =head2 deactivate_all 1221 | 1222 | =over 4 1223 | 1224 | =item Arguments: None 1225 | 1226 | =item Return value: $new_local_lib 1227 | 1228 | =back 1229 | 1230 | Constructs a new instance with all C directories deactivated. 1231 | 1232 | =head2 environment_vars_string 1233 | 1234 | =over 4 1235 | 1236 | =item Arguments: [ $shelltype ] 1237 | 1238 | =item Return value: $shell_env_string 1239 | 1240 | =back 1241 | 1242 | Returns a string to set up the C, meant to be run by a shell. 1243 | 1244 | =head2 build_environment_vars 1245 | 1246 | =over 4 1247 | 1248 | =item Arguments: None 1249 | 1250 | =item Return value: %environment_vars 1251 | 1252 | =back 1253 | 1254 | Returns a hash with the variables listed above, properly set to use the 1255 | given path as the base directory. 1256 | 1257 | =head2 setup_env_hash 1258 | 1259 | =over 4 1260 | 1261 | =item Arguments: None 1262 | 1263 | =item Return value: None 1264 | 1265 | =back 1266 | 1267 | Constructs the C<%ENV> keys for the given path, by calling 1268 | L. 1269 | 1270 | =head2 setup_local_lib 1271 | 1272 | Constructs the C<%ENV> hash using L, and set up C<@INC>. 1273 | 1274 | =head1 A WARNING ABOUT UNINST=1 1275 | 1276 | Be careful about using local::lib in combination with "make install UNINST=1". 1277 | The idea of this feature is that will uninstall an old version of a module 1278 | before installing a new one. However it lacks a safety check that the old 1279 | version and the new version will go in the same directory. Used in combination 1280 | with local::lib, you can potentially delete a globally accessible version of a 1281 | module while installing the new version in a local place. Only combine "make 1282 | install UNINST=1" and local::lib if you understand these possible consequences. 1283 | 1284 | =head1 LIMITATIONS 1285 | 1286 | =over 4 1287 | 1288 | =item * Directory names with spaces in them are not well supported by the perl 1289 | toolchain and the programs it uses. Pure-perl distributions should support 1290 | spaces, but problems are more likely with dists that require compilation. A 1291 | workaround you can do is moving your local::lib to a directory with spaces 1292 | B you installed all modules inside your local::lib bootstrap. But be 1293 | aware that you can't update or install CPAN modules after the move. 1294 | 1295 | =item * Rather basic shell detection. Right now anything with csh in its name is 1296 | assumed to be a C shell or something compatible, and everything else is assumed 1297 | to be Bourne, except on Win32 systems. If the C environment variable is 1298 | not set, a Bourne-compatible shell is assumed. 1299 | 1300 | =item * Kills any existing PERL_MM_OPT or PERL_MB_OPT. 1301 | 1302 | =item * Should probably auto-fixup CPAN config if not already done. 1303 | 1304 | =item * On VMS and MacOS Classic (pre-OS X), local::lib loads L. 1305 | This means any L version installed in the local::lib will be 1306 | ignored by scripts using local::lib. A workaround for this is using 1307 | C instead of using C directly. 1308 | 1309 | =item * Conflicts with L's C option. 1310 | C uses the C option, as it has more predictable and 1311 | sane behavior. If something attempts to use the C option when running 1312 | a F, L will refuse to run, as the two 1313 | options conflict. This can be worked around by temporarily unsetting the 1314 | C environment variable. 1315 | 1316 | =item * Conflicts with L's C<--prefix> option. Similar to the 1317 | previous limitation, but any C<--prefix> option specified will be ignored. 1318 | This can be worked around by temporarily unsetting the C 1319 | environment variable. 1320 | 1321 | =back 1322 | 1323 | Patches very much welcome for any of the above. 1324 | 1325 | =over 4 1326 | 1327 | =item * On Win32 systems, does not have a way to write the created environment 1328 | variables to the registry, so that they can persist through a reboot. 1329 | 1330 | =back 1331 | 1332 | =head1 TROUBLESHOOTING 1333 | 1334 | If you've configured local::lib to install CPAN modules somewhere in to your 1335 | home directory, and at some point later you try to install a module with C, but it fails with an error like: C and buried within the install log is an 1339 | error saying C<'INSTALL_BASE' is not a known MakeMaker parameter name>, then 1340 | you've somehow lost your updated ExtUtils::MakeMaker module. 1341 | 1342 | To remedy this situation, rerun the bootstrapping procedure documented above. 1343 | 1344 | Then, run C 1345 | 1346 | Finally, re-run C and it should install without problems. 1347 | 1348 | =head1 ENVIRONMENT 1349 | 1350 | =over 4 1351 | 1352 | =item SHELL 1353 | 1354 | =item COMSPEC 1355 | 1356 | local::lib looks at the user's C environment variable when printing out 1357 | commands to add to the shell configuration file. 1358 | 1359 | On Win32 systems, C is also examined. 1360 | 1361 | =back 1362 | 1363 | =head1 SEE ALSO 1364 | 1365 | =over 4 1366 | 1367 | =item * L 1368 | 1369 | =back 1370 | 1371 | =head1 SUPPORT 1372 | 1373 | IRC: 1374 | 1375 | Join #local-lib on irc.perl.org. 1376 | 1377 | =head1 AUTHOR 1378 | 1379 | Matt S Trout http://www.shadowcat.co.uk/ 1380 | 1381 | auto_install fixes kindly sponsored by http://www.takkle.com/ 1382 | 1383 | =head1 CONTRIBUTORS 1384 | 1385 | Patches to correctly output commands for csh style shells, as well as some 1386 | documentation additions, contributed by Christopher Nehren . 1387 | 1388 | Doc patches for a custom local::lib directory, more cleanups in the english 1389 | documentation and a L contributed by 1390 | Torsten Raudssus . 1391 | 1392 | Hans Dieter Pearcey sent in some additional tests for ensuring 1393 | things will install properly, submitted a fix for the bug causing problems with 1394 | writing Makefiles during bootstrapping, contributed an example program, and 1395 | submitted yet another fix to ensure that local::lib can install and bootstrap 1396 | properly. Many, many thanks! 1397 | 1398 | pattern of Freenode IRC contributed the beginnings of the Troubleshooting 1399 | section. Many thanks! 1400 | 1401 | Patch to add Win32 support contributed by Curtis Jewell . 1402 | 1403 | Warnings for missing PATH/PERL5LIB (as when not running interactively) silenced 1404 | by a patch from Marco Emilio Poleggi. 1405 | 1406 | Mark Stosberg provided the code for the now deleted 1407 | '--self-contained' option. 1408 | 1409 | Documentation patches to make win32 usage clearer by 1410 | David Mertens (run4flat). 1411 | 1412 | Brazilian L and minor doc 1413 | patches contributed by Breno G. de Oliveira . 1414 | 1415 | Improvements to stacking multiple local::lib dirs and removing them from the 1416 | environment later on contributed by Andrew Rodland . 1417 | 1418 | Patch for Carp version mismatch contributed by Hakim Cassimally 1419 | . 1420 | 1421 | Rewrite of internals and numerous bug fixes and added features contributed by 1422 | Graham Knop . 1423 | 1424 | =head1 COPYRIGHT 1425 | 1426 | Copyright (c) 2007 - 2013 the local::lib L and L as 1427 | listed above. 1428 | 1429 | =head1 LICENSE 1430 | 1431 | This is free software; you can redistribute it and/or modify it under 1432 | the same terms as the Perl 5 programming language system itself. 1433 | 1434 | =cut 1435 | --------------------------------------------------------------------------------