├── .DS_Store
├── example
    ├── input.bed
    ├── input.txt
    └── input.vcf
├── doc
    ├── .DS_Store
    ├── img
    │   ├── new.png
    │   ├── .DS_Store
    │   └── icages_pipeline.png
    ├── misc
    │   ├── .DS_Store
    │   ├── license.md
    │   ├── credit.md
    │   └── workflow.md
    ├── user-guide
    │   ├── .DS_Store
    │   ├── usage.md
    │   ├── startup.md
    │   ├── download.md
    │   └── example.md
    ├── mkdocs.yml
    └── index.md
├── img
    └── icages_pipeline.png
├── README.md
├── genomeLocusFinder.pl
├── bin
    ├── icagesJson.pl
    ├── icagesGene.pl
    ├── icagesDrug.pl
    ├── icagesMutation.pl
    ├── icagesMutationNew.pl
    └── DGIdb
    │   └── local
    │       └── lib.pm
└── icages.pl


/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/WGLab/icages/HEAD/.DS_Store


--------------------------------------------------------------------------------
/example/input.bed:
--------------------------------------------------------------------------------
1 | chr10	89677000	89690000
2 | chr8	38336000	38353000
3 | 


--------------------------------------------------------------------------------
/doc/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/WGLab/icages/HEAD/doc/.DS_Store


--------------------------------------------------------------------------------
/doc/img/new.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/WGLab/icages/HEAD/doc/img/new.png


--------------------------------------------------------------------------------
/doc/img/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/WGLab/icages/HEAD/doc/img/.DS_Store


--------------------------------------------------------------------------------
/doc/misc/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/WGLab/icages/HEAD/doc/misc/.DS_Store


--------------------------------------------------------------------------------
/doc/user-guide/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/WGLab/icages/HEAD/doc/user-guide/.DS_Store


--------------------------------------------------------------------------------
/img/icages_pipeline.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/WGLab/icages/HEAD/img/icages_pipeline.png


--------------------------------------------------------------------------------
/doc/img/icages_pipeline.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/WGLab/icages/HEAD/doc/img/icages_pipeline.png


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # iCAGES
2 | This is iCAGES command line that prioritizes personalized cancer driver mutations, genes and therapies.
3 | 
4 | Please see http://icages.openbioinformatics.org/en/latest for documentation.
5 | 
6 | 
7 | 


--------------------------------------------------------------------------------
/doc/mkdocs.yml:
--------------------------------------------------------------------------------
 1 | site_name: iCAGES Documentation
 2 | site_url: http://icages.openbioinformatics.org/
 3 | repo_url: http://github.com/WangGenomicsLab/iCAGES/doc/
 4 | repo_name: GitHub
 5 | site_description: Documentation for iCAGES software
 6 | site_author: Coco Dong et. al.
 7 | site_favicon: favicon.ico
 8 | 
 9 | docs_dir: .
10 | include_search: true
11 | #use_absolute_urls: true
12 | #use_directory_urls: false
13 | 
14 | pages:
15 | - iCAGES: index.md
16 | - User Guide:
17 |   - Download and install iCAGES: user-guide/download.md
18 |   - Quick Start-Up Guide: user-guide/startup.md
19 |   - Usage: user-guide/usage.md
20 |   - Examples: user-guide/example.md
21 | - Misc:
22 |   - How it works: misc/workflow.md
23 |   - Credit: misc/credit.md
24 |   - License: misc/license.md
25 | 


--------------------------------------------------------------------------------
/doc/misc/license.md:
--------------------------------------------------------------------------------
1 | ## License Agreement
2 | By using the software, you acknowledge that you agree to the terms below:
3 | 
4 | For academic and non-profit use, you are free to fork, download, modify, distribute and use the software without restriction.
5 | 
6 | For commercial use, you are required to contact [Stevens Institute of Innovation](https://stevens.usc.edu/contact-us/) at USC directly to discuss licensing options.
7 | 
8 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
9 | 


--------------------------------------------------------------------------------
/doc/misc/credit.md:
--------------------------------------------------------------------------------
1 | ## Acknowledgements
2 | 
3 | The iCAGES software is originally designed by Coco Dong, under the mentorship of Dr. Kai Wang. Other developers and significant contributors include Zeyu He, a graduate student from New York University who helped delevelop web interface for iCAGES. Other members from our lab include Yunfei Guo and Hui Yang and many iCAGES users have provided feedbacks, bug reports, code snipets and suggestions to improve the functionality of iCAGES and I am indebted to them for their invaluable help.
4 | 
5 | ## Reference
6 | 
7 | - Chengliang Dong, Yunfei Guo, Hui Yang, Zeyu He, Xiaoming Liu, Kai Wang [**iCAGES: integrated CAncer GEnome Score for comprehensively prioritizing driver genes in personal cancer genomes**] (http://genomemedicine.biomedcentral.com/articles/10.1186/s13073-016-0390-0). Genome Medicine. 2016. DOI: 10.1186/s13073-016-0390-0
8 | 
9 | 


--------------------------------------------------------------------------------
/doc/index.md:
--------------------------------------------------------------------------------
 1 | # iCAGES Documentation
 2 | 
 3 | iCAGES is an efficient software tool to prioritizes personalized cancer driver mutations, genes and drugs. Given input in ANNOVAR input format, VCF format or BED format, iCAGES can generate a list of 
 4 | 
 5 | - iCAGES mutation scores for all point coding variants, non-coding variants and structural variations, which measure genomic cancer driving potential of these variants.
 6 | 
 7 | - iCAGES gene scores for all mutated genes, which measure personal cancer driving potential of these genes.
 8 | 
 9 | - iCAGES drug scores for all potential drugs targeting any gene mutated in this particular patient.
10 | 
11 | Please click the menu items to navigate through this website. The web interface of iCAGES may be previously accessed [here](http://icages.wglab.edu) (the server was shut down by the previous university, so if you want to run the web server, please download it and run yourself from [here](https://github.com/WGLab/icages-server). If you have questions, comments and bug reports, please post them in the Disqus comment form in this website (or email me <coco90417@gmail.com> or my mentor Dr. Kai Wang <kaichop@gmail.com> directly). Thank you very much for your help and support!
12 | 
13 | ---
14 | 
15 | ![new](img/new.png) 2015Feb26: v1.0.0 initial stable release of iCAGES command line is available now! 
16 | 
17 | ![new](img/new.png) 2015Nov27: v1.0.1 stable release of iCAGES command line is available now! 
18 | 
19 | ![new](img/new.png) 2017Jan08: v1.0.2 stable release of iCAGES command line is available now! 
20 | 


--------------------------------------------------------------------------------
/doc/misc/workflow.md:
--------------------------------------------------------------------------------
 1 | ## How iCAGES works
 2 | ![iCAGES pipeline](/img/icages_pipeline.png)
 3 | 
 4 | A flowchart to show the process of iCAGES package. The iCAGES package consists of three layers. The input file contains all variants identified from the patient; it can be either in ANNOVAR input format or in VCF format. The first layer of iCAGES prioritizes mutations. It computes three different feature scores for annotating the gene, including radial SVM score for each of its point coding mutation, CNV normalized peak score for each of its structural variation and FunSeq2 score for each of its point non-coding mutation. The second layer of iCAGES prioritizes cancer driver genes. It takes three features scores from the first layer, generates the corresponding Phenolyzer score for each mutated gene and computes an LR score for this gene (iCAGES gene score). The final level of iCAGES prioritizes targeted drugs. It first queries DGIdb for potential drugs that interact with the patient’ s mutated genes and their neighbors. Next, it calculates the joint probability for each drug being the most effective (iCAGES drug score) from three feature scores, which are iCAGES score for its direct/indirect target, normalized BioSystems probability measuring the maximum relatedness of the drugs’ direct target with each mutated gene (final target) in the patient and PubChem active probability measuring the bioactivity of the drug. The final output of iCAGES consists of three major elements, a prioritized list of mutations, a prioritized list of genes with their iCAGES gene scores as well as a prioritized list of targeted drugs with their iCAGES drug scores, all highlighted in red.
 5 | 
 6 | 
 7 | 
 8 | ---
 9 | 
10 | <div id="disqus_thread"></div>
11 | <script type="text/javascript">
12 | /* * * CONFIGURATION VARIABLES * * */
13 | var disqus_shortname = 'icages';
14 | var disqus_identifier = 'workflow';
15 | var disquss_title = 'iCAGES Workflow';
16 | 
17 | /* * * DON'T EDIT BELOW THIS LINE * * */
18 | (function() {
19 | var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true;
20 | dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js';
21 | (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
22 | })();
23 | </script>
24 | <noscript>Please enable JavaScript to view the <a href="https://disqus.com/?ref_noscript" rel="nofollow">comments powered by Disqus.</a></noscript>
25 | 
26 | 


--------------------------------------------------------------------------------
/doc/user-guide/usage.md:
--------------------------------------------------------------------------------
 1 | ## SYNOPSIS
 2 | 
 3 | icages.pl (input_file) [options]
 4 | 
 5 | ## OPTIONS
 6 | 
 7 | - -h, --help              
 8 | print help message   
 9 | 
10 | - -m, --manual            
11 | print manual message
12 | 
13 | - -t, --tumor (TEXT)      
14 | name of column that contains tumor mutations in your vcf file (if you have multiple samples with tumor mutations, please use this option to select tumor mutations that you want to analyze)
15 | 
16 | - -g, --germline (TEXT)   
17 | name of column that contains germline mutations in your vcf file (if you have multiple samples with germline mutations, please use this option to select germline mutations that you want to compare your tumor mutations against to generate somatic mutation profiles for the sample you want to analyze)
18 | 
19 | - -i, --id (TEXT)         
20 | name of column that contains somatic mutations in your multiple sample vcf file with only somatic mutations (if you have multiple samples with tumor and germline mutations, please use -g and -t options instead)
21 | 
22 | - -s, --subtype (TEXT)    
23 | subtype of the cancer, valid options include "ACC", "BLCA", "BRCA", "CESC", "CHOL", "ESCA", "GBM", "HNSC", "KICH", "KIRC", "KIRP", "LAML", "LGG", "LIHC", "LUSC", "OV", "PAAD", "PCPG", "PRAD", "SARC", "SKCM", "STAD", "TGCT", "TGCA", "THYM", "UCEC", "UCS", "UVM"
24 | 
25 | - --logdir                
26 | directory for log files generated by iCAGES
27 | 
28 | - --tempdir (TEXT)        
29 | directory for temporary files generated by iCAGES
30 | 
31 | - --outputdir (TEXT)      
32 | directory for output files generated by iCAGES
33 | 
34 | - -p, --prefix (TEXT)     
35 | prefix of all files generated by iCAGES
36 | 
37 | - -b, --bed (TEXT)        
38 | additional bed file specifying the location of structural variations in the sample. Note that if you only have bed file for the patient of your interest, please use it as input file directly without using -b option, with the command icages.pl input.bed
39 | 
40 | - --buildver (TEXT)       
41 | reference genome version, valid options include "hg19", "hg38" and "hg18"
42 | 
43 | - -e, --expression (TEXT)                
44 | bed file describing gene expression patterns, the columns are chromosome, start, end, log fold changes
45 | 
46 | ---
47 | 
48 | <div id="disqus_thread"></div>
49 | <script type="text/javascript">
50 | /* * * CONFIGURATION VARIABLES * * */
51 | var disqus_shortname = 'icages';
52 | var disqus_identifier = 'usage';
53 | var disquss_title = 'iCAGES Usage';
54 | 
55 | /* * * DON'T EDIT BELOW THIS LINE * * */
56 | (function() {
57 | var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true;
58 | dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js';
59 | (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
60 | })();
61 | </script>
62 | <noscript>Please enable JavaScript to view the <a href="https://disqus.com/?ref_noscript" rel="nofollow">comments powered by Disqus.</a></noscript>
63 | 
64 | 


--------------------------------------------------------------------------------
/example/input.txt:
--------------------------------------------------------------------------------
  1 | 1	12919840	12919840	T	C
  2 | 1	35332717	35332717	C	A
  3 | 1	55148456	55148456	G	T
  4 | 1	70504789	70504789	C	T
  5 | 1	167059520	167059520	A	T
  6 | 1	182496864	182496864	A	T
  7 | 1	197073351	197073351	C	T
  8 | 1	216373211	216373211	G	T
  9 | 10	37490170	37490170	G	A
 10 | 10	56089432	56089432	A	C
 11 | 10	69957135	69957135	A	T
 12 | 10	125601918	125601918	C	A
 13 | 10	125804220	125804220	C	A
 14 | 10	134649656	134649656	C	A
 15 | 11	5172737	5172737	G	T
 16 | 11	5905898	5905898	G	T
 17 | 11	6942832	6942832	C	A
 18 | 11	16068172	16068172	C	T
 19 | 11	44297060	44297060	G	A
 20 | 11	48328274	48328274	G	C
 21 | 11	89409330	89409330	A	G
 22 | 11	107965601	107965601	T	G
 23 | 11	120180145	120180145	C	A
 24 | 12	32481489	32481489	G	A
 25 | 12	67699695	67699695	G	T
 26 | 12	96883571	96883571	T	C
 27 | 12	119968739	119968739	G	C
 28 | 13	32340125	32340125	C	A
 29 | 13	36521559	36521559	G	A
 30 | 14	20529107	20529107	C	T
 31 | 14	36154270	36154270	T	C
 32 | 14	59112264	59112264	G	C
 33 | 14	63174963	63174963	C	A
 34 | 14	94007023	94007023	A	T
 35 | 14	95557552	95557552	G	A
 36 | 14	105410443	105410443	C	G
 37 | 15	20739706	20739706	T	C
 38 | 15	30054548	30054548	A	G
 39 | 16	20376770	20376770	G	T
 40 | 16	66422268	66422268	C	A
 41 | 17	7125396	7125396	A	G
 42 | 17	21731175	21731175	C	A
 43 | 17	26958542	26958542	C	T
 44 | 17	41836054	41836054	C	T
 45 | 18	14852153	14852153	G	T
 46 | 18	55319937	55319937	G	A
 47 | 18	58039082	58039082	C	A
 48 | 18	61170854	61170854	C	T
 49 | 19	13249152	13249152	G	T
 50 | 19	52938422	52938422	C	A
 51 | 19	53770704	53770704	C	G
 52 | 19	53911857	53911857	C	T
 53 | 19	57286834	57286834	A	T
 54 | 2	84940277	84940277	G	T
 55 | 2	112945031	112945031	A	T
 56 | 2	136552307	136552307	T	G
 57 | 2	157332699	157332699	G	T
 58 | 2	179588352	179588352	C	A
 59 | 2	187627271	187627271	G	T
 60 | 2	189863400	189863400	G	T
 61 | 2	189910616	189910616	A	G
 62 | 2	201533514	201533514	T	C
 63 | 2	238245127	238245127	C	A
 64 | 20	23473695	23473695	C	G
 65 | 20	29956465	29956465	G	A
 66 | 20	47266080	47266080	G	C
 67 | 22	30661003	30661003	G	C
 68 | 3	10157495	10157495	A	G
 69 | 3	11064098	11064098	A	T
 70 | 3	13413467	13413467	C	A
 71 | 3	51690015	51690015	C	T
 72 | 3	52417509	52417509	G	T
 73 | 3	63821060	63821060	T	A
 74 | 3	100087931	100087931	T	C
 75 | 3	112998777	112998777	C	T
 76 | 3	118621763	118621763	G	T
 77 | 4	13617077	13617077	C	A
 78 | 4	20533629	20533629	A	G
 79 | 4	147830310	147830310	G	C
 80 | 4	173269688	173269688	T	A
 81 | 5	19483542	19483542	G	T
 82 | 5	37326041	37326041	C	A
 83 | 5	55110949	55110949	C	T
 84 | 5	75596562	75596562	C	A
 85 | 5	140536091	140536091	C	G
 86 | 5	140616128	140616128	C	G
 87 | 5	140723876	140723876	G	T
 88 | 5	159686516	159686516	G	A
 89 | 6	7368953	7368953	G	C
 90 | 6	49808692	49808692	G	C
 91 | 6	109752383	109752383	C	T
 92 | 6	137329736	137329736	C	T
 93 | 7	47867026	47867026	G	T
 94 | 7	69583198	69583198	C	A
 95 | 7	93518498	93518498	G	A
 96 | 7	149475905	149475905	C	A
 97 | 8	3076901	3076901	G	A
 98 | 8	11995670	11995670	G	T
 99 | 8	37704629	37704629	C	T
100 | 8	71068967	71068967	C	A
101 | 8	77764912	77764912	G	C
102 | 8	77775920	77775920	C	A
103 | 8	79510674	79510674	A	C
104 | 8	88363968	88363968	C	A
105 | 8	89179974	89179974	G	A
106 | 8	105503431	105503431	C	A
107 | 8	118819526	118819526	G	A
108 | 8	118831993	118831993	C	A
109 | 9	17409334	17409334	A	G
110 | 9	21201831	21201831	G	T
111 | 9	43627065	43627065	C	T
112 | 9	79325065	79325065	C	A
113 | 9	104171019	104171019	A	T
114 | 9	105767273	105767273	C	A
115 | X	47426121	47426121	C	G
116 | X	48213490	48213490	C	A
117 | X	54020144	54020144	G	T
118 | X	55249114	55249114	G	T
119 | X	64140039	64140039	T	A
120 | X	71802303	71802303	A	T
121 | X	75004601	75004601	C	G
122 | X	76939716	76939716	G	A
123 | X	78216983	78216983	T	A
124 | X	100513426	100513426	G	T
125 | X	118717207	118717207	C	T
126 | X	130416606	130416606	G	T
127 | X	134305608	134305608	G	A
128 | X	148037166	148037166	G	T
129 | X	38280273	38280274	CA	CA
130 | 12	29908693	29908693	C	-
131 | 19	53612719	53612719	-	T
132 | 2	229997		229997		-	GAT
133 | 6	7886214		7886216		GAA	-
134 | X	56592088	56592088	-	T
135 | 


--------------------------------------------------------------------------------
/doc/user-guide/startup.md:
--------------------------------------------------------------------------------
  1 | ## Important notice
  2 | 
  3 | As iCAGES depends on modules to call web API of DGIdb, please make sure that your PC/Mac is connected to the Internet. Thanks!
  4 | 
  5 | ## ANNOVAR input 
  6 | 
  7 | For beginners, the easiest way to use iCAGES is to annotate somatic mutations in [ANNOVAR](http://annovar.openbioinformatics.org/) input format with reference genome version hg19. This exemplary input file is provided in iCAGES package.
  8 | ```
  9 | [cocodong@biocluster ~/]$ head input.txt 
 10 | 1	12919840	12919840	T	C
 11 | 1	35332717	35332717	C	A
 12 | 1	55148456	55148456	G	T
 13 | 1	70504789	70504789	C	T
 14 | 1	167059520	167059520	A	T
 15 | 1	182496864	182496864	A	T
 16 | 1	197073351	197073351	C	T
 17 | 1	216373211	216373211	G	T
 18 | 10	37490170	37490170	G	A
 19 | 10	56089432	56089432	A	C
 20 | [cocodong@biocluster ~/]$ icages.pl input.txt
 21 | 
 22 | ```
 23 | 
 24 | ## VCF input with one sample with his/her somatic mutations only
 25 | 
 26 | If you have somatic mutations for one sample in VCF file format with reference genome version hg19, iCAGES can automaticaly detect the input format and analyze your data. This exemplary input file is also provided in iCAGES package.
 27 | ```
 28 | [cocodong@biocluster ~/]$ cat input.vcf
 29 | ##fileformat=VCFv4.1
 30 | ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
 31 | ##contig=<ID=1,length=249250621,assembly=b37>
 32 | ##contig=<ID=2,length=243199373,assembly=b37>
 33 | ##contig=<ID=3,length=198022430,assembly=b37>
 34 | ##contig=<ID=4,length=191154276,assembly=b37>
 35 | ##contig=<ID=5,length=180915260,assembly=b37>
 36 | ##contig=<ID=6,length=171115067,assembly=b37>
 37 | ##contig=<ID=7,length=159138663,assembly=b37>
 38 | ##contig=<ID=8,length=146364022,assembly=b37>
 39 | ##contig=<ID=9,length=141213431,assembly=b37>
 40 | ##contig=<ID=10,length=135534747,assembly=b37>
 41 | ##contig=<ID=11,length=135006516,assembly=b37>
 42 | ##contig=<ID=12,length=133851895,assembly=b37>
 43 | ##contig=<ID=13,length=115169878,assembly=b37>
 44 | ##contig=<ID=14,length=107349540,assembly=b37>
 45 | ##contig=<ID=15,length=102531392,assembly=b37>
 46 | ##contig=<ID=16,length=90354753,assembly=b37>
 47 | ##contig=<ID=17,length=81195210,assembly=b37>
 48 | ##contig=<ID=18,length=78077248,assembly=b37>
 49 | ##contig=<ID=19,length=59128983,assembly=b37>
 50 | ##contig=<ID=20,length=63025520,assembly=b37>
 51 | ##contig=<ID=21,length=48129895,assembly=b37>
 52 | ##contig=<ID=22,length=51304566,assembly=b37>
 53 | ##contig=<ID=X,length=155270560,assembly=b37>
 54 | ##contig=<ID=Y,length=59373566,assembly=b37>
 55 | ##contig=<ID=MT,length=16569,assembly=b37>
 56 | #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  tumor
 57 | 1       12919840        .       T       C       .       .       .       GT      1|1
 58 | 1       35332717        .       C       A       .       .       .       GT      1|1
 59 | 1       55148456        .       G       T       .       .       .       GT      1|1
 60 | 1       70504789        .       C       T       .       .       .       GT      1|1
 61 | 1       167059520       .       A       T       .       .       .       GT      1|1
 62 | 1       182496864       .       A       T       .       .       .       GT      1|1
 63 | 1       197073351       .       C       T       .       .       .       GT      1|1
 64 | 1       216373211       .       G       T       .       .       .       GT      1|1
 65 | 10      37490170        .       G       A       .       .       .       GT      1|1
 66 | 10      56089432        .       A       C       .       .       .       GT      1|1
 67 | ...
 68 | [cocodong@biocluster ~/]$ icages.pl input.vcf
 69 | ```
 70 | 
 71 | ## BED input with one sample with his/her somatic structural variations only
 72 | 
 73 | If you only have BED files for all structural variations detected in this patient with reference genome version hg19, iCAGES can also automatically detect the input format of your data and carry on downstream analysis. This exemplary input file is also provided in iCAGES package.
 74 | 
 75 | ```
 76 | [cocodong@biocluster ~/]$ head input.bed
 77 | chr12	85865797	85887628
 78 | chr20	15052592	15071191
 79 | chr16	87340388	87349798
 80 | chr2		213000509	213007522
 81 | [cocodong@biocluster ~/]$ icages.pl input.bed
 82 | ```
 83 | 
 84 | 
 85 | 
 86 | ---
 87 | 
 88 | <div id="disqus_thread"></div>
 89 | <script type="text/javascript">
 90 | /* * * CONFIGURATION VARIABLES * * */
 91 | var disqus_shortname = 'icages';
 92 | var disqus_identifier = 'start';
 93 | var disquss_title = 'iCAGES Start Up Guide';
 94 | 
 95 | /* * * DON'T EDIT BELOW THIS LINE * * */
 96 | (function() {
 97 | var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true;
 98 | dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js';
 99 | (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
100 | })();
101 | </script>
102 | <noscript>Please enable JavaScript to view the <a href="https://disqus.com/?ref_noscript" rel="nofollow">comments powered by Disqus.</a></noscript>
103 | 
104 | 


--------------------------------------------------------------------------------
/example/input.vcf:
--------------------------------------------------------------------------------
  1 | ##fileformat=VCFv4.1
  2 | ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
  3 | ##contig=<ID=1,length=249250621,assembly=b37>
  4 | ##contig=<ID=2,length=243199373,assembly=b37>
  5 | ##contig=<ID=3,length=198022430,assembly=b37>
  6 | ##contig=<ID=4,length=191154276,assembly=b37>
  7 | ##contig=<ID=5,length=180915260,assembly=b37>
  8 | ##contig=<ID=6,length=171115067,assembly=b37>
  9 | ##contig=<ID=7,length=159138663,assembly=b37>
 10 | ##contig=<ID=8,length=146364022,assembly=b37>
 11 | ##contig=<ID=9,length=141213431,assembly=b37>
 12 | ##contig=<ID=10,length=135534747,assembly=b37>
 13 | ##contig=<ID=11,length=135006516,assembly=b37>
 14 | ##contig=<ID=12,length=133851895,assembly=b37>
 15 | ##contig=<ID=13,length=115169878,assembly=b37>
 16 | ##contig=<ID=14,length=107349540,assembly=b37>
 17 | ##contig=<ID=15,length=102531392,assembly=b37>
 18 | ##contig=<ID=16,length=90354753,assembly=b37>
 19 | ##contig=<ID=17,length=81195210,assembly=b37>
 20 | ##contig=<ID=18,length=78077248,assembly=b37>
 21 | ##contig=<ID=19,length=59128983,assembly=b37>
 22 | ##contig=<ID=20,length=63025520,assembly=b37>
 23 | ##contig=<ID=21,length=48129895,assembly=b37>
 24 | ##contig=<ID=22,length=51304566,assembly=b37>
 25 | ##contig=<ID=X,length=155270560,assembly=b37>
 26 | ##contig=<ID=Y,length=59373566,assembly=b37>
 27 | ##contig=<ID=MT,length=16569,assembly=b37>
 28 | #CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	tumor
 29 | 1	12919840	.	T	C	.	.	.	GT	1|1
 30 | 1	35332717	.	C	A	.	.	.	GT	1|1
 31 | 1	55148456	.	G	T	.	.	.	GT	1|1
 32 | 1	70504789	.	C	T	.	.	.	GT	1|1
 33 | 1	167059520	.	A	T	.	.	.	GT	1|1
 34 | 1	182496864	.	A	T	.	.	.	GT	1|1
 35 | 1	197073351	.	C	T	.	.	.	GT	1|1
 36 | 1	216373211	.	G	T	.	.	.	GT	1|1
 37 | 10	37490170	.	G	A	.	.	.	GT	1|1
 38 | 10	56089432	.	A	C	.	.	.	GT	1|1
 39 | 10	69957135	.	A	T	.	.	.	GT	1|1
 40 | 10	125601918	.	C	A	.	.	.	GT	1|1
 41 | 10	125804220	.	C	A	.	.	.	GT	1|1
 42 | 10	134649656	.	C	A	.	.	.	GT	1|1
 43 | 11	5172737	.	G	T	.	.	.	GT	1|1
 44 | 11	5905898	.	G	T	.	.	.	GT	1|1
 45 | 11	6942832	.	C	A	.	.	.	GT	1|1
 46 | 11	16068172	.	C	T	.	.	.	GT	1|1
 47 | 11	44297060	.	G	A	.	.	.	GT	1|1
 48 | 11	48328274	.	G	C	.	.	.	GT	1|1
 49 | 11	89409330	.	A	G	.	.	.	GT	1|1
 50 | 11	107965601	.	T	G	.	.	.	GT	1|1
 51 | 11	120180145	.	C	A	.	.	.	GT	1|1
 52 | 12	32481489	.	G	A	.	.	.	GT	1|1
 53 | 12	67699695	.	G	T	.	.	.	GT	1|1
 54 | 12	96883571	.	T	C	.	.	.	GT	1|1
 55 | 12	119968739	.	G	C	.	.	.	GT	1|1
 56 | 13	32340125	.	C	A	.	.	.	GT	1|1
 57 | 13	36521559	.	G	A	.	.	.	GT	1|1
 58 | 14	20529107	.	C	T	.	.	.	GT	1|1
 59 | 14	36154270	.	T	C	.	.	.	GT	1|1
 60 | 14	59112264	.	G	C	.	.	.	GT	1|1
 61 | 14	63174963	.	C	A	.	.	.	GT	1|1
 62 | 14	94007023	.	A	T	.	.	.	GT	1|1
 63 | 14	95557552	.	G	A	.	.	.	GT	1|1
 64 | 14	105410443	.	C	G	.	.	.	GT	1|1
 65 | 15	20739706	.	T	C	.	.	.	GT	1|1
 66 | 15	30054548	.	A	G	.	.	.	GT	1|1
 67 | 16	20376770	.	G	T	.	.	.	GT	1|1
 68 | 16	66422268	.	C	A	.	.	.	GT	1|1
 69 | 17	7125396	.	A	G	.	.	.	GT	1|1
 70 | 17	21731175	.	C	A	.	.	.	GT	1|1
 71 | 17	26958542	.	C	T	.	.	.	GT	1|1
 72 | 17	41836054	.	C	T	.	.	.	GT	1|1
 73 | 18	14852153	.	G	T	.	.	.	GT	1|1
 74 | 18	55319937	.	G	A	.	.	.	GT	1|1
 75 | 18	58039082	.	C	A	.	.	.	GT	1|1
 76 | 18	61170854	.	C	T	.	.	.	GT	1|1
 77 | 19	13249152	.	G	T	.	.	.	GT	1|1
 78 | 19	52938422	.	C	A	.	.	.	GT	1|1
 79 | 19	53770704	.	C	G	.	.	.	GT	1|1
 80 | 19	53911857	.	C	T	.	.	.	GT	1|1
 81 | 19	57286834	.	A	T	.	.	.	GT	1|1
 82 | 2	84940277	.	G	T	.	.	.	GT	1|1
 83 | 2	112945031	.	A	T	.	.	.	GT	1|1
 84 | 2	136552307	.	T	G	.	.	.	GT	1|1
 85 | 2	157332699	.	G	T	.	.	.	GT	1|1
 86 | 2	179588352	.	C	A	.	.	.	GT	1|1
 87 | 2	187627271	.	G	T	.	.	.	GT	1|1
 88 | 2	189863400	.	G	T	.	.	.	GT	1|1
 89 | 2	189910616	.	A	G	.	.	.	GT	1|1
 90 | 2	201533514	.	T	C	.	.	.	GT	1|1
 91 | 2	238245127	.	C	A	.	.	.	GT	1|1
 92 | 20	23473695	.	C	G	.	.	.	GT	1|1
 93 | 20	29956465	.	G	A	.	.	.	GT	1|1
 94 | 20	47266080	.	G	C	.	.	.	GT	1|1
 95 | 22	30661003	.	G	C	.	.	.	GT	1|1
 96 | 3	10157495	.	A	G	.	.	.	GT	1|1
 97 | 3	11064098	.	A	T	.	.	.	GT	1|1
 98 | 3	13413467	.	C	A	.	.	.	GT	1|1
 99 | 3	51690015	.	C	T	.	.	.	GT	1|1
100 | 3	52417509	.	G	T	.	.	.	GT	1|1
101 | 3	63821060	.	T	A	.	.	.	GT	1|1
102 | 3	100087931	.	T	C	.	.	.	GT	1|1
103 | 3	112998777	.	C	T	.	.	.	GT	1|1
104 | 3	118621763	.	G	T	.	.	.	GT	1|1
105 | 4	13617077	.	C	A	.	.	.	GT	1|1
106 | 4	20533629	.	A	G	.	.	.	GT	1|1
107 | 4	147830310	.	G	C	.	.	.	GT	1|1
108 | 4	173269688	.	T	A	.	.	.	GT	1|1
109 | 5	19483542	.	G	T	.	.	.	GT	1|1
110 | 5	37326041	.	C	A	.	.	.	GT	1|1
111 | 5	55110949	.	C	T	.	.	.	GT	1|1
112 | 5	75596562	.	C	A	.	.	.	GT	1|1
113 | 5	140536091	.	C	G	.	.	.	GT	1|1
114 | 5	140616128	.	C	G	.	.	.	GT	1|1
115 | 5	140723876	.	G	T	.	.	.	GT	1|1
116 | 5	159686516	.	G	A	.	.	.	GT	1|1
117 | 6	7368953	.	G	C	.	.	.	GT	1|1
118 | 6	49808692	.	G	C	.	.	.	GT	1|1
119 | 6	109752383	.	C	T	.	.	.	GT	1|1
120 | 6	137329736	.	C	T	.	.	.	GT	1|1
121 | 7	47867026	.	G	T	.	.	.	GT	1|1
122 | 7	69583198	.	C	A	.	.	.	GT	1|1
123 | 7	93518498	.	G	A	.	.	.	GT	1|1
124 | 7	149475905	.	C	A	.	.	.	GT	1|1
125 | 8	3076901	.	G	A	.	.	.	GT	1|1
126 | 8	11995670	.	G	T	.	.	.	GT	1|1
127 | 8	37704629	.	C	T	.	.	.	GT	1|1
128 | 8	71068967	.	C	A	.	.	.	GT	1|1
129 | 8	77764912	.	G	C	.	.	.	GT	1|1
130 | 8	77775920	.	C	A	.	.	.	GT	1|1
131 | 8	79510674	.	A	C	.	.	.	GT	1|1
132 | 8	88363968	.	C	A	.	.	.	GT	1|1
133 | 8	89179974	.	G	A	.	.	.	GT	1|1
134 | 8	105503431	.	C	A	.	.	.	GT	1|1
135 | 8	118819526	.	G	A	.	.	.	GT	1|1
136 | 8	118831993	.	C	A	.	.	.	GT	1|1
137 | 9	17409334	.	A	G	.	.	.	GT	1|1
138 | 9	21201831	.	G	T	.	.	.	GT	1|1
139 | 9	43627065	.	C	T	.	.	.	GT	1|1
140 | 9	79325065	.	C	A	.	.	.	GT	1|1
141 | 9	104171019	.	A	T	.	.	.	GT	1|1
142 | 9	105767273	.	C	A	.	.	.	GT	1|1
143 | X	47426121	.	C	G	.	.	.	GT	1|1
144 | X	48213490	.	C	A	.	.	.	GT	1|1
145 | X	54020144	.	G	T	.	.	.	GT	1|1
146 | X	55249114	.	G	T	.	.	.	GT	1|1
147 | X	64140039	.	T	A	.	.	.	GT	1|1
148 | X	71802303	.	A	T	.	.	.	GT	1|1
149 | X	75004601	.	C	G	.	.	.	GT	1|1
150 | X	76939716	.	G	A	.	.	.	GT	1|1
151 | X	78216983	.	T	A	.	.	.	GT	1|1
152 | X	100513426	.	G	T	.	.	.	GT	1|1
153 | X	118717207	.	C	T	.	.	.	GT	1|1
154 | X	130416606	.	G	T	.	.	.	GT	1|1
155 | X	134305608	.	G	A	.	.	.	GT	1|1
156 | X	148037166	.	G	T	.	.	.	GT	1|1
157 | X	38280273	38280274	CA	CA	.	.	.	GT	1|1
158 | 12	29908693	29908693	C	-	.	.	.	GT	1|1
159 | 19	53612719	53612719	-	T	.	.	.	GT	1|1
160 | 2	229997	229997	-	GAT	.	.	.	GT	1|1
161 | 6	7886214	7886216	GAA	-	.	.	.	GT	1|1
162 | X	56592088	56592088	-	T	.	.	.	GT	1|1
163 | 


--------------------------------------------------------------------------------
/doc/user-guide/download.md:
--------------------------------------------------------------------------------
  1 | ## iCAGES main package
  2 | 
  3 | Please join the iCAGES mailing list at google groups [here](https://groups.google.com/forum/?hl=en#!forum/icages) to receive announcements on software updates.
  4 | 
  5 | The latest version of iCAGES (2017Jan08) can be downloaded [here](https://github.com/WGLab/icages/releases/tag/v1.0.2).
  6 | 
  7 | iCAGES is written in Perl and can be run as a standalone application on diverse hardware systems where standard Perl modules are installed.
  8 | 
  9 | ## Download
 10 | 
 11 | ```
 12 | wget https://github.com/WGLab/icages/archive/refs/tags/v1.0.2.tar.gz
 13 | ```
 14 | 
 15 | ## Installation
 16 | 
 17 | - Unzip downloaded file
 18 | 
 19 | ```
 20 | tar -zxvf icages-(version).tar.gz
 21 | mv icages-(version) icages
 22 | ```
 23 | 
 24 | - Download and unzip database files
 25 | 
 26 | ```
 27 | cd icages/
 28 | wget http://www.openbioinformatics.org/annovar/download/icages/db.tar.gz
 29 | tar -zxvf db.tar.gz
 30 | ```
 31 | 
 32 | The `db.tar.gz` file contains resources for hg19 genome coordinate.
 33 | 
 34 | You can also get the file from
 35 | http://www.openbioinformatics.org/annovar/download/icages/hg18_db.tar.gz
 36 | and http://www.openbioinformatics.org/annovar/download/icages/hg38_db.tar.gz depending on the genome build. (Note: due to shortage of storage space, these two files are taken offline; if you need them, please contact the PI to set up a temporary link to download).
 37 | 
 38 | 
 39 | - Install necessary packages for perl. If you have root access, please use cpanm command to download JSON, HTTP::Request and LWP packages for perl
 40 | 
 41 | ```
 42 | cpanm JSON
 43 | cpanm HTTP::Request
 44 | cpanm LWP
 45 | ```
 46 | 
 47 | - Install the first dependency for iCAGES, ANNOVAR. Please visit [ANNOVAR](http://www.openbioinformatics.org/annovar/annovar_download.html) website and download it. If your current direcotry is icages, then please move annovar/ directory to ./bin diretory 
 48 | ```
 49 | mv path-to-annovar/annovar/ ./bin/
 50 | ```
 51 | 
 52 | - Install the second dependency for iCAGES, DGIdb. If your current directory is icages, then please create a directory under ./bin directory and name it DGIdb.Please visit [DGIdb](http://dgidb.genome.wustl.edu/) to read about it and download download the corresponding perl script from [here](wget https://raw.github.com/genome/dgi-db/master/files/perl_example.pl) to ./bin/DGIdb directory
 53 | 
 54 | ```
 55 | mkdir ./bin/DGIdb
 56 | wget https://raw.github.com/genome/dgi-db/master/files/perl_example.pl -O ./bin/DGIdb/getDrugList.pl
 57 | ```
 58 | 
 59 | - Please make some modifications of this get_DrugList.pl file. First, add this following line after "parse_opts();" 
 60 | 
 61 | ``` 
 62 | my $output;
 63 | open (OUT, ">$output") or die "iCAGES: cannot open file $output for writing the drugs recommended for cancer driver genes\n";
 64 | ```
 65 | 
 66 | - Then, add this following line after "'help' => \$help,"
 67 | 
 68 | ```
 69 | 'output:s'    => \$output 
 70 | ```
 71 | 
 72 | - Next, comment out this following line 
 73 | 
 74 | ```
 75 | print "gene_name\tdrug_name\tinteraction_type\tsource\tgene_categories\n";
 76 | ```
 77 | into
 78 | ```
 79 | # print "gene_name\tdrug_name\tinteraction_type\tsource\tgene_categories\n";
 80 | ```
 81 | 
 82 | - Next, change this following line
 83 | 
 84 | ```
 85 | print "$gene_name\t$drug_name\t$interaction_type\t$source\t$gene_categories\n"; 
 86 | ```
 87 | into
 88 | ```
 89 | print OUT "$gene_name\t$drug_name\t$interaction_type\t$source\t$gene_categories\n"; 
 90 | ```
 91 | 
 92 | - And change this following lines
 93 | 
 94 | ```
 95 | print "\n" . 'Unmatched search term: ', $_->{searchTerm}, "\n";
 96 | print 'Possible suggestions: ', join(",", @{$_->{suggestions}}), "\n";
 97 | ```
 98 | into 
 99 | ```
100 | print OUT "\n" . 'Unmatched search term: ', $_->{searchTerm}, "\n";
101 | print OUT 'Possible suggestions: ', join(",", @{$_->{suggestions}}), "\n";
102 | ```
103 | 
104 | - Install the third dependency for iCAGES, vcftools. Asuming you are already in icages-(version)/bin/ directory, download vcftools through sourceforge
105 | 
106 | ```
107 | wget http://iweb.dl.sourceforge.net/project/vcftools/vcftools_0.1.12b.tar.gz
108 | tar -zxvf vcftools_0.1.12b.tar.gz 
109 | mv vcftools_0.1.12b/ vcftools/
110 | rm vcftools_0.1.12b.tar.gz
111 | ```
112 | 
113 | - Install vcftools by typing the following command
114 | 
115 | ```
116 | cd vcftools
117 | make
118 | ```
119 | 
120 | - Install the fourth dependency for iCAGES, bedtools. Asuming you are already in icages-(version)/bin/ directory,
121 | 
122 | ```
123 | wget https://codeload.github.com/arq5x/bedtools2/tar.gz/v2.25.0
124 | tar -zxvf v2.25.0.tar.gz
125 | mv bedtools2-2.25.0 bedtools
126 | rm v2.25.0.tar.gz
127 | cd bedtools
128 | make
129 | ```
130 | 
131 | ## Additional databases
132 | 
133 | Initial databases for iCAGES only includes hg19 reference genome for human. In order to annotate variants with hg18 or hg38 reference genomes, please download these additional databases compiled for these two versions of references. (Note: due to shortage of storage space, these two files are taken offline; if you need them, please contact the PI to set up a temporary link to download).
134 | 
135 | - hg18
136 | 
137 | ```
138 | cd icages/db/
139 | wget http://www.openbioinformatics.org/annovar/download/icages/hg18_db.tar.gz
140 | tar -zxvf hg18_db.tar.gz
141 | ```
142 | 
143 | - hg38
144 | 
145 | ```
146 | cd icages/db/
147 | wget http://www.openbioinformatics.org/annovar/download/icages/hg38_db.tar.gz
148 | tar -zxvf hg38_db.tar.gz
149 | ```
150 | 
151 | 
152 | 
153 | ---
154 | 
155 | <div id="disqus_thread"></div>
156 | <script type="text/javascript">
157 | /* * * CONFIGURATION VARIABLES * * */
158 | var disqus_shortname = 'icages';
159 | var disqus_identifier = 'download';
160 | var disquss_title = 'iCAGES Download';
161 | 
162 | /* * * DON'T EDIT BELOW THIS LINE * * */
163 | (function() {
164 | var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true;
165 | dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js';
166 | (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
167 | })();
168 | </script>
169 | <noscript>Please enable JavaScript to view the <a href="https://disqus.com/?ref_noscript" rel="nofollow">comments powered by Disqus.</a></noscript>
170 | 
171 | 


--------------------------------------------------------------------------------
/genomeLocusFinder.pl:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/perl
  2 | use strict;
  3 | use warnings;
  4 | use Fcntl qw/SEEK_END SEEK_SET/;
  5 | use Data::Dumper;
  6 | #use Time::HiRes qw/gettimeofday tv_interval/;
  7 | 
  8 | my $BINSIZE = 10; #bin size for indexing
  9 | my $ENDOFFILE = "END_OF_GENOMELOCUSFINDER_INDEX";
 10 | my $DEBUG = 0;
 11 | my $suffix = "binsize${BINSIZE}gidx";
 12 | my $usage = "Usage: $0 
 13 | index	index a sorted tab-delmited text file
 14 | find 	retrieve a record for some loci\n";
 15 | die $usage unless @ARGV >= 1;
 16 | my $subprogram = shift @ARGV;
 17 | if ($subprogram eq 'index') {
 18 |     &index(@ARGV);
 19 | } elsif ($subprogram eq 'find') {
 20 |     &find(@ARGV);
 21 | } else {
 22 |     die $usage;
 23 | }
 24 | 
 25 | ##########WARNING##################################################################
 26 | #typical use cases for this design
 27 | #small vocabulary in 1st column
 28 | #large database file
 29 | #large query file
 30 | #only a few annotations for each genomic locus
 31 | #not many overlappic loci (e.g. no chr1:1-10, chr1:1-100, chr1:1-1000 etc)
 32 | ####MIGHT NOT WORK WELL UNDER OTHER CIRCUMSTATNCES################################
 33 | ##################SUBROUTINES#######################
 34 | sub index {
 35 |     #generate hierarchical index for a sorted tab-delimited txt file
 36 |     my $usage = "$0 index <fasta index file> <sorted BED file>\n";
 37 |     die $usage unless @_ == 2;
 38 |     warn "Start indexing...\n";
 39 | #    my $t0 = [gettimeofday];
 40 |     my $fai = shift;
 41 |     my $in = shift;
 42 |     my $out = "$in.$suffix";
 43 |     #input format
 44 |     #1	10001	10001	T	A	0.18521432
 45 |     #types of binary representation: Q: unsigned 64-bit integer, 8 bytes
 46 |     #index layout
 47 |     #############
 48 |     #bin index: 8-byte unsigned integer marking start of each $BINSIZE bp bin in the file being indexed
 49 |     #key index: (number of keys)*8 bytes of unsigned integers marking start position of each bin index 
 50 |     #		section and length associated with each key (for chromosomes, this is equivalent to chr
 51 |     #		length), the order of key indeces is same as the order of vocabulary
 52 |     #vocabulary: a list of keys separated by tabs 
 53 |     #offset: 8 bytes unsiged integer, mark total bytes for vocabulary
 54 |     #######################An example index file################################
 55 |     #-----------------------------------------------------------------
 56 |     #(bin index section)
 57 |     #(chr1 bins)
 58 |     #0x0000 0000 0000 0001
 59 |     #0x0000 0000 0000 00FA
 60 |     #...
 61 |     #(chr2 bins)
 62 |     #...
 63 |     #-----------------------------------------------------------------
 64 |     #(key index section)
 65 |     #(chr1 bin start in this index)0x00000001 ||(chr1 length)0xE9A6740
 66 |     #(chr2 bin start in this index)0x000000EB ||(chr2 length)0xE7EED8D
 67 |     #...
 68 |     #-----------------------------------------------------------------
 69 |     #(vocabulary section)
 70 |     #chr1\tchr2\t...
 71 |     #-----------------------------------------------------------------
 72 |     #(the offset)0x000010AE
 73 |     #-----------------------------------------------------------------
 74 |     my %key_length = &readFastaIndex($fai);
 75 |     my $vocabulary_string;
 76 |     my %key_start;
 77 |     my $offset;
 78 | 
 79 |     open IN,'<',$in or die "$in: $!";
 80 |     open OUT,'>',$out or die "$out: $!";
 81 |     binmode OUT; #index is in binary format for space efficiency
 82 | 
 83 |     my $current_db_position = 0; #current position in database file
 84 |     my $count_total = 0;
 85 |     my $previous_key;
 86 |     my ($previous_start, $previous_end);
 87 |     my $walker = 0; #measures how far we have walked on each chr, in unit of $BINSIZE
 88 |     #every time we pass $BINSIZE, record position
 89 |     #reset for each new chr
 90 |     while (<IN>) {
 91 | 	$count_total++;
 92 | #	warn "NOTICE: $count_total records processed in ".tv_interval($t0)." seconds\n" if $count_total % 1_000_000 == 0;
 93 | 	#example database record
 94 | 	#0	1	2	3	4	5
 95 | 	#1	10001	10001	T	A	0.18521432
 96 | 	#assume database is sorted
 97 | 	chomp;
 98 | 	my @f = split /\t/;
 99 | 	die "ERROR: expect at least 5 fields at line $. of $in\n" unless @f >= 5;
100 | 	die "ERROR: Database $in not sorted in ascending order at line $. of $in.\n" if defined $previous_start and defined $previous_key and
101 | 	$previous_key eq $f[0] and (($f[1] < $previous_start) or ($previous_start == $f[1] and $f[2] < $previous_end));
102 | 	die "ERROR: start larger than end at line $. of $in\n" unless $f[2] >= $f[1];
103 | 	die "ERROR: range out of predefined bounds of $f[0] at line $. of $in.\n"
104 | 	if $key_length{$f[0]} < $f[1] or $key_length{$f[0]} < $f[2];
105 | 
106 | 	if (!defined $previous_key or $previous_key ne $f[0]) {
107 | 	    #we got a new key
108 | 	    $key_start{$f[0]} = tell OUT;
109 | 	    if (defined $previous_key) {
110 | 		#modify length according to how far we have walked
111 | 		$key_length{$previous_key} = $walker;
112 | 	    }
113 | 	    $walker = 0;
114 | 	}
115 | 
116 | 	while ($f[1] >= $walker) {
117 | 	    print $current_db_position,"\n" if $DEBUG;
118 | 	    print OUT pack("Q",$current_db_position);
119 | 	    $walker += $BINSIZE;
120 | 	}
121 | 
122 | 	$current_db_position = tell IN;
123 | 	$previous_key = $f[0];
124 | 	($previous_start,$previous_end) = @f[1,2];
125 |     }
126 |     close IN;
127 |     #record last key's end position
128 |     $key_length{$previous_key} = $walker;
129 | 
130 |     #print out key starts and lengths
131 |     for my $key(sort keys %key_start) {
132 | 	print OUT pack("Q",$key_start{$key});
133 | 	print OUT pack("Q",$key_length{$key});
134 | 	print $key_start{$key},"\n" if $DEBUG;
135 | 	print $key_length{$key},"\n" if $DEBUG;
136 |     }
137 |     #print vocabulary
138 |     $vocabulary_string = join("\t",sort keys %key_start);
139 |     print OUT pack("A".length($vocabulary_string), $vocabulary_string);
140 |     print $vocabulary_string,"\n" if $DEBUG;
141 | 
142 |     #print offset
143 |     $offset = length($vocabulary_string);
144 |     print OUT pack("Q", $offset);
145 |     print $offset,"\n" if $DEBUG;
146 | 
147 |     #print EOF mark
148 |     print OUT pack("A".length($ENDOFFILE), $ENDOFFILE);
149 |     close OUT;
150 | 
151 |     warn "Indexing done. Bin size: $BINSIZE. Assume ASCII Encoding for characters.\n";
152 | #   warn "Processed $count_total records in ".tv_interval($t0)." seconds\n";
153 | }
154 | sub find {
155 |     #retrieve recording overlapping with queries
156 |     my $usage = "$0 find <database> <query file> <database name> <output file>\n";
157 |     die $usage unless @_ == 4;
158 | #    my $t0 = [gettimeofday];
159 |     my $db = shift;
160 |     my $index_file = "$db.$suffix";
161 |     my $query = shift;
162 |     my $databaseName = shift;
163 |     my $outputFile = shift;
164 |     warn "Start querying $db...\n";
165 |     #how to find the record for a specific locus?
166 |     #first read the offset at the last 4 bytes
167 |     #then load the vocabulary into hash in memory (size is determined by offset)
168 |     #then locate key index (each key has one start 4 bytes, one length 4 bytes, total 8 bytes).
169 |     #store the key indeces in hash
170 |     #locate bin index by key index and genomic coordinate
171 |     #go to the file, read from bin start until no more records
172 |     my %key_index = &loadKeyIndex($index_file);
173 |     #example query
174 |     #1	114438528	114438528	G	A	TCGA-06-0155-01B-01D-1492-08
175 |     open DB,'<',$db or die "$db: $!";
176 |     open INDEX,'<',$index_file or die "$index_file: $!";
177 |     open QUERY,'<',$query or die "$query: $!";
178 |     open OUT, ">", $outputFile or die "$outputFile: $!";
179 |     
180 |     my $count_match = 0;
181 |     my $count_total = 0;
182 |     my $count_total_db_lookup = 1;
183 | 
184 |     while (<QUERY>) {
185 | 	$count_total++;
186 | #	warn "NOTICE: $count_total records processed in ".tv_interval($t0)." seconds\n" if $count_total % 1_000_000 == 0;
187 | #	print "query: $_" ;
188 | 	chomp;
189 | 	my @f = split /\t/;
190 | 	die "ERROR: expect at least 5 fields at line $. of $query\n" unless @f >= 5;
191 | 	warn "QUERY: @f\n" if $DEBUG;
192 | 	my ($chr, $start, $end, $ref, $alt) = @f[0..4];
193 | 	# chr start
194 | 	if ($chr =~ /chr/) {
195 | 	    $chr =~ /chr(.*)/;
196 | 	    $chr = $1;
197 | 	}
198 | 	if ((not exists $key_index{$chr}) or $start > $key_index{$chr}->[1]) {
199 | 	    warn "NOTICE: @f no match found\n" if $DEBUG;
200 | 	} else {
201 | 	    seek INDEX, $key_index{$chr}->[0] + 8*(int $start/$BINSIZE), SEEK_SET;
202 | 	    my $position_in_db;
203 | 	    read INDEX, $position_in_db, 8;
204 | 	    $position_in_db = unpack("Q", $position_in_db);
205 | 	    seek DB, $position_in_db, SEEK_SET;
206 | 	    while (<DB>) {
207 | 		#search within database
208 | 		#until no more annotations can be possibly found
209 | 		$count_total_db_lookup++;
210 | #		print;
211 | 		chomp;
212 | 		my @db_fields = split /\t/;
213 | 		warn "DB:@db_fields\n" if $DEBUG;
214 | 
215 | 
216 | 		if ($db_fields[0] eq $chr and $db_fields[1] == $start and $db_fields[2] == $end
217 | 			and $db_fields[3] eq $ref and $db_fields[4] eq $alt) {
218 | 		    print OUT join("\t", $databaseName, $db_fields[5], @f[0..4]), "\n";
219 | 		    $count_match++;
220 | 		}
221 | 		if ($db_fields[0] ne $chr or $db_fields[1] > $start) {
222 | 		    last; #if chromosomes are not equal, or the start coordinate is larger than query start
223 | 		    #there is no need to continue searching
224 | 		}
225 | 	    }
226 | 	}
227 |     }
228 |     close QUERY;
229 |     close DB;
230 |     close INDEX;
231 |     warn "Querying done. Found $count_match matches in $count_total queries.\n";
232 | #    warn "Processed $count_total records in ".tv_interval($t0)." seconds\n";
233 |     warn "Average database lookup: ".($count_total_db_lookup/$count_total)."\n";
234 | }
235 | sub loadKeyIndex {
236 |     #load vocabulary and key indeces of the index file
237 |     my $idx = shift;
238 | 
239 |     my $offset;
240 |     my %key_index; #values are [start_position, length]
241 |     my @vocabulary;
242 |     my $buffer;
243 |     open IN,'<',$idx or die "$!";
244 |     binmode IN;
245 | 
246 |     #check end
247 |     seek IN, -length($ENDOFFILE), SEEK_END or die "$!";
248 |     read IN, $buffer, length($ENDOFFILE) or die "$!";
249 |     $buffer = unpack("A".length($ENDOFFILE), $buffer);
250 |     die "Incomplete or corrupted index file, please re-index the database.\n" unless $buffer eq $ENDOFFILE;
251 | 
252 |     #offset
253 |     seek IN, -8-length($ENDOFFILE), SEEK_END or die "$!";
254 |     read IN, $offset, 8 or die "$!";
255 |     $offset = unpack("Q",$offset);
256 |     print "offset: $offset\n" if $DEBUG;
257 | 
258 |     #vocabulary
259 |     ##reopen the filehandle because last time we read to the end
260 |     #open IN,'<',$idx or die "$!";
261 |     #binmode IN;
262 |     seek IN, -(8 + $offset + length($ENDOFFILE)), SEEK_END or die "$!";
263 |     read IN, $buffer, $offset;
264 |     @vocabulary = split /\t/,(unpack ("A".$offset,$buffer));
265 |     print "@vocabulary" if $DEBUG;
266 | 
267 |     #key index
268 |     seek IN, -(length($ENDOFFILE) + 8 + $offset + 16*@vocabulary), SEEK_END;
269 |     for my $key(@vocabulary) {
270 | 	my ($start, $len);
271 | 	read IN, $start, 8;
272 | 	read IN, $len, 8;
273 | 	$start = unpack("Q", $start);
274 | 	$len = unpack("Q", $len);
275 | 	$key_index{$key} = [$start, $len];
276 |     }
277 | 
278 |     print Dumper(%key_index) if $DEBUG;
279 |     close IN;
280 |     return %key_index;
281 | }
282 | sub readFastaIndex {
283 |     my $in = shift;
284 |     #chr1	243000000	80	120
285 |     my %vocabulary;
286 |     open IN,'<',$in or die "$in: $!";
287 |     while (<IN>) {
288 | 	my @f = split /\t/;
289 | 	$vocabulary{$f[0]} = $f[1];
290 |     }
291 |     close IN;
292 |     return %vocabulary;
293 | }
294 | 


--------------------------------------------------------------------------------
/bin/icagesJson.pl:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/perl
  2 | use strict;
  3 | use warnings;
  4 | use List::Util qw(min max);
  5 | use Pod::Usage;
  6 | use Getopt::Long;
  7 | use JSON;
  8 | 
  9 | ######################################################################################################################################
 10 | ######################################################## variable declaration ########################################################
 11 | ######################################################################################################################################
 12 | 
 13 | my $rawInputFile = $ARGV[0];
 14 | my $icagesLocation = $ARGV[1];
 15 | my $prefix = $ARGV[2];
 16 | my ($icagesMutationsRef, $icagesGenesRef, $icagesDrugsRef, $logInformationRef);
 17 | my $json;
 18 | my $nowString;
 19 | $nowString = localtime();
 20 | 
 21 | ######################################################################################################################################
 22 | ########################################################### main  ####################################################################
 23 | ######################################################################################################################################
 24 | 
 25 | ($icagesMutationsRef, $icagesGenesRef, $icagesDrugsRef, $logInformationRef) = &loadResult($rawInputFile, $prefix);
 26 | $json = &createJson($icagesMutationsRef, $icagesGenesRef, $icagesDrugsRef, $logInformationRef);
 27 | &printJson($rawInputFile, $json, $prefix);
 28 | 
 29 | ######################################################################################################################################
 30 | ############################################################# subroutines ############################################################
 31 | ######################################################################################################################################
 32 | 
 33 | sub loadResult {
 34 |     print "NOTICE: start loading three output files from iCAGES\n";
 35 |     my ($rawInputFile, $prefix, $icagesMutationsFile, $icagesGenesFile, $icagesDrugsFile);
 36 |     my ($icagesMutationsRef, $icagesGenesRef, $icagesDrugsRef);
 37 |     my (%icagesMutations, %icagesGenes, %icagesDrugs);
 38 |     my ($missenseCount, $noncodingCount, $structuralVariationCount);
 39 |     my ($geneCount, $driverCount, $cgcCount, $keggCount, $drugCount);
 40 |     my %logInformation;
 41 |     $rawInputFile = shift;
 42 |     $prefix = shift;
 43 |     $icagesMutationsFile = $rawInputFile . $prefix. ".annovar.icagesMutations.csv";
 44 |     $icagesGenesFile = $rawInputFile . $prefix. ".annovar.icagesGenes.csv";
 45 |     $icagesDrugsFile = $rawInputFile . $prefix. ".annovar.icagesDrugs.csv";
 46 |     ($icagesMutationsRef, $missenseCount, $noncodingCount, $structuralVariationCount) = &loadMutations($icagesMutationsFile);
 47 |     ($icagesGenesRef, $geneCount, $driverCount, $cgcCount, $keggCount) = &loadGenes($icagesGenesFile);
 48 |     ($icagesDrugsRef, $drugCount) = &loadDrugs($icagesDrugsFile);
 49 |     %icagesMutations = %{$icagesMutationsRef};
 50 |     %icagesGenes = %{$icagesGenesRef};
 51 |     %icagesDrugs = %{$icagesDrugsRef};
 52 |     %logInformation = ("Gene_count" => $geneCount, "Driver_count" => $driverCount, "CGC_count" => $cgcCount, "KEGG_count" => $keggCount, "Missense_count" => $missenseCount, "Noncoding_count" => $noncodingCount, "Structural_variation_count" => $structuralVariationCount, "Drug_count" => $drugCount);
 53 |     return (\%icagesMutations, \%icagesGenes, \%icagesDrugs, \%logInformation);
 54 | }
 55 | 
 56 | sub createJson {
 57 |     print "NOTICE: start creating JSON file\n";
 58 |     my ($icagesMutationsRef, $icagesGenesRef, $icagesDrugsRef, $logInformationRef);
 59 |     my (%icagesMutations, %icagesGenes, %icagesDrugs);
 60 |     my %json;
 61 |     my $json;
 62 |     $icagesMutationsRef = shift;
 63 |     $icagesGenesRef = shift;
 64 |     $icagesDrugsRef = shift;
 65 |     $logInformationRef = shift;
 66 |     %icagesMutations = %{$icagesMutationsRef};
 67 |     %icagesGenes = %{$icagesGenesRef};
 68 |     %icagesDrugs = %{$icagesDrugsRef};
 69 |     foreach my $gene (sort keys %icagesGenes){
 70 |         $icagesGenes{$gene}{"Mutation"} = $icagesMutations{$gene};
 71 |         $icagesGenes{$gene}{"Children"} = $icagesDrugs{$gene};
 72 |         push @{$json{"Output"}}, $icagesGenes{$gene};
 73 |     }
 74 |     $json{"Log"} = $logInformationRef;
 75 |     $json = encode_json \%json;
 76 |     return $json;
 77 | }
 78 | 
 79 | sub printJson {
 80 |     my ($rawInputFile, $json, $prefix, $icagesJsonFile);
 81 |     $rawInputFile = shift;
 82 |     $json = shift;
 83 |     $prefix = shift;
 84 |     $icagesJsonFile = $rawInputFile . $prefix . ".icages.json";
 85 |     open(OUT, ">$icagesJsonFile") or die ;
 86 |     print OUT "$json\n";
 87 |     close OUT;
 88 |     print "NOTICE: end runing iCAGES packge at $nowString\n";
 89 | }
 90 | 
 91 | sub loadMutations {
 92 |     print "NOTICE: start processing icagesMutations.csv file to generate JSON file\n";
 93 |     my ($icagesMutationsFile, $title);
 94 |     my ($missenseCount, $noncodingCount, $structuralVariationCount);
 95 |     my %icagesMutations;
 96 |     $icagesMutationsFile = shift;
 97 |     open(MUT, "$icagesMutationsFile") or die;
 98 |     $title = <MUT>;
 99 |     $missenseCount = 0;
100 |     $noncodingCount = 0;
101 |     $structuralVariationCount = 0;
102 |     while(<MUT>){
103 |         chomp;
104 |         my @line = split(",", $_);
105 |         my $geneName = $line[0];
106 |         my $chrmosomeNumber = $line[1];
107 |         my $start = $line[2];
108 |         my $end = $line[3];
109 |         my $reference = $line[4];
110 |         my $alternative = $line[5];
111 |         my $category = $line[6];
112 |         if($category eq "point coding"){
113 |             $missenseCount ++;
114 |         }elsif($category eq  "point noncoding"){
115 |             $noncodingCount ++;
116 |         }elsif($category eq  "structural variation"){
117 |             $structuralVariationCount ++;
118 |         }
119 |         my $mutationSyntax = $line[7];
120 |         my $proteinSyntax = $line[8];
121 |         my $scoreCategory = $line[9];
122 |         my $mutationScore = $line[10] eq "NA" ? "NA" : sprintf("%.2f", $line[10]);
123 | 	
124 |         my %anonymousHash = ("Chromosome" => $chrmosomeNumber, "Start_position" => $start, "End_position" => $end, "Reference_allele" => $reference, "Alternative_allele" => $alternative, "Mutation_syntax" => $mutationSyntax, "Protein_syntax" => $proteinSyntax, "Mutation_category" => $category, "Score_category" => $scoreCategory, "Driver_mutation_score" => $mutationScore);
125 |         push @{$icagesMutations{$geneName}}, \%anonymousHash;
126 |     }
127 |     close MUT;
128 |     return (\%icagesMutations, $missenseCount, $noncodingCount, $structuralVariationCount);
129 | }
130 | 
131 | sub loadGenes {
132 |     print "NOTICE: start processing icagesGenes.csv file to generate JSON file\n";
133 |     my ($icagesGenesFile, $title);
134 |     my ($geneCount, $driverCount, $cgcCount, $keggCount);
135 |     $geneCount = 0;
136 |     $driverCount = 0;
137 |     $cgcCount = 0;
138 |     $keggCount = 0;
139 |     $icagesGenesFile = shift;
140 |     my $keggRef;
141 |     my %kegg;
142 |     my %icagesGenes;
143 |     my $DBLocation = $icagesLocation . "db/";
144 |     $keggRef = loadDatabase($DBLocation);
145 |     %kegg = %{$keggRef};
146 |     open(GENE, "$icagesGenesFile") or die;
147 |     $title = <GENE>;
148 |     # limit the number of maximum genes of 20 
149 |     while(<GENE>){
150 |         chomp;
151 | 	last if $geneCount == 49;
152 |         $geneCount ++;
153 |         my @line = split(",", $_);
154 |         my $geneName = $line[0];
155 |         my $category = $line[1];
156 | 	my $phenolyzer = $line[5] eq "NA" ? "NA" : sprintf("%.2f", $line[5]);
157 | 	my $icagesGeneScore = $line[6] eq "NA" ? "NA" : sprintf("%.2f", $line[6]);
158 |         my $url;
159 |         if ($category eq "Cancer Gene Census"){
160 |             $cgcCount ++;
161 |             $url = "http://cancer.sanger.ac.uk/cosmic/gene/overview?ln=" . $geneName;
162 |         }elsif($category eq "KEGG Cancer Pathway"){
163 |             $keggCount ++;
164 |             $url = "http://www.genome.jp/dbget-bin/www_bget?hsa:" . $kegg{$geneName};
165 |         }else{
166 |             $url = "http://www.genecards.org/cgi-bin/carddisp.pl?gene=". $geneName. "&search=96a88d95c4a24ffc2ac1129a92af7b02";
167 |         }
168 |         my %anonymousHash =  ("Gene_url" => $url, "Name" => $geneName, "Category" => $category, "Phenolyzer_score" => $phenolyzer, "iCAGES_gene_score" => $icagesGeneScore);
169 |         $icagesGenes{$geneName} = \%anonymousHash;
170 |     }
171 |     my @sortediCAGESGeneScore = sort {$icagesGenes{$b}{"iCAGES_gene_score"} <=> $icagesGenes{$a}{"iCAGES_gene_score"}} keys %icagesGenes;
172 |     my $percentile = int(($#sortediCAGESGeneScore + 1) * 0.20);
173 | #    my $criticalValue = $icagesGenes{$sortediCAGESGeneScore[$percentile-1]}{"iCAGES_gene_score"};
174 |     my $criticalValue = 0.11;
175 |     foreach my $gene (sort keys %icagesGenes){
176 |         if($icagesGenes{$gene}{"iCAGES_gene_score"} > $criticalValue){
177 |             $icagesGenes{$gene}{"Driver"} = "TRUE";
178 |             $driverCount ++;
179 |         }else{
180 |             $icagesGenes{$gene}{"Driver"} = "FALSE";
181 |         }
182 |     }
183 |     close GENE;
184 |     return (\%icagesGenes, $geneCount, $driverCount, $cgcCount, $keggCount);
185 | }
186 | 
187 | 
188 | 
189 | sub loadDrugs {
190 |     print "NOTICE: start processing icagesGenes.csv file to generate JSON file\n";
191 |     my ($icagesDrugsFile, $title);
192 |     my %icagesDrugs;
193 |     my $drug_count;
194 |     $drug_count = 0;
195 |     $icagesDrugsFile = shift;
196 |     open(DRUG, "$icagesDrugsFile") or die;
197 |     $title = <DRUG>;
198 |     while(<DRUG>){
199 |         $drug_count ++;
200 |         chomp;
201 |         my @line = split(",", $_);
202 |         my $drugName = $line[0];
203 |         my $finalTarget = $line[1];
204 |         my $directTarget = $line[2];
205 | 	my $maxBioSystemsScore = $line[4] eq "NA" ? "NA" : sprintf("%.2f", $line[4]);
206 | 	my $maxActivityScore = $line[5] eq "NA" ? "NA" : sprintf("%.2f", $line[5]);
207 | 	my $icagesDrugScore = $line[6] eq "NA" ? "NA" : sprintf("%.2f", $line[6]);
208 | 	# add fda(7,8) and clinical trial(9,10,11,12)
209 | 	my $FDA_tag = ( $line[8] eq "NA" and $line[9] eq "NA" ) ? "FALSE" : "TRUE" ;
210 | 	my $CT_tag = ( $line[10] eq "NA" and $line[11] eq "NA" and $line[12] eq "NA" and $line[13] eq "NA" ) ? "FALSE" : "TRUE" ;
211 | 	my (%fda, %ct);
212 |         my %anonymousHash;
213 | 	if($FDA_tag eq "FALSE" and $CT_tag eq "FALSE"){
214 | 	    %anonymousHash = ("Drug_name" => $drugName, "Final_target_gene" => $finalTarget, "Direct_target_gene" => $directTarget, "BioSystems_probability" => $maxBioSystemsScore, "PubChem_active_probability" => $maxActivityScore, "iCAGES_drug_score" => $icagesDrugScore, "Target_mutation_tag" => "FALSE" , "FDA_tag" => "FALSE", "CT_tag" => "FALSE");
215 | 	}elsif($FDA_tag eq "TRUE" and $CT_tag eq "FALSE"){
216 | 	    %fda = ("Status" => $line[8], "Active_ingredient" => $line[9]);
217 | 	    %anonymousHash = ("Drug_name" => $drugName, "Final_target_gene" => $finalTarget, "Direct_target_gene" => $directTarget, "BioSystems_probability" => $maxBioSystemsScore, "PubChem_active_probability" => $maxActivityScore, "iCAGES_drug_score" => $icagesDrugScore, "Target_mutation_tag" => "FALSE" , "FDA_tag" => "TRUE", "CT_tag" => "FALSE", "FDA_Info" => \%fda);
218 | 	}elsif($CT_tag eq "TRUE" and $FDA_tag eq "FALSE"){
219 | 	    %ct = ("Name" => $line[10], "Organization" => $line[11], "Phase" => $line[12] , "URL" => $line[13]);
220 | 	    %anonymousHash = ("Drug_name" => $drugName, "Final_target_gene" => $finalTarget, "Direct_target_gene" => $directTarget, "BioSystems_probability" => $maxBioSystemsScore, "PubChem_active_probability" => $maxActivityScore, "iCAGES_drug_score" => $icagesDrugScore, "Target_mutation_tag" => "FALSE" , "FDA_tag" => "FALSE", "CT_tag" => "TRUE", "CT_Children"=> [\%ct]);
221 | 	}else{
222 | 	    %fda = ("Status" => $line[8], "Active_ingredient" => $line[9]);
223 | 	    %ct= ("Name" => $line[10], "Organization" => $line[11], "Phase" => $line[12] , "URL" => $line[13]);
224 | 	    %anonymousHash = ("Drug_name" => $drugName, "Final_target_gene" => $finalTarget, "Direct_target_gene" => $directTarget, "BioSystems_probability" => $maxBioSystemsScore, "PubChem_active_probability" => $maxActivityScore, "iCAGES_drug_score" => $icagesDrugScore, "Target_mutation_tag" => "FALSE" , "FDA_tag" => "TRUE", "CT_tag" => "TRUE", "FDA_Info" => \%fda, "CT_Children" => [\%ct]);
225 | 	}
226 | 	
227 |         push @{$icagesDrugs{$finalTarget}}, \%anonymousHash;
228 |     }
229 |     close DRUG;
230 |     return (\%icagesDrugs, $drug_count);
231 | }
232 | 
233 | 
234 | sub loadDatabase {
235 |     print "NOTICE: start loading CGC, KEGG cancer pathway databases";
236 |     my ($DBLocation, $cgcFile, $keggFile);
237 |     my (%cgc, %kegg);
238 |     $DBLocation = shift;
239 |     $keggFile = $DBLocation . "kegg.gene";
240 |     open(KEGG, "$keggFile") or die ;
241 |     while(<KEGG>){
242 |         chomp;
243 |         my @line = split;
244 |         $kegg{$line[0]} = $line[1];
245 |     }
246 |     close KEGG;
247 |     return (\%kegg);
248 | }
249 | 
250 | 
251 | 


--------------------------------------------------------------------------------
/bin/icagesGene.pl:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/perl
  2 | use strict;
  3 | use warnings;
  4 | use List::Util qw(min max);	
  5 | use Pod::Usage;
  6 | use Getopt::Long;
  7 | 
  8 | ######################################################################################################################################
  9 | ######################################################## variable declaration ########################################################
 10 | ######################################################################################################################################
 11 | 
 12 | my ($rawInputFile, $icagesLocation, $subtype, $prefix);
 13 | my %phenolyzer;
 14 | 
 15 | ######################################################################################################################################
 16 | ########################################################### main  ####################################################################
 17 | ######################################################################################################################################
 18 | 
 19 | $rawInputFile = $ARGV[0];
 20 | $icagesLocation = $ARGV[1];
 21 | $subtype = $ARGV[2];
 22 | $prefix = $ARGV[3];
 23 | %phenolyzer = &loadPhenolyzer($icagesLocation);
 24 | &processMutation($rawInputFile, $icagesLocation, \%phenolyzer, $subtype, $prefix);
 25 | 
 26 | 
 27 | ######################################################################################################################################
 28 | ############################################################# subroutines ############################################################
 29 | ######################################################################################################################################
 30 | 
 31 | sub loadPhenolyzer {
 32 |     print "NOTICE: start loading Phenolyzer\n";
 33 |     my %phenolyzer;
 34 |     my ($icagesLocation, $DBLocation, $phenolyzerDB);
 35 |     $icagesLocation = shift;
 36 |     $DBLocation = $icagesLocation . "db/";
 37 |     $phenolyzerDB = $DBLocation . "phenolyzer.score";
 38 |     open(PHE, "$phenolyzerDB") or die "ERROR: cannot open $phenolyzerDB\n";
 39 |     while(<PHE>){
 40 |         chomp;
 41 |         my @line = split("\t", $_);
 42 |         $phenolyzer{$line[0]} = $line[1];
 43 |     }
 44 |     return %phenolyzer;
 45 | }
 46 | 
 47 | 
 48 | sub processMutation{
 49 |     print "NOTICE: start process mutation files from iCAGES layer one\n";
 50 |     my ($rawInputFile, $icagesLocation, $DBLocation, $icagesMutations, $icagesGenes, $ref, $subtype);
 51 |     my (%phenolyzer, %icagesGenes, %icagesPrint);
 52 |     my (%cgc, %kegg);
 53 |     my ($cgcRef, $keggRef);
 54 |     $rawInputFile = shift;
 55 |     $icagesLocation = shift;
 56 |     $ref = shift;
 57 |     $subtype = shift;
 58 |     $prefix = shift;
 59 |     %phenolyzer = %{$ref};
 60 |     $DBLocation = $icagesLocation . "db/";
 61 |     ($cgcRef, $keggRef) = loadDatabase($DBLocation);
 62 |     %cgc = %{$cgcRef};
 63 |     %kegg = %{$keggRef};
 64 |     $icagesMutations = $rawInputFile . $prefix . ".annovar.icagesMutations.csv";
 65 |     $icagesGenes = $rawInputFile . $prefix . ".annovar.icagesGenes.csv";
 66 |     open(MUTATIONS, "$icagesMutations") or die "ERROR: cannot open $icagesMutations\n";
 67 |     my $header = <MUTATIONS>;
 68 |     open(GENES, ">$icagesGenes") or die "ERROR: cannot open $icagesGenes\n";
 69 |     while(<MUTATIONS>){
 70 |         chomp;
 71 |         my @line = split(",", $_);
 72 | 	next unless defined $line[0];
 73 |         if(defined $icagesGenes{$line[0]}{$line[9]}){
 74 |             if($line[10] eq "NA" or  $icagesGenes{$line[0]}{$line[9]} eq "NA"){
 75 |                 next;
 76 |             }else{
 77 |                 $icagesGenes{$line[0]}{$line[9]} = max($line[10], $icagesGenes{$line[0]}{$line[9]});
 78 |             };
 79 |         }else{
 80 |             $icagesGenes{$line[0]}{$line[9]} = $line[10];
 81 |         };
 82 |     };
 83 |     
 84 |     ####### count genes
 85 |     my $geneCount = 0;
 86 |     my $cgcCount = 0;
 87 |     my $keggCount = 0;
 88 |     
 89 | 
 90 |     foreach my $gene (sort keys %icagesGenes){
 91 |         $geneCount ++;
 92 |         my ($radialSVM, $funseq, $cnv, $phenolyzer, $icagesGene, $category);
 93 |         if (exists $icagesGenes{$gene}{"radial SVM"} and $icagesGenes{$gene}{"radial SVM"} ne "NA"){
 94 |             $radialSVM = $icagesGenes{$gene}{"radial SVM"};
 95 |         }else{
 96 |             $radialSVM = 0;
 97 |         };
 98 |         if (exists $icagesGenes{$gene}{"FunSeq2"} and $icagesGenes{$gene}{"FunSeq2"} ne "NA"){
 99 |             $funseq = $icagesGenes{$gene}{"FunSeq2"} ;
100 |         }else{
101 |             $funseq = 0;
102 |         };
103 |         if (exists $icagesGenes{$gene}{"CNV normalized signal"} and $icagesGenes{$gene}{"CNV normalized signal"} ne "NA"){
104 |             $cnv = $icagesGenes{$gene}{"CNV normalized signal"} ;
105 |         }else{
106 |             $cnv =  0;
107 |         };
108 | 
109 | 	next if $cnv==0 and $funseq==0 and $radialSVM==0;
110 | 	
111 |         if (exists $phenolyzer{$gene}){
112 |             $phenolyzer = $phenolyzer{$gene};
113 |         }else{
114 |             $phenolyzer = 0;
115 |         };
116 |         if (exists $cgc{$gene}){
117 |             $cgcCount ++;
118 |             $category = "Cancer Gene Census";
119 |         }elsif(exists $kegg{$gene}){
120 |             $keggCount ++;
121 |             $category = "KEGG Cancer Pathway";
122 |         }else{
123 |             $category = "Other Category";
124 |         }
125 |         if($subtype eq "ACC"){
126 |             $icagesGene = 1.36771 -0.42382 * $radialSVM -0.93636 * $cnv  -0.15453 * $funseq -2.43656 * $phenolyzer;
127 |         }elsif($subtype eq "BLCA"){
128 |             $icagesGene = 1.058757 -0.038520 * $radialSVM -0.398104 * $cnv -0.003739 * $funseq -0.909022 * $phenolyzer;
129 |         }elsif($subtype eq "BRCA"){
130 |             $icagesGene = 1.093442 -0.075614 * $radialSVM  -0.479842* $cnv -0.010534 * $funseq -1.022521 * $phenolyzer;
131 |         }elsif($subtype eq "CESC"){
132 |             $icagesGene = 1.16180 -0.14885 * $radialSVM -1.32652 * $cnv -0.05598 * $funseq -1.32015 * $phenolyzer;
133 |         }elsif($subtype eq "CHOL"){
134 |             $icagesGene = -0.14641 + 0.18845 * $radialSVM + 0.50381 * $cnv +0.13216 * $funseq + 0.61047 * $phenolyzer;
135 |         }elsif($subtype eq "COADREAD") {
136 | 	    $icagesGene = -0.062618 + 0.041399 * $radialSVM + 1.315360 * $cnv + 0.008455 * $funseq + 0.990089 * $phenolyzer;
137 | 	}elsif($subtype eq "COAD"){
138 |             $icagesGene = -0.091516 + 0.069962 * $radialSVM + 1.101200 * $cnv  + 0.030804 * $funseq + 1.044726 * $phenolyzer;
139 |         }elsif($subtype eq "DLBC") {
140 | 	    $icagesGene = -0.12036 + 0.13884 * $radialSVM + 0.53773 * $cnv  + 0.01512 * $funseq + 0.85130 * $phenolyzer;
141 | 	}elsif($subtype eq "ESCA"){
142 |             $icagesGene = -0.13015 + 0.11817 * $radialSVM + 0.82129 * $cnv + 0.01265 * $funseq + 1.07817 * $phenolyzer;
143 |         }elsif($subtype eq "GBM"){
144 |             $icagesGene = -0.12775 + 0.12841 * $radialSVM + 0.45996 * $cnv + 0.02754 * $funseq + 0.92774 * $phenolyzer;
145 |         }elsif($subtype eq "GBMLGG"){
146 | 	    $icagesGene = -0.07477 + 0.05322 * $radialSVM + 1.18877 * $cnv -0.01261 * $funseq + 1.08833 * $phenolyzer;
147 | 	} elsif($subtype eq "HNSC"){
148 |             $icagesGene = 1.058015 -0.037751 * $radialSVM -0.581648 * $cnv -0.004731 * $funseq -0.946285 * $phenolyzer;
149 |         }elsif($subtype eq "KICH"){
150 |             $icagesGene = -0.12298 + 0.16503 * $radialSVM + 0.29233 * $cnv + 0.10256 * $funseq + 0.46457 * $phenolyzer;
151 |         }elsif($subtype eq "KIPAN") {
152 | 	    $icagesGene = 1.093222 -0.073644 * $radialSVM -0.345129 * $cnv - 0.028881 * $funseq -1.146782 * $phenolyzer;
153 | 	}elsif($subtype eq "KIRC"){
154 |             $icagesGene = 1.13778 -0.12890 * $radialSVM -0.54436 * $cnv -0.01729 * $funseq -1.44148 * $phenolyzer;
155 |         }elsif($subtype eq "KIRP"){
156 |             $icagesGene = -0.07488 + 0.06861 * $radialSVM + 1.23478 * $cnv +0.01712 * $funseq + 0.96863 * $phenolyzer;
157 |         }elsif($subtype eq "LAML"){
158 |             $icagesGene = -0.05995 + 0.15925 * $radialSVM + 0.12114 * $cnv + 0.02738 * $funseq + 0.02367 * $phenolyzer;
159 |         }elsif($subtype eq "LGG"){
160 |             $icagesGene = -0.11611 + 0.10186 * $radialSVM + 0.95189 * $cnv + 0.04495 * $funseq + 1.11112 * $phenolyzer;
161 |         }elsif($subtype eq "LIHC"){
162 |             $icagesGene = 1.113955 -0.100192 * $radialSVM -0.773660 * $cnv  -0.047867 * $funseq -1.076089 * $phenolyzer;
163 |         }elsif($subtype eq "LUAD") {
164 | 	    $icagesGene = -0.096116 + 0.074096 * $radialSVM + 1.462135 * $cnv + 0.032892 * $funseq + 1.035469 * $phenolyzer;
165 | 	}elsif($subtype eq "LUSC"){
166 |             $icagesGene = 1.13182 -0.11578 * $radialSVM -0.53788 * $cnv -0.03912 * $funseq -1.15891 * $phenolyzer;
167 |         }elsif($subtype eq "OV"){
168 |             $icagesGene = 1.18324 -0.17481 * $radialSVM -0.72561 * $cnv -0.06436 * $funseq -1.42091 * $phenolyzer;
169 |         }elsif($subtype eq "PAAD"){
170 |             $icagesGene = 1.16658 -0.15211 * $radialSVM -0.87784 * $cnv -0.08511 * $funseq -1.33967 * $phenolyzer;
171 |         }elsif($subtype eq "PCPG"){
172 |             $icagesGene = -0.12564 + 0.19600 * $radialSVM + 0.08097 * $cnv  + 0.06315 * $funseq + 0.28951 * $phenolyzer;
173 |         }elsif($subtype eq "PRAD"){
174 |             $icagesGene = 1.124113 -0.105090 * $radialSVM -0.299424 * $cnv -0.001517 * $funseq -1.375397 * $phenolyzer;
175 |         }elsif($subtype eq "READ") {
176 | 	    $icagesGene = -0.12973 + 0.12316 * $radialSVM + 1.52412 * $cnv +  0.03016 * $funseq + 0.98996 * $phenolyzer;
177 | 	}elsif($subtype eq "SARC"){
178 |             $icagesGene = -0.077275 + 0.081598 * $radialSVM + 0.562010 * $cnv -0.006907 * $funseq + 0.772565 * $phenolyzer;
179 |         }elsif($subtype eq "SKCM"){
180 |             $icagesGene = 1.076621 -0.051706 * $radialSVM -0.283882 * $cnv -0.015553 * $funseq -1.104187 * $phenolyzer;
181 |         }elsif($subtype eq "STAD"){
182 |             $icagesGene = 1.028806 -0.010729 * $radialSVM -0.753154 * $cnv + 0.007523 * $funseq -0.807513 * $phenolyzer;
183 |         }elsif($subtype eq "STES") {
184 | 	    $icagesGene = 1.0288425 -0.0127624 * $radialSVM -0.5998427 * $cnv + 0.0004369 * $funseq -0.7377184 * $phenolyzer;
185 | 	}elsif($subtype eq "TGCT"){
186 |             $icagesGene = -0.12173 + 0.14025 * $radialSVM -0.01375 * $cnv + 0.01167 * $funseq + 0.73381 * $phenolyzer;
187 |         }elsif($subtype eq "THCA"){
188 |             $icagesGene = -0.106440 + 0.125886 * $radialSVM + 0.753607 * $cnv + 0.008624 * $funseq + 0.705268 * $phenolyzer;
189 |         }elsif($subtype eq "THYM"){
190 |             $icagesGene = -0.07786 + 0.15530 * $radialSVM + 0.25656 * $cnv + 0.04185 * $funseq + 0.15939 * $phenolyzer;
191 |         }elsif($subtype eq "UCEC"){
192 |             $icagesGene = -0.064307 + 0.038792 * $radialSVM + 0.992520 * $cnv + 0.006978 * $funseq + 1.058414 * $phenolyzer;
193 |         }elsif($subtype eq "UCS"){
194 |             $icagesGene = -0.132534 + 0.153048 * $radialSVM + 0.500817 * $cnv -0.004412 * $funseq + 0.766499 * $phenolyzer;
195 |         }else{
196 | 	    $icagesGene = -8.0124 + 5.5577 * $radialSVM + 267.3371 * $cnv + 0.3741 * $funseq + 10.0949 * $phenolyzer;
197 | #            $icagesGene = 0.455825 + 0.053429 * $radialSVM + 0.267669 * $cnv + 0.054989 * $funseq  -0.117 * $phenolyzer;
198 |         }
199 |         $icagesGene = 1/(1+exp(-$icagesGene));
200 |         $icagesPrint{$gene}{"score"} = $icagesGene;
201 | 	# add driver information requested by user
202 | 	my $driver = "No";
203 | 	if ($icagesGene >= 0.11) {
204 | 	    $driver = "Yes";
205 | 	}
206 |         $icagesPrint{$gene}{"content"} = "$gene,$category,$radialSVM,$funseq,$cnv,$phenolyzer,$icagesGene,$driver";
207 |     };
208 |     print GENES "geneName,category,radialSVM,funseq,cnv,phenolyzer,icagesGeneScore,driver\n";
209 |     foreach my $gene (sort {$icagesPrint{$b}{"score"} <=> $icagesPrint{$a}{"score"}} keys %icagesPrint){
210 |         print GENES "$icagesPrint{$gene}{\"content\"}\n";
211 |     }
212 |     
213 |     my $logFile = $rawInputFile . $prefix  . ".annovar.icages.log";
214 |     open(LOG, ">>$logFile") or die "iCAGES: cannot open file $logFile\n";
215 |     
216 |     print LOG "########### iCAGES Gene Summary ###########\n";
217 |     print LOG "## basic information\n";
218 |     print LOG "Total: $geneCount\n";
219 |     print LOG "Cancer Gene Census Gene: $cgcCount\n";
220 |     print LOG "KEGG Pathway Gene: $keggCount\n\n";
221 |     
222 | }
223 | 
224 | 
225 | sub loadDatabase {
226 |     print "NOTICE: start loading CGC, KEGG cancer pathway databases\n";
227 |     my ($DBLocation, $cgcFile, $keggFile);
228 |     my (%cgc, %kegg);
229 |     $DBLocation = shift;
230 |     $cgcFile = $DBLocation . "cgc.gene";
231 |     $keggFile = $DBLocation . "kegg.gene";
232 |     open(CGC, "$cgcFile") or die ;
233 |     open(KEGG, "$keggFile") or die ;
234 |     while(<CGC>){
235 |         chomp;
236 |         my @line = split;
237 |         $cgc{$line[0]} = 1;
238 |     }
239 |     while(<KEGG>){
240 |         chomp;
241 |         my @line = split;
242 |         $kegg{$line[0]} = 1;
243 |     }
244 |     close CGC;
245 |     close KEGG;
246 |     return (\%cgc, \%kegg);
247 | }
248 | 
249 | 
250 | 
251 | 


--------------------------------------------------------------------------------
/doc/user-guide/example.md:
--------------------------------------------------------------------------------
  1 | ## ANNOVAR input with new prefix and new direcotry for output
  2 | 
  3 | To change prefix, please use option `-p` or `--prefix` and to change directory where your output will be generated, please use option `--outputdir`. Note that if you have other forms of input, such as VCF format and BED format, the syntax is the same.
  4 | 
  5 | ```
  6 | [cocodong@biocluster ~/]$ head input.txt 
  7 | 1	12919840	12919840	T	C
  8 | 1	35332717	35332717	C	A
  9 | 1	55148456	55148456	G	T
 10 | 1	70504789	70504789	C	T
 11 | 1	167059520	167059520	A	T
 12 | 1	182496864	182496864	A	T
 13 | 1	197073351	197073351	C	T
 14 | 1	216373211	216373211	G	T
 15 | 10	37490170	37490170	G	A
 16 | 10	56089432	56089432	A	C
 17 | [cocodong@biocluster ~/]$ icages.pl input.txt -p newname --outputdir newoutputdir
 18 | 
 19 | ```
 20 | 
 21 | ## ANNOVAR input annotated with hg38
 22 | 
 23 | To change database version, please use option `--buildver`. Note that if you have other forms of input, such as VCF format and BED format, the syntax is the same.
 24 | 
 25 | ```
 26 | [cocodong@biocluster ~/]$ head input.txt 
 27 | 1	12919840	12919840	T	C
 28 | 1	35332717	35332717	C	A
 29 | 1	55148456	55148456	G	T
 30 | 1	70504789	70504789	C	T
 31 | 1	167059520	167059520	A	T
 32 | 1	182496864	182496864	A	T
 33 | 1	197073351	197073351	C	T
 34 | 1	216373211	216373211	G	T
 35 | 10	37490170	37490170	G	A
 36 | 10	56089432	56089432	A	C
 37 | [cocodong@biocluster ~/]$ icages.pl input.txt --buildver hg38
 38 | 
 39 | ```
 40 | 
 41 | ## VCF input with one sample which contains both germline mutations and mutations in his/her tumor
 42 | 
 43 | If you do not have somatic mutations for one sample in VCF file, but what you have is a VCF file that contains both germline mutations and mutations in cancer for this sample, then you can specify the headers for germline mutations using options `-t` or `--tumor` and specify the headers for tumor mutations using options `-g` or `--germline`. iCAGES will be able to extract somatic mutations from this VCF file and carry on downstream analysis for you. In this example, the input file is a VCF file that contains tumor mutations with header "tumor" and germline mutations with header "germline", all annotated with reference genome version of hg19.
 44 | ```
 45 | [cocodong@biocluster ~/]$ cat input.vcf
 46 | ##fileformat=VCFv4.1
 47 | ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
 48 | ##contig=<ID=1,length=249250621,assembly=b37>
 49 | ##contig=<ID=2,length=243199373,assembly=b37>
 50 | ##contig=<ID=3,length=198022430,assembly=b37>
 51 | ##contig=<ID=4,length=191154276,assembly=b37>
 52 | ##contig=<ID=5,length=180915260,assembly=b37>
 53 | ##contig=<ID=6,length=171115067,assembly=b37>
 54 | ##contig=<ID=7,length=159138663,assembly=b37>
 55 | ##contig=<ID=8,length=146364022,assembly=b37>
 56 | ##contig=<ID=9,length=141213431,assembly=b37>
 57 | ##contig=<ID=10,length=135534747,assembly=b37>
 58 | ##contig=<ID=11,length=135006516,assembly=b37>
 59 | ##contig=<ID=12,length=133851895,assembly=b37>
 60 | ##contig=<ID=13,length=115169878,assembly=b37>
 61 | ##contig=<ID=14,length=107349540,assembly=b37>
 62 | ##contig=<ID=15,length=102531392,assembly=b37>
 63 | ##contig=<ID=16,length=90354753,assembly=b37>
 64 | ##contig=<ID=17,length=81195210,assembly=b37>
 65 | ##contig=<ID=18,length=78077248,assembly=b37>
 66 | ##contig=<ID=19,length=59128983,assembly=b37>
 67 | ##contig=<ID=20,length=63025520,assembly=b37>
 68 | ##contig=<ID=21,length=48129895,assembly=b37>
 69 | ##contig=<ID=22,length=51304566,assembly=b37>
 70 | ##contig=<ID=X,length=155270560,assembly=b37>
 71 | ##contig=<ID=Y,length=59373566,assembly=b37>
 72 | ##contig=<ID=MT,length=16569,assembly=b37>
 73 | #CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	tumor	germline
 74 | 1	12919840	.	T	C	.	.	.	GT	1|1	0|0
 75 | 1	35332717	.	C	A	.	.	.	GT	1|1	0|0
 76 | 1	55148456	.	G	T	.	.	.	GT	1|1	0|0
 77 | 1	70504789	.	C	T	.	.	.	GT	1|1	0|0
 78 | 1	167059520	.	A	T	.	.	.	GT	1|1	0|0
 79 | 1	182496864	.	A	T	.	.	.	GT	1|1	0|0
 80 | 1	197073351	.	C	T	.	.	.	GT	1|1	0|0
 81 | 1	216373211	.	G	T	.	.	.	GT	1|1	0|0
 82 | 10	37490170	.	G	A	.	.	.	GT	1|1	0|0
 83 | 10	56089432	.	A	C	.	.	.	GT	1|1	0|0
 84 | ...
 85 | [cocodong@biocluster ~/]$ icages.pl input.vcf -t tumor -g germline
 86 | ```
 87 | 
 88 | ## VCF input with multiple samples which contains both germline mutations and tumor mutations
 89 | 
 90 | iCAGES is a personalized cancer driver analysis pipeline, so it only does analysis for ONE single patient. But if what you have is a VCF file that contains both germline mutations and tumor mutations for multiple individuals, then you can specify the headers for germline mutations for the patient of your interest using options `-t` or `--tumor` and specify the headers for tumor mutations for the patient of your interest using options `-g` or `--germline`. iCAGES will be able to extract somatic mutations for this particular individual from this VCF file and carry on downstream analysis for you. In this example, the input file is a VCF file that contains mutations from two individuals Sapmle1 and Sample2, each of them have both tumor mutations and germline mutations with slightly different headers, all annotated with reference genome version of hg19. By specifying headers for Sample1, iCAGES analyzes this sample for you.
 91 | ```
 92 | [cocodong@biocluster ~/]$ cat input.vcf
 93 | ##fileformat=VCFv4.1
 94 | ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
 95 | ##contig=<ID=1,length=249250621,assembly=b37>
 96 | ##contig=<ID=2,length=243199373,assembly=b37>
 97 | ##contig=<ID=3,length=198022430,assembly=b37>
 98 | ##contig=<ID=4,length=191154276,assembly=b37>
 99 | ##contig=<ID=5,length=180915260,assembly=b37>
100 | ##contig=<ID=6,length=171115067,assembly=b37>
101 | ##contig=<ID=7,length=159138663,assembly=b37>
102 | ##contig=<ID=8,length=146364022,assembly=b37>
103 | ##contig=<ID=9,length=141213431,assembly=b37>
104 | ##contig=<ID=10,length=135534747,assembly=b37>
105 | ##contig=<ID=11,length=135006516,assembly=b37>
106 | ##contig=<ID=12,length=133851895,assembly=b37>
107 | ##contig=<ID=13,length=115169878,assembly=b37>
108 | ##contig=<ID=14,length=107349540,assembly=b37>
109 | ##contig=<ID=15,length=102531392,assembly=b37>
110 | ##contig=<ID=16,length=90354753,assembly=b37>
111 | ##contig=<ID=17,length=81195210,assembly=b37>
112 | ##contig=<ID=18,length=78077248,assembly=b37>
113 | ##contig=<ID=19,length=59128983,assembly=b37>
114 | ##contig=<ID=20,length=63025520,assembly=b37>
115 | ##contig=<ID=21,length=48129895,assembly=b37>
116 | ##contig=<ID=22,length=51304566,assembly=b37>
117 | ##contig=<ID=X,length=155270560,assembly=b37>
118 | ##contig=<ID=Y,length=59373566,assembly=b37>
119 | ##contig=<ID=MT,length=16569,assembly=b37>
120 | #CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	Sample1Tumor	Sample1Germline	Sample2Tumor	Sample2Germline
121 | 1	12919840	.	T	C	.	.	.	GT	1|1	0|0	1|1	0|0
122 | 1	35332717	.	C	A	.	.	.	GT	1|1	0|0	1|1	0|0
123 | 1	55148456	.	G	T	.	.	.	GT	1|1	0|0	1|1	0|0
124 | 1	70504789	.	C	T	.	.	.	GT	1|1	0|0	1|1	0|0
125 | 1	167059520	.	A	T	.	.	.	GT	1|1	0|0	1|1	0|0
126 | 1	182496864	.	A	T	.	.	.	GT	1|1	0|0	1|1	0|0
127 | 1	197073351	.	C	T	.	.	.	GT	1|1	0|0	1|1	0|0
128 | 1	216373211	.	G	T	.	.	.	GT	1|1	0|0	1|1	0|0
129 | 10	37490170	.	G	A	.	.	.	GT	1|1	0|0	1|1	0|0
130 | 10	56089432	.	A	C	.	.	.	GT	1|1	0|0	1|1	0|0
131 | ...
132 | [cocodong@biocluster ~/]$ icages.pl input.vcf -t Sample1Tumor -g Sample1Germline
133 | ```
134 | 
135 | ## VCF input with multiple samples which contains only somatic mutations
136 | 
137 | Again, iCAGES is a personalized cancer driver analysis pipeline, so it only does analysis for ONE single patient. But if what you have is a VCF file that contains somatic mutations for multiple individuals, then you can specify the header for the patient of your interest using options `-i` or `--id`. iCAGES will be able to extract somatic mutations for this particular individual from this VCF file and carry on downstream analysis for you. In this example, the input file is a VCF file that contains somatic mutations from two individuals Sapmle1 and Sample2, all annotated with reference genome version of hg19. By specifying header for Sample1, iCAGES analyzes this sample for you.
138 | ```
139 | [cocodong@biocluster ~/]$ cat input.vcf
140 | ##fileformat=VCFv4.1
141 | ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
142 | ##contig=<ID=1,length=249250621,assembly=b37>
143 | ##contig=<ID=2,length=243199373,assembly=b37>
144 | ##contig=<ID=3,length=198022430,assembly=b37>
145 | ##contig=<ID=4,length=191154276,assembly=b37>
146 | ##contig=<ID=5,length=180915260,assembly=b37>
147 | ##contig=<ID=6,length=171115067,assembly=b37>
148 | ##contig=<ID=7,length=159138663,assembly=b37>
149 | ##contig=<ID=8,length=146364022,assembly=b37>
150 | ##contig=<ID=9,length=141213431,assembly=b37>
151 | ##contig=<ID=10,length=135534747,assembly=b37>
152 | ##contig=<ID=11,length=135006516,assembly=b37>
153 | ##contig=<ID=12,length=133851895,assembly=b37>
154 | ##contig=<ID=13,length=115169878,assembly=b37>
155 | ##contig=<ID=14,length=107349540,assembly=b37>
156 | ##contig=<ID=15,length=102531392,assembly=b37>
157 | ##contig=<ID=16,length=90354753,assembly=b37>
158 | ##contig=<ID=17,length=81195210,assembly=b37>
159 | ##contig=<ID=18,length=78077248,assembly=b37>
160 | ##contig=<ID=19,length=59128983,assembly=b37>
161 | ##contig=<ID=20,length=63025520,assembly=b37>
162 | ##contig=<ID=21,length=48129895,assembly=b37>
163 | ##contig=<ID=22,length=51304566,assembly=b37>
164 | ##contig=<ID=X,length=155270560,assembly=b37>
165 | ##contig=<ID=Y,length=59373566,assembly=b37>
166 | ##contig=<ID=MT,length=16569,assembly=b37>
167 | #CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	Sample1	Sample2
168 | 1	12919840	.	T	C	.	.	.	GT	1|1	0|0
169 | 1	35332717	.	C	A	.	.	.	GT	1|1	1|1
170 | 1	55148456	.	G	T	.	.	.	GT	1|1	0|0
171 | 1	70504789	.	C	T	.	.	.	GT	1|1	1|1
172 | 1	167059520	.	A	T	.	.	.	GT	1|1	0|0
173 | 1	182496864	.	A	T	.	.	.	GT	0|0	0|0
174 | 1	197073351	.	C	T	.	.	.	GT	1|1	1|1
175 | 1	216373211	.	G	T	.	.	.	GT	1|1	0|0
176 | 10	37490170	.	G	A	.	.	.	GT	1|1	0|0
177 | 10	56089432	.	A	C	.	.	.	GT	1|1	0|0
178 | ...
179 | [cocodong@biocluster ~/]$ icages.pl input.vcf -i Sample1
180 | ```
181 | 
182 | ## VCF input with multiple samples and BED files with additional structural variations
183 | 
184 | VCF has immaure development of annotation on structural variations. In order to better annotate personal cancer mutation profiles, we made iCAGES to support additional BED file input, which profiles structural variations, using options `-b` or `--bed`. iCAGES will be able to combine information from VCF files and BED files to do downstream data analysis for you. In this example, the input files are a VCF file that contains somatic mutations from two individuals Sapmle1 and Sample2, all annotated with reference genome version of hg19 and a BED file that contains coordinates of structural varations. This exemplary BED file is also provided in the package.
185 | ```
186 | [cocodong@biocluster ~/]$ cat input.vcf
187 | ##fileformat=VCFv4.1
188 | ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
189 | ##contig=<ID=1,length=249250621,assembly=b37>
190 | ##contig=<ID=2,length=243199373,assembly=b37>
191 | ##contig=<ID=3,length=198022430,assembly=b37>
192 | ##contig=<ID=4,length=191154276,assembly=b37>
193 | ##contig=<ID=5,length=180915260,assembly=b37>
194 | ##contig=<ID=6,length=171115067,assembly=b37>
195 | ##contig=<ID=7,length=159138663,assembly=b37>
196 | ##contig=<ID=8,length=146364022,assembly=b37>
197 | ##contig=<ID=9,length=141213431,assembly=b37>
198 | ##contig=<ID=10,length=135534747,assembly=b37>
199 | ##contig=<ID=11,length=135006516,assembly=b37>
200 | ##contig=<ID=12,length=133851895,assembly=b37>
201 | ##contig=<ID=13,length=115169878,assembly=b37>
202 | ##contig=<ID=14,length=107349540,assembly=b37>
203 | ##contig=<ID=15,length=102531392,assembly=b37>
204 | ##contig=<ID=16,length=90354753,assembly=b37>
205 | ##contig=<ID=17,length=81195210,assembly=b37>
206 | ##contig=<ID=18,length=78077248,assembly=b37>
207 | ##contig=<ID=19,length=59128983,assembly=b37>
208 | ##contig=<ID=20,length=63025520,assembly=b37>
209 | ##contig=<ID=21,length=48129895,assembly=b37>
210 | ##contig=<ID=22,length=51304566,assembly=b37>
211 | ##contig=<ID=X,length=155270560,assembly=b37>
212 | ##contig=<ID=Y,length=59373566,assembly=b37>
213 | ##contig=<ID=MT,length=16569,assembly=b37>
214 | #CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	Sample1	Sample2
215 | 1	12919840	.	T	C	.	.	.	GT	1|1	0|0
216 | 1	35332717	.	C	A	.	.	.	GT	1|1	1|1
217 | 1	55148456	.	G	T	.	.	.	GT	1|1	0|0
218 | 1	70504789	.	C	T	.	.	.	GT	1|1	1|1
219 | 1	167059520	.	A	T	.	.	.	GT	1|1	0|0
220 | 1	182496864	.	A	T	.	.	.	GT	0|0	0|0
221 | 1	197073351	.	C	T	.	.	.	GT	1|1	1|1
222 | 1	216373211	.	G	T	.	.	.	GT	1|1	0|0
223 | 10	37490170	.	G	A	.	.	.	GT	1|1	0|0
224 | 10	56089432	.	A	C	.	.	.	GT	1|1	0|0
225 | ...
226 | [cocodong@biocluster ~/]$ cat input.bed
227 | chr10	89677000	89690000
228 | chr8	38336000	38353000
229 | [cocodong@biocluster ~/]$ icages.pl input.vcf -i Sample1 -b input.bed
230 | ```
231 | 
232 | 
233 | 
234 | 
235 | ---
236 | 
237 | <div id="disqus_thread"></div>
238 | <script type="text/javascript">
239 | /* * * CONFIGURATION VARIABLES * * */
240 | var disqus_shortname = 'icages';
241 | var disqus_identifier = 'example';
242 | var disquss_title = 'iCAGES Examples';
243 | 
244 | /* * * DON'T EDIT BELOW THIS LINE * * */
245 | (function() {
246 | var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true;
247 | dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js';
248 | (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
249 | })();
250 | </script>
251 | <noscript>Please enable JavaScript to view the <a href="https://disqus.com/?ref_noscript" rel="nofollow">comments powered by Disqus.</a></noscript>
252 | 
253 | 
254 | 


--------------------------------------------------------------------------------
/icages.pl:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/perl
  2 | use strict;
  3 | use warnings;
  4 | use Pod::Usage;
  5 | use Getopt::Long;
  6 | 
  7 | ######################################################################################################################################
  8 | ######################################################## variable declaration ########################################################
  9 | ######################################################################################################################################
 10 | my ($icagesMutation, $icagesGene, $icagesDrug, $icagesJson);
 11 | my ($inputFile, $inputDir, $icagesLocation, $tumor, $germline, $id, $subtype , $logDir, $outputDir, $tempDir, $prefix, $bed, $hg19, $expression);
 12 | 
 13 | ######################################################################################################################################
 14 | ############################################################# main  ##################################################################
 15 | ######################################################################################################################################
 16 | ($inputDir, $icagesLocation, $tumor, $germline, $id, $subtype , $logDir, $outputDir, $tempDir, $prefix, $bed, $hg19, $expression) =  &processArguments();
 17 | &checkReady($icagesLocation);
 18 | # use yunfei's new index function
 19 | $icagesMutation = $icagesLocation. "bin/icagesMutationNew.pl";
 20 | $icagesGene = $icagesLocation . "bin/icagesGene.pl";
 21 | $icagesDrug = $icagesLocation . "bin/icagesDrug.pl";
 22 | $icagesJson = $icagesLocation . "bin/icagesJson.pl";
 23 | $inputFile = $ARGV[0];
 24 | &genLogFile($inputFile, $inputDir, $tumor, $germline, $id, $subtype , $logDir, $outputDir, $tempDir, $prefix, $bed, $hg19, $expression);
 25 | !system("perl $icagesMutation $inputFile $inputDir $icagesLocation $tumor $germline $id $prefix $bed $hg19 $expression") or die "ERROR: cannot call icagesMutation module\n";
 26 | !system("perl $icagesGene $inputDir $icagesLocation $subtype $prefix ") or die "ERROR: cannot call icagesGene module\n";
 27 | !system("perl $icagesDrug $inputDir $icagesLocation $prefix") or die "ERROR: cannot call icagesDrug module\n";
 28 | !system("perl $icagesJson $inputDir $icagesLocation $prefix") or die "ERROR: cannot call icagesJson module\n";
 29 | &moveFiles($inputDir, $prefix, $logDir, $outputDir, $tempDir);
 30 | 
 31 | ######################################################################################################################################
 32 | ########################################################## subroutines ###############################################################
 33 | ######################################################################################################################################
 34 | 
 35 | sub genLogFile{
 36 |     my ($inputFile, $inputDir, $tumor, $germline, $id, $subtype , $logDir, $outputDir, $tempDir, $prefix, $bed, $hg19, $expression);
 37 |     $inputFile = shift;
 38 |     $inputDir = shift;
 39 |     $tumor = shift;
 40 |     $germline = shift;
 41 |     $id = shift;
 42 |     $subtype = shift;
 43 |     $logDir = shift;
 44 |     $outputDir = shift;
 45 |     $tempDir = shift;
 46 |     $prefix = shift;
 47 |     $bed = shift;
 48 |     $hg19 = shift;
 49 |     $expression = shift;
 50 | 
 51 |     my $annovarInputFile = $inputDir . "/" . $prefix . ".annovar";
 52 |     my $logFile = $annovarInputFile . ".icages.log";
 53 |     open(LOG, ">$logFile") or die "ERROR: cannot open log file\n";
 54 |     print LOG "########### iCAGES Parameter Summary ###########\n";
 55 |     print LOG "## basic information\n";
 56 |     print LOG "Input file:\t$inputFile\n";
 57 |     print LOG "Input directory:\t$inputDir\n";
 58 |     print LOG "Temp directory:\t$tempDir\n";
 59 |     print LOG "Output directory:\t$outputDir\n\n";
 60 |     
 61 |     
 62 |     print LOG "## sample information\n";
 63 |     print LOG "Prefix:\t$prefix\n";
 64 |     print LOG "Subtype:\t$subtype\n\n";
 65 | 
 66 |     
 67 |     print LOG "## advanced information\n";
 68 |     print LOG "Tumor sample id (if any):\t$tumor\n";
 69 |     print LOG "Germline sample id (if any):\t$germline\n";
 70 |     print LOG "Sample id:\t$id\n";
 71 |     print LOG "BED file (if any):\t$bed\n";
 72 |     print LOG "HG version:\t$hg19\n";
 73 |     print LOG "Expression file:\t$expression\n\n";
 74 |     close LOG;
 75 | }
 76 | 
 77 | 
 78 | sub moveFiles{
 79 |     my ($inputDir, $prefix, $logDir, $outputDir, $tempDir);
 80 |     my ( $outputFile, $tempFile, $logFile, $jsonFile);
 81 |     $inputDir = shift;
 82 |     $prefix = shift;
 83 |     $logDir = shift;
 84 |     $outputDir = shift;
 85 |     $tempDir = shift;
 86 |     if($inputDir ne $outputDir){
 87 | 	$outputFile = "$inputDir/$prefix*icages*.csv";
 88 | 	$jsonFile = "$inputDir/$prefix*.json";
 89 | 	!system("mv $outputFile $outputDir") or die "ERROR: cannot move iCAGES file\n";
 90 | 	!system("mv $jsonFile $outputDir") or die "ERROR: cannot move iCAGES file\n";
 91 |     }
 92 |     if($inputDir ne $logDir){
 93 | 	$logFile = "$inputDir/$prefix*.log";
 94 | 	!system("mv $logFile $logDir") or die "ERROR: cannot move iCAGES file\n";	
 95 |     }
 96 |     if($inputDir ne $tempDir){
 97 | 	$tempFile = "$inputDir/$prefix.*";
 98 | 	!system("mv $tempFile $tempDir") or die "ERROR: cannot move iCAGES file\n";
 99 |     }
100 | }
101 | 
102 | sub checkReady() {
103 |     my $icagesLocation = shift;
104 |     my $dbLocation = $icagesLocation . "db";
105 |     if(-d $dbLocation){
106 |         return 1;
107 |     }else{
108 |         die "ERROR: please download iCAGES database first https://github.com/WangGenomicsLab/icages \n";
109 |     }
110 | }
111 | 
112 | sub processArguments {
113 |     my ($help, $manual, $tumor, $germline, $id, $subtype, $logDir, $outputDir, $tempDir, $prefix, $inputDir, $inputLocation, $icagesLocation, $bed, $expression, $hg);
114 |     ################### initialize arguments ##################                                                                                                                                           
115 |     GetOptions( 'help|h' => \$help,
116 |     'manual|man|m' => \$manual,
117 |     'tumor|t=s' => \$tumor,   # name for tumor in the vcf file                                                                                                                                             
118 |     'germline|g=s' => \$germline ,   # name for germline in the vcf file                                                                                                                                  
119 |     'id|i=s' => \$id,   # sample identifier for the person of interest for multiple sample vcf file                                                                                                        
120 |     'subtype|s=s' => \$subtype, # cancer subtype                                                                                                                                        
121 |     'logdir=s' => \$logDir, # log directory                                                                                                                                                                
122 |     'outputdir=s' => \$outputDir,
123 |     'tempdir=s' => \$tempDir,
124 |     'prefix|p=s' => \$prefix,
125 |     'bed|b=s' => \$bed, # bed file describing structural variations
126 |     'expression|e=s' => \$expression,
127 |     'buildver=s' => \$hg		
128 | 	)or pod2usage ();
129 |     ################### locations ########################
130 |     if($hg and $hg ne "hg19" and $hg ne "hg38" and $hg ne "hg18"){
131 | 	pod2usage ();
132 |     }
133 |     @ARGV == 1 or pod2usage (); # check only has one argument 
134 |     $inputLocation = $ARGV[0];
135 |     $inputDir = $inputLocation;
136 |     if($inputDir =~ /\//){
137 | 	$inputDir =~ /(.*\/)(.*?)$/;
138 | 	$inputDir = $1;
139 |     }else{
140 | 	$inputDir = "./" ;
141 |     }
142 |     $icagesLocation = "$0";
143 |     $icagesLocation =~ /(.*)icages\.pl/;
144 |     $icagesLocation = $1;
145 |     $icagesLocation = "./" if $icagesLocation eq "";
146 |     ###### all directories should end up with / ###
147 |     if(!$prefix){
148 | 	$prefix = $ARGV[0];
149 | 	if($prefix =~ /\//){
150 | 	    $prefix =~ /(.*\/)(.*?)$/;   
151 | 	    $prefix = $2;
152 | 	}
153 |     }
154 |     if(!$tumor){
155 | 	$tumor = "NA";
156 |     }
157 |     if(!$germline){
158 | 	$germline = "NA";
159 |     }
160 |     if(!$id){
161 | 	$id = "NA";
162 |     }
163 |     if(!$subtype){
164 | 	$subtype = "NA";
165 |     }
166 |     if(!$bed){
167 | 	$bed = "NA";
168 |     }
169 |     if(!$expression){
170 | 	$expression = "NA";
171 |     }
172 |     if(!$hg){
173 | 	$hg = "hg19";
174 |     }
175 |     if(!$logDir){
176 | 	if(!$outputDir){
177 | 	    $logDir = $inputDir;
178 | 	}else{
179 | 	    $logDir = $outputDir;
180 | 	}
181 |     }
182 |     if(!$outputDir){
183 | 	$outputDir = $inputDir;
184 |     }
185 |     if(!$tempDir){
186 | 	if(!$outputDir){
187 | 	    $tempDir = $inputDir;
188 | 	}else{
189 | 	    $tempDir = $outputDir;
190 | 	}
191 |     }
192 |     if(-d $logDir){
193 |         if(!($logDir =~ /\/$/)){
194 | 	    $logDir = $logDir . "/";
195 | 	}	
196 |     }else{
197 | 	if(!($logDir =~ /\/$/)){
198 |             $logDir = $logDir . "/";
199 |         }
200 | 	mkdir($logDir) or die "ERROR: no such directory for log files\n";
201 |     }
202 |     if(-d $outputDir){
203 | 	if(!($outputDir =~ /\/$/)){
204 |             $outputDir = $outputDir ."/";
205 |         }
206 |     }else{
207 | 	if(!($outputDir =~ /\/$/)){
208 |             $outputDir = $outputDir ."/";
209 |         }
210 | 	mkdir($outputDir) or die "ERROR: no such directory for output files\n";
211 |     }
212 |     if(-d $tempDir){
213 | 	if(!($tempDir =~ /\/$/)){
214 |             $tempDir = $tempDir ."/";
215 |         }
216 |     }else{
217 | 	if(!($tempDir =~ /\/$/)){
218 |             $tempDir = $tempDir ."/";
219 |         }
220 | 	mkdir($tempDir) or die "ERROR: no such directory for temp files\n";
221 |     }
222 | 
223 |     ######################## arguments ########################
224 |     $help and pod2usage (-verbose=>1, -exitval=>1, -output=>\*STDOUT);
225 |     $manual and pod2usage (-verbose=>2, -exitval=>1, -output=>\*STDOUT);
226 |     return ($inputDir, $icagesLocation, $tumor, $germline, $id, $subtype , $logDir, $outputDir, $tempDir, $prefix, $bed, $hg, $expression);
227 | }
228 | 
229 | ######################################################################################################################################
230 | ############################################################ manual page #############################################################
231 | ######################################################################################################################################
232 | 
233 | =head1 NAME                                                                                                                                                                                                                                                                 
234 |            
235 |  iCAGES (integrated CAncer GEnome Score) command line package for web interface.
236 |                                                               
237 | =head1 SYNOPSIS
238 |                                                                                                                                                                                                       
239 |  icages.pl [options] <input>                                                                                                                                                                                                                                                  
240 |                                                                                                                                                                                                                                                                                
241 |  Options:                                                                                                                                                                                                                                                                     
242 |         -h, --help                      print help message   
243 |         -m, --manual                    print manual message
244 |         -t, --tumor                     name of column that contains tumor mutations in your vcf file (if you have multiple samples with tumor mutations, please use this option to select tumor mutations that you want to analyze)
245 |         -g, --germline                  name of column that contains germline mutations in your vcf file (if you have multiple samples with germline mutations, please use this option to select germline mutations that you want to compare your tumor mutations against to generate somatic mutation profiles for the sample you want to analyze)
246 |         -i, --id                        name of column that contains somatic mutations in your multiple sample vcf file with only somatic mutations (if you have multiple samples with tumor and germline mutations, please use -g and -t options instead)
247 |         -s, --subtype                   subtype of the cancer, valid options include "ACC", "BLCA", "BRCA", "CESC", "CHOL", "ESCA", "GBM", "HNSC", "KICH", "KIRC", "KIRP", "LAML", "LGG", "LIHC", "LUSC", "OV", "PAAD", "PCPG", "PRAD", "SARC", "SKCM", "STAD", "TGCT", "TGCA", "THYM", "UCEC", "UCS", "UVM"
248 |         --logdir                        directory for log files generated by iCAGES
249 |         --tempdir                       directory for temporary files generated by iCAGES
250 |         --outputdir                     directory for output files generated by iCAGES
251 |         -p, --prefix                    prefix of all files generated by iCAGES
252 |         -b, --bed                       additional bed file specifying the location of structural variations in the sample
253 |         --buildver                      reference genome version, valid options include "hg19" (default), "hg38" and "hg18"
254 |         -e, --expression                bed file describing gene expression patterns, the columns are chromosome, start, end, log fold changes
255 |  
256 |  Function: iCAGES predicts cancer driver genes given somatic mutations (in ANNOVAR/VCF format) from a patient.
257 |  
258 |  Example: icages.pl /path/to/input.vcf
259 |  
260 |  Installation: before using iCAGES, please first install it by 'perl icagesInitiate.pl' command.
261 |  
262 |  Version: 1.0
263 |  
264 |  Last update: Wed Feb 25 12:51:17 PST 2015
265 |  
266 | =head1 OPTIONS
267 | 
268 | =over 8
269 | 
270 | =item B<--help>
271 | 
272 |  print a brief usage message and detailed explanation of options.
273 | 
274 | =item B<--manual>
275 | 
276 |  print the manual page and exit.
277 | 
278 | =back
279 | 
280 | =cut
281 | 
282 | 


--------------------------------------------------------------------------------
/bin/icagesDrug.pl:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/perl
  2 | use strict;
  3 | use warnings;
  4 | use List::Util qw(min max);
  5 | use Pod::Usage;
  6 | use Getopt::Long;
  7 | 
  8 | ######################################################################################################################################
  9 | ######################################################## variable declaration ########################################################
 10 | ######################################################################################################################################
 11 | 
 12 | my ($rawInputFile, $icagesLocation, $prefix);
 13 | my (%biosystem, %neighbors, %activity, %onc, %sup, %icagesGenes);
 14 | # fda and clinical
 15 | # fda target
 16 | my (%fda, %clin, %fda_target, $fda_target_ref, $fdaRef, $clinRef);
 17 | my ($biosystemRef, $activityRef, $oncRef, $supRef, $icagesGenesRef, $neighborsRef, $supDrugRef, $oncDrugRef, $otherDrugRef);
 18 | 
 19 | 
 20 | ######################################################################################################################################
 21 | ########################################################### main  ####################################################################
 22 | ######################################################################################################################################
 23 | 
 24 | $rawInputFile = $ARGV[0];
 25 | $icagesLocation = $ARGV[1];
 26 | $prefix = $ARGV[2];
 27 | # add DGIdb database genes to reduce number of genes to query DGIdb
 28 | ($biosystemRef, $activityRef, $oncRef, $supRef, $fdaRef, $clinRef, $fda_target_ref, $supDrugRef, $oncDrugRef, $otherDrugRef) = &loadDatabase($icagesLocation);
 29 | 
 30 | %biosystem = %{$biosystemRef};
 31 | %activity = %{$activityRef};
 32 | %onc = %{$oncRef};
 33 | %sup = %{$supRef};
 34 | %fda = %{$fdaRef};
 35 | %clin = %{$clinRef};
 36 | %fda_target = %{$fda_target_ref};
 37 | # %dgiGenes = %{$dgiGenesRef};
 38 | $icagesGenesRef = &getiCAGES($rawInputFile, $prefix);
 39 | %icagesGenes = %{$icagesGenesRef};
 40 | 
 41 | $neighborsRef = &getNeighbors(\%icagesGenes, \%biosystem, \%onc, \%sup);
 42 | %neighbors = %{$neighborsRef};
 43 | &getDrugs ($rawInputFile, $icagesLocation, \%neighbors, \%onc, \%sup, $prefix, \%fda_target, $supDrugRef, $oncDrugRef, $otherDrugRef);
 44 | &processDrugs($rawInputFile, \%neighbors, \%activity, $prefix , \%fda, \%clin); # add fda and clin
 45 | 
 46 | ######################################################################################################################################
 47 | ############################################################# subroutines ############################################################
 48 | ######################################################################################################################################
 49 | 
 50 | sub loadDatabase {
 51 |     print "NOTICE: start loading Databases\n";
 52 |     my (%biosystem, %activity, %onc, %sup, %fda, %clin, %fda_target_ref, %supDrugDB, %oncDrugDB, %otherDrugDB);
 53 |     my ($icagesLocation, $DBLocation, $biosystemDB, $activityDB, $oncDB, $supDB, $fdaDB, $clinicalDB, $supDrugDB, $oncDrugDB, $otherDrugDB);
 54 |     $icagesLocation = shift;
 55 |     $DBLocation = $icagesLocation . "db/";
 56 |     $biosystemDB = $DBLocation . "biosystem.score";
 57 |     $activityDB = $DBLocation . "drug.score";
 58 |     $oncDB = $DBLocation . "oncogene.gene";
 59 |     $supDB = $DBLocation . "suppressor.gene";
 60 |     # fda and clinical trial, note that only one clinical trial would be provided, for more information, please visit myclinicaltrial.com
 61 |     $fdaDB = $DBLocation . "FDA_cancer.txt";
 62 |     $clinicalDB = $DBLocation . "ClinicalTrial.txt";
 63 | #    $dgiGenesDB = $DBLocation . "DGIdb.genes";
 64 |     $supDrugDB = $DBLocation . "suppressor.drug";
 65 |     $oncDrugDB = $DBLocation . "oncogene.drug";
 66 |     $otherDrugDB = $DBLocation . "othergene.drug";
 67 |     open(BIO, "$biosystemDB") or die "ERROR: cannot open $biosystemDB\n";
 68 |     open(ACT, "$activityDB") or die "ERROR: cannot open $activityDB\n";
 69 |     open(ONC, "$oncDB") or die "ERROR: cannot open $oncDB\n";
 70 |     open(SUP, "$supDB") or die "ERROR: cannot open $supDB\n";
 71 |     # fda and clinical trial
 72 |     open(FDA, "$fdaDB") or die "ERROR: cannot open $fdaDB\n";
 73 |     open(CLIN, "$clinicalDB") or die "ERROR: cannot open $clinicalDB\n";
 74 |  #   open(DGI, "$dgiGenesDB") or die "ERROR: cannot open $dgiGenesDB\n";
 75 |     open(SUPDRUG, "$supDrugDB") or die "ERROR: cannot open $supDrugDB\n";
 76 |     open(ONCDRUG, "$oncDrugDB") or die "ERROR: cannot open $oncDrugDB\n";
 77 |     open(OTHERDRUG, "$otherDrugDB") or die "ERROR: cannot open $otherDrugDB\n";
 78 | 
 79 |     while (<SUPDRUG>) {
 80 |         chomp;
 81 |         my $line = $_;
 82 |         my @line = split("\t", $_);
 83 | 	next unless defined $line[0] and defined $line[1];
 84 |         $supDrugDB{$line[0]}{$line[1]} = $line;
 85 | #	print "$line[0]\t$line[1]\t$line\n";
 86 |     }
 87 | 
 88 |     while (<ONCDRUG>) {
 89 |         chomp;
 90 |         my $line = $_;
 91 |         my @line = split("\t", $_);
 92 | 	next unless defined $line[0] and defined $line[1];
 93 |         $oncDrugDB{$line[0]}{$line[1]} = $line;
 94 | #	print "$line[0]\t$line[1]\t$line\n";
 95 |     }
 96 |     
 97 |     while (<OTHERDRUG>) {
 98 |         chomp;
 99 |         my $line = $_;
100 |         my @line = split("\t", $_);
101 | 	next unless defined $line[0] and defined $line[1];
102 |         $otherDrugDB{$line[0]}{$line[1]} = $line;
103 | #	print "$line[0]\t$line[1]\t$line\n";
104 |     }
105 |     
106 |   #  while(<DGI>){
107 |    #     chomp;
108 |     #    $dgiGenes{$_} = 1;
109 |     #}
110 |     
111 |     while(<BIO>){
112 |         chomp;
113 |         my @line = split("\t", $_);
114 |         $biosystem{$line[0]}{$line[1]} = $line[2];
115 |     }
116 |     
117 |     while(<ACT>){
118 |         chomp;
119 |         my @line = split("\t", $_);
120 |         $activity{$line[0]} = $line[1];
121 |     }
122 |     
123 |     while(<ONC>){
124 |         chomp;
125 |         my @line = split("\t", $_);
126 |         $onc{$line[0]} = 1;
127 |     }
128 |     
129 |     while(<SUP>){
130 |         chomp;
131 |         my @line = split("\t", $_);
132 |         $sup{$line[0]} = 1;
133 |     }
134 |     
135 |     while(<FDA>){
136 |         chomp;
137 |         my @line = split("\t", $_);
138 |         # drugname: subtype,tradename
139 |         $line[0] = "NA" unless defined $line[0];
140 |         $line[1] = "NA" unless defined $line[1];
141 |         $line[3] = "NA" unless defined $line[3];
142 |         my $content = $line[1] . "," . $line[3];
143 |         $fda{$line[0]} = $content;
144 |         my @target = split(";", $line[2]);
145 |         for(0..$#target){
146 |             $fda_target{$target[$_]}{$line[0]} = 1;
147 |         }
148 |     }
149 |     
150 |     while(<CLIN>){
151 |         chomp;
152 |         my @line = split("\t", $_);
153 |         # drugname: trialname,organization,phase,url
154 |         $line[0] = "NA" unless defined $line[0];
155 |         $line[1] = "NA"unless defined $line[1];
156 |         $line[2] = "NA"unless defined $line[2];
157 |         $line[3] = "NA" unless defined $line[3];
158 |         $line[4] = "NA" unless defined $line[4];
159 |         my $content = $line[1] . "," . $line[2] . "," . $line[3] . "," . $line[4];
160 |         $clin{$line[0]} = $content;
161 |     }
162 |     
163 |     close FDA;
164 |     close CLIN;
165 |     close SUP;
166 |     close ONC;
167 |     close ACT;
168 |     close BIO;
169 |     close SUPDRUG;
170 |     close ONCDRUG;
171 |     close OTHERDRUG;
172 |     return (\%biosystem, \%activity, \%onc, \%sup, \%fda, \%clin, \%fda_target, \%supDrugDB, \%oncDrugDB, \%otherDrugDB);
173 | }
174 | 
175 | sub getiCAGES{
176 |     print "NOTICE: start process gene files from iCAGES layer two\n";
177 |     my ($rawInputFile, $icagesGenes, $prefix);
178 |     my %icagesGenes;
179 |     $rawInputFile = shift;
180 |     $prefix = shift;
181 |     $icagesGenes = $rawInputFile . $prefix . ".annovar.icagesGenes.csv";
182 |     open(GENES, "$icagesGenes") or die "ERROR: cannot open $icagesGenes\n";
183 |     my $header = <GENES>;
184 |     while(<GENES>){
185 |         chomp;
186 |         my @line = split(",", $_);
187 |         $icagesGenes{$line[0]} = $line[5];
188 |     }
189 |     return \%icagesGenes;
190 | }
191 | 
192 | sub getNeighbors{
193 |     print "NOTICE: start getting top five neighbors for mutated genes\n";
194 |     my (%icagesGenes, %biosystem, %neighbors, %onc, %sup);
195 |     my ($icagesGenesRef, $biosystemRef, $oncRef, $supRef);
196 |     my $index;
197 |     $icagesGenesRef = shift;
198 |     $biosystemRef = shift;
199 |     $oncRef = shift;
200 |     $supRef = shift;
201 | #    $dgiGenesRef = shift;
202 |     %icagesGenes = %{$icagesGenesRef};
203 |     %biosystem = %{$biosystemRef};
204 |     %onc = %{$oncRef};
205 |     %sup = %{$supRef};
206 | 
207 |     foreach my $gene (sort keys %icagesGenes){
208 |         $index = 0;
209 |         $neighbors{$gene}{$gene}{"biosystem"} = 1;
210 |         $neighbors{$gene}{$gene}{"icages"} = $icagesGenes{$gene};
211 |         $neighbors{$gene}{$gene}{"product"} = $icagesGenes{$gene};
212 |         foreach my $neighbor (sort { $biosystem{$b} <=> $biosystem{$a} }  keys %{$biosystem{$gene}}){
213 | #	    if(exists $onc{$gene} or exists $sup{$gene}){
214 | #		last if $index == 10;
215 | #	    }else{
216 |             last if $index == 5;
217 | #	    }
218 |             $index ++;
219 |             $neighbors{$neighbor}{$gene}{"biosystem"} = $biosystem{$gene}{$neighbor};
220 |             $neighbors{$neighbor}{$gene}{"icages"} = $icagesGenes{$gene};
221 |             $neighbors{$neighbor}{$gene}{"product"} =  $icagesGenes{$gene} * $biosystem{$gene}{$neighbor};
222 |         }
223 |     }
224 |     return \%neighbors;
225 | }
226 | 
227 | sub getDrugs{
228 |     print "NOTICE: start getting drugs for seed genes\n";
229 |     my (%neighbors, %onc, %sup);
230 |     my ($neighborsRef, $oncRef, $supRef);
231 |     my (@seeds, @onc, @sup, @other);
232 |     my ($onc, $sup, $other);
233 |     $onc="";
234 |     $sup="";
235 |     $other="";
236 |     # fda
237 |     my $fda_target_ref;
238 |     my %fda_target;
239 | 
240 |     my ($rawInputFile, $supFile, $oncFile, $otherFile, $icagesLocation, $callDgidb, $prefix);
241 |     $rawInputFile = shift;
242 |     $icagesLocation = shift;
243 |     $neighborsRef = shift;
244 |     $oncRef = shift;
245 |     $supRef = shift;
246 |     $prefix = shift;
247 |     $fda_target_ref = shift;
248 |     # three kinds of drugs
249 |     my $supDrugRef = shift;
250 |     my $oncDrugRef = shift;
251 |     my $otherDrugRef = shift;
252 |     my %supDrug;
253 |     my %oncDrug;
254 |     my %otherDrug;
255 |     %supDrug = %{$supDrugRef};
256 |     %oncDrug = %{$oncDrugRef};
257 |     %otherDrug = %{$otherDrugRef};
258 |     
259 |     %neighbors = %{$neighborsRef};
260 |     %onc = %{$oncRef};
261 |     %sup = %{$supRef};
262 |     %fda_target = %{$fda_target_ref};
263 |     @seeds = keys %neighbors;
264 |     $callDgidb = $icagesLocation . "bin/DGIdb/getDrugList.pl";
265 |     $supFile = $rawInputFile . $prefix . ".suppressor.drug";
266 |     $oncFile = $rawInputFile . $prefix.".oncogene.drug";
267 |     $otherFile = $rawInputFile . $prefix. ".other.drug";
268 | 
269 |     # find FDA drugs for genes in case DGIdb missed it
270 |     my %all_genes;
271 |     for(0..$#seeds){
272 | #        print "$seeds[$_]\n";
273 |         if(exists $sup{$seeds[$_]} and $seeds[$_] =~ /[a-zA-Z0-9]+/){
274 | #            if(exists $dgiGenes{$seeds[$_]}){
275 |                 push @sup, $seeds[$_];
276 | #            }
277 |             $all_genes{$seeds[$_]} = 1;
278 |         }elsif(exists $onc{$seeds[$_]} and $seeds[$_] =~ /[a-zA-Z0-9]+/){
279 | #            if(exists $dgiGenes{$seeds[$_]}){
280 |                 push @onc, $seeds[$_];
281 | #            }
282 |             $all_genes{$seeds[$_]} = 1;
283 |         }else{
284 |             if($seeds[$_] =~ /[a-zA-Z0-9]+/){
285 | #                if(exists $dgiGenes{$seeds[$_]}){
286 |                     push @other, $seeds[$_];
287 |  #               }
288 |             $all_genes{$seeds[$_]} = 1;
289 |             }
290 |         }
291 |     }
292 |     $sup = join(",", @sup);
293 |     $onc = join(",", @onc);
294 |     $other = join(",", @other);
295 |    
296 |     open(SUP, ">$supFile") or die;
297 |     open(ONC, ">$oncFile") or die;
298 |     open(OTHER, ">$otherFile") or die;
299 |     
300 |     
301 |     if($sup ne "" ){
302 | #	print "iCAGES: $callDgidb --genes='$sup' --interaction_type='activator,other/unknown,n/a,inducer,positive allosteric modulator,potentiator,stimulator' --source_trust_levels='Expert curated' --output='$supFile'";
303 |         for(0..$#sup) {
304 | #	    print "$sup[$_]\n";
305 |            if (exists $supDrug{$sup[$_]}) {
306 |                 foreach my $key (sort keys %{$supDrug{$sup[$_]}}) {
307 |                     print SUP "$supDrug{$sup[$_]}{$key}\n";
308 |                 }
309 |             }
310 |         }
311 |         
312 |         # !system("$callDgidb --genes='$sup' --interaction_type='activator,other/unknown,n/a,inducer,positive allosteric modulator,potentiator,stimulator' --source_trust_levels='Expert curated' --output='$supFile'") or warn "ERROR: cannot gt drugs\n$callDgidb --genes='$sup' --interaction_type='activator,other/unknown,n/a,inducer,positive allosteric modulator,potentiator,stimulator' --source_trust_levels='Expert curated' --output='$supFile'";
313 |     }
314 |     if($onc ne ""){
315 | #	print "iCAGES: $callDgidb --genes='$onc' --interaction_type='agonist,antisense,competitive,immunotherapy,inhibitory allosteric modulator,inverse agonist,negative modulator,partial agonist,partial antagonist,vaccine,inhibitor,suppressor,antibody,antagonist,blocker,other/unknown,n/a' --source_trust_levels='Expert curated' --output='$oncFile'";
316 |         #!system("$callDgidb --genes='$onc' --interaction_type='agonist,antisense,competitive,immunotherapy,inhibitory allosteric modulator,inverse agonist,negative modulator,partial agonist,partial antagonist,vaccine,inhibitor,suppressor,antibody,antagonist,blocker,other/unknown,n/a' --source_trust_levels='Expert curated' --output='$oncFile'") or warn "ERROR: cannot get drugs\n$callDgidb --genes='$onc' --interaction_type='agonist,antisense,competitive,immunotherapy,inhibitory allosteric modulator,inverse agonist,negative modulator,partial agonist,partial antagonist,vaccine,inhibitor,suppressor,antibody,antagonist,blocker,other/unknown,n/a' --source_trust_levels='Expert curated' --output='$oncFile'\n";
317 |         
318 |         for(0..$#onc) {
319 | #	    print "$onc[$_]\n";
320 |             if (exists $oncDrug{$onc[$_]}) {
321 |                 foreach my $key (sort keys %{$oncDrug{$onc[$_]}}) {
322 |                     print ONC "$oncDrug{$onc[$_]}{$key}\n";
323 |                 }
324 |             }
325 |         }
326 |     }
327 |     if($other ne ""){
328 |         #	print "$callDgidb --genes='$other' --source_trust_levels='Expert curated' --output='$otherFile'";
329 |         
330 |         #!system("$callDgidb --genes='$other' --source_trust_levels='Expert curated' --output='$otherFile'") or warn "ERROR: cannot get drugs\n$callDgidb --genes='$other' --source_trust_levels='Expert curated' --output='$otherFile'\n";
331 |         
332 |         for(0..$#other) {
333 | #	    print "$other[$_]\n";
334 |             if (exists $otherDrug{$other[$_]}) {
335 |                 foreach my $key (sort keys %{$otherDrug{$other[$_]}}) {
336 |                     print OTHER "$otherDrug{$other[$_]}{$key}\n";
337 |                 }
338 |             }
339 |         }
340 |         
341 |     }
342 |     
343 |     # create an output file
344 |     my $fdaDrug = $rawInputFile . $prefix. ".fda.drug";
345 |     open(FDADRUG, ">$fdaDrug") or die "ERROR: cannot create $fdaDrug for outputing FDA drugs\n";
346 |     foreach my $gene (sort keys %all_genes){
347 | 	if(exists $fda_target{$gene}){
348 | 	    foreach my $drug (sort keys %{$fda_target{$gene}}){
349 | 		print FDADRUG "$gene\t$drug\n";
350 | 	    }
351 | 	}
352 |     }
353 | }
354 | 
355 | 
356 | sub processDrugs{
357 |     print "NOTICE: start processing drugs from DGIdb\n";
358 |     my ($rawInputFile, $matchFile, $allDrugs, $icagesDrugs, $prefix);
359 |     my (%neighbors, %activity, %icagesDrug, %icagesPrint);
360 |     my ($neighborsRef, $activityRef);
361 |     my ($oncDrugFile, $supDrugFile, $otherDrugFile);
362 |     # fda and clin
363 |     my ($fdaRef, $clinRef);
364 |     my (%fda, %clin);
365 |     # fda drug list
366 |     my $FDADrugFile;
367 | 
368 |     $rawInputFile = shift;
369 |     $neighborsRef = shift;
370 |     $activityRef = shift;
371 |     $prefix = shift;
372 |     $fdaRef = shift;
373 |     $clinRef = shift;
374 |     %fda = %{$fdaRef};
375 |     %clin = %{$clinRef};
376 |     %neighbors = %{$neighborsRef};
377 |     %activity = %{$activityRef};
378 |     $matchFile = $rawInputFile . $prefix . ".*.drug";
379 |     $oncDrugFile = $rawInputFile . $prefix . ".oncogene.drug";
380 |     $supDrugFile = $rawInputFile . $prefix . ".suppressor.drug";
381 |     $otherDrugFile = $rawInputFile . $prefix. ".other.drug";
382 |     $FDADrugFile = $rawInputFile . $prefix. ".fda.drug";
383 |     $allDrugs = $rawInputFile . $prefix . ".drug.all";
384 |     $icagesDrugs = $rawInputFile . $prefix . ".annovar.icagesDrugs.csv";
385 |     if(! -e $oncDrugFile){
386 | 	!system("touch $oncDrugFile") or die "ERROR: cannot create $oncDrugFile\n";
387 |     }
388 |     if(! -e $supDrugFile){
389 | 	!system("touch $supDrugFile") or die "ERROR: cannot create $supDrugFile\n";
390 |     }
391 |     if(! -e $otherDrugFile){
392 | 	!system("touch $otherDrugFile") or die "ERROR: cannot create $otherDrugFile\n";
393 |     }
394 |     if(! -e $FDADrugFile){
395 |         !system("touch $FDADrugFile") or die "ERROR: cannot create $FDADrugFile\n";
396 |     }
397 |     !system("cat $oncDrugFile $supDrugFile $otherDrugFile $FDADrugFile > $allDrugs") or die "ERROR: cannot create an empty drug file\n";
398 |     open(DRUG, "$allDrugs") or die "ERROR: cannot open drug file $allDrugs\n";
399 |     open(OUT, ">$icagesDrugs") or die "ERROR: cannot open $icagesDrugs\n";
400 |     while(<DRUG>){
401 |         chomp;
402 |         my @line = split("\t", $_);
403 |         next unless defined $line[1];
404 |         my $neighbor = $line[0];
405 |         my $index = 0;
406 |         foreach my $target (sort { $neighbors{$neighbor}{$b}{"product"} <=> $neighbors{$neighbor}{$a}{"product"} } keys %{$neighbors{$neighbor}}){
407 |             last if $index == 1;
408 |             if(exists $icagesDrug{$line[1]}{$neighbor}){
409 |                 if($neighbors{$neighbor}{$target}{"product"} > $icagesDrug{$line[1]}{$neighbor}{$target}{"biosystem"} * $icagesDrug{$line[1]}{$neighbor}{$target}{"icages"}){
410 |                     $icagesDrug{$line[1]}{$neighbor}{$target}{"biosystem"} = $neighbors{$neighbor}{$target}{"biosystem"};
411 |                     $icagesDrug{$line[1]}{$neighbor}{$target}{"icages"} = $neighbors{$neighbor}{$target}{"icages"} ;
412 | 		    if(exists $activity{$line[1]}){
413 |                         $icagesDrug{$line[1]}{$neighbor}{$target}{"activity"} = $activity{$line[1]};
414 |                     }else{
415 |                         $icagesDrug{$line[1]}{$neighbor}{$target}{"activity"} = 0;
416 |                     }
417 | #		    $icagesDrug{$line[1]}{$neighbor}{$target}{"activity"} = 1 if exists $fda{$line[1]} ;
418 | #		    $icagesDrug{$line[1]}{$neighbor}{$target}{"activity"} = max(0.5, $icagesDrug{$line[1]}{$neighbor}{$target}{"activity"}) if  exists $clin{$line[1]};
419 |                 }
420 |             }else{
421 |                 $icagesDrug{$line[1]}{$neighbor}{$target}{"biosystem"} = $neighbors{$neighbor}{$target}{"biosystem"};
422 |                 $icagesDrug{$line[1]}{$neighbor}{$target}{"icages"} = $neighbors{$neighbor}{$target}{"icages"} ;
423 |                 if(exists $activity{$line[1]}){
424 |                     $icagesDrug{$line[1]}{$neighbor}{$target}{"activity"} = $activity{$line[1]};
425 |                 }else{
426 |                     $icagesDrug{$line[1]}{$neighbor}{$target}{"activity"} = 0;
427 |                 }
428 | #		$icagesDrug{$line[1]}{$neighbor}{$target}{"activity"} = 1 if exists $fda{$line[1]} ;
429 | #		$icagesDrug{$line[1]}{$neighbor}{$target}{"activity"} = max(0.5, $icagesDrug{$line[1]}{$neighbor}{$target}{"activity"}) if exists $clin{$line[1]};
430 | 		
431 |             }
432 |             $index ++;
433 |         }
434 |     }
435 |     
436 |     ##### count drug
437 |     my $drugCount = 0;
438 |     my $gooddrugCount = 0;
439 |     foreach my $drug (sort keys %icagesDrug){
440 |         foreach my $neighbor (sort keys %{$icagesDrug{$drug}}){
441 |             foreach my $final (sort keys %{$icagesDrug{$drug}{$neighbor}}){
442 |                 my $icagesDrug = $icagesDrug{$drug}{$neighbor}{$final}{"biosystem"} * $icagesDrug{$drug}{$neighbor}{$final}{"icages"} * $icagesDrug{$drug}{$neighbor}{$final}{"activity"};
443 | 		my $tier = 3;
444 | 		if(exists $fda{$drug}){
445 | 		    $tier = 1;
446 | 		}elsif(exists $clin{$drug}){
447 | 		    $tier = 2;
448 | 		}
449 |                 $icagesPrint{$tier}{$drug}{"score"} = $icagesDrug;
450 | 
451 | 		if( $icagesDrug{$drug}{$neighbor}{$final}{"activity"} == 0){
452 | 		    $icagesDrug{$drug}{$neighbor}{$final}{"activity"} = "NA";
453 | 		}
454 | 		if($icagesDrug == 0){
455 | 		    $icagesDrug{$drug}{$neighbor}{$final}{"drug"} ="NA";
456 | 		}
457 |                 $icagesPrint{$tier}{$drug}{"content"} = "$drug,$final,$neighbor,$icagesDrug{$drug}{$neighbor}{$final}{\"icages\"},$icagesDrug{$drug}{$neighbor}{$final}{\"biosystem\"},$icagesDrug{$drug}{$neighbor}{$final}{\"activity\"},$icagesDrug,$tier";
458 |             }
459 |         }
460 |     }
461 |     print OUT "drugName,finalTarget,directTarget,iCAGESGeneScore,maxBioSystemsScore,maxActivityScore,icagesDrugScore,tier,FDA_approvedSubtype,FDA_activeIngredient,CLT_name,CLT_organization,CLT_phase,CLT_url\n";
462 |     foreach my $tier (sort {$a <=> $b} keys %icagesPrint){
463 |     foreach my $drug (sort {$icagesPrint{$tier}{$b}{"score"} <=> $icagesPrint{$tier}{$a}{"score"}} keys %{$icagesPrint{$tier}}){
464 |         $drugCount ++;
465 |         $gooddrugCount ++ if $icagesPrint{$tier}{$drug}{"score"} > 0.5;
466 | 	# check fda and clinical trial
467 | 	my $printContent = $icagesPrint{$tier}{$drug}{"content"};
468 | 	if(exists $fda{$drug}){
469 | 	    $printContent .= "," . $fda{$drug};
470 | 	}else{
471 | 	    $printContent .= ",NA,NA";
472 | 	}
473 | 	if(exists $clin{$drug}){
474 | 	    $printContent .= "," . $clin{$drug};
475 | 	}else{
476 | 	    $printContent .= ",NA,NA,NA,NA";
477 | 	}
478 |         print OUT "$printContent\n";
479 |     }
480 |     }
481 |     close OUT;
482 |     close DRUG;
483 |     
484 |     my $logFile = $rawInputFile . $prefix . ".annovar.icages.log";
485 |     open(LOG, ">>$logFile") or die "iCAGES: cannot open file $logFile\n";
486 |     print LOG "########### iCAGES Drug Summary ###########\n";
487 |     print LOG "## basic information\n";
488 |     print LOG "Total: $drugCount\n";
489 |     print LOG "Good drug (iCAGES drug score >= 0.5): $gooddrugCount\n";
490 | }
491 | 
492 | 
493 | 
494 | 
495 | 
496 | 
497 | 
498 | 


--------------------------------------------------------------------------------
/bin/icagesMutation.pl:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/perl
  2 | use strict;
  3 | use warnings;
  4 | use List::Util qw(min max);	
  5 | use Pod::Usage;
  6 | use Getopt::Long;
  7 | 
  8 | ######################################################################################################################################
  9 | ######################################################## variable declaration ########################################################
 10 | ######################################################################################################################################
 11 | 
 12 | my ($annovarInputFile,$rawInputFile, $inputDir ,$icagesLocation , $tumor ,$germline, $id ,$prefix, $bed, $hg, $expression);
 13 | my $nowString;
 14 | my (%sup, %onc);
 15 | 
 16 | ######################################################################################################################################
 17 | ########################################################### main  ####################################################################
 18 | ######################################################################################################################################
 19 | $rawInputFile = $ARGV[0];
 20 | $inputDir = $ARGV[1];
 21 | $icagesLocation = $ARGV[2];
 22 | $tumor = $ARGV[3];
 23 | $germline = $ARGV[4];
 24 | $id = $ARGV[5];
 25 | $prefix = $ARGV[6];
 26 | $bed = $ARGV[7];
 27 | $hg = $ARGV[8];
 28 | $expression = $ARGV[9];
 29 | $nowString = localtime();
 30 | $annovarInputFile = &runAnnovar($rawInputFile, $inputDir ,$icagesLocation ,$tumor ,$germline ,$id, $prefix, $bed, $hg , $expression);
 31 | &processAnnovar($annovarInputFile, $hg, $icagesLocation);
 32 | 
 33 | ######################################################################################################################################
 34 | ############################################################# subroutines ############################################################
 35 | ######################################################################################################################################
 36 | 
 37 | sub runAnnovar {
 38 |     print "NOTICE: start runing iCAGES packge at $nowString\n";
 39 |     my ($rawInputFile, $annovarInputFile);                                             #ANNOVAR input files
 40 |     my ($icagesLocation, $callAnnotateVariation, $DBLocation);                         #ANNOVAR commands
 41 |     my ($tumor, $germline ,$id ,$prefix);                                     #VCF conversion paramters & prefix for output
 42 |     my ($radialSVMDB, $radialSVMIndex, $funseq2DB, $funseq2Index, $refGeneDB, $refGeneIndex, $cnvDB);           #ANNOVAR DB files: iCAGES score (index), refGene (fasta), dbSNP
 43 |     my ($log, $annovarLog);
 44 |     my $nowString = localtime;                                                                                  #SYSTEM local time
 45 |     my $bed;    # location of bed file
 46 |     my $expression; # location of expression log change file
 47 |     $rawInputFile = shift;
 48 |     $inputDir = shift;
 49 |     $icagesLocation = shift;
 50 |     $tumor = shift;
 51 |     $germline = shift;
 52 |     $id = shift;
 53 |     $prefix = shift;
 54 |     $bed = shift;
 55 |     $hg = shift;
 56 |     $expression = shift;
 57 |     $callAnnotateVariation = $icagesLocation . "bin/annovar/annotate_variation.pl";
 58 |     $annovarInputFile = $inputDir . "/" . $prefix . ".annovar";
 59 |     $DBLocation = $icagesLocation . "db/";
 60 |     &formatConvert($rawInputFile, $annovarInputFile, $icagesLocation, $tumor, $germline , $id , $bed , $expression);
 61 |     &divideMutation($annovarInputFile);
 62 |     &loadDatabase($DBLocation);
 63 |     &annotateMutation($icagesLocation, $annovarInputFile, $hg);
 64 |     return $annovarInputFile;
 65 | }
 66 | 
 67 | 
 68 | sub processAnnovar{
 69 |     print "NOTICE: start processing output from ANNOVAR\n";
 70 |     my $annovarInputFile = shift;
 71 |     my $hg = shift;
 72 |     my $icagesLocation = shift;
 73 |     # still have to load onc gene set and suppressor gene set
 74 |     my $oncDB = $icagesLocation . "/db/oncogene.gene";
 75 |     my $supDB = $icagesLocation . "/db/suppressor.gene";
 76 |     my (%onc, %sup);
 77 |     open(ONCDB, "$oncDB") or die "ERROR: cannot open $oncDB\n";
 78 |     open(SUPDB, "$supDB") or die "ERROR: cannot open $supDB\n";
 79 |     while(<ONCDB>){
 80 | 	chomp;
 81 | 	$onc{$_} =1;
 82 |     }
 83 |     close ONCDB;
 84 |     while(<SUPDB>){
 85 | 	chomp;
 86 | 	$sup{$_} = 1;
 87 |     }
 88 |     close SUPDB;
 89 |     my $annovarVariantFunction = $annovarInputFile . ".variant_function";
 90 |     # create a temp file to store bed file of variant function : chr start end score
 91 |     my $genebed = $annovarInputFile . ".variant_function.bed";
 92 |     open(GENEFORBED, "$annovarVariantFunction") or die "ERROR: cannot open $annovarVariantFunction\n";
 93 |     open(GENEBED, ">$genebed") or die "ERROR: cannot create $genebed file\n";
 94 |     my $lastline = "" ;
 95 |     while(<GENEFORBED>){
 96 | 	chomp;
 97 | 	my @line = split("\t", $_);
 98 | 	my $printout = "$line[2]\t$line[3]\t$line[4]"; 
 99 | 	if($printout eq $lastline){
100 | 	    next;
101 | 	}else{
102 | 	    print GENEBED "$line[2]\t$line[3]\t$line[4]\n";
103 | 	}
104 | 	$lastline = $printout;
105 |     }
106 |     close GENEBED;
107 |     close GENEFORBED;
108 |     my $annovarExonVariantFunction = $annovarInputFile . ".exonic_variant_function";
109 |     my $annovarRadialSVM = $annovarInputFile . ".snp." . $hg . "_iCAGES_dropped";
110 |     my $annovarCNV = $annovarInputFile . ".cnv." . $hg . "_cnv";
111 |     # create a temp file to store bed file of cnv with this format : chr start end score
112 |     my $cnvbed = $annovarInputFile . ".cnv." . $hg . "_cnv.bed";
113 | 
114 |     # create a final file to store the final result of bedtools intersect
115 |     my $cnvfinal = $annovarInputFile . ".cnv.final";
116 | 
117 |     my $annovarFunseq2 = $annovarInputFile . ".snp." . $hg . "_funseq2_dropped";
118 |     my $icagesMutations = $annovarInputFile . ".icagesMutations.csv";
119 |     # add bedtools 
120 |     my $bedtools = $icagesLocation . "/bin/bedtools/bin/bedtools";
121 |     open(GENE, "$annovarVariantFunction") or die "ERROR: cannot open file $annovarVariantFunction\n";
122 |     open(EXON, "$annovarExonVariantFunction") or die "ERROR: cannot open file $annovarExonVariantFunction\n";
123 |     if(!-e $annovarCNV){
124 | 	!system("touch $annovarCNV") or die "ERROR: cannot create file $annovarCNV\n";
125 |     }
126 |     if(!-e $annovarRadialSVM){
127 | 	!system("touch $annovarRadialSVM") or die "ERROR: cannot create file $annovarRadialSVM\n";
128 |     }
129 |     if(!-e $annovarFunseq2){
130 | 	!system("touch $annovarFunseq2") or die "ERROR: cannot create file $annovarFunseq2\n";
131 |     }
132 | 
133 |     open(CNV, "$annovarCNV") or die "ERROR: cannot open file $annovarCNV\n";
134 |     open(RADIAL, "$annovarRadialSVM") or die "ERROR: cannot open file $annovarRadialSVM\n";
135 |     open(FUNSEQ, "$annovarFunseq2") or die "ERROR: cannot open file $annovarFunseq2\n";
136 |     open(OUT, ">$icagesMutations") or die "ERROR: cannot open file $icagesMutations\n";
137 |     my (%radialSVM, %funseq, %cnv, %exon);
138 |     my (%pointcoding);
139 |     my %icagesMutations;
140 |     
141 |     ######## count location information
142 |     my $exonCount = 0;
143 |     my $intronCount = 0;
144 |     my $noncodingRNACount = 0;
145 |     my $intergenicCount = 0;
146 |     my $otherCount = 0;
147 |     
148 |     ######## count annotation information
149 |     my $radialSVMCount = 0;
150 |     my $funseqCount = 0;
151 |     my $cnvCount = 0;
152 |     
153 |     while(<RADIAL>){
154 |         chomp;
155 |         my @line;
156 |         my $key;
157 |         @line = split(/\t/, $_);
158 |         $key = "$line[2],$line[3],$line[4],$line[5],$line[6]";
159 |         $radialSVM{$key} = $line[1];
160 |     }
161 |     while(<FUNSEQ>){
162 |         chomp;
163 |         my @line;
164 |         my $key;
165 |         @line = split(/\t/, $_);
166 |         $key = "$line[2],$line[3],$line[4],$line[5],$line[6]";
167 |         $funseq{$key} = $line[1];
168 |     }
169 | 
170 |     # cnv cannot be processed using key and value
171 |     # create a file to temporarily store
172 | 
173 |     open(CNVBED, ">$cnvbed") or die "ERROR: cannot open $cnvbed for write:\n";
174 |     
175 |     while(<CNV>){
176 |         chomp;
177 |         my @line;
178 |         my $key;
179 |         my $score;
180 |         @line = split(/\t/, $_);
181 |         $score = $line[1];
182 |         $score =~ /Score=(.*);/;
183 |         $score = $1;
184 | 	# chr start end score
185 | 	print CNVBED "$line[2]\t$line[3]\t$line[4]\t$score\n";
186 |    }
187 |     close CNVBED;
188 |     
189 | 
190 |     # get intersect
191 |     if(-z $cnvbed ){
192 | 	!system("touch $cnvfinal") or die "ERROR: cannot create file $cnvfinal\n";
193 |     }else{
194 | 	!system("$bedtools intersect -a $cnvbed -b $genebed -wa > $cnvfinal") or die "ERROR: cannot find intersect using bedtools, please check whether or not you have installed bedtools\n";
195 |     }
196 |     
197 |     open(CNVFINAL, "$cnvfinal") or die "ERROR: cannot open $cnvfinal for read:\n";
198 |     while(<CNVFINAL>){
199 | 	chomp;
200 | 	my @line = split("\t", $_);
201 | 	# note that this key is different for snv
202 | 	my $key = "$line[0],$line[1],$line[2]";             
203 | 	$cnv{$key} = $line[3]; 
204 |     }
205 | 
206 |     while(<EXON>){
207 |         chomp;
208 |         my (@line, @syntax, @content);
209 |         my ($key, $mut, $pro);
210 |         @line = split(/\t/, $_);
211 |         @syntax = split(",", $line[2]);
212 |         @content = split(":", $syntax[0]);
213 |         $mut = $content[3];
214 |         $pro = $content[4];
215 |         $key = "$line[3],$line[4],$line[5],$line[6],$line[7]";
216 |         $exon{$key}{"mutationSyntax"} = $mut;
217 |         $exon{$key}{"proteinSyntax"} = $pro;
218 |         if($line[4] == $line[5]){
219 |             $pointcoding{$key} = 1;
220 |         }
221 |     }
222 |     while(<GENE>){
223 |         chomp;
224 |         my @line;
225 |         my ($key, $gene);                                                            #hash key used for fetch radial SVM score from %radialSVM: mutation->radialSVM
226 |         my ($category, $mutationSyntax, $proteinSyntax, $scoreCategory, $score);
227 |         @line = split(/\t/, $_);
228 |         $gene = $line[1];
229 |         next unless defined $gene;
230 |         $key = "$line[2],$line[3],$line[4],$line[5],$line[6]";
231 | 	# note that structural variation key is different !!! chr,start,end;
232 | 	my $structKey = "$line[2],$line[3],$line[4]";
233 |         my %cnvScore ; # we also need this hash to store cnv score for each gene 
234 | 	
235 | 	next unless defined $key;
236 |         next unless defined $structKey;
237 | 	
238 | 
239 |         ####### process gene for noncoding variants
240 |         my @printGene;
241 |         if($gene =~ /(.*?)\(dist=(.*?)\),(.*?)\(dist=(.*?)\)/){
242 |             my $gene1 = $1;
243 |             my $gene2 = $3;
244 |             my $dist1 = $2;
245 |             my $dist2 = $4;
246 | 	    if($dist1 eq "NONE" and $dist2 eq "NONE"){
247 | 		$printGene[0] = $gene1;
248 | 	    }elsif($dist1 eq "NONE"){
249 | 		$printGene[0] = $gene2;
250 | 	    }elsif($dist2 eq "NONE"){
251 | 		$printGene[0] = $gene1;
252 | 	    }elsif($dist1 <= $dist2){
253 |                 $printGene[0] = $gene1;
254 |             }else{
255 |                 $printGene[0] = $gene2;
256 |             }
257 |         }elsif($gene =~ /([A-Z|0-9|-]+?)\(.*\),([A-Z|0-9|-]+?)\(.*\)$/){
258 |             $printGene[0] = $1;
259 |             $printGene[1] = $3;
260 |         }elsif($gene =~ /([A-Z|0-9|-]+?)\(.*\);([A-Z|0-9|-]+?)\(.*\)$/){
261 |             $printGene[0] = $1;
262 |             $printGene[1] = $3;
263 |         }elsif($gene =~ /([A-Z|0-9|-]+?)\(.*\)$/){
264 |             $printGene[0] = $1;
265 |         }elsif($gene =~ /;/ or $gene =~ /,/){
266 |             my @gene = split(/;|,/, $gene);
267 |             for(0..$#gene){
268 |                 $printGene[$_] = $gene[$_];
269 |             }
270 |         }else{
271 |             $printGene[0] = $gene;
272 |         }
273 | 
274 |         
275 |         if($line[0] =~ /^exonic/ || $line[0] =~ /^splicing/ ){
276 |             $exonCount ++;
277 |         }elsif($line[0] =~ /^intron/){
278 |             $intronCount ++;
279 |         }elsif($line[0] =~ /^ncRNA/){
280 |             $noncodingRNACount ++;
281 |         }elsif($line[0] =~ /^intergenic/){
282 |             $intergenicCount ++;
283 |         }else{
284 |             $otherCount ++;
285 |         }
286 |        
287 |         if ($line[3] == $line[4]){
288 |             if(exists $pointcoding{$key}){
289 |                 $radialSVMCount ++;
290 |                 $category = "point coding";
291 |                 if(defined $exon{$key}{"mutationSyntax"}){
292 |                     $mutationSyntax = $exon{$key}{"mutationSyntax"};
293 |                 }else{
294 |                     $mutationSyntax = "NA";
295 |                 }
296 |                 if(defined $exon{$key}{"proteinSyntax"}){
297 |                     $proteinSyntax = $exon{$key}{"proteinSyntax"};
298 |                 }else{
299 |                      $proteinSyntax = "NA";
300 |                 }
301 |                 $scoreCategory = "radial SVM";
302 |                 if(exists $radialSVM{$key}){
303 |                     $score = $radialSVM{$key}
304 |                 }else{
305 |                     $score = "NA";
306 |                 }
307 |             }else{
308 |                 $funseqCount ++;
309 |                 $category = "point noncoding";
310 |                 $mutationSyntax = "NA";
311 |                 $proteinSyntax = "NA";
312 |                 $scoreCategory = "FunSeq2";
313 |                 if(exists $funseq{$key}){
314 |                     $score = $funseq{$key}
315 |                 }else{
316 |                     $score = "NA";
317 |                 }
318 |             }
319 |         }else{
320 |             $cnvCount ++ ;
321 |             $category = "structural variation";
322 |             if(exists $exon{$key}{"mutationSyntax"} and exists $exon{$key}{"proteinSyntax"}){
323 |                 $mutationSyntax = $exon{$key}{"mutationSyntax"};
324 |                 $proteinSyntax = $exon{$key}{"proteinSyntax"};
325 |             }else{
326 |                 $mutationSyntax = "NA";
327 |                 $proteinSyntax = "NA";
328 |             }
329 |             $scoreCategory = "CNV normalized signal";
330 | 	    for(0..$#printGene){
331 | 		if(exists $cnv{$structKey} and (exists $onc{$printGene[$_]} or exists $sup{$printGene[$_]})){
332 | 		    $score = $cnv{$structKey};
333 | 		}else{
334 | 		    $score = "NA";
335 | 		}
336 | 		$cnvScore{$printGene[$_]} = $score;
337 | 	    }
338 |         }
339 | 	
340 |         for(0..$#printGene){
341 |             $icagesMutations{$printGene[$_]}{$key}{"category"} = $category;
342 |             $icagesMutations{$printGene[$_]}{$key}{"mutationSyntax"} = $mutationSyntax;
343 |             $icagesMutations{$printGene[$_]}{$key}{"proteinSyntax"} = $proteinSyntax;
344 |             $icagesMutations{$printGene[$_]}{$key}{"scoreCategory"} = $scoreCategory;
345 | 	    if($category eq "structural variation"){
346 | 		$icagesMutations{$printGene[$_]}{$key}{"score"} = $cnvScore{$printGene[$_]};
347 | 	    }else{
348 | 		$icagesMutations{$printGene[$_]}{$key}{"score"} = $score;
349 | 	    }
350 |         }
351 |     }
352 |     print OUT "geneName,chrmosomeNumber,start,end,reference,alternative,category,mutationSyntax,proteinSyntax,scoreCategory,mutationScore\n";
353 |     foreach my $gene (sort keys %icagesMutations){
354 |         foreach my $mutation (sort keys %{$icagesMutations{$gene}}){
355 |            if (defined $gene and defined $mutation and defined $icagesMutations{$gene}{$mutation}{"category"} and defined $icagesMutations{$gene}{$mutation}{"mutationSyntax"} and defined $icagesMutations{$gene}{$mutation}{"proteinSyntax"} and defined $icagesMutations{$gene}{$mutation}{"scoreCategory"} and defined $icagesMutations{$gene}{$mutation}{"score"}){
356 |                print OUT "$gene,$mutation,$icagesMutations{$gene}{$mutation}{\"category\"},$icagesMutations{$gene}{$mutation}{\"mutationSyntax\"},$icagesMutations{$gene}{$mutation}{\"proteinSyntax\"},$icagesMutations{$gene}{$mutation}{\"scoreCategory\"},$icagesMutations{$gene}{$mutation}{\"score\"}\n" ;
357 |             }else{
358 |                 print "$gene,$mutation,$icagesMutations{$gene}{$mutation}{\"category\"},$icagesMutations{$gene}{$mutation}{\"mutationSyntax\"},$icagesMutations{$gene}{$mutation}{\"proteinSyntax\"},$icagesMutations{$gene}{$mutation}{\"scoreCategory\"},$icagesMutations{$gene}{$mutation}{\"score\"}\n" ;
359 |             }
360 |         }
361 |     }
362 |     
363 |     my $logFile = $annovarInputFile . ".icages.log";
364 |     open(LOG, ">>$logFile") or die "iCAGES: cannot open file $logFile\n";
365 |     
366 |     print LOG "## location information\n";
367 |     print LOG "exonic/splice: $exonCount\n";
368 |     print LOG "intronic: $intronCount\n";
369 |     print LOG "noncoding RNA: $noncodingRNACount\n";
370 |     print LOG "intergenic: $intergenicCount\n";
371 |     print LOG "other: $otherCount\n\n";
372 |     
373 |     print LOG "## annotation information\n";
374 |     print LOG "point coding variants with radialSVM annotation: $radialSVMCount\n";
375 |     print LOG "point noncoding variants with FunSeq2 annotation: $funseqCount\n";
376 |     print LOG "Indels and SVs with CNV signal annotation: $cnvCount\n\n";
377 | }
378 | 
379 | 
380 | sub formatConvert{
381 |     # $rawInputFile, $annovarInputFile, $icagesLocation, $tumor, $germline , $id, $bed, $expression
382 |     print "NOTICE: start input file format checking and converting format if needed\n";
383 |     my ($rawInputFile, $annovarInputFile, $icagesLocation );
384 |     my ( $tumor, $germline , $id, $bed, $prefix , $expression); # parameters for vcf conversion
385 |     my  $callConvertToAnnovar;
386 |     my $callvcftools;
387 |     my $formatCheckFirstLine;
388 |     my $isbedFormat = 0; # check whether or not this file is in bed format
389 |     $rawInputFile = shift;
390 |     $annovarInputFile = shift;
391 |     $icagesLocation = shift;
392 |     $tumor = shift;
393 |     $germline = shift;
394 |     $id = shift;
395 |     $bed = shift;
396 |     $expression = shift;
397 |     open(IN, "$rawInputFile") or die "ERROR: cannot open $rawInputFile\n";
398 |     $formatCheckFirstLine = <IN>;
399 |     chomp $formatCheckFirstLine;
400 |     my $multipleSampleCheck = 0;    
401 |     if($formatCheckFirstLine =~ /^#/){
402 | 	# check the whole file to see if this is multple sample
403 | 
404 | 	while(<IN>){
405 | 	    chomp;
406 | 	    my $line = $_;
407 | 	    my @line = split;
408 | 	    if($line[0] =~ /^#CHROM/){
409 | 		if($#line > 9){
410 | 		    $multipleSampleCheck = 1;
411 | 		}
412 | 		last;
413 | 	    }
414 | 	}
415 |     }else{
416 | 	my @line = split(/\t| /, $formatCheckFirstLine);
417 | 	if($#line == 2 ){
418 | 	    $isbedFormat  = 1;
419 | 	}elsif($#line != 2 and defined  $line[3]  and defined $line[4] ){
420 | 	    if($line[3] !~ /[a|t|c|g|A|T|C|G|-]+/ or $line[4] !~ /[a|t|c|g|A|T|C|G|-]+/){
421 | 		$isbedFormat  = 1;
422 | 	    }
423 | 	}
424 |     }
425 |     close IN;
426 |     $callConvertToAnnovar = $icagesLocation . "bin/annovar/convert2annovar.pl";
427 |     $callvcftools = $icagesLocation . "bin/vcftools/bin/vcftools";
428 |     if($formatCheckFirstLine =~ /^##fileformat=VCF/){             #VCF
429 | 	if($multipleSampleCheck and $tumor eq "NA" and $germline eq "NA" and $id eq "NA"){
430 | 	    die "ERROR: your vcf file contains multiple samples please specify a valid sample identifier \n";
431 | 	}
432 | 	if($tumor ne "NA" and $germline ne "NA"){
433 | 	    !system("$callvcftools --recode --vcf $rawInputFile --indv $tumor --out $rawInputFile.$tumor") or die "ERROR: please specify a valid sample identifier for tumor sample\n";
434 | 	    !system("$callvcftools --recode --vcf $rawInputFile --indv $germline --out $rawInputFile.$germline") or die "ERROR: please specify a valid sample identifier for germline variants\n";
435 | 	    !system("$callConvertToAnnovar -format vcf4 $rawInputFile.$tumor.recode.vcf > $rawInputFile.$tumor.ann") or die "ERROR: cannot execute convert2annovar.pl for converting VCF file\n";
436 | 	    !system("$callConvertToAnnovar -format vcf4 $rawInputFile.$germline.recode.vcf > $rawInputFile.$germline.ann") or die "ERROR: cannot execute convert2annovar.pl for converting VCF file\n";
437 | 	    !system("cat $rawInputFile.$tumor.ann $rawInputFile.$germline.ann | uniq -u > $annovarInputFile") or die "ERROR: cannot generate somatic variants input file for iCAGES, please double check teh format of your input files\n";
438 | 	}elsif($id ne "NA"){
439 | 	    !system("$callvcftools --recode --vcf $rawInputFile --indv $id --out $rawInputFile.$id") or die "ERROR: please specify a valid sample identifier for somatic variants of your interest\n";
440 | 	    !system("$callConvertToAnnovar -format vcf4 $rawInputFile.$id.recode.vcf > $annovarInputFile") or die "ERROR: cannot execute convert2annovar.pl for converting VCF file\n";
441 | 	}else{
442 | 	    !system("$callConvertToAnnovar -format vcf4 $rawInputFile > $annovarInputFile") or die "ERROR: cannot execute convert2annovar.pl for converting VCF file\n";
443 | 	}
444 |         
445 |     }elsif($isbedFormat){
446 | 	# BED
447 | 	print "iCAGES: your input file is likely to be a bed file and iCAGES is converting it to ANNOVAR input format\n";
448 | #	!system("$callConvertToAnnovar -format bed $rawInputFile >  $annovarInputFile") or die "ERROR: cannot convert your BED file input into ANNOVAR input format, please double check your input file\n";
449 | 	my @cnv;
450 | 	open(INPUTCNV, "$rawInputFile") or die;
451 | 	my $checkFiveFields = 0;
452 | 	while(<INPUTCNV>){
453 | 	    chomp;
454 | 	    push @cnv, $_;
455 | 	    my @line  = split;
456 | 	    $checkFiveFields = 1 if $#line == 4;
457 | 	}
458 | 	close INPUTCNV;
459 | 	open(OUTPUTCNV, ">$annovarInputFile") or die;
460 | 	for(0..$#cnv){
461 | 	    if( $checkFiveFields == 1){
462 | 		print OUTPUTCNV "$cnv[$_]\n";
463 | 	    }else{
464 | 		print OUTPUTCNV "$cnv[$_]\t0\t0\n";
465 | 	    }
466 | 	}
467 |     }else{                    
468 | 	#ANNOVAR
469 |         !system("cp $rawInputFile $annovarInputFile") or die "ERROR: cannot use input file $rawInputFile\n";
470 |     }
471 |     if($bed ne "NA"){
472 | # there is a bug in annovar convert2annovar.pl for bed
473 | #	!system("$callConvertToAnnovar -format bed $bed >  $bed.out") or die "ERROR: cannot convert your BED file input into ANNOVAR input format, please double check your input file\n";
474 | #	!system('awk \'{print $1 "\t" $2 "\t" $3 "\t0\t0"  }\' $bed >  $bed.out') or die "ERROR: cannot convert your BED file input into ANNOVAR input format\n";
475 | 	open(ANNBED,  ">>", $annovarInputFile) or die "iCAGES: cannot find converted input file for iCAGES in ANNOVAR input format\n";
476 | 	open(BED, "$bed") or die "iCAGES: cannot open ANNOVAR input file generated by your input BED file\n";
477 | 	my @bed;
478 | 	while(<BED>){
479 | 	    chomp;
480 | 	    push @bed, $_;
481 | 	}
482 | 	close BED;
483 |         for(0..$#bed){
484 | 	    print ANNBED "$bed[$_]\t0\t0\n";
485 | 	}
486 | 	close ANNBED;
487 |     }
488 |     if($expression ne "NA"){
489 | 	!system("$callConvertToAnnovar -format bed $expression >  $expression.out") or die "ERROR: cannot convert your BED file input into ANNOVAR input format, please double check your input file\n";
490 |         open(ANNBED,  ">>", $annovarInputFile) or die "iCAGES: cannot find converted input file for iCAGES in ANNOVAR input format\n";
491 |         open(EXP, "$expression.out") or die "iCAGES: cannot open ANNOVAR input file generated by your input BED file\n";
492 |         my @bed;
493 |         while(<EXP>){
494 |             chomp;
495 |             push @bed, $_;
496 |         }
497 | 	close EXP;
498 |         for(0..$#bed){
499 |             print ANNBED "$bed[$_]\n";
500 |         }
501 |         close ANNBED;
502 |     }
503 | }
504 | 
505 | 
506 | sub divideMutation{
507 |     print "NOTICE: start dividing mutations to SNP and structural variation\n";
508 |     my ($annovarInputFile, $snpFile, $cnvFile);
509 |     $annovarInputFile = shift;
510 |     $snpFile = $annovarInputFile . ".snp";
511 |     $cnvFile = $annovarInputFile . ".cnv";
512 |     
513 |     open(OUT, "$annovarInputFile") or die "iCAGES: cannot open input file $annovarInputFile\n";
514 |     open(SNP, ">$snpFile") or die "iCAGES: cannot open input file $snpFile\n";
515 |     open(CNV, ">$cnvFile") or die "iCAGES: cannot open input file $cnvFile\n";
516 |     
517 |     #### add variant information into log file
518 |     my $logFile = $annovarInputFile . ".icages.log";
519 |     open(LOG, ">$logFile") or die "iCAGES: cannot open file $logFile\n";
520 |     my $variantCount = 0;
521 |     my $snvCount = 0;
522 |     my $cnvCount = 0;
523 |     
524 |     while(<OUT>){
525 |         chomp;
526 |         $variantCount ++;
527 |         my $printLine = $_;
528 |         my @line = split(/\t/, $_);
529 |         if ($line[1] == $line[2] and $line[3] ne "-" and $line[4] ne "-"){
530 |             $snvCount ++;
531 |             print SNP "$printLine\n";
532 |         }else{
533 |             $cnvCount ++;
534 |             print CNV "$printLine\n";
535 |         }
536 |     }
537 |     close OUT;
538 |     close SNP;
539 |     close CNV;
540 |     
541 |     print LOG "########### iCAGES Variant Summary ###########\n";
542 |     print LOG "## basic information\n";
543 |     print LOG "Total: $variantCount\n";
544 |     print LOG "SNVs: $snvCount\n";
545 |     print LOG "Indels and Structural variants: $cnvCount\n\n";
546 | 
547 | }
548 | 
549 | sub loadDatabase{
550 |     print "NOTICE: start loading databases\n";
551 |     my $DBLocation = shift;
552 |     my $supLocation = $DBLocation . "suppressor.gene";
553 |     my $oncLocation = $DBLocation . "oncogene.gene";
554 |     print "NOTICE: start extracting suppressor genes\n";
555 |     open(SUP, "$supLocation") or die "cannot open $supLocation\n";
556 |     while(<SUP>){
557 |         chomp;
558 |         $sup{$_} = 1;
559 |     }
560 |     close SUP;
561 |     print "NOTICE: start extracting oncogenes\n";
562 |     open(ONC, "$oncLocation") or die "cannot open $oncLocation\n";
563 |     while(<ONC>){
564 |         chomp;
565 |         $onc{$_} = 1;
566 |     }
567 |     close ONC;
568 | }
569 | 
570 | 
571 | 
572 | sub annotateMutation{
573 |     my ($DBLocation, $icagesLocation, $callAnnovar, $annovarInputFile, $snpFile, $cnvFile, $hg);
574 |     $icagesLocation = shift;
575 |     $annovarInputFile = shift;
576 |     $hg = shift;
577 |     $DBLocation = $icagesLocation . "db/";
578 |     $snpFile = $annovarInputFile . ".snp";
579 |     $cnvFile = $annovarInputFile . ".cnv";
580 |     $callAnnovar = $icagesLocation . "bin/annovar/annotate_variation.pl";
581 |     my @children_pids;
582 |     $children_pids[0] = fork();
583 |     if($children_pids[0] == 0){
584 |         print "NOTICE: start to run ANNOVAR region annotation to annotate structural variations or variants associated with CNV changes\n";
585 | 	if(-s $cnvFile){
586 | 	    !system("$callAnnovar -regionanno -build $hg -out $cnvFile -dbtype cnv $cnvFile $DBLocation -scorecolumn 4 --colsWanted 0") or die "ERROR: cannot call structural varation\n";
587 | 	}else{
588 | 	    print "NOTICE: CNV file has 0 size\n";
589 | 	}
590 |         exit 0;
591 |     }
592 |     $children_pids[1] = fork();
593 |     if($children_pids[1] == 0){
594 |         print "NOTICE: start to run ANNOVAR index function to fetch radial SVM score for each mutation \n";
595 |         if(-s $snpFile){
596 | 	    !system("$callAnnovar -filter -out $snpFile -build $hg -dbtype iCAGES $snpFile $DBLocation") or die "ERROR: cannot call icages\n";
597 |         }else{
598 | 	    print "NOTICE: SNV file has 0 size\n";
599 | 	}
600 | 	exit 0;
601 |     }
602 |     $children_pids[2] = fork();
603 |     if($children_pids[2] == 0){
604 |         print "NOTICE: start to run ANNOVAR index function to fetch funseq score for each mutation \n";
605 | 	if(-s $snpFile){
606 | 	    !system("$callAnnovar -filter -out $snpFile -build $hg -dbtype funseq2 $snpFile $DBLocation") or die "ERROR: cannot call funseq2\n";
607 | 	}else{
608 | 	    print "NOTICE: SNV file has 0 size, skip funseq score annotation\n";
609 | 	}
610 |         exit 0;
611 |     }
612 |     $children_pids[3] = fork();
613 |     if($children_pids[3] == 0){
614 |         print "NOTICE: start annotating each mutaiton using ANNOVAR\n";
615 |         !system("$callAnnovar -out $annovarInputFile -build $hg $annovarInputFile $DBLocation") or die "ERROR: cannot call annovar\n";
616 |         exit 0;
617 |     }
618 |     for (0.. $#children_pids){
619 |         waitpid($children_pids[$_], 0);
620 |     }
621 | }
622 | 
623 | 
624 | 
625 | 


--------------------------------------------------------------------------------
/bin/icagesMutationNew.pl:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/perl
  2 | use strict;
  3 | use warnings;
  4 | use List::Util qw(min max);	
  5 | use Pod::Usage;
  6 | use Getopt::Long;
  7 | 
  8 | ######################################################################################################################################
  9 | ######################################################## variable declaration ########################################################
 10 | ######################################################################################################################################
 11 | 
 12 | my ($annovarInputFile,$rawInputFile, $inputDir ,$icagesLocation , $tumor ,$germline, $id ,$prefix, $bed, $hg, $expression);
 13 | my $nowString;
 14 | my (%sup, %onc);
 15 | 
 16 | ######################################################################################################################################
 17 | ########################################################### main  ####################################################################
 18 | ######################################################################################################################################
 19 | $rawInputFile = $ARGV[0];
 20 | $inputDir = $ARGV[1];
 21 | $icagesLocation = $ARGV[2];
 22 | $tumor = $ARGV[3];
 23 | $germline = $ARGV[4];
 24 | $id = $ARGV[5];
 25 | $prefix = $ARGV[6];
 26 | $bed = $ARGV[7];
 27 | $hg = $ARGV[8];
 28 | $expression = $ARGV[9];
 29 | $nowString = localtime();
 30 | $annovarInputFile = &runAnnovar($rawInputFile, $inputDir ,$icagesLocation ,$tumor ,$germline ,$id, $prefix, $bed, $hg , $expression);
 31 | &processAnnovar($annovarInputFile, $hg, $icagesLocation);
 32 | 
 33 | ######################################################################################################################################
 34 | ############################################################# subroutines ############################################################
 35 | ######################################################################################################################################
 36 | 
 37 | sub runAnnovar {
 38 |     print "NOTICE: start runing iCAGES packge at $nowString\n";
 39 |     my ($rawInputFile, $annovarInputFile);                                             #ANNOVAR input files
 40 |     my ($icagesLocation, $callAnnotateVariation, $DBLocation);                         #ANNOVAR commands
 41 |     my ($tumor, $germline ,$id ,$prefix);                                     #VCF conversion paramters & prefix for output
 42 |     my ($radialSVMDB, $radialSVMIndex, $funseq2DB, $funseq2Index, $refGeneDB, $refGeneIndex, $cnvDB);           #ANNOVAR DB files: iCAGES score (index), refGene (fasta), dbSNP
 43 |     my ($log, $annovarLog);
 44 |     my $nowString = localtime;                                                                                  #SYSTEM local time
 45 |     my $bed;    # location of bed file
 46 |     my $expression; # location of expression log change file
 47 |     $rawInputFile = shift;
 48 |     $inputDir = shift;
 49 |     $icagesLocation = shift;
 50 |     $tumor = shift;
 51 |     $germline = shift;
 52 |     $id = shift;
 53 |     $prefix = shift;
 54 |     $bed = shift;
 55 |     $hg = shift;
 56 |     $expression = shift;
 57 |     $callAnnotateVariation = $icagesLocation . "bin/annovar/annotate_variation.pl";
 58 | 
 59 |     # ANNOVAR have already installed
 60 |     if (! -e $callAnnotateVariation) {
 61 | 	$callAnnotateVariation = "annotate_variation.pl";
 62 | 	print ("WARNING: cannot find ANNOVAR in ./bin/ direcotry, assuming that you have installed ANNOVAR\n");
 63 |     }
 64 |     $annovarInputFile = $inputDir . "/" . $prefix . ".annovar";
 65 |     $DBLocation = $icagesLocation . "db/";
 66 | #    $DBLocation = "/ssd/icages-humandb/";
 67 |     &formatConvert($rawInputFile, $annovarInputFile, $icagesLocation, $tumor, $germline , $id , $bed , $expression);
 68 |     &divideMutation($annovarInputFile);
 69 |     &loadDatabase($DBLocation);
 70 |     &annotateMutation($icagesLocation, $annovarInputFile, $hg);
 71 |     return $annovarInputFile;
 72 | }
 73 | 
 74 | 
 75 | sub processAnnovar{
 76 |     print "NOTICE: start processing output from ANNOVAR\n";
 77 |     my $annovarInputFile = shift;
 78 |     my $hg = shift;
 79 |     my $icagesLocation = shift;
 80 |     # still have to load onc gene set and suppressor gene set
 81 |     my $oncDB = $icagesLocation . "/db/oncogene.gene";
 82 |     my $supDB = $icagesLocation . "/db/suppressor.gene";
 83 |     my (%onc, %sup);
 84 |     open(ONCDB, "$oncDB") or die "ERROR: cannot open $oncDB\n";
 85 |     open(SUPDB, "$supDB") or die "ERROR: cannot open $supDB\n";
 86 |     while(<ONCDB>){
 87 | 	chomp;
 88 | 	$onc{$_} =1;
 89 |     }
 90 |     close ONCDB;
 91 |     while(<SUPDB>){
 92 | 	chomp;
 93 | 	$sup{$_} = 1;
 94 |     }
 95 |     close SUPDB;
 96 |     my $annovarVariantFunction = $annovarInputFile . ".variant_function";
 97 |     # create a temp file to store bed file of variant function : chr start end score
 98 |     my $genebed = $annovarInputFile . ".variant_function.bed";
 99 |     open(GENEFORBED, "$annovarVariantFunction") or die "ERROR: cannot open $annovarVariantFunction\n";
100 |     open(GENEBED, ">$genebed") or die "ERROR: cannot create $genebed file\n";
101 |     my $lastline = "" ;
102 |     while(<GENEFORBED>){
103 | 	chomp;
104 | 	my @line = split("\t", $_);
105 | 	my $printout = "$line[2]\t$line[3]\t$line[4]"; 
106 | 	if($printout eq $lastline){
107 | 	    next;
108 | 	}else{
109 | 	    print GENEBED "$line[2]\t$line[3]\t$line[4]\n";
110 | 	}
111 | 	$lastline = $printout;
112 |     }
113 |     close GENEBED;
114 |     close GENEFORBED;
115 |     my $annovarExonVariantFunction = $annovarInputFile . ".exonic_variant_function";
116 |     my $annovarRadialSVM = $annovarInputFile . ".snp." . $hg . "_iCAGES_dropped";
117 |     my $annovarCNV = $annovarInputFile . ".cnv." . $hg . "_cnv";
118 |     # create a temp file to store bed file of cnv with this format : chr start end score
119 |     my $cnvbed = $annovarInputFile . ".cnv." . $hg . "_cnv.bed";
120 | 
121 |     # create a final file to store the final result of bedtools intersect
122 |     my $cnvfinal = $annovarInputFile . ".cnv.final";
123 | 
124 |     my $annovarFunseq2 = $annovarInputFile . ".snp." . $hg . "_funseq2_dropped";
125 |     my $icagesMutations = $annovarInputFile . ".icagesMutations.csv";
126 |     # add bedtools 
127 |     my $bedtools = $icagesLocation . "/bin/bedtools/bin/bedtools";
128 | 
129 |     # bedtools have already installed
130 |     if (! -e $bedtools) {
131 | 	$bedtools = "bedtools";
132 | 	print ("WARNING: did not find bedtools in ./bin/ directory, assuming you have installed bedtools\n");
133 |     }
134 |     open(GENE, "$annovarVariantFunction") or die "ERROR: cannot open file $annovarVariantFunction\n";
135 |     open(EXON, "$annovarExonVariantFunction") or die "ERROR: cannot open file $annovarExonVariantFunction\n";
136 |     if(!-e $annovarCNV){
137 | 	!system("touch $annovarCNV") or die "ERROR: cannot create file $annovarCNV\n";
138 |     }
139 |     if(!-e $annovarRadialSVM){
140 | 	!system("touch $annovarRadialSVM") or die "ERROR: cannot create file $annovarRadialSVM\n";
141 |     }
142 |     if(!-e $annovarFunseq2){
143 | 	!system("touch $annovarFunseq2") or die "ERROR: cannot create file $annovarFunseq2\n";
144 |     }
145 | 
146 |     open(CNV, "$annovarCNV") or die "ERROR: cannot open file $annovarCNV\n";
147 |     open(RADIAL, "$annovarRadialSVM") or die "ERROR: cannot open file $annovarRadialSVM\n";
148 |     open(FUNSEQ, "$annovarFunseq2") or die "ERROR: cannot open file $annovarFunseq2\n";
149 |     open(OUT, ">$icagesMutations") or die "ERROR: cannot open file $icagesMutations\n";
150 |     my (%radialSVM, %funseq, %cnv, %exon);
151 |     my (%pointcoding);
152 |     my %icagesMutations;
153 |     
154 |     ######## count location information
155 |     my $exonCount = 0;
156 |     my $intronCount = 0;
157 |     my $noncodingRNACount = 0;
158 |     my $intergenicCount = 0;
159 |     my $otherCount = 0;
160 |     
161 |     ######## count annotation information
162 |     my $radialSVMCount = 0;
163 |     my $funseqCount = 0;
164 |     my $cnvCount = 0;
165 |     
166 |     while(<RADIAL>){
167 |         chomp;
168 |         my @line;
169 |         my $key;
170 |         @line = split(/\t/, $_);
171 |         $key = "$line[2],$line[3],$line[4],$line[5],$line[6]";
172 |         $radialSVM{$key} = $line[1];
173 |     }
174 |     while(<FUNSEQ>){
175 |         chomp;
176 |         my @line;
177 |         my $key;
178 |         @line = split(/\t/, $_);
179 |         $key = "$line[2],$line[3],$line[4],$line[5],$line[6]";
180 |         $funseq{$key} = $line[1];
181 |     }
182 | 
183 |     # cnv cannot be processed using key and value
184 |     # create a file to temporarily store
185 | 
186 |     open(CNVBED, ">$cnvbed") or die "ERROR: cannot open $cnvbed for write:\n";
187 |     
188 |     while(<CNV>){
189 |         chomp;
190 |         my @line;
191 |         my $key;
192 |         my $score;
193 |         @line = split(/\t/, $_);
194 |         $score = $line[1];
195 |         $score =~ /Score=(.*);/;
196 |         $score = $1;
197 | 	# chr start end score
198 | 	print CNVBED "$line[2]\t$line[3]\t$line[4]\t$score\n";
199 |    }
200 |     close CNVBED;
201 |     
202 | 
203 |     # get intersect
204 |     if(-z $cnvbed ){
205 | #	print "-x\n";
206 | 	!system("touch $cnvfinal") or die "ERROR: cannot create file $cnvfinal\n";
207 |     }else{
208 | #	print "not -x\n";
209 | 	!system("$bedtools intersect -a $cnvbed -b $genebed -wa > $cnvfinal") or die "ERROR: cannot find intersect using bedtools, please check whether or not you have installed bedtools\n";
210 |     }
211 |     
212 |     open(CNVFINAL, "$cnvfinal") or die "ERROR: cannot open $cnvfinal for read:\n";
213 |     while(<CNVFINAL>){
214 | 	chomp;
215 | 	my @line = split("\t", $_);
216 | 	# note that this key is different for snv
217 | 	my $key = "$line[0],$line[1],$line[2]";             
218 | 	$cnv{$key} = $line[3]; 
219 |     }
220 | 
221 |     while(<EXON>){
222 |         chomp;
223 |         my (@line, @syntax, @content);
224 |         my ($key, $mut, $pro);
225 |         @line = split(/\t+/, $_);
226 |         @syntax = split(",", $line[2]);
227 |         @content = split(":", $syntax[0]);
228 |         $mut = $content[3];
229 |         $pro = $content[4];
230 |         $key = "$line[3],$line[4],$line[5],$line[6],$line[7]";
231 |         $exon{$key}{"mutationSyntax"} = $mut;
232 |         $exon{$key}{"proteinSyntax"} = $pro;
233 |         if($line[4] == $line[5]){
234 |             $pointcoding{$key} = 1;
235 |         }
236 |     }
237 |     while(<GENE>){
238 |         chomp;
239 |         my @line;
240 |         my ($key, $gene);                                                            #hash key used for fetch radial SVM score from %radialSVM: mutation->radialSVM
241 |         my ($category, $mutationSyntax, $proteinSyntax, $scoreCategory, $score);
242 |         @line = split(/\t+/, $_);
243 |         $gene = $line[1];
244 |         next unless defined $gene;
245 |         $key = "$line[2],$line[3],$line[4],$line[5],$line[6]";
246 | 	# note that structural variation key is different !!! chr,start,end;
247 | 	my $structKey = "$line[2],$line[3],$line[4]";
248 |         my %cnvScore ; # we also need this hash to store cnv score for each gene 
249 | 	
250 | 	next unless defined $key;
251 |         next unless defined $structKey;
252 | 	
253 | 
254 |         ####### process gene for noncoding variants
255 |         my @printGene;
256 |         if($gene =~ /(.*?)\(dist=(.*?)\),(.*?)\(dist=(.*?)\)/){
257 |             my $gene1 = $1;
258 |             my $gene2 = $3;
259 |             my $dist1 = $2;
260 |             my $dist2 = $4;
261 | 	    if($dist1 eq "NONE" and $dist2 eq "NONE"){
262 | 		$printGene[0] = $gene1;
263 | 	    }elsif($dist1 eq "NONE"){
264 | 		$printGene[0] = $gene2;
265 | 	    }elsif($dist2 eq "NONE"){
266 | 		$printGene[0] = $gene1;
267 | 	    }elsif($dist1 <= $dist2){
268 |                 $printGene[0] = $gene1;
269 |             }else{
270 |                 $printGene[0] = $gene2;
271 |             }
272 |         }elsif($gene =~ /([A-Z|0-9|-]+?)\(.*\),([A-Z|0-9|-]+?)\(.*\)$/){
273 |             $printGene[0] = $1;
274 |             $printGene[1] = $3;
275 |         }elsif($gene =~ /([A-Z|0-9|-]+?)\(.*\);([A-Z|0-9|-]+?)\(.*\)$/){
276 |             $printGene[0] = $1;
277 |             $printGene[1] = $3;
278 |         }elsif($gene =~ /([A-Z|0-9|-]+?)\(.*\)$/){
279 |             $printGene[0] = $1;
280 |         }elsif($gene =~ /;/ or $gene =~ /,/){
281 |             my @gene = split(/;|,/, $gene);
282 |             for(0..$#gene){
283 |                 $printGene[$_] = $gene[$_];
284 |             }
285 |         }elsif($gene ne ""){
286 |             $printGene[0] = $gene;
287 |         }
288 | 
289 |         
290 |         if($line[0] =~ /^exonic/ || $line[0] =~ /^splicing/ ){
291 |             $exonCount ++;
292 |         }elsif($line[0] =~ /^intron/){
293 |             $intronCount ++;
294 |         }elsif($line[0] =~ /^ncRNA/){
295 |             $noncodingRNACount ++;
296 |         }elsif($line[0] =~ /^intergenic/){
297 |             $intergenicCount ++;
298 |         }else{
299 |             $otherCount ++;
300 |         }
301 |        
302 |         if ($line[3] == $line[4]){
303 |             if(exists $pointcoding{$key}){
304 |                 $radialSVMCount ++;
305 |                 $category = "point coding";
306 |                 if(defined $exon{$key}{"mutationSyntax"}){
307 |                     $mutationSyntax = $exon{$key}{"mutationSyntax"};
308 |                 }else{
309 |                     $mutationSyntax = "NA";
310 |                 }
311 |                 if(defined $exon{$key}{"proteinSyntax"}){
312 |                     $proteinSyntax = $exon{$key}{"proteinSyntax"};
313 |                 }else{
314 |                      $proteinSyntax = "NA";
315 |                 }
316 |                 $scoreCategory = "radial SVM";
317 |                 if(exists $radialSVM{$key}){
318 |                     $score = $radialSVM{$key}
319 |                 }else{
320 |                     $score = "NA";
321 |                 }
322 |             }else{
323 |                 $funseqCount ++;
324 |                 $category = "point noncoding";
325 |                 $mutationSyntax = "NA";
326 |                 $proteinSyntax = "NA";
327 |                 $scoreCategory = "FunSeq2";
328 |                 if(exists $funseq{$key}){
329 |                     $score = $funseq{$key}
330 |                 }else{
331 |                     $score = "NA";
332 |                 }
333 |             }
334 |         }else{
335 |             $cnvCount ++ ;
336 |             $category = "structural variation";
337 |             if(exists $exon{$key}{"mutationSyntax"} and exists $exon{$key}{"proteinSyntax"}){
338 |                 $mutationSyntax = $exon{$key}{"mutationSyntax"};
339 |                 $proteinSyntax = $exon{$key}{"proteinSyntax"};
340 |             }else{
341 |                 $mutationSyntax = "NA";
342 |                 $proteinSyntax = "NA";
343 |             }
344 |             $scoreCategory = "CNV normalized signal";
345 | 	    for(0..$#printGene){
346 | 		if(exists $cnv{$structKey} and (exists $onc{$printGene[$_]} or exists $sup{$printGene[$_]})){
347 | 		    $score = $cnv{$structKey};
348 | 		}else{
349 | 		    $score = "NA";
350 | 		}
351 | 		$cnvScore{$printGene[$_]} = $score;
352 | 	    }
353 |         }
354 | 	
355 |         for(0..$#printGene){
356 |             $icagesMutations{$printGene[$_]}{$key}{"category"} = $category;
357 |             $icagesMutations{$printGene[$_]}{$key}{"mutationSyntax"} = $mutationSyntax;
358 |             $icagesMutations{$printGene[$_]}{$key}{"proteinSyntax"} = $proteinSyntax;
359 |             $icagesMutations{$printGene[$_]}{$key}{"scoreCategory"} = $scoreCategory;
360 | 	    if($category eq "structural variation"){
361 | 		$icagesMutations{$printGene[$_]}{$key}{"score"} = $cnvScore{$printGene[$_]};
362 | 	    }else{
363 | 		$icagesMutations{$printGene[$_]}{$key}{"score"} = $score;
364 | 	    }
365 |         }
366 |     }
367 |     print OUT "geneName,chrmosomeNumber,start,end,reference,alternative,category,mutationSyntax,proteinSyntax,scoreCategory,mutationScore\n";
368 |     foreach my $gene (sort keys %icagesMutations){
369 |         foreach my $mutation (sort keys %{$icagesMutations{$gene}}){
370 |            if (defined $gene and defined $mutation and defined $icagesMutations{$gene}{$mutation}{"category"} and defined $icagesMutations{$gene}{$mutation}{"mutationSyntax"} and defined $icagesMutations{$gene}{$mutation}{"proteinSyntax"} and defined $icagesMutations{$gene}{$mutation}{"scoreCategory"} and defined $icagesMutations{$gene}{$mutation}{"score"}){
371 |                print OUT "$gene,$mutation,$icagesMutations{$gene}{$mutation}{\"category\"},$icagesMutations{$gene}{$mutation}{\"mutationSyntax\"},$icagesMutations{$gene}{$mutation}{\"proteinSyntax\"},$icagesMutations{$gene}{$mutation}{\"scoreCategory\"},$icagesMutations{$gene}{$mutation}{\"score\"}\n" ;
372 |             }else{
373 |                 print "$gene,$mutation,$icagesMutations{$gene}{$mutation}{\"category\"},$icagesMutations{$gene}{$mutation}{\"mutationSyntax\"},$icagesMutations{$gene}{$mutation}{\"proteinSyntax\"},$icagesMutations{$gene}{$mutation}{\"scoreCategory\"},$icagesMutations{$gene}{$mutation}{\"score\"}\n" ;
374 |             }
375 |         }
376 |     }
377 |     
378 |     my $logFile = $annovarInputFile . ".icages.log";
379 |     open(LOG, ">>$logFile") or die "iCAGES: cannot open file $logFile\n";
380 |     
381 |     print LOG "## location information\n";
382 |     print LOG "exonic/splice: $exonCount\n";
383 |     print LOG "intronic: $intronCount\n";
384 |     print LOG "noncoding RNA: $noncodingRNACount\n";
385 |     print LOG "intergenic: $intergenicCount\n";
386 |     print LOG "other: $otherCount\n\n";
387 |     
388 |     print LOG "## annotation information\n";
389 |     print LOG "point coding variants with radialSVM annotation: $radialSVMCount\n";
390 |     print LOG "point noncoding variants with FunSeq2 annotation: $funseqCount\n";
391 |     print LOG "Indels and SVs with CNV signal annotation: $cnvCount\n\n";
392 |     close OUT;
393 | }
394 | 
395 | 
396 | sub formatConvert{
397 |     # $rawInputFile, $annovarInputFile, $icagesLocation, $tumor, $germline , $id, $bed, $expression
398 |     print "NOTICE: start input file format checking and converting format if needed\n";
399 |     my ($rawInputFile, $annovarInputFile, $icagesLocation );
400 |     my ( $tumor, $germline , $id, $bed, $prefix , $expression); # parameters for vcf conversion
401 |     my  $callConvertToAnnovar;
402 |     my $callvcftools;
403 |     my $formatCheckFirstLine;
404 |     my $isbedFormat = 0; # check whether or not this file is in bed format
405 |     $rawInputFile = shift;
406 |     $annovarInputFile = shift;
407 |     $icagesLocation = shift;
408 |     $tumor = shift;
409 |     $germline = shift;
410 |     $id = shift;
411 |     $bed = shift;
412 |     $expression = shift;
413 |     open(IN, "$rawInputFile") or die "ERROR: cannot open $rawInputFile\n";
414 |     $formatCheckFirstLine = <IN>;
415 |     chomp $formatCheckFirstLine;
416 |     my $multipleSampleCheck = 0;    
417 |     if($formatCheckFirstLine =~ /^#/){
418 | 	# check the whole file to see if this is multple sample
419 | 
420 | 	while(<IN>){
421 | 	    chomp;
422 | 	    my $line = $_;
423 | 	    my @line = split;
424 | 	    if($line[0] =~ /^#CHROM/){
425 | 		if($#line > 9){
426 | 		    $multipleSampleCheck = 1;
427 | 		}
428 | 		last;
429 | 	    }
430 | 	}
431 | 
432 |     }else{
433 | 	my @line = split(/\t| /, $formatCheckFirstLine);
434 | 	if($#line == 2 ){
435 | 	    $isbedFormat  = 1;
436 | 	}elsif($#line != 2 and defined  $line[3]  and defined $line[4] ){
437 | 	    if($line[3] !~ /[a|t|c|g|A|T|C|G|-]+/ or $line[4] !~ /[a|t|c|g|A|T|C|G|-]+/){
438 | 		$isbedFormat  = 1;
439 | 	    }
440 | 	}
441 | 	if ($isbedFormat == 1) {
442 | 	    print "iCAEGS: your file is likely to be a BED format\n";
443 | 
444 | 	}
445 |     }
446 |     close IN;
447 |     $callConvertToAnnovar = $icagesLocation . "bin/annovar/convert2annovar.pl";
448 |     # maybe the user already have installed annovar
449 |     if (! -e $callConvertToAnnovar) {
450 | 	$callConvertToAnnovar = "convert2annovar.pl";
451 | 	print("WARNING: there is no ANNOVAR installed in the bin directory, assuming that you have already installed ANNOVAR in your system\n");
452 |     }
453 | 
454 | 
455 |     $callvcftools = $icagesLocation . "bin/vcftools/bin/vcftools";
456 |     # maybe the user already have installed vcftools
457 |     if (! -e $callvcftools ){
458 | 	$callvcftools = "vcftools";
459 | 	print("WARNING: there is no vcftools installed in the bin directory, assuming that you have already installed vcftools in your system\n");
460 |     }
461 | 
462 | 
463 |     if($formatCheckFirstLine =~ /^##fileformat=VCF/){             #VCF
464 | 	if($multipleSampleCheck and $tumor eq "NA" and $germline eq "NA" and $id eq "NA"){
465 | 	    die "ERROR: your vcf file contains multiple samples please specify a valid sample identifier \n";
466 | 	}
467 | 	if($tumor ne "NA" and $germline ne "NA"){
468 | 	    !system("$callvcftools --recode --vcf $rawInputFile --indv $tumor --out $rawInputFile.$tumor") or die "ERROR: please specify a valid sample identifier for tumor sample\n";
469 | 	    !system("$callvcftools --recode --vcf $rawInputFile --indv $germline --out $rawInputFile.$germline") or die "ERROR: please specify a valid sample identifier for germline variants\n";
470 | 	    !system("$callConvertToAnnovar -format vcf4 $rawInputFile.$tumor.recode.vcf > $rawInputFile.$tumor.ann") or die "ERROR: cannot execute convert2annovar.pl for converting VCF file\n";
471 | 	    !system("$callConvertToAnnovar -format vcf4 $rawInputFile.$germline.recode.vcf > $rawInputFile.$germline.ann") or die "ERROR: cannot execute convert2annovar.pl for converting VCF file\n";
472 | 	    !system("cat $rawInputFile.$tumor.ann $rawInputFile.$germline.ann | uniq -u > $annovarInputFile") or die "ERROR: cannot generate somatic variants input file for iCAGES, please double check teh format of your input files\n";
473 | 	}elsif($id ne "NA"){
474 | 	    !system("$callvcftools --recode --vcf $rawInputFile --indv $id --out $rawInputFile.$id") or die "ERROR: please specify a valid sample identifier for somatic variants of your interest\n";
475 | 	    !system("$callConvertToAnnovar -format vcf4 $rawInputFile.$id.recode.vcf > $annovarInputFile") or die "ERROR: cannot execute convert2annovar.pl for converting VCF file\n";
476 | 	}else{
477 | 	    !system("$callConvertToAnnovar -format vcf4 $rawInputFile > $annovarInputFile") or die "ERROR: cannot execute convert2annovar.pl for converting VCF file\n";
478 | 	}
479 |         
480 |     }elsif($isbedFormat){
481 | 	# BED
482 | 	print "iCAGES: your input file is likely to be a bed file and iCAGES is converting it to ANNOVAR input format\n";
483 | #	!system("$callConvertToAnnovar -format bed $rawInputFile >  $annovarInputFile") or die "ERROR: cannot convert your BED file input into ANNOVAR input format, please double check your input file\n";
484 | 	my @cnv;
485 | 	open(INPUTCNV, "$rawInputFile") or die;
486 | 	my $checkFiveFields = 0;
487 | 	while(<INPUTCNV>){
488 | 	    chomp;
489 | 	    push @cnv, $_;
490 | 	    my @line  = split(/\t| /, $_);
491 | 	    $checkFiveFields = 1 if $#line == 4;
492 | 	}
493 | 	close INPUTCNV;
494 | 	open(OUTPUTCNV, ">$annovarInputFile") or die;
495 | 	for(0..$#cnv){
496 | 	    if( $checkFiveFields == 1){
497 | 		print OUTPUTCNV "$cnv[$_]\n";
498 | 	    }else{
499 | 		print OUTPUTCNV "$cnv[$_]\t0\t0\n";
500 | 	    }
501 | 	}
502 | 	close OUTPUTCNV;
503 |     }else{                    
504 | 	#ANNOVAR
505 |         !system("cp $rawInputFile $annovarInputFile") or die "ERROR: cannot use input file $rawInputFile\n";
506 |     }
507 |     if($bed ne "NA"){
508 | # there is a bug in annovar convert2annovar.pl for bed
509 | #	!system("$callConvertToAnnovar -format bed $bed >  $bed.out") or die "ERROR: cannot convert your BED file input into ANNOVAR input format, please double check your input file\n";
510 | #	!system('awk \'{print $1 "\t" $2 "\t" $3 "\t0\t0"  }\' $bed >  $bed.out') or die "ERROR: cannot convert your BED file input into ANNOVAR input format\n";
511 | 	open(ANNBED,  ">>", $annovarInputFile) or die "iCAGES: cannot find converted input file for iCAGES in ANNOVAR input format\n";
512 | 	open(BED, "$bed") or die "iCAGES: cannot open ANNOVAR input file generated by your input BED file\n";
513 | 	my @bed;
514 | 	while(<BED>){
515 | 	    chomp;
516 | 	    push @bed, $_;
517 | 	}
518 | 	close BED;
519 |         for(0..$#bed){
520 | 	    print ANNBED "$bed[$_]\t0\t0\n";
521 | 	}
522 | 	close ANNBED;
523 |     }
524 |     if($expression ne "NA"){
525 | 	!system("$callConvertToAnnovar -format bed $expression >  $expression.out") or die "ERROR: cannot convert your BED file input into ANNOVAR input format, please double check your input file\n";
526 |         open(ANNBED,  ">>", $annovarInputFile) or die "iCAGES: cannot find converted input file for iCAGES in ANNOVAR input format\n";
527 |         open(EXP, "$expression.out") or die "iCAGES: cannot open ANNOVAR input file generated by your input BED file\n";
528 |         my @bed;
529 |         while(<EXP>){
530 |             chomp;
531 |             push @bed, $_;
532 |         }
533 | 	close EXP;
534 |         for(0..$#bed){
535 |             print ANNBED "$bed[$_]\n";
536 |         }
537 |         close ANNBED;
538 |     }
539 | }
540 | 
541 | 
542 | sub divideMutation{
543 |     print "NOTICE: start dividing mutations to SNP and structural variation\n";
544 |     my ($annovarInputFile, $snpFile, $cnvFile);
545 |     $annovarInputFile = shift;
546 |     $snpFile = $annovarInputFile . ".snp";
547 |     $cnvFile = $annovarInputFile . ".cnv";
548 |     
549 |     open(OUT, "<$annovarInputFile") or die "iCAGES: cannot open input file $annovarInputFile\n";
550 |     open(SNP, ">$snpFile") or die "iCAGES: cannot open input file $snpFile\n";
551 |     open(CNV, ">$cnvFile") or die "iCAGES: cannot open input file $cnvFile\n";
552 |     
553 |     #### add variant information into log file
554 |     my $logFile = $annovarInputFile . ".icages.log";
555 |     open(LOG, ">>$logFile") or die "iCAGES: cannot open file $logFile\n";
556 |     my $variantCount = 0;
557 |     my $snvCount = 0;
558 |     my $cnvCount = 0;
559 | 
560 |     while(<OUT>){
561 |         chomp;
562 |         $variantCount ++;
563 |         my $printLine = $_;
564 | 	# get rid of ^M sign
565 | 	if ($printLine =~ /\r/){
566 | 	    $printLine =~ s/\r//g;
567 | 	}
568 | 
569 |         my @line = split(/\t| /, $printLine);
570 | 
571 |         if ($line[3] ne "" and $line[4] ne "" and $line[1] == $line[2] and $line[3] ne "-" and $line[4] ne "-"){
572 |             $snvCount ++;
573 |             print SNP "$printLine\n";
574 |         }else{
575 |             $cnvCount ++;
576 |             print CNV "$printLine\n";
577 |         }
578 |     }
579 | 
580 |     close OUT;
581 |     close SNP;
582 |     close CNV;
583 |     
584 |     print LOG "########### iCAGES Variant Summary ###########\n";
585 |     print LOG "## basic information\n";
586 |     print LOG "Total: $variantCount\n";
587 |     print LOG "SNVs: $snvCount\n";
588 |     print LOG "Indels and Structural variants: $cnvCount\n\n";
589 | 
590 |     close LOG;
591 | }
592 | 
593 | sub loadDatabase{
594 |     print "NOTICE: start loading databases\n";
595 |     my $DBLocation = shift;
596 |     my $supLocation = $DBLocation . "suppressor.gene";
597 |     my $oncLocation = $DBLocation . "oncogene.gene";
598 |     print "NOTICE: start extracting suppressor genes\n";
599 |     open(SUP, "$supLocation") or die "cannot open $supLocation\n";
600 |     while(<SUP>){
601 |         chomp;
602 |         $sup{$_} = 1;
603 |     }
604 |     close SUP;
605 |     print "NOTICE: start extracting oncogenes\n";
606 |     open(ONC, "$oncLocation") or die "cannot open $oncLocation\n";
607 |     while(<ONC>){
608 |         chomp;
609 |         $onc{$_} = 1;
610 |     }
611 |     close ONC;
612 | }
613 | 
614 | 
615 | 
616 | sub annotateMutation{
617 |     my ($DBLocation, $icagesLocation, $callAnnovar, $annovarInputFile, $snpFile, $cnvFile, $hg);
618 |     $icagesLocation = shift;
619 |     $annovarInputFile = shift;
620 |     $hg = shift;
621 |     $DBLocation = $icagesLocation . "db/";
622 |     $snpFile = $annovarInputFile . ".snp";
623 |     $cnvFile = $annovarInputFile . ".cnv";
624 |     $callAnnovar = $icagesLocation . "bin/annovar/annotate_variation.pl";
625 |     # sometimes the user 
626 |     if (! -e $callAnnovar) {
627 | 	$callAnnovar = "annotate_variation.pl";
628 | 	print("WARNING: did not find ANNOVAR in ./bin/ directory, assuming have installed ANNOVAR\n");
629 |     }
630 | 
631 |     my @children_pids;
632 |     $children_pids[0] = fork();
633 |     if($children_pids[0] == 0){
634 |         print "NOTICE: start to run ANNOVAR region annotation to annotate structural variations or variants associated with CNV changes\n";
635 | 	if(-s $cnvFile){
636 | 	    !system("$callAnnovar -regionanno -build $hg -out $cnvFile -dbtype cnv $cnvFile $DBLocation -scorecolumn 4 --colsWanted 0") or die "ERROR: cannot call structural varation\n";
637 | 	}else{
638 | 	    print "NOTICE: CNV file has 0 size\n";
639 | 	}
640 |         exit 0;
641 |     }
642 |     $children_pids[1] = fork();
643 |     if($children_pids[1] == 0){
644 |         print "NOTICE: start to run ANNOVAR index function to fetch radial SVM score for each mutation \n";
645 |         if(-s $snpFile){
646 | 	    # use yunfei's new query function instead
647 | #	    print ("/ssd/icages-humandb/genomeLocusFinder.pl find /ssd/icages-humandb/" . $hg . "_iCAGES.txt $snpFile iCAGES $snpFile." . $hg . "_iCAGES_dropped");
648 | 
649 | #	    !system("/ssd/icages-humandb/genomeLocusFinder.pl find /ssd/icages-humandb/" . $hg . "_iCAGES.txt $snpFile iCAGES $snpFile." . $hg . "_iCAGES_dropped") or die "ERROR: cannot call icages\n";
650 | 	    !system("$icagesLocation/genomeLocusFinder.pl find $DBLocation" . $hg . "_iCAGES.txt $snpFile iCAGES $snpFile." . $hg . "_iCAGES_dropped") or die "ERROR: cannot call icages\n";
651 | 
652 | 	    # !system("$callAnnovar -filter -out $snpFile -build $hg -dbtype iCAGES $snpFile $DBLocation") or die "ERROR: cannot call icages\n";
653 |         }else{
654 | 	    print "NOTICE: SNV file has 0 size\n";
655 | 	}
656 | 	exit 0;
657 |     }
658 |     $children_pids[2] = fork();
659 |     if($children_pids[2] == 0){
660 |         print "NOTICE: start to run ANNOVAR index function to fetch funseq score for each mutation \n";
661 | 	if(-s $snpFile){
662 | 	    # use yunfei's new query function instead
663 | #	    print ("/ssd/icages-humandb/genomeLocusFinder.pl find /ssd/icages-humandb/" . $hg . "_funseq2.txt $snpFile funseq2 $snpFile." . $hg . "_funseq2_dropped\n");
664 | #	    !system("/ssd/icages-humandb/genomeLocusFinder.pl find /ssd/icages-humandb/" . $hg . "_funseq2.txt $snpFile funseq2 $snpFile." . $hg . "_funseq2_dropped") or die "ERROR: cannot call funseq2\n";
665 | 	    
666 | 	    !system("$icagesLocation/genomeLocusFinder.pl find $DBLocation" . $hg . "_funseq2.txt $snpFile funseq2 $snpFile." . $hg . "_funseq2_dropped") or die "ERROR: cannot call funseq2\n";
667 | 	    
668 | 	    # !system("$callAnnovar -filter -out $snpFile -build $hg -dbtype funseq2 $snpFile $DBLocation") or die "ERROR: cannot call funseq2\n";
669 | 	}else{
670 | 	    print "NOTICE: SNV file has 0 size, skip funseq score annotation\n";
671 | 	}
672 |         exit 0;
673 |     }
674 |     $children_pids[3] = fork();
675 |     if($children_pids[3] == 0){
676 |         print "NOTICE: start annotating each mutaiton using ANNOVAR\n";
677 |         !system("$callAnnovar -out $annovarInputFile -build $hg $annovarInputFile $DBLocation") or die "ERROR: cannot call annovar\n";
678 |         exit 0;
679 |     }
680 |     for (0.. $#children_pids){
681 |         waitpid($children_pids[$_], 0);
682 |     }
683 | }
684 | 
685 | 
686 | 
687 | 


--------------------------------------------------------------------------------
/bin/DGIdb/local/lib.pm:
--------------------------------------------------------------------------------
   1 | package local::lib;
   2 | use 5.006;
   3 | use strict;
   4 | use warnings;
   5 | use Config;
   6 | 
   7 | our $VERSION = '2.000014';
   8 | $VERSION = eval $VERSION;
   9 | 
  10 | BEGIN {
  11 |   *_WIN32 = ($^O eq 'MSWin32' || $^O eq 'NetWare' || $^O eq 'symbian')
  12 |     ? sub(){1} : sub(){0};
  13 |   # punt on these systems
  14 |   *_USE_FSPEC = ($^O eq 'MacOS' || $^O eq 'VMS' || $INC{'File/Spec.pm'})
  15 |     ? sub(){1} : sub(){0};
  16 | }
  17 | our $_DIR_JOIN = _WIN32 ? '\\' : '/';
  18 | our $_DIR_SPLIT = (_WIN32 || $^O eq 'cygwin') ? qr{[\\/]}
  19 |                                               : qr{/};
  20 | our $_ROOT = _WIN32 ? do {
  21 |   my $UNC = qr{[\\/]{2}[^\\/]+[\\/][^\\/]+};
  22 |   qr{^(?:$UNC|[A-Za-z]:|)$_DIR_SPLIT};
  23 | } : qr{^/};
  24 | our $_PERL;
  25 | 
  26 | sub _cwd {
  27 |   my $drive = shift;
  28 |   if (!$_PERL) {
  29 |     ($_PERL) = $^X =~ /(.+)/; # $^X is internal how could it be tainted?!
  30 |     if (_is_abs($_PERL)) {
  31 |     }
  32 |     elsif (-x $Config{perlpath}) {
  33 |       $_PERL = $Config{perlpath};
  34 |     }
  35 |     else {
  36 |       ($_PERL) =
  37 |         map { /(.*)/ }
  38 |         grep { -x $_ }
  39 |         map { join($_DIR_JOIN, $_, $_PERL) }
  40 |         split /\Q$Config{path_sep}\E/, $ENV{PATH};
  41 |     }
  42 |   }
  43 |   local @ENV{qw(PATH IFS CDPATH ENV BASH_ENV)};
  44 |   my $cmd = $drive ? "eval { Cwd::getdcwd(q($drive)) }"
  45 |                    : 'getcwd';
  46 |   my $cwd = `"$_PERL" -MCwd -le "print $cmd"`;
  47 |   chomp $cwd;
  48 |   if (!length $cwd && $drive) {
  49 |     $cwd = $drive;
  50 |   }
  51 |   $cwd =~ s/$_DIR_SPLIT?$/$_DIR_JOIN/;
  52 |   $cwd;
  53 | }
  54 | 
  55 | sub _catdir {
  56 |   if (_USE_FSPEC) {
  57 |     require File::Spec;
  58 |     File::Spec->catdir(@_);
  59 |   }
  60 |   else {
  61 |     my $dir = join($_DIR_JOIN, @_);
  62 |     $dir =~ s{($_DIR_SPLIT)(?:\.?$_DIR_SPLIT)+}{$1}g;
  63 |     $dir;
  64 |   }
  65 | }
  66 | 
  67 | sub _is_abs {
  68 |   if (_USE_FSPEC) {
  69 |     require File::Spec;
  70 |     File::Spec->file_name_is_absolute($_[0]);
  71 |   }
  72 |   else {
  73 |     $_[0] =~ $_ROOT;
  74 |   }
  75 | }
  76 | 
  77 | sub _rel2abs {
  78 |   my ($dir, $base) = @_;
  79 |   return $dir
  80 |     if _is_abs($dir);
  81 | 
  82 |   $base = _WIN32 && $dir =~ s/^([A-Za-z]:)// ? _cwd("$1")
  83 |         : $base                              ? $base
  84 |                                              : _cwd;
  85 |   return _catdir($base, $dir);
  86 | }
  87 | 
  88 | sub import {
  89 |   my ($class, @args) = @_;
  90 |   push @args, @ARGV
  91 |     if $0 eq '-';
  92 | 
  93 |   my @steps;
  94 |   my %opts;
  95 |   my $shelltype;
  96 | 
  97 |   while (@args) {
  98 |     my $arg = shift @args;
  99 |     # check for lethal dash first to stop processing before causing problems
 100 |     # the fancy dash is U+2212 or \xE2\x88\x92
 101 |     if ($arg =~ /\xE2\x88\x92/ or $arg =~ /−/) {
 102 |       die <<'DEATH';
 103 | WHOA THERE! It looks like you've got some fancy dashes in your commandline!
 104 | These are *not* the traditional -- dashes that software recognizes. You
 105 | probably got these by copy-pasting from the perldoc for this module as
 106 | rendered by a UTF8-capable formatter. This most typically happens on an OS X
 107 | terminal, but can happen elsewhere too. Please try again after replacing the
 108 | dashes with normal minus signs.
 109 | DEATH
 110 |     }
 111 |     elsif ($arg eq '--self-contained') {
 112 |       die <<'DEATH';
 113 | FATAL: The local::lib --self-contained flag has never worked reliably and the
 114 | original author, Mark Stosberg, was unable or unwilling to maintain it. As
 115 | such, this flag has been removed from the local::lib codebase in order to
 116 | prevent misunderstandings and potentially broken builds. The local::lib authors
 117 | recommend that you look at the lib::core::only module shipped with this
 118 | distribution in order to create a more robust environment that is equivalent to
 119 | what --self-contained provided (although quite possibly not what you originally
 120 | thought it provided due to the poor quality of the documentation, for which we
 121 | apologise).
 122 | DEATH
 123 |     }
 124 |     elsif( $arg =~ /^--deactivate(?:=(.*))?$/ ) {
 125 |       my $path = defined $1 ? $1 : shift @args;
 126 |       push @steps, ['deactivate', $path];
 127 |     }
 128 |     elsif ( $arg eq '--deactivate-all' ) {
 129 |       push @steps, ['deactivate_all'];
 130 |     }
 131 |     elsif ( $arg =~ /^--shelltype(?:=(.*))?$/ ) {
 132 |       $shelltype = defined $1 ? $1 : shift @args;
 133 |     }
 134 |     elsif ( $arg eq '--no-create' ) {
 135 |       $opts{no_create} = 1;
 136 |     }
 137 |     elsif ( $arg =~ /^--/ ) {
 138 |       die "Unknown import argument: $arg";
 139 |     }
 140 |     else {
 141 |       push @steps, ['activate', $arg];
 142 |     }
 143 |   }
 144 |   if (!@steps) {
 145 |     push @steps, ['activate', undef];
 146 |   }
 147 | 
 148 |   my $self = $class->new(%opts);
 149 | 
 150 |   for (@steps) {
 151 |     my ($method, @args) = @$_;
 152 |     $self = $self->$method(@args);
 153 |   }
 154 | 
 155 |   if ($0 eq '-') {
 156 |     print $self->environment_vars_string($shelltype);
 157 |     exit 0;
 158 |   }
 159 |   else {
 160 |     $self->setup_local_lib;
 161 |   }
 162 | }
 163 | 
 164 | sub new {
 165 |   my $class = shift;
 166 |   bless {@_}, $class;
 167 | }
 168 | 
 169 | sub clone {
 170 |   my $self = shift;
 171 |   bless {%$self, @_}, ref $self;
 172 | }
 173 | 
 174 | sub inc { $_[0]->{inc}     ||= \@INC }
 175 | sub libs { $_[0]->{libs}   ||= [ \'PERL5LIB' ] }
 176 | sub bins { $_[0]->{bins}   ||= [ \'PATH' ] }
 177 | sub roots { $_[0]->{roots} ||= [ \'PERL_LOCAL_LIB_ROOT' ] }
 178 | sub extra { $_[0]->{extra} ||= {} }
 179 | sub no_create { $_[0]->{no_create} }
 180 | 
 181 | my $_archname = $Config{archname};
 182 | my $_version  = $Config{version};
 183 | my @_inc_version_list = reverse split / /, $Config{inc_version_list};
 184 | my $_path_sep = $Config{path_sep};
 185 | 
 186 | sub _as_list {
 187 |   my $list = shift;
 188 |   grep length, map {
 189 |     !(ref $_ && ref $_ eq 'SCALAR') ? $_ : (
 190 |       defined $ENV{$$_} ? split(/\Q$_path_sep/, $ENV{$$_})
 191 |                         : ()
 192 |     )
 193 |   } ref $list ? @$list : $list;
 194 | }
 195 | sub _remove_from {
 196 |   my ($list, @remove) = @_;
 197 |   return @$list
 198 |     if !@remove;
 199 |   my %remove = map { $_ => 1 } @remove;
 200 |   grep !$remove{$_}, _as_list($list);
 201 | }
 202 | 
 203 | my @_lib_subdirs = (
 204 |   [$_version, $_archname],
 205 |   [$_version],
 206 |   [$_archname],
 207 |   (@_inc_version_list ? \@_inc_version_list : ()),
 208 |   [],
 209 | );
 210 | 
 211 | sub install_base_bin_path {
 212 |   my ($class, $path) = @_;
 213 |   return _catdir($path, 'bin');
 214 | }
 215 | sub install_base_perl_path {
 216 |   my ($class, $path) = @_;
 217 |   return _catdir($path, 'lib', 'perl5');
 218 | }
 219 | sub install_base_arch_path {
 220 |   my ($class, $path) = @_;
 221 |   _catdir($class->install_base_perl_path($path), $_archname);
 222 | }
 223 | 
 224 | sub lib_paths_for {
 225 |   my ($class, $path) = @_;
 226 |   my $base = $class->install_base_perl_path($path);
 227 |   return map { _catdir($base, @$_) } @_lib_subdirs;
 228 | }
 229 | 
 230 | sub _mm_escape_path {
 231 |   my $path = shift;
 232 |   $path =~ s/\\/\\\\\\\\/g;
 233 |   if ($path =~ s/ /\\ /g) {
 234 |     $path = qq{"\\"$path\\""};
 235 |   }
 236 |   return $path;
 237 | }
 238 | 
 239 | sub _mb_escape_path {
 240 |   my $path = shift;
 241 |   $path =~ s/\\/\\\\/g;
 242 |   return qq{"$path"};
 243 | }
 244 | 
 245 | sub installer_options_for {
 246 |   my ($class, $path) = @_;
 247 |   return (
 248 |     PERL_MM_OPT =>
 249 |       defined $path ? "INSTALL_BASE="._mm_escape_path($path) : undef,
 250 |     PERL_MB_OPT =>
 251 |       defined $path ? "--install_base "._mb_escape_path($path) : undef,
 252 |   );
 253 | }
 254 | 
 255 | sub active_paths {
 256 |   my ($self) = @_;
 257 |   $self = ref $self ? $self : $self->new;
 258 | 
 259 |   return grep {
 260 |     # screen out entries that aren't actually reflected in @INC
 261 |     my $active_ll = $self->install_base_perl_path($_);
 262 |     grep { $_ eq $active_ll } @{$self->inc};
 263 |   } _as_list($self->roots);
 264 | }
 265 | 
 266 | 
 267 | sub deactivate {
 268 |   my ($self, $path) = @_;
 269 |   $self = $self->new unless ref $self;
 270 |   $path = $self->resolve_path($path);
 271 |   $path = $self->normalize_path($path);
 272 | 
 273 |   my @active_lls = $self->active_paths;
 274 | 
 275 |   if (!grep { $_ eq $path } @active_lls) {
 276 |     warn "Tried to deactivate inactive local::lib '$path'\n";
 277 |     return $self;
 278 |   }
 279 | 
 280 |   my %args = (
 281 |     bins  => [ _remove_from($self->bins,
 282 |       $self->install_base_bin_path($path)) ],
 283 |     libs  => [ _remove_from($self->libs,
 284 |       $self->install_base_perl_path($path)) ],
 285 |     inc   => [ _remove_from($self->inc,
 286 |       $self->lib_paths_for($path)) ],
 287 |     roots => [ _remove_from($self->roots, $path) ],
 288 |   );
 289 | 
 290 |   $args{extra} = { $self->installer_options_for($args{roots}[0]) };
 291 | 
 292 |   $self->clone(%args);
 293 | }
 294 | 
 295 | sub deactivate_all {
 296 |   my ($self) = @_;
 297 |   $self = $self->new unless ref $self;
 298 | 
 299 |   my @active_lls = $self->active_paths;
 300 | 
 301 |   my %args;
 302 |   if (@active_lls) {
 303 |     %args = (
 304 |       bins => [ _remove_from($self->bins,
 305 |         map $self->install_base_bin_path($_), @active_lls) ],
 306 |       libs => [ _remove_from($self->libs,
 307 |         map $self->install_base_perl_path($_), @active_lls) ],
 308 |       inc => [ _remove_from($self->inc,
 309 |         map $self->lib_paths_for($_), @active_lls) ],
 310 |       roots => [ _remove_from($self->roots, @active_lls) ],
 311 |     );
 312 |   }
 313 | 
 314 |   $args{extra} = { $self->installer_options_for(undef) };
 315 | 
 316 |   $self->clone(%args);
 317 | }
 318 | 
 319 | sub activate {
 320 |   my ($self, $path) = @_;
 321 |   $self = $self->new unless ref $self;
 322 |   $path = $self->resolve_path($path);
 323 |   $self->ensure_dir_structure_for($path)
 324 |     unless $self->no_create;
 325 | 
 326 |   $path = $self->normalize_path($path);
 327 | 
 328 |   my @active_lls = $self->active_paths;
 329 | 
 330 |   if (grep { $_ eq $path } @active_lls[1 .. $#active_lls]) {
 331 |     $self = $self->deactivate($path);
 332 |   }
 333 | 
 334 |   my %args;
 335 |   if (!@active_lls || $active_lls[0] ne $path) {
 336 |     %args = (
 337 |       bins  => [ $self->install_base_bin_path($path), @{$self->bins} ],
 338 |       libs  => [ $self->install_base_perl_path($path), @{$self->libs} ],
 339 |       inc   => [ $self->lib_paths_for($path), @{$self->inc} ],
 340 |       roots => [ $path, @{$self->roots} ],
 341 |     );
 342 |   }
 343 | 
 344 |   $args{extra} = { $self->installer_options_for($path) };
 345 | 
 346 |   $self->clone(%args);
 347 | }
 348 | 
 349 | sub normalize_path {
 350 |   my ($self, $path) = @_;
 351 |   $path = ( Win32::GetShortPathName($path) || $path )
 352 |     if $^O eq 'MSWin32';
 353 |   return $path;
 354 | }
 355 | 
 356 | sub build_environment_vars_for {
 357 |   my $self = $_[0]->new->activate($_[1]);
 358 |   $self->build_environment_vars;
 359 | }
 360 | sub build_activate_environment_vars_for {
 361 |   my $self = $_[0]->new->activate($_[1]);
 362 |   $self->build_environment_vars;
 363 | }
 364 | sub build_deactivate_environment_vars_for {
 365 |   my $self = $_[0]->new->deactivate($_[1]);
 366 |   $self->build_environment_vars;
 367 | }
 368 | sub build_deact_all_environment_vars_for {
 369 |   my $self = $_[0]->new->deactivate_all;
 370 |   $self->build_environment_vars;
 371 | }
 372 | sub build_environment_vars {
 373 |   my $self = shift;
 374 |   (
 375 |     PATH                => join($_path_sep, _as_list($self->bins)),
 376 |     PERL5LIB            => join($_path_sep, _as_list($self->libs)),
 377 |     PERL_LOCAL_LIB_ROOT => join($_path_sep, _as_list($self->roots)),
 378 |     %{$self->extra},
 379 |   );
 380 | }
 381 | 
 382 | sub setup_local_lib_for {
 383 |   my $self = $_[0]->new->activate($_[1]);
 384 |   $self->setup_local_lib;
 385 | }
 386 | 
 387 | sub setup_local_lib {
 388 |   my $self = shift;
 389 | 
 390 |   # if Carp is already loaded, ensure Carp::Heavy is also loaded, to avoid
 391 |   # $VERSION mismatch errors (Carp::Heavy loads Carp, so we do not need to
 392 |   # check in the other direction)
 393 |   require Carp::Heavy if $INC{'Carp.pm'};
 394 | 
 395 |   $self->setup_env_hash;
 396 |   @INC = @{$self->inc};
 397 | }
 398 | 
 399 | sub setup_env_hash_for {
 400 |   my $self = $_[0]->new->activate($_[1]);
 401 |   $self->setup_env_hash;
 402 | }
 403 | sub setup_env_hash {
 404 |   my $self = shift;
 405 |   my %env = $self->build_environment_vars;
 406 |   for my $key (keys %env) {
 407 |     if (defined $env{$key}) {
 408 |       $ENV{$key} = $env{$key};
 409 |     }
 410 |     else {
 411 |       delete $ENV{$key};
 412 |     }
 413 |   }
 414 | }
 415 | 
 416 | sub print_environment_vars_for {
 417 |   print $_[0]->environment_vars_string_for(@_[1..$#_]);
 418 | }
 419 | 
 420 | sub environment_vars_string_for {
 421 |   my $self = $_[0]->new->activate($_[1]);
 422 |   $self->environment_vars_string;
 423 | }
 424 | sub environment_vars_string {
 425 |   my ($self, $shelltype) = @_;
 426 | 
 427 |   $shelltype ||= $self->guess_shelltype;
 428 | 
 429 |   my $extra = $self->extra;
 430 |   my @envs = (
 431 |     PATH                => $self->bins,
 432 |     PERL5LIB            => $self->libs,
 433 |     PERL_LOCAL_LIB_ROOT => $self->roots,
 434 |     map { $_ => $extra->{$_} } sort keys %$extra,
 435 |   );
 436 |   $self->_build_env_string($shelltype, \@envs);
 437 | }
 438 | 
 439 | sub _build_env_string {
 440 |   my ($self, $shelltype, $envs) = @_;
 441 |   my @envs = @$envs;
 442 | 
 443 |   my $build_method = "build_${shelltype}_env_declaration";
 444 | 
 445 |   my $out = '';
 446 |   while (@envs) {
 447 |     my ($name, $value) = (shift(@envs), shift(@envs));
 448 |     if (
 449 |         ref $value
 450 |         && @$value == 1
 451 |         && ref $value->[0]
 452 |         && ref $value->[0] eq 'SCALAR'
 453 |         && ${$value->[0]} eq $name) {
 454 |       next;
 455 |     }
 456 |     $out .= $self->$build_method($name, $value);
 457 |   }
 458 |   my $wrap_method = "wrap_${shelltype}_output";
 459 |   if ($self->can($wrap_method)) {
 460 |     return $self->$wrap_method($out);
 461 |   }
 462 |   return $out;
 463 | }
 464 | 
 465 | sub build_bourne_env_declaration {
 466 |   my ($class, $name, $args) = @_;
 467 |   my $value = $class->_interpolate($args, '${%s}', qr/["\\\$!`]/, '\\%s');
 468 | 
 469 |   if (!defined $value) {
 470 |     return qq{unset $name;\n};
 471 |   }
 472 | 
 473 |   $value =~ s/(^|\G|$_path_sep)\$\{$name\}$_path_sep/$1\${$name}\${$name+$_path_sep}/g;
 474 |   $value =~ s/$_path_sep\$\{$name\}$/\${$name+$_path_sep}\${$name}/;
 475 | 
 476 |   qq{${name}="$value"; export ${name};\n}
 477 | }
 478 | 
 479 | sub build_csh_env_declaration {
 480 |   my ($class, $name, $args) = @_;
 481 |   my ($value, @vars) = $class->_interpolate($args, '${%s}', '"', '"\\%s"');
 482 |   if (!defined $value) {
 483 |     return qq{unsetenv $name;\n};
 484 |   }
 485 | 
 486 |   my $out = '';
 487 |   for my $var (@vars) {
 488 |     $out .= qq{if ! \$?$name setenv $name '';\n};
 489 |   }
 490 | 
 491 |   my $value_without = $value;
 492 |   if ($value_without =~ s/(?:^|$_path_sep)\$\{$name\}(?:$_path_sep|$)//g) {
 493 |     $out .= qq{if "\${$name}" != '' setenv $name "$value";\n};
 494 |     $out .= qq{if "\${$name}" == '' };
 495 |   }
 496 |   $out .= qq{setenv $name "$value_without";\n};
 497 |   return $out;
 498 | }
 499 | 
 500 | sub build_cmd_env_declaration {
 501 |   my ($class, $name, $args) = @_;
 502 |   my $value = $class->_interpolate($args, '%%%s%%', qr(%), '%s');
 503 |   if (!$value) {
 504 |     return qq{\@set $name=\n};
 505 |   }
 506 | 
 507 |   my $out = '';
 508 |   my $value_without = $value;
 509 |   if ($value_without =~ s/(?:^|$_path_sep)%$name%(?:$_path_sep|$)//g) {
 510 |     $out .= qq{\@if not "%$name%"=="" set "$name=$value"\n};
 511 |     $out .= qq{\@if "%$name%"=="" };
 512 |   }
 513 |   $out .= qq{\@set "$name=$value_without"\n};
 514 |   return $out;
 515 | }
 516 | 
 517 | sub build_powershell_env_declaration {
 518 |   my ($class, $name, $args) = @_;
 519 |   my $value = $class->_interpolate($args, '$env:%s', '"', '`%s');
 520 | 
 521 |   if (!$value) {
 522 |     return qq{Remove-Item -ErrorAction 0 Env:\\$name;\n};
 523 |   }
 524 | 
 525 |   my $maybe_path_sep = qq{\$(if("\$env:$name"-eq""){""}else{"$_path_sep"})};
 526 |   $value =~ s/(^|\G|$_path_sep)\$env:$name$_path_sep/$1\$env:$name"+$maybe_path_sep+"/g;
 527 |   $value =~ s/$_path_sep\$env:$name$/"+$maybe_path_sep+\$env:$name+"/;
 528 | 
 529 |   qq{\$env:$name = \$("$value");\n};
 530 | }
 531 | sub wrap_powershell_output {
 532 |   my ($class, $out) = @_;
 533 |   return $out || " \n";
 534 | }
 535 | 
 536 | sub build_fish_env_declaration {
 537 |   my ($class, $name, $args) = @_;
 538 |   my $value = $class->_interpolate($args, '$%s', qr/[\\"' ]/, '\\%s');
 539 |   if (!defined $value) {
 540 |     return qq{set -e $name;\n};
 541 |   }
 542 |   $value =~ s/$_path_sep/ /g;
 543 |   qq{set -x $name $value;\n};
 544 | }
 545 | 
 546 | sub _interpolate {
 547 |   my ($class, $args, $var_pat, $escape, $escape_pat) = @_;
 548 |   return
 549 |     unless defined $args;
 550 |   my @args = ref $args ? @$args : $args;
 551 |   return
 552 |     unless @args;
 553 |   my @vars = map { $$_ } grep { ref $_ eq 'SCALAR' } @args;
 554 |   my $string = join $_path_sep, map {
 555 |     ref $_ eq 'SCALAR' ? sprintf($var_pat, $$_) : do {
 556 |       s/($escape)/sprintf($escape_pat, $1)/ge; $_;
 557 |     };
 558 |   } @args;
 559 |   return wantarray ? ($string, \@vars) : $string;
 560 | }
 561 | 
 562 | sub pipeline;
 563 | 
 564 | sub pipeline {
 565 |   my @methods = @_;
 566 |   my $last = pop(@methods);
 567 |   if (@methods) {
 568 |     \sub {
 569 |       my ($obj, @args) = @_;
 570 |       $obj->${pipeline @methods}(
 571 |         $obj->$last(@args)
 572 |       );
 573 |     };
 574 |   } else {
 575 |     \sub {
 576 |       shift->$last(@_);
 577 |     };
 578 |   }
 579 | }
 580 | 
 581 | sub resolve_path {
 582 |   my ($class, $path) = @_;
 583 | 
 584 |   $path = $class->${pipeline qw(
 585 |     resolve_relative_path
 586 |     resolve_home_path
 587 |     resolve_empty_path
 588 |   )}($path);
 589 | 
 590 |   $path;
 591 | }
 592 | 
 593 | sub resolve_empty_path {
 594 |   my ($class, $path) = @_;
 595 |   if (defined $path) {
 596 |     $path;
 597 |   } else {
 598 |     '~/perl5';
 599 |   }
 600 | }
 601 | 
 602 | sub resolve_home_path {
 603 |   my ($class, $path) = @_;
 604 |   $path =~ /^~([^\/]*)/ or return $path;
 605 |   my $user = $1;
 606 |   my $homedir = do {
 607 |     if (! length($user) && defined $ENV{HOME}) {
 608 |       $ENV{HOME};
 609 |     }
 610 |     else {
 611 |       require File::Glob;
 612 |       File::Glob::bsd_glob("~$user", File::Glob::GLOB_TILDE());
 613 |     }
 614 |   };
 615 |   unless (defined $homedir) {
 616 |     require Carp; require Carp::Heavy;
 617 |     Carp::croak(
 618 |       "Couldn't resolve homedir for "
 619 |       .(defined $user ? $user : 'current user')
 620 |     );
 621 |   }
 622 |   $path =~ s/^~[^\/]*/$homedir/;
 623 |   $path;
 624 | }
 625 | 
 626 | sub resolve_relative_path {
 627 |   my ($class, $path) = @_;
 628 |   _rel2abs($path);
 629 | }
 630 | 
 631 | sub ensure_dir_structure_for {
 632 |   my ($class, $path) = @_;
 633 |   unless (-d $path) {
 634 |     warn "Attempting to create directory ${path}\n";
 635 |   }
 636 |   require File::Basename;
 637 |   my @dirs;
 638 |   while(!-d $path) {
 639 |     push @dirs, $path;
 640 |     $path = File::Basename::dirname($path);
 641 |   }
 642 |   mkdir $_ for reverse @dirs;
 643 |   return;
 644 | }
 645 | 
 646 | sub guess_shelltype {
 647 |   my $shellbin
 648 |     = defined $ENV{SHELL}
 649 |       ? ($ENV{SHELL} =~ /([\w.]+)$/)[-1]
 650 |     : ( $^O eq 'MSWin32' && exists $ENV{'!EXITCODE'} )
 651 |       ? 'bash'
 652 |     : ( $^O eq 'MSWin32' && $ENV{PROMPT} && $ENV{COMSPEC} )
 653 |       ? ($ENV{COMSPEC} =~ /([\w.]+)$/)[-1]
 654 |     : ( $^O eq 'MSWin32' && !$ENV{PROMPT} )
 655 |       ? 'powershell.exe'
 656 |     : 'sh';
 657 | 
 658 |   for ($shellbin) {
 659 |     return
 660 |         /csh$/                   ? 'csh'
 661 |       : /fish/                   ? 'fish'
 662 |       : /command(?:\.com)?$/i    ? 'cmd'
 663 |       : /cmd(?:\.exe)?$/i        ? 'cmd'
 664 |       : /4nt(?:\.exe)?$/i        ? 'cmd'
 665 |       : /powershell(?:\.exe)?$/i ? 'powershell'
 666 |                                  : 'bourne';
 667 |   }
 668 | }
 669 | 
 670 | 1;
 671 | __END__
 672 | 
 673 | =encoding utf8
 674 | 
 675 | =head1 NAME
 676 | 
 677 | local::lib - create and use a local lib/ for perl modules with PERL5LIB
 678 | 
 679 | =head1 SYNOPSIS
 680 | 
 681 | In code -
 682 | 
 683 |   use local::lib; # sets up a local lib at ~/perl5
 684 | 
 685 |   use local::lib '~/foo'; # same, but ~/foo
 686 | 
 687 |   # Or...
 688 |   use FindBin;
 689 |   use local::lib "$FindBin::Bin/../support";  # app-local support library
 690 | 
 691 | From the shell -
 692 | 
 693 |   # Install LWP and its missing dependencies to the '~/perl5' directory
 694 |   perl -MCPAN -Mlocal::lib -e 'CPAN::install(LWP)'
 695 | 
 696 |   # Just print out useful shell commands
 697 |   $ perl -Mlocal::lib
 698 |   PERL_MB_OPT='--install_base /home/username/perl5'; export PERL_MB_OPT;
 699 |   PERL_MM_OPT='INSTALL_BASE=/home/username/perl5'; export PERL_MM_OPT;
 700 |   PERL5LIB="/home/username/perl5/lib/perl5"; export PERL5LIB;
 701 |   PATH="/home/username/perl5/bin:$PATH"; export PATH;
 702 |   PERL_LOCAL_LIB_ROOT="/home/usename/perl5:$PERL_LOCAL_LIB_ROOT"; export PERL_LOCAL_LIB_ROOT;
 703 | 
 704 | From a .bashrc file -
 705 | 
 706 |   [ $SHLVL -eq 1 ] && eval "$(perl -I$HOME/perl5/lib/perl5 -Mlocal::lib)"
 707 | 
 708 | =head2 The bootstrapping technique
 709 | 
 710 | A typical way to install local::lib is using what is known as the
 711 | "bootstrapping" technique.  You would do this if your system administrator
 712 | hasn't already installed local::lib.  In this case, you'll need to install
 713 | local::lib in your home directory.
 714 | 
 715 | Even if you do have administrative privileges, you will still want to set up your
 716 | environment variables, as discussed in step 4. Without this, you would still
 717 | install the modules into the system CPAN installation and also your Perl scripts
 718 | will not use the lib/ path you bootstrapped with local::lib.
 719 | 
 720 | By default local::lib installs itself and the CPAN modules into ~/perl5.
 721 | 
 722 | Windows users must also see L</Differences when using this module under Win32>.
 723 | 
 724 | =over 4
 725 | 
 726 | =item 1.
 727 | 
 728 | Download and unpack the local::lib tarball from CPAN (search for "Download"
 729 | on the CPAN page about local::lib).  Do this as an ordinary user, not as root
 730 | or administrator.  Unpack the file in your home directory or in any other
 731 | convenient location.
 732 | 
 733 | =item 2.
 734 | 
 735 | Run this:
 736 | 
 737 |   perl Makefile.PL --bootstrap
 738 | 
 739 | If the system asks you whether it should automatically configure as much
 740 | as possible, you would typically answer yes.
 741 | 
 742 | In order to install local::lib into a directory other than the default, you need
 743 | to specify the name of the directory when you call bootstrap, as follows:
 744 | 
 745 |   perl Makefile.PL --bootstrap=~/foo
 746 | 
 747 | =item 3.
 748 | 
 749 | Run this: (local::lib assumes you have make installed on your system)
 750 | 
 751 |   make test && make install
 752 | 
 753 | =item 4.
 754 | 
 755 | Now we need to setup the appropriate environment variables, so that Perl
 756 | starts using our newly generated lib/ directory. If you are using bash or
 757 | any other Bourne shells, you can add this to your shell startup script this
 758 | way:
 759 | 
 760 |   echo '[ $SHLVL -eq 1 ] && eval "$(perl -I$HOME/perl5/lib/perl5 -Mlocal::lib)"' >>~/.bashrc
 761 | 
 762 | If you are using C shell, you can do this as follows:
 763 | 
 764 |   /bin/csh
 765 |   echo $SHELL
 766 |   /bin/csh
 767 |   echo 'eval `perl -I$HOME/perl5/lib/perl5 -Mlocal::lib`' >> ~/.cshrc
 768 | 
 769 | If you passed to bootstrap a directory other than default, you also need to
 770 | give that as import parameter to the call of the local::lib module like this
 771 | way:
 772 | 
 773 |   echo '[ $SHLVL -eq 1 ] && eval "$(perl -I$HOME/foo/lib/perl5 -Mlocal::lib=$HOME/foo)"' >>~/.bashrc
 774 | 
 775 | After writing your shell configuration file, be sure to re-read it to get the
 776 | changed settings into your current shell's environment. Bourne shells use
 777 | C<. ~/.bashrc> for this, whereas C shells use C<source ~/.cshrc>.
 778 | 
 779 | =back
 780 | 
 781 | If you're on a slower machine, or are operating under draconian disk space
 782 | limitations, you can disable the automatic generation of manpages from POD when
 783 | installing modules by using the C<--no-manpages> argument when bootstrapping:
 784 | 
 785 |   perl Makefile.PL --bootstrap --no-manpages
 786 | 
 787 | To avoid doing several bootstrap for several Perl module environments on the
 788 | same account, for example if you use it for several different deployed
 789 | applications independently, you can use one bootstrapped local::lib
 790 | installation to install modules in different directories directly this way:
 791 | 
 792 |   cd ~/mydir1
 793 |   perl -Mlocal::lib=./
 794 |   eval $(perl -Mlocal::lib=./)  ### To set the environment for this shell alone
 795 |   printenv                      ### You will see that ~/mydir1 is in the PERL5LIB
 796 |   perl -MCPAN -e install ...    ### whatever modules you want
 797 |   cd ../mydir2
 798 |   ... REPEAT ...
 799 | 
 800 | When used in a C<.bashrc> file, it is recommended that you protect against
 801 | re-activating a directory in a sub-shell.  This can be done by checking the
 802 | C<$SHLVL> variable as shown in synopsis.  Without this, sub-shells created by
 803 | the user or other programs will override changes made to the parent shell's
 804 | environment.
 805 | 
 806 | If you are working with several C<local::lib> environments, you may want to
 807 | remove some of them from the current environment without disturbing the others.
 808 | You can deactivate one environment like this (using bourne sh):
 809 | 
 810 |   eval $(perl -Mlocal::lib=--deactivate,~/path)
 811 | 
 812 | which will generate and run the commands needed to remove C<~/path> from your
 813 | various search paths. Whichever environment was B<activated most recently> will
 814 | remain the target for module installations. That is, if you activate
 815 | C<~/path_A> and then you activate C<~/path_B>, new modules you install will go
 816 | in C<~/path_B>. If you deactivate C<~/path_B> then modules will be installed
 817 | into C<~/pathA> -- but if you deactivate C<~/path_A> then they will still be
 818 | installed in C<~/pathB> because pathB was activated later.
 819 | 
 820 | You can also ask C<local::lib> to clean itself completely out of the current
 821 | shell's environment with the C<--deactivate-all> option.
 822 | For multiple environments for multiple apps you may need to include a modified
 823 | version of the C<< use FindBin >> instructions in the "In code" sample above.
 824 | If you did something like the above, you have a set of Perl modules at C<<
 825 | ~/mydir1/lib >>. If you have a script at C<< ~/mydir1/scripts/myscript.pl >>,
 826 | you need to tell it where to find the modules you installed for it at C<<
 827 | ~/mydir1/lib >>.
 828 | 
 829 | In C<< ~/mydir1/scripts/myscript.pl >>:
 830 | 
 831 |   use strict;
 832 |   use warnings;
 833 |   use local::lib "$FindBin::Bin/..";  ### points to ~/mydir1 and local::lib finds lib
 834 |   use lib "$FindBin::Bin/../lib";     ### points to ~/mydir1/lib
 835 | 
 836 | Put this before any BEGIN { ... } blocks that require the modules you installed.
 837 | 
 838 | =head2 Differences when using this module under Win32
 839 | 
 840 | To set up the proper environment variables for your current session of
 841 | C<CMD.exe>, you can use this:
 842 | 
 843 |   C:\>perl -Mlocal::lib
 844 |   set PERL_MB_OPT=--install_base C:\DOCUME~1\ADMINI~1\perl5
 845 |   set PERL_MM_OPT=INSTALL_BASE=C:\DOCUME~1\ADMINI~1\perl5
 846 |   set PERL5LIB=C:\DOCUME~1\ADMINI~1\perl5\lib\perl5
 847 |   set PATH=C:\DOCUME~1\ADMINI~1\perl5\bin;%PATH%
 848 | 
 849 |   ### To set the environment for this shell alone
 850 |   C:\>perl -Mlocal::lib > %TEMP%\tmp.bat && %TEMP%\tmp.bat && del %TEMP%\tmp.bat
 851 |   ### instead of $(perl -Mlocal::lib=./)
 852 | 
 853 | If you want the environment entries to persist, you'll need to add them to the
 854 | Control Panel's System applet yourself or use L<App::local::lib::Win32Helper>.
 855 | 
 856 | The "~" is translated to the user's profile directory (the directory named for
 857 | the user under "Documents and Settings" (Windows XP or earlier) or "Users"
 858 | (Windows Vista or later)) unless $ENV{HOME} exists. After that, the home
 859 | directory is translated to a short name (which means the directory must exist)
 860 | and the subdirectories are created.
 861 | 
 862 | =head3 PowerShell
 863 | 
 864 | local::lib also supports PowerShell, and can be used with the
 865 | C<Invoke-Expression> cmdlet.
 866 | 
 867 |   Invoke-Expression "$(perl -Mlocal::lib)"
 868 | 
 869 | =head1 RATIONALE
 870 | 
 871 | The version of a Perl package on your machine is not always the version you
 872 | need.  Obviously, the best thing to do would be to update to the version you
 873 | need.  However, you might be in a situation where you're prevented from doing
 874 | this.  Perhaps you don't have system administrator privileges; or perhaps you
 875 | are using a package management system such as Debian, and nobody has yet gotten
 876 | around to packaging up the version you need.
 877 | 
 878 | local::lib solves this problem by allowing you to create your own directory of
 879 | Perl packages downloaded from CPAN (in a multi-user system, this would typically
 880 | be within your own home directory).  The existing system Perl installation is
 881 | not affected; you simply invoke Perl with special options so that Perl uses the
 882 | packages in your own local package directory rather than the system packages.
 883 | local::lib arranges things so that your locally installed version of the Perl
 884 | packages takes precedence over the system installation.
 885 | 
 886 | If you are using a package management system (such as Debian), you don't need to
 887 | worry about Debian and CPAN stepping on each other's toes.  Your local version
 888 | of the packages will be written to an entirely separate directory from those
 889 | installed by Debian.
 890 | 
 891 | =head1 DESCRIPTION
 892 | 
 893 | This module provides a quick, convenient way of bootstrapping a user-local Perl
 894 | module library located within the user's home directory. It also constructs and
 895 | prints out for the user the list of environment variables using the syntax
 896 | appropriate for the user's current shell (as specified by the C<SHELL>
 897 | environment variable), suitable for directly adding to one's shell
 898 | configuration file.
 899 | 
 900 | More generally, local::lib allows for the bootstrapping and usage of a
 901 | directory containing Perl modules outside of Perl's C<@INC>. This makes it
 902 | easier to ship an application with an app-specific copy of a Perl module, or
 903 | collection of modules. Useful in cases like when an upstream maintainer hasn't
 904 | applied a patch to a module of theirs that you need for your application.
 905 | 
 906 | On import, local::lib sets the following environment variables to appropriate
 907 | values:
 908 | 
 909 | =over 4
 910 | 
 911 | =item PERL_MB_OPT
 912 | 
 913 | =item PERL_MM_OPT
 914 | 
 915 | =item PERL5LIB
 916 | 
 917 | =item PATH
 918 | 
 919 | =item PERL_LOCAL_LIB_ROOT
 920 | 
 921 | =back
 922 | 
 923 | When possible, these will be appended to instead of overwritten entirely.
 924 | 
 925 | These values are then available for reference by any code after import.
 926 | 
 927 | =head1 CREATING A SELF-CONTAINED SET OF MODULES
 928 | 
 929 | See L<lib::core::only> for one way to do this - but note that
 930 | there are a number of caveats, and the best approach is always to perform a
 931 | build against a clean perl (i.e. site and vendor as close to empty as possible).
 932 | 
 933 | =head1 IMPORT OPTIONS
 934 | 
 935 | Options are values that can be passed to the C<local::lib> import besides the
 936 | directory to use. They are specified as C<use local::lib '--option'[, path];>
 937 | or C<perl -Mlocal::lib=--option[,path]>.
 938 | 
 939 | =head2 --deactivate
 940 | 
 941 | Remove the chosen path (or the default path) from the module search paths if it
 942 | was added by C<local::lib>, instead of adding it.
 943 | 
 944 | =head2 --deactivate-all
 945 | 
 946 | Remove all directories that were added to search paths by C<local::lib> from the
 947 | search paths.
 948 | 
 949 | =head2 --shelltype
 950 | 
 951 | Specify the shell type to use for output.  By default, the shell will be
 952 | detected based on the environment.  Should be one of: C<bourne>, C<csh>,
 953 | C<cmd>, or C<powershell>.
 954 | 
 955 | =head2 --no-create
 956 | 
 957 | Prevents C<local::lib> from creating directories when activating dirs.  This is
 958 | likely to cause issues on Win32 systems.
 959 | 
 960 | =head1 CLASS METHODS
 961 | 
 962 | =head2 ensure_dir_structure_for
 963 | 
 964 | =over 4
 965 | 
 966 | =item Arguments: $path
 967 | 
 968 | =item Return value: None
 969 | 
 970 | =back
 971 | 
 972 | Attempts to create the given path, and all required parent directories. Throws
 973 | an exception on failure.
 974 | 
 975 | =head2 print_environment_vars_for
 976 | 
 977 | =over 4
 978 | 
 979 | =item Arguments: $path
 980 | 
 981 | =item Return value: None
 982 | 
 983 | =back
 984 | 
 985 | Prints to standard output the variables listed above, properly set to use the
 986 | given path as the base directory.
 987 | 
 988 | =head2 build_environment_vars_for
 989 | 
 990 | =over 4
 991 | 
 992 | =item Arguments: $path
 993 | 
 994 | =item Return value: %environment_vars
 995 | 
 996 | =back
 997 | 
 998 | Returns a hash with the variables listed above, properly set to use the
 999 | given path as the base directory.
1000 | 
1001 | =head2 setup_env_hash_for
1002 | 
1003 | =over 4
1004 | 
1005 | =item Arguments: $path
1006 | 
1007 | =item Return value: None
1008 | 
1009 | =back
1010 | 
1011 | Constructs the C<%ENV> keys for the given path, by calling
1012 | L</build_environment_vars_for>.
1013 | 
1014 | =head2 active_paths
1015 | 
1016 | =over 4
1017 | 
1018 | =item Arguments: None
1019 | 
1020 | =item Return value: @paths
1021 | 
1022 | =back
1023 | 
1024 | Returns a list of active C<local::lib> paths, according to the
1025 | C<PERL_LOCAL_LIB_ROOT> environment variable and verified against
1026 | what is really in C<@INC>.
1027 | 
1028 | =head2 install_base_perl_path
1029 | 
1030 | =over 4
1031 | 
1032 | =item Arguments: $path
1033 | 
1034 | =item Return value: $install_base_perl_path
1035 | 
1036 | =back
1037 | 
1038 | Returns a path describing where to install the Perl modules for this local
1039 | library installation. Appends the directories C<lib> and C<perl5> to the given
1040 | path.
1041 | 
1042 | =head2 lib_paths_for
1043 | 
1044 | =over 4
1045 | 
1046 | =item Arguments: $path
1047 | 
1048 | =item Return value: @lib_paths
1049 | 
1050 | =back
1051 | 
1052 | Returns the list of paths perl will search for libraries, given a base path.
1053 | This includes the base path itself, the architecture specific subdirectory, and
1054 | perl version specific subdirectories.  These paths may not all exist.
1055 | 
1056 | =head2 install_base_bin_path
1057 | 
1058 | =over 4
1059 | 
1060 | =item Arguments: $path
1061 | 
1062 | =item Return value: $install_base_bin_path
1063 | 
1064 | =back
1065 | 
1066 | Returns a path describing where to install the executable programs for this
1067 | local library installation. Appends the directory C<bin> to the given path.
1068 | 
1069 | =head2 installer_options_for
1070 | 
1071 | =over 4
1072 | 
1073 | =item Arguments: $path
1074 | 
1075 | =item Return value: %installer_env_vars
1076 | 
1077 | =back
1078 | 
1079 | Returns a hash of environment variables that should be set to cause
1080 | installation into the given path.
1081 | 
1082 | =head2 resolve_empty_path
1083 | 
1084 | =over 4
1085 | 
1086 | =item Arguments: $path
1087 | 
1088 | =item Return value: $base_path
1089 | 
1090 | =back
1091 | 
1092 | Builds and returns the base path into which to set up the local module
1093 | installation. Defaults to C<~/perl5>.
1094 | 
1095 | =head2 resolve_home_path
1096 | 
1097 | =over 4
1098 | 
1099 | =item Arguments: $path
1100 | 
1101 | =item Return value: $home_path
1102 | 
1103 | =back
1104 | 
1105 | Attempts to find the user's home directory. If installed, uses C<File::HomeDir>
1106 | for this purpose. If no definite answer is available, throws an exception.
1107 | 
1108 | =head2 resolve_relative_path
1109 | 
1110 | =over 4
1111 | 
1112 | =item Arguments: $path
1113 | 
1114 | =item Return value: $absolute_path
1115 | 
1116 | =back
1117 | 
1118 | Translates the given path into an absolute path.
1119 | 
1120 | =head2 resolve_path
1121 | 
1122 | =over 4
1123 | 
1124 | =item Arguments: $path
1125 | 
1126 | =item Return value: $absolute_path
1127 | 
1128 | =back
1129 | 
1130 | Calls the following in a pipeline, passing the result from the previous to the
1131 | next, in an attempt to find where to configure the environment for a local
1132 | library installation: L</resolve_empty_path>, L</resolve_home_path>,
1133 | L</resolve_relative_path>. Passes the given path argument to
1134 | L</resolve_empty_path> which then returns a result that is passed to
1135 | L</resolve_home_path>, which then has its result passed to
1136 | L</resolve_relative_path>. The result of this final call is returned from
1137 | L</resolve_path>.
1138 | 
1139 | =head1 OBJECT INTERFACE
1140 | 
1141 | =head2 new
1142 | 
1143 | =over 4
1144 | 
1145 | =item Arguments: %attributes
1146 | 
1147 | =item Return value: $local_lib
1148 | 
1149 | =back
1150 | 
1151 | Constructs a new C<local::lib> object, representing the current state of
1152 | C<@INC> and the relevant environment variables.
1153 | 
1154 | =head1 ATTRIBUTES
1155 | 
1156 | =head2 roots
1157 | 
1158 | An arrayref representing active C<local::lib> directories.
1159 | 
1160 | =head2 inc
1161 | 
1162 | An arrayref representing C<@INC>.
1163 | 
1164 | =head2 libs
1165 | 
1166 | An arrayref representing the PERL5LIB environment variable.
1167 | 
1168 | =head2 bins
1169 | 
1170 | An arrayref representing the PATH environment variable.
1171 | 
1172 | =head2 extra
1173 | 
1174 | A hashref of extra environment variables (e.g. C<PERL_MM_OPT> and
1175 | C<PERL_MB_OPT>)
1176 | 
1177 | =head2 no_create
1178 | 
1179 | If set, C<local::lib> will not try to create directories when activating them.
1180 | 
1181 | =head1 OBJECT METHODS
1182 | 
1183 | =head2 clone
1184 | 
1185 | =over 4
1186 | 
1187 | =item Arguments: %attributes
1188 | 
1189 | =item Return value: $local_lib
1190 | 
1191 | =back
1192 | 
1193 | Constructs a new C<local::lib> object based on the existing one, overriding the
1194 | specified attributes.
1195 | 
1196 | =head2 activate
1197 | 
1198 | =over 4
1199 | 
1200 | =item Arguments: $path
1201 | 
1202 | =item Return value: $new_local_lib
1203 | 
1204 | =back
1205 | 
1206 | Constructs a new instance with the specified path active.
1207 | 
1208 | =head2 deactivate
1209 | 
1210 | =over 4
1211 | 
1212 | =item Arguments: $path
1213 | 
1214 | =item Return value: $new_local_lib
1215 | 
1216 | =back
1217 | 
1218 | Constructs a new instance with the specified path deactivated.
1219 | 
1220 | =head2 deactivate_all
1221 | 
1222 | =over 4
1223 | 
1224 | =item Arguments: None
1225 | 
1226 | =item Return value: $new_local_lib
1227 | 
1228 | =back
1229 | 
1230 | Constructs a new instance with all C<local::lib> directories deactivated.
1231 | 
1232 | =head2 environment_vars_string
1233 | 
1234 | =over 4
1235 | 
1236 | =item Arguments: [ $shelltype ]
1237 | 
1238 | =item Return value: $shell_env_string
1239 | 
1240 | =back
1241 | 
1242 | Returns a string to set up the C<local::lib>, meant to be run by a shell.
1243 | 
1244 | =head2 build_environment_vars
1245 | 
1246 | =over 4
1247 | 
1248 | =item Arguments: None
1249 | 
1250 | =item Return value: %environment_vars
1251 | 
1252 | =back
1253 | 
1254 | Returns a hash with the variables listed above, properly set to use the
1255 | given path as the base directory.
1256 | 
1257 | =head2 setup_env_hash
1258 | 
1259 | =over 4
1260 | 
1261 | =item Arguments: None
1262 | 
1263 | =item Return value: None
1264 | 
1265 | =back
1266 | 
1267 | Constructs the C<%ENV> keys for the given path, by calling
1268 | L</build_environment_vars>.
1269 | 
1270 | =head2 setup_local_lib
1271 | 
1272 | Constructs the C<%ENV> hash using L</setup_env_hash>, and set up C<@INC>.
1273 | 
1274 | =head1 A WARNING ABOUT UNINST=1
1275 | 
1276 | Be careful about using local::lib in combination with "make install UNINST=1".
1277 | The idea of this feature is that will uninstall an old version of a module
1278 | before installing a new one. However it lacks a safety check that the old
1279 | version and the new version will go in the same directory. Used in combination
1280 | with local::lib, you can potentially delete a globally accessible version of a
1281 | module while installing the new version in a local place. Only combine "make
1282 | install UNINST=1" and local::lib if you understand these possible consequences.
1283 | 
1284 | =head1 LIMITATIONS
1285 | 
1286 | =over 4
1287 | 
1288 | =item * Directory names with spaces in them are not well supported by the perl
1289 | toolchain and the programs it uses.  Pure-perl distributions should support
1290 | spaces, but problems are more likely with dists that require compilation. A
1291 | workaround you can do is moving your local::lib to a directory with spaces
1292 | B<after> you installed all modules inside your local::lib bootstrap. But be
1293 | aware that you can't update or install CPAN modules after the move.
1294 | 
1295 | =item * Rather basic shell detection. Right now anything with csh in its name is
1296 | assumed to be a C shell or something compatible, and everything else is assumed
1297 | to be Bourne, except on Win32 systems. If the C<SHELL> environment variable is
1298 | not set, a Bourne-compatible shell is assumed.
1299 | 
1300 | =item * Kills any existing PERL_MM_OPT or PERL_MB_OPT.
1301 | 
1302 | =item * Should probably auto-fixup CPAN config if not already done.
1303 | 
1304 | =item * On VMS and MacOS Classic (pre-OS X), local::lib loads L<File::Spec>.
1305 | This means any L<File::Spec> version installed in the local::lib will be
1306 | ignored by scripts using local::lib.  A workaround for this is using
1307 | C<use lib "$local_lib/lib/perl5";> instead of using C<local::lib> directly.
1308 | 
1309 | =item * Conflicts with L<ExtUtils::MakeMaker>'s C<PREFIX> option.
1310 | C<local::lib> uses the C<INSTALL_BASE> option, as it has more predictable and
1311 | sane behavior.  If something attempts to use the C<PREFIX> option when running
1312 | a F<Makefile.PL>, L<ExtUtils::MakeMaker> will refuse to run, as the two
1313 | options conflict.  This can be worked around by temporarily unsetting the
1314 | C<PERL_MM_OPT> environment variable.
1315 | 
1316 | =item * Conflicts with L<Module::Build>'s C<--prefix> option.  Similar to the
1317 | previous limitation, but any C<--prefix> option specified will be ignored.
1318 | This can be worked around by temporarily unsetting the C<PERL_MB_OPT>
1319 | environment variable.
1320 | 
1321 | =back
1322 | 
1323 | Patches very much welcome for any of the above.
1324 | 
1325 | =over 4
1326 | 
1327 | =item * On Win32 systems, does not have a way to write the created environment
1328 | variables to the registry, so that they can persist through a reboot.
1329 | 
1330 | =back
1331 | 
1332 | =head1 TROUBLESHOOTING
1333 | 
1334 | If you've configured local::lib to install CPAN modules somewhere in to your
1335 | home directory, and at some point later you try to install a module with C<cpan
1336 | -i Foo::Bar>, but it fails with an error like: C<Warning: You do not have
1337 | permissions to install into /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux at
1338 | /usr/lib64/perl5/5.8.8/Foo/Bar.pm> and buried within the install log is an
1339 | error saying C<'INSTALL_BASE' is not a known MakeMaker parameter name>, then
1340 | you've somehow lost your updated ExtUtils::MakeMaker module.
1341 | 
1342 | To remedy this situation, rerun the bootstrapping procedure documented above.
1343 | 
1344 | Then, run C<rm -r ~/.cpan/build/Foo-Bar*>
1345 | 
1346 | Finally, re-run C<cpan -i Foo::Bar> and it should install without problems.
1347 | 
1348 | =head1 ENVIRONMENT
1349 | 
1350 | =over 4
1351 | 
1352 | =item SHELL
1353 | 
1354 | =item COMSPEC
1355 | 
1356 | local::lib looks at the user's C<SHELL> environment variable when printing out
1357 | commands to add to the shell configuration file.
1358 | 
1359 | On Win32 systems, C<COMSPEC> is also examined.
1360 | 
1361 | =back
1362 | 
1363 | =head1 SEE ALSO
1364 | 
1365 | =over 4
1366 | 
1367 | =item * L<Perl Advent article, 2011|http://perladvent.org/2011/2011-12-01.html>
1368 | 
1369 | =back
1370 | 
1371 | =head1 SUPPORT
1372 | 
1373 | IRC:
1374 | 
1375 |     Join #local-lib on irc.perl.org.
1376 | 
1377 | =head1 AUTHOR
1378 | 
1379 | Matt S Trout <mst@shadowcat.co.uk> http://www.shadowcat.co.uk/
1380 | 
1381 | auto_install fixes kindly sponsored by http://www.takkle.com/
1382 | 
1383 | =head1 CONTRIBUTORS
1384 | 
1385 | Patches to correctly output commands for csh style shells, as well as some
1386 | documentation additions, contributed by Christopher Nehren <apeiron@cpan.org>.
1387 | 
1388 | Doc patches for a custom local::lib directory, more cleanups in the english
1389 | documentation and a L<german documentation|POD2::DE::local::lib> contributed by
1390 | Torsten Raudssus <torsten@raudssus.de>.
1391 | 
1392 | Hans Dieter Pearcey <hdp@cpan.org> sent in some additional tests for ensuring
1393 | things will install properly, submitted a fix for the bug causing problems with
1394 | writing Makefiles during bootstrapping, contributed an example program, and
1395 | submitted yet another fix to ensure that local::lib can install and bootstrap
1396 | properly. Many, many thanks!
1397 | 
1398 | pattern of Freenode IRC contributed the beginnings of the Troubleshooting
1399 | section. Many thanks!
1400 | 
1401 | Patch to add Win32 support contributed by Curtis Jewell <csjewell@cpan.org>.
1402 | 
1403 | Warnings for missing PATH/PERL5LIB (as when not running interactively) silenced
1404 | by a patch from Marco Emilio Poleggi.
1405 | 
1406 | Mark Stosberg <mark@summersault.com> provided the code for the now deleted
1407 | '--self-contained' option.
1408 | 
1409 | Documentation patches to make win32 usage clearer by
1410 | David Mertens <dcmertens.perl@gmail.com> (run4flat).
1411 | 
1412 | Brazilian L<portuguese translation|POD2::PT_BR::local::lib> and minor doc
1413 | patches contributed by Breno G. de Oliveira <garu@cpan.org>.
1414 | 
1415 | Improvements to stacking multiple local::lib dirs and removing them from the
1416 | environment later on contributed by Andrew Rodland <arodland@cpan.org>.
1417 | 
1418 | Patch for Carp version mismatch contributed by Hakim Cassimally
1419 | <osfameron@cpan.org>.
1420 | 
1421 | Rewrite of internals and numerous bug fixes and added features contributed by
1422 | Graham Knop <haarg@haarg.org>.
1423 | 
1424 | =head1 COPYRIGHT
1425 | 
1426 | Copyright (c) 2007 - 2013 the local::lib L</AUTHOR> and L</CONTRIBUTORS> as
1427 | listed above.
1428 | 
1429 | =head1 LICENSE
1430 | 
1431 | This is free software; you can redistribute it and/or modify it under
1432 | the same terms as the Perl 5 programming language system itself.
1433 | 
1434 | =cut
1435 | 


--------------------------------------------------------------------------------