├── README.md
├── _config.yml
├── bilm-tf
    ├── LICENSE
    ├── README.md
    ├── bilm
    │   ├── __init__.py
    │   ├── data.py
    │   ├── elmo.py
    │   ├── model.py
    │   └── training.py
    └── setup.py
├── cache.py
├── frontend
    ├── constants
    │   └── constants.js
    ├── index.html
    ├── index_content.html
    ├── info.js
    ├── js
    │   ├── analytics.js
    │   ├── bootstrap.css
    │   ├── bootstrap.min.js
    │   ├── global.css
    │   ├── jquery.js
    │   └── logo.png
    └── loading_icon.gif
├── install.sh
├── main.py
├── mapping
    ├── README.md
    ├── bbn.logic.mapping
    ├── bbn.mapping
    ├── figer.logic.mapping
    ├── figer.mapping
    ├── ontonotes.logic.mapping
    └── ontonotes.mapping
├── requirements.txt
├── scripts.py
├── server.py
└── zoe_utils.py


/README.md:
--------------------------------------------------------------------------------
 1 | # ZOE (Zero-shot Open Entity Typing)
 2 | A state of the art system for zero-shot entity fine typing with minimum supervision
 3 | 
 4 | ## Introduction
 5 | 
 6 | This is a demo system for our paper "Zero-Shot Open Entity Typing as Type-Compatible Grounding",
 7 | which at the time of publication represents the state-of-the-art of zero-shot entity typing.
 8 | 
 9 | The original experiments that produced all the results in the paper
10 | are done with a package written in Java. This is a re-written package solely for
11 | the purpose of demoing the algorithm and validating key results. 
12 | 
13 | The results may be slightly different with published numbers, due to the randomness in Java's 
14 | HashSet and Python set's iteration order. The difference should be negligible.
15 | 
16 | This system may take a long time if ran on a large number of new sentences, due to ELMo processing.
17 | We have cached ELMo results for the provided experiments.
18 | 
19 | The package also contains an online demo, please refer to [Publication Page](http://cogcomp.org/page/publication_view/845)
20 | for more details.
21 | 
22 | ## Usage
23 | 
24 | ### Install the system
25 | 
26 | #### Prerequisites
27 | 
28 | * Minimum 20G available disk space and 16G memory. (strict requirement)
29 | * Python 3.X (Mostly tested on 3.5)
30 | * A POSIX OS (Windows not supported)
31 | * Java JDK and Maven
32 | * `virtualenv` if you are installing with script
33 | * `wget` if you are installing with script (Use brew to install it on OSX)
34 | * `unzip` if you are installing with script
35 | 
36 | #### Install using a one-line command
37 | 
38 | To make life easier, we provide a simple way to install with `sh install.sh`.
39 | 
40 | This script does everything mentioned in the next section, plus creating a virtualenv. Use `source venv/bin/activate` to activate.
41 | 
42 | #### Install manually
43 | 
44 | See wiki [manual-installation](https://github.com/CogComp/zoe/wiki/Manual-Installation)
45 | 
46 | ### Run the system
47 | 
48 | Currently you can do the following without changes to the code:
49 | * Run experiment on FIGER test set (randomly sampled as the paper): `python3 main.py figer`
50 | * Run experiment on BBN test set: `python3 main.py bbn`
51 | * Run experiment on the first 1000 Ontonotes_fine test set instances (due to size issue): `python3 main.py ontonotes`
52 | 
53 | Additionally, you can run server mode that initializes the online demo with `python3 server.py`
54 | However, this requires some additional files that's not provided for download yet.
55 | Please directly contact the authors.
56 | 
57 | It's generally an expensive operation to run on large numerb of new sentences, but you are welcome to do it.
58 | Please refer to `main.py` and [Engineering Details](https://github.com/CogComp/zoe/wiki/Engineering-Details) 
59 | to see how you can test on your own data. 
60 | 
61 | 
62 | ## Citation
63 | See the following paper: 
64 | ```
65 | @inproceedings{ZKTR18,
66 |     author = {Ben Zhou, Daniel Khashabi, Chen-Tse Tsai and Dan Roth },
67 |     title = {Zero-Shot Open Entity Typing as Type-Compatible Grounding},
68 |     booktitle = {EMNLP},
69 |     year = {2018},
70 | }
71 | ```
72 | 


--------------------------------------------------------------------------------
/_config.yml:
--------------------------------------------------------------------------------
1 | theme: jekyll-theme-minimal


--------------------------------------------------------------------------------
/bilm-tf/LICENSE:
--------------------------------------------------------------------------------
  1 |                                  Apache License
  2 |                            Version 2.0, January 2004
  3 |                         http://www.apache.org/licenses/
  4 | 
  5 |    TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
  6 | 
  7 |    1. Definitions.
  8 | 
  9 |       "License" shall mean the terms and conditions for use, reproduction,
 10 |       and distribution as defined by Sections 1 through 9 of this document.
 11 | 
 12 |       "Licensor" shall mean the copyright owner or entity authorized by
 13 |       the copyright owner that is granting the License.
 14 | 
 15 |       "Legal Entity" shall mean the union of the acting entity and all
 16 |       other entities that control, are controlled by, or are under common
 17 |       control with that entity. For the purposes of this definition,
 18 |       "control" means (i) the power, direct or indirect, to cause the
 19 |       direction or management of such entity, whether by contract or
 20 |       otherwise, or (ii) ownership of fifty percent (50%) or more of the
 21 |       outstanding shares, or (iii) beneficial ownership of such entity.
 22 | 
 23 |       "You" (or "Your") shall mean an individual or Legal Entity
 24 |       exercising permissions granted by this License.
 25 | 
 26 |       "Source" form shall mean the preferred form for making modifications,
 27 |       including but not limited to software source code, documentation
 28 |       source, and configuration files.
 29 | 
 30 |       "Object" form shall mean any form resulting from mechanical
 31 |       transformation or translation of a Source form, including but
 32 |       not limited to compiled object code, generated documentation,
 33 |       and conversions to other media types.
 34 | 
 35 |       "Work" shall mean the work of authorship, whether in Source or
 36 |       Object form, made available under the License, as indicated by a
 37 |       copyright notice that is included in or attached to the work
 38 |       (an example is provided in the Appendix below).
 39 | 
 40 |       "Derivative Works" shall mean any work, whether in Source or Object
 41 |       form, that is based on (or derived from) the Work and for which the
 42 |       editorial revisions, annotations, elaborations, or other modifications
 43 |       represent, as a whole, an original work of authorship. For the purposes
 44 |       of this License, Derivative Works shall not include works that remain
 45 |       separable from, or merely link (or bind by name) to the interfaces of,
 46 |       the Work and Derivative Works thereof.
 47 | 
 48 |       "Contribution" shall mean any work of authorship, including
 49 |       the original version of the Work and any modifications or additions
 50 |       to that Work or Derivative Works thereof, that is intentionally
 51 |       submitted to Licensor for inclusion in the Work by the copyright owner
 52 |       or by an individual or Legal Entity authorized to submit on behalf of
 53 |       the copyright owner. For the purposes of this definition, "submitted"
 54 |       means any form of electronic, verbal, or written communication sent
 55 |       to the Licensor or its representatives, including but not limited to
 56 |       communication on electronic mailing lists, source code control systems,
 57 |       and issue tracking systems that are managed by, or on behalf of, the
 58 |       Licensor for the purpose of discussing and improving the Work, but
 59 |       excluding communication that is conspicuously marked or otherwise
 60 |       designated in writing by the copyright owner as "Not a Contribution."
 61 | 
 62 |       "Contributor" shall mean Licensor and any individual or Legal Entity
 63 |       on behalf of whom a Contribution has been received by Licensor and
 64 |       subsequently incorporated within the Work.
 65 | 
 66 |    2. Grant of Copyright License. Subject to the terms and conditions of
 67 |       this License, each Contributor hereby grants to You a perpetual,
 68 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 69 |       copyright license to reproduce, prepare Derivative Works of,
 70 |       publicly display, publicly perform, sublicense, and distribute the
 71 |       Work and such Derivative Works in Source or Object form.
 72 | 
 73 |    3. Grant of Patent License. Subject to the terms and conditions of
 74 |       this License, each Contributor hereby grants to You a perpetual,
 75 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 76 |       (except as stated in this section) patent license to make, have made,
 77 |       use, offer to sell, sell, import, and otherwise transfer the Work,
 78 |       where such license applies only to those patent claims licensable
 79 |       by such Contributor that are necessarily infringed by their
 80 |       Contribution(s) alone or by combination of their Contribution(s)
 81 |       with the Work to which such Contribution(s) was submitted. If You
 82 |       institute patent litigation against any entity (including a
 83 |       cross-claim or counterclaim in a lawsuit) alleging that the Work
 84 |       or a Contribution incorporated within the Work constitutes direct
 85 |       or contributory patent infringement, then any patent licenses
 86 |       granted to You under this License for that Work shall terminate
 87 |       as of the date such litigation is filed.
 88 | 
 89 |    4. Redistribution. You may reproduce and distribute copies of the
 90 |       Work or Derivative Works thereof in any medium, with or without
 91 |       modifications, and in Source or Object form, provided that You
 92 |       meet the following conditions:
 93 | 
 94 |       (a) You must give any other recipients of the Work or
 95 |           Derivative Works a copy of this License; and
 96 | 
 97 |       (b) You must cause any modified files to carry prominent notices
 98 |           stating that You changed the files; and
 99 | 
100 |       (c) You must retain, in the Source form of any Derivative Works
101 |           that You distribute, all copyright, patent, trademark, and
102 |           attribution notices from the Source form of the Work,
103 |           excluding those notices that do not pertain to any part of
104 |           the Derivative Works; and
105 | 
106 |       (d) If the Work includes a "NOTICE" text file as part of its
107 |           distribution, then any Derivative Works that You distribute must
108 |           include a readable copy of the attribution notices contained
109 |           within such NOTICE file, excluding those notices that do not
110 |           pertain to any part of the Derivative Works, in at least one
111 |           of the following places: within a NOTICE text file distributed
112 |           as part of the Derivative Works; within the Source form or
113 |           documentation, if provided along with the Derivative Works; or,
114 |           within a display generated by the Derivative Works, if and
115 |           wherever such third-party notices normally appear. The contents
116 |           of the NOTICE file are for informational purposes only and
117 |           do not modify the License. You may add Your own attribution
118 |           notices within Derivative Works that You distribute, alongside
119 |           or as an addendum to the NOTICE text from the Work, provided
120 |           that such additional attribution notices cannot be construed
121 |           as modifying the License.
122 | 
123 |       You may add Your own copyright statement to Your modifications and
124 |       may provide additional or different license terms and conditions
125 |       for use, reproduction, or distribution of Your modifications, or
126 |       for any such Derivative Works as a whole, provided Your use,
127 |       reproduction, and distribution of the Work otherwise complies with
128 |       the conditions stated in this License.
129 | 
130 |    5. Submission of Contributions. Unless You explicitly state otherwise,
131 |       any Contribution intentionally submitted for inclusion in the Work
132 |       by You to the Licensor shall be under the terms and conditions of
133 |       this License, without any additional terms or conditions.
134 |       Notwithstanding the above, nothing herein shall supersede or modify
135 |       the terms of any separate license agreement you may have executed
136 |       with Licensor regarding such Contributions.
137 | 
138 |    6. Trademarks. This License does not grant permission to use the trade
139 |       names, trademarks, service marks, or product names of the Licensor,
140 |       except as required for reasonable and customary use in describing the
141 |       origin of the Work and reproducing the content of the NOTICE file.
142 | 
143 |    7. Disclaimer of Warranty. Unless required by applicable law or
144 |       agreed to in writing, Licensor provides the Work (and each
145 |       Contributor provides its Contributions) on an "AS IS" BASIS,
146 |       WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 |       implied, including, without limitation, any warranties or conditions
148 |       of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 |       PARTICULAR PURPOSE. You are solely responsible for determining the
150 |       appropriateness of using or redistributing the Work and assume any
151 |       risks associated with Your exercise of permissions under this License.
152 | 
153 |    8. Limitation of Liability. In no event and under no legal theory,
154 |       whether in tort (including negligence), contract, or otherwise,
155 |       unless required by applicable law (such as deliberate and grossly
156 |       negligent acts) or agreed to in writing, shall any Contributor be
157 |       liable to You for damages, including any direct, indirect, special,
158 |       incidental, or consequential damages of any character arising as a
159 |       result of this License or out of the use or inability to use the
160 |       Work (including but not limited to damages for loss of goodwill,
161 |       work stoppage, computer failure or malfunction, or any and all
162 |       other commercial damages or losses), even if such Contributor
163 |       has been advised of the possibility of such damages.
164 | 
165 |    9. Accepting Warranty or Additional Liability. While redistributing
166 |       the Work or Derivative Works thereof, You may choose to offer,
167 |       and charge a fee for, acceptance of support, warranty, indemnity,
168 |       or other liability obligations and/or rights consistent with this
169 |       License. However, in accepting such obligations, You may act only
170 |       on Your own behalf and on Your sole responsibility, not on behalf
171 |       of any other Contributor, and only if You agree to indemnify,
172 |       defend, and hold each Contributor harmless for any liability
173 |       incurred by, or claims asserted against, such Contributor by reason
174 |       of your accepting any such warranty or additional liability.
175 | 
176 |    END OF TERMS AND CONDITIONS
177 | 
178 |    APPENDIX: How to apply the Apache License to your work.
179 | 
180 |       To apply the Apache License to your work, attach the following
181 |       boilerplate notice, with the fields enclosed by brackets "{}"
182 |       replaced with your own identifying information. (Don't include
183 |       the brackets!)  The text should be enclosed in the appropriate
184 |       comment syntax for the file format. We also recommend that a
185 |       file or class name and description of purpose be included on the
186 |       same "printed page" as the copyright notice for easier
187 |       identification within third-party archives.
188 | 
189 |    Copyright {yyyy} {name of copyright owner}
190 | 
191 |    Licensed under the Apache License, Version 2.0 (the "License");
192 |    you may not use this file except in compliance with the License.
193 |    You may obtain a copy of the License at
194 | 
195 |        http://www.apache.org/licenses/LICENSE-2.0
196 | 
197 |    Unless required by applicable law or agreed to in writing, software
198 |    distributed under the License is distributed on an "AS IS" BASIS,
199 |    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200 |    See the License for the specific language governing permissions and
201 |    limitations under the License.
202 | 


--------------------------------------------------------------------------------
/bilm-tf/README.md:
--------------------------------------------------------------------------------
  1 | # bilm-tf
  2 | Tensorflow implementation of the pretrained biLM used to compute ELMo
  3 | representations from ["Deep contextualized word representations"](http://arxiv.org/abs/1802.05365).
  4 | 
  5 | This repository supports both training biLMs and using pre-trained models for prediction.
  6 | 
  7 | We also have a pytorch implementation available in [AllenNLP](http://allennlp.org/).
  8 | 
  9 | You may also find it easier to use the version provided in [Tensorflow Hub](https://www.tensorflow.org/hub/modules/google/elmo/2) if you just like to make predictions.
 10 | 
 11 | Citation:
 12 | 
 13 | ```
 14 | @inproceedings{Peters:2018,
 15 |   author={Peters, Matthew E. and  Neumann, Mark and Iyyer, Mohit and Gardner, Matt and Clark, Christopher and Lee, Kenton and Zettlemoyer, Luke},
 16 |   title={Deep contextualized word representations},
 17 |   booktitle={Proc. of NAACL},
 18 |   year={2018}
 19 | }
 20 | ```
 21 | 
 22 | 
 23 | ## Installing
 24 | Install python version 3.5 or later, tensorflow version 1.2 and h5py:
 25 | 
 26 | ```
 27 | pip install tensorflow-gpu==1.2 h5py
 28 | python setup.py install
 29 | ```
 30 | 
 31 | Ensure the tests pass in your environment by running:
 32 | ```
 33 | python -m unittest discover tests/
 34 | ```
 35 | 
 36 | ## Installing with Docker
 37 | 
 38 | To run the image, you must use nvidia-docker, because this repository
 39 | requires GPUs.
 40 | ```
 41 | sudo nvidia-docker run -t allennlp/bilm-tf:training-gpu
 42 | ```
 43 | 
 44 | ## Using pre-trained models
 45 | 
 46 | We have several different English language pre-trained biLMs available for use.
 47 | Each model is specified with two separate files, a JSON formatted "options"
 48 | file with hyperparameters and a hdf5 formatted file with the model
 49 | weights.  Links to the pre-trained models are available [here](https://allennlp.org/elmo).
 50 | 
 51 | 
 52 | There are three ways to integrate ELMo representations into a downstream task, depending on your use case.
 53 | 
 54 | 1. Compute representations on the fly from raw text using character input.  This is the most general method and will handle any input text.  It is also the most computationally expensive.
 55 | 2. Precompute and cache the context independent token representations, then compute context dependent representations using the biLSTMs for input data.  This method is less computationally expensive then #1, but is only applicable with a fixed, prescribed vocabulary.
 56 | 3.  Precompute the representations for your entire dataset and save to a file.
 57 | 
 58 | We have used all of these methods in the past for various use cases.  #1 is necessary for evaluating at test time on unseen data (e.g. public SQuAD leaderboard). #2 is a good compromise for large datasets where the size of the file in #3 is unfeasible (SNLI, SQuAD).  #3 is a good choice for smaller datasets or in cases where you'd like to use ELMo in other frameworks.
 59 | 
 60 | In all cases, the process roughly follows the same steps.
 61 | First, create a `Batcher` (or `TokenBatcher` for #2) to translate tokenized strings to numpy arrays of character (or token) ids.
 62 | Then, load the pretrained ELMo model (class `BidirectionalLanguageModel`).
 63 | Finally, for steps #1 and #2 use `weight_layers` to compute the final ELMo representations.
 64 | For #3, use `BidirectionalLanguageModel` to write all the intermediate layers to a file.
 65 | 
 66 | #### Shape conventions
 67 | Each tokenized sentence is a list of `str`, with a batch of sentences
 68 | a list of tokenized sentences (`List[List[str]]`).
 69 | 
 70 | The `Batcher` packs these into a shape
 71 | `(n_sentences, max_sentence_length + 2, 50)` numpy array of character
 72 | ids, padding on the right with 0 ids for sentences less then the maximum
 73 | length.  The first and last tokens for each sentence are special
 74 | begin and end of sentence ids added by the `Batcher`.
 75 | 
 76 | The input character id placeholder can be dimensioned `(None, None, 50)`,
 77 | with both the batch dimension (axis=0) and time dimension (axis=1) determined
 78 | for each batch, up the the maximum batch size specified in the
 79 | `BidirectionalLanguageModel` constructor.
 80 | 
 81 | After running inference with the batch, the return biLM embeddings are
 82 | a numpy array with shape `(n_sentences, 3, max_sentence_length, 1024)`,
 83 | after removing the special begin/end tokens.
 84 | 
 85 | #### Vocabulary file
 86 | The `Batcher` takes a vocabulary file as input for efficency.  This is a
 87 | text file, with one token per line, separated by newlines (`\n`).
 88 | Each token in the vocabulary is cached as the appropriate 50 character id
 89 | sequence once.  Since the model is completely character based, tokens not in
 90 | the vocabulary file are handled appropriately at run time, with a slight
 91 | decrease in run time.  It is recommended to always include the special
 92 | `<S>` and `</S>` tokens (case sensitive) in the vocabulary file.
 93 | 
 94 | ### ELMo with character input
 95 | 
 96 | See `usage_character.py` for a detailed usage example.
 97 | 
 98 | ### ELMo with pre-computed and cached context independent token representations
 99 | To speed up model inference with a fixed, specified vocabulary, it is
100 | possible to pre-compute the context independent token representations,
101 | write them to a file, and re-use them for inference.  Note that we don't
102 | support falling back to character inputs for out-of-vocabulary words,
103 | so this should only be used when the biLM is used to compute embeddings
104 | for input with a fixed, defined vocabulary.
105 | 
106 | To use this option:
107 | 
108 | 1.  First create a vocabulary file with all of the unique tokens in your
109 | dataset and add the special `<S>` and `</S>` tokens.
110 | 2.  Run `dump_token_embeddings` with the full model to write the token
111 | embeddings to a hdf5 file.
112 | 3.  Use `TokenBatcher` (instead of `Batcher`) with your vocabulary file,
113 | and pass `use_token_inputs=False` and the name of the output file from step
114 | 2 to the `BidirectonalLanguageModel` constructor.
115 | 
116 | See `usage_token.py` for a detailed usage example.
117 | 
118 | ### Dumping biLM embeddings for an entire dataset to a single file.
119 | 
120 | To take this option, create a text file with your tokenized dataset.  Each line is one tokenized sentence (whitespace separated).  Then use `dump_bilm_embeddings`.
121 | 
122 | The output file is `hdf5` format.  Each sentence in the input data is stored as a dataset with key `str(sentence_id)` where `sentence_id` is the line number in the dataset file (indexed from 0).
123 | The embeddings for each sentence are a shape (3, n_tokens, 1024) array.
124 | 
125 | See `usage_cached.py` for a detailed example.
126 | 
127 | ## Training a biLM on a new corpus
128 | 
129 | Broadly speaking, the process to train and use a new biLM is:
130 | 
131 | 1.  Prepare input data and a vocabulary file.
132 | 2.  Train the biLM.
133 | 3.  Test (compute the perplexity of) the biLM on heldout data.
134 | 4.  Write out the weights from the trained biLM to a hdf5 file.
135 | 5.  See the instructions above for using the output from Step #4 in downstream models.
136 | 
137 | 
138 | #### 1.  Prepare input data and a vocabulary file.
139 | To train and evaluate a biLM, you need to provide:
140 | 
141 | * a vocabulary file
142 | * a set of training files
143 | * a set of heldout files
144 | 
145 | The vocabulary file is a a text file with one token per line.  It must also include the special tokens `<S>`, `</S>` and `<UNK>` (case sensitive) in the file.
146 | 
147 | <i>IMPORTANT</i>: the vocabulary file should be sorted in descending order by token count in your training data.  The first three lines should be the special tokens (`<S>`, `</S>` and `<UNK>`), then the most common token in the training data, ending with the least common token.
148 | 
149 | <i>NOTE</i>: the vocabulary file used in training may differ from the one use for prediction.
150 | 
151 | The training data should be randomly split into many training files,
152 | each containing one slice of the data.  Each file contains pre-tokenized and
153 | white space separated text, one sentence per line.
154 | Don't include the `<S>` or `</S>` tokens in your training data.
155 | 
156 | All tokenization/normalization is done before training a model, so both
157 | the vocabulary file and training files should include normalized tokens.
158 | As the default settings use a fully character based token representation, in general we do not recommend any normalization other then tokenization.
159 | 
160 | Finally, reserve a small amount of the training data as heldout data for evaluating the trained biLM.
161 | 
162 | #### 2.  Train the biLM.
163 | The hyperparameters used to train the ELMo model can be found in `bin/train_elmo.py`.
164 | 
165 | The ELMo model was trained on 3 GPUs.
166 | To train a new model with the same hyperparameters, first download the training data from the [1 Billion Word Benchmark](http://www.statmt.org/lm-benchmark/).
167 | Then download the [vocabulary file](https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/vocab-2016-09-10.txt).
168 | Finally, run:
169 | 
170 | ```
171 | export CUDA_VISIBLE_DEVICES=0,1,2
172 | python bin/train_elmo.py \
173 |     --train_prefix='/path/to/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/*' \
174 |     --vocab_file /path/to/vocab-2016-09-10.txt \
175 |     --save_dir /output_path/to/checkpoint
176 | ```
177 | 
178 | #### 3. Evaluate the trained model.
179 | 
180 | Use `bin/run_test.py` to evaluate a trained model, e.g.
181 | 
182 | ```
183 | export CUDA_VISIBLE_DEVICES=0
184 | python bin/run_test.py \
185 |     --test_prefix='/path/to/1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-000*' \
186 |     --vocab_file /path/to/vocab-2016-09-10.txt \
187 |     --save_dir /output_path/to/checkpoint
188 | ```
189 | 
190 | #### 4. Convert the tensorflow checkpoint to hdf5 for prediction with `bilm` or `allennlp`.
191 | 
192 | First, create an `options.json` file for the newly trained model.  To do so,
193 | follow the template in an existing file (e.g. the [original `options.json`](https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway/elmo_2x4096_512_2048cnn_2xhighway_options.json) and modify for your hyperpararameters.
194 | 
195 | **Important**: always set `n_characters` to 262 after training (see below).
196 | 
197 | Then Run:
198 | 
199 | ```
200 | python bin/dump_weights.py \
201 |     --save_dir /output_path/to/checkpoint
202 |     --outfile /output_path/to/weights.hdf5
203 | ```
204 | 
205 | ## Frequently asked questions and other warnings
206 | 
207 | #### Can you provide the tensorflow checkpoint from training?
208 | The tensorflow checkpoint is available by downloading these files:
209 | 
210 | * [vocabulary](https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/vocab-2016-09-10.txt)
211 | * [checkpoint](https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway_tf_checkpoint/checkpoint)
212 | * [options](https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway_tf_checkpoint/options.json)
213 | * [1](https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway_tf_checkpoint/model.ckpt-935588.data-00000-of-00001)
214 | * [2](https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway_tf_checkpoint/model.ckpt-935588.index)
215 | * [3](https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway_tf_checkpoint/model.ckpt-935588.meta)
216 | 
217 | 
218 | #### How to do fine tune a model on additional unlabeled data?
219 | 
220 | First download the checkpoint files above.
221 | Then prepare the dataset as described in the section "Training a biLM on a new corpus", with the exception that we will use the existing vocabulary file instead of creating a new one.  Finally, use the script `bin/restart.py` to restart training with the existing checkpoint on the new dataset.
222 | For small datasets (e.g. < 10 million tokens) we only recommend tuning for a small number of epochs and monitoring the perplexity on a heldout set, otherwise the model will overfit the small dataset.
223 | 
224 | #### Are the softmax weights available?
225 | 
226 | They are available in the training checkpoint above.
227 | 
228 | #### Can you provide some more details about how the model was trained?
229 | The script `bin/train_elmo.py` has hyperparameters for training the model.
230 | The original model was trained on 3 GTX 1080 for 10 epochs, taking about
231 | two weeks.
232 | 
233 | For input processing, we used the raw 1 Billion Word Benchmark dataset
234 | [here](
235 | http://www.statmt.org/lm-benchmark/1-billion-word-language-modeling-benchmark-r13output.tar.gz), and the existing vocabulary of 793471 tokens, including `<S>`, `</S>` and `<UNK>`.
236 | You can find our vocabulary file [here](https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/vocab-2016-09-10.txt).
237 | At the model input, all text used the full character based representation,
238 | including tokens outside the vocab.
239 | For the softmax output we replaced OOV tokens with `<UNK>`.
240 | 
241 | The model was trained with a fixed size window of 20 tokens.
242 | The batches were constructed by padding sentences with `<S>` and `</S>`, then packing tokens from one or more sentences into each row to fill completely fill each batch.
243 | Partial sentences and the LSTM states were carried over from batch to batch so that the language model could use information across batches for context, but backpropogation was broken at each batch boundary.
244 | 
245 | #### Why do I get slightly different embeddings if I run the same text through the pre-trained model twice?
246 | As a result of the training method (see above), the LSTMs are stateful, and carry their state forward from batch to batch.
247 | Consequently, this introduces a small amount of non-determinism, expecially
248 | for the first two batches.
249 | 
250 | #### Why does training seem to take forever even with my small dataset?
251 | The number of gradient updates during training is determined by:
252 | 
253 | * the number of tokens in the training data (`n_train_tokens`)
254 | * the batch size (`batch_size`)
255 | * the number of epochs (`n_epochs`)
256 | 
257 | Be sure to set these values for your particular dataset in `bin/train_elmo.py`.
258 | 
259 | 
260 | #### What's the deal with `n_characters` and padding?
261 | During training, we fill each batch to exactly 20 tokens by adding `<S>` and `</S>` to each sentence, then packing tokens from one or more sentences into each row to fill completely fill each batch.
262 | As a result, we do not allocate space for a special padding token.
263 | The `UnicodeCharsVocabulary` that converts token strings to lists of character
264 | ids always uses a fixed number of character embeddings of `n_characters=261`, so always
265 | set `n_characters=261` during training.
266 | 
267 | However, for prediction, we ensure each sentence is fully contained in a single batch,
268 | and as a result pad sentences of different lengths with a special padding id.
269 | This occurs in the `Batcher` [see here](https://github.com/allenai/bilm-tf/blob/master/bilm/data.py#L220).
270 | As a result, set `n_characters=262` during prediction in the `options.json`.
271 | 
272 | #### How can I use ELMo to compute sentence representations?
273 | Simple methods like average and max pooling of the word level ELMo representations across sentences works well, often outperforming supervised methods on benchmark datasets.
274 | See "Evaluation of sentence embeddings in downstream and linguistic probing tasks", Perone et al, 2018 [arxiv link](https://arxiv.org/abs/1806.06259).
275 | 
276 | 
277 | 


--------------------------------------------------------------------------------
/bilm-tf/bilm/__init__.py:
--------------------------------------------------------------------------------
1 | 
2 | from .data import Batcher, TokenBatcher
3 | from .elmo import weight_layers
4 | from .model import BidirectionalLanguageModel, dump_token_embeddings, \
5 |     dump_bilm_embeddings, dump_bilm_embeddings_inner, initialize_sess
6 | 
7 | 


--------------------------------------------------------------------------------
/bilm-tf/bilm/data.py:
--------------------------------------------------------------------------------
  1 | # originally based on https://github.com/tensorflow/models/tree/master/lm_1b
  2 | import glob
  3 | import random
  4 | from typing import List
  5 | 
  6 | import numpy as np
  7 | 
  8 | 
  9 | class Vocabulary(object):
 10 |     '''
 11 |     A token vocabulary.  Holds a map from token to ids and provides
 12 |     a method for encoding text to a sequence of ids.
 13 |     '''
 14 |     def __init__(self, filename, validate_file=False):
 15 |         '''
 16 |         filename = the vocabulary file.  It is a flat text file with one
 17 |             (normalized) token per line.  In addition, the file should also
 18 |             contain the special tokens <S>, </S>, <UNK> (case sensitive).
 19 |         '''
 20 |         self._id_to_word = []
 21 |         self._word_to_id = {}
 22 |         self._unk = -1
 23 |         self._bos = -1
 24 |         self._eos = -1
 25 | 
 26 |         with open(filename) as f:
 27 |             idx = 0
 28 |             for line in f:
 29 |                 word_name = line.strip()
 30 |                 if word_name == '<S>':
 31 |                     self._bos = idx
 32 |                 elif word_name == '</S>':
 33 |                     self._eos = idx
 34 |                 elif word_name == '<UNK>':
 35 |                     self._unk = idx
 36 |                 if word_name == '!!!MAXTERMID':
 37 |                     continue
 38 | 
 39 |                 self._id_to_word.append(word_name)
 40 |                 self._word_to_id[word_name] = idx
 41 |                 idx += 1
 42 | 
 43 |         # check to ensure file has special tokens
 44 |         if validate_file:
 45 |             if self._bos == -1 or self._eos == -1 or self._unk == -1:
 46 |                 raise ValueError("Ensure the vocabulary file has "
 47 |                                  "<S>, </S>, <UNK> tokens")
 48 | 
 49 |     @property
 50 |     def bos(self):
 51 |         return self._bos
 52 | 
 53 |     @property
 54 |     def eos(self):
 55 |         return self._eos
 56 | 
 57 |     @property
 58 |     def unk(self):
 59 |         return self._unk
 60 | 
 61 |     @property
 62 |     def size(self):
 63 |         return len(self._id_to_word)
 64 | 
 65 |     def word_to_id(self, word):
 66 |         if word in self._word_to_id:
 67 |             return self._word_to_id[word]
 68 |         return self.unk
 69 | 
 70 |     def id_to_word(self, cur_id):
 71 |         return self._id_to_word[cur_id]
 72 | 
 73 |     def decode(self, cur_ids):
 74 |         """Convert a list of ids to a sentence, with space inserted."""
 75 |         return ' '.join([self.id_to_word(cur_id) for cur_id in cur_ids])
 76 | 
 77 |     def encode(self, sentence, reverse=False, split=True):
 78 |         """Convert a sentence to a list of ids, with special tokens added.
 79 |         Sentence is a single string with tokens separated by whitespace.
 80 | 
 81 |         If reverse, then the sentence is assumed to be reversed, and
 82 |             this method will swap the BOS/EOS tokens appropriately."""
 83 | 
 84 |         if split:
 85 |             word_ids = [
 86 |                 self.word_to_id(cur_word) for cur_word in sentence.split()
 87 |             ]
 88 |         else:
 89 |             word_ids = [self.word_to_id(cur_word) for cur_word in sentence]
 90 | 
 91 |         if reverse:
 92 |             return np.array([self.eos] + word_ids + [self.bos], dtype=np.int32)
 93 |         else:
 94 |             return np.array([self.bos] + word_ids + [self.eos], dtype=np.int32)
 95 | 
 96 | 
 97 | class UnicodeCharsVocabulary(Vocabulary):
 98 |     """Vocabulary containing character-level and word level information.
 99 | 
100 |     Has a word vocabulary that is used to lookup word ids and
101 |     a character id that is used to map words to arrays of character ids.
102 | 
103 |     The character ids are defined by ord(c) for c in word.encode('utf-8')
104 |     This limits the total number of possible char ids to 256.
105 |     To this we add 5 additional special ids: begin sentence, end sentence,
106 |         begin word, end word and padding.
107 | 
108 |     WARNING: for prediction, we add +1 to the output ids from this
109 |     class to create a special padding id (=0).  As a result, we suggest
110 |     you use the `Batcher`, `TokenBatcher`, and `LMDataset` classes instead
111 |     of this lower level class.  If you are using this lower level class,
112 |     then be sure to add the +1 appropriately, otherwise embeddings computed
113 |     from the pre-trained model will be useless.
114 |     """
115 |     def __init__(self, filename, max_word_length, **kwargs):
116 |         super(UnicodeCharsVocabulary, self).__init__(filename, **kwargs)
117 |         self._max_word_length = max_word_length
118 | 
119 |         # char ids 0-255 come from utf-8 encoding bytes
120 |         # assign 256-300 to special chars
121 |         self.bos_char = 256  # <begin sentence>
122 |         self.eos_char = 257  # <end sentence>
123 |         self.bow_char = 258  # <begin word>
124 |         self.eow_char = 259  # <end word>
125 |         self.pad_char = 260 # <padding>
126 | 
127 |         num_words = len(self._id_to_word)
128 | 
129 |         self._word_char_ids = np.zeros([num_words, max_word_length],
130 |             dtype=np.int32)
131 | 
132 |         # the charcter representation of the begin/end of sentence characters
133 |         def _make_bos_eos(c):
134 |             r = np.zeros([self.max_word_length], dtype=np.int32)
135 |             r[:] = self.pad_char
136 |             r[0] = self.bow_char
137 |             r[1] = c
138 |             r[2] = self.eow_char
139 |             return r
140 |         self.bos_chars = _make_bos_eos(self.bos_char)
141 |         self.eos_chars = _make_bos_eos(self.eos_char)
142 | 
143 |         for i, word in enumerate(self._id_to_word):
144 |             self._word_char_ids[i] = self._convert_word_to_char_ids(word)
145 | 
146 |         self._word_char_ids[self.bos] = self.bos_chars
147 |         self._word_char_ids[self.eos] = self.eos_chars
148 |         # TODO: properly handle <UNK>
149 | 
150 |     @property
151 |     def word_char_ids(self):
152 |         return self._word_char_ids
153 | 
154 |     @property
155 |     def max_word_length(self):
156 |         return self._max_word_length
157 | 
158 |     def _convert_word_to_char_ids(self, word):
159 |         code = np.zeros([self.max_word_length], dtype=np.int32)
160 |         code[:] = self.pad_char
161 | 
162 |         word_encoded = word.encode('utf-8', 'ignore')[:(self.max_word_length-2)]
163 |         code[0] = self.bow_char
164 |         for k, chr_id in enumerate(word_encoded, start=1):
165 |             code[k] = chr_id
166 |         code[k + 1] = self.eow_char
167 | 
168 |         return code
169 | 
170 |     def word_to_char_ids(self, word):
171 |         if word in self._word_to_id:
172 |             return self._word_char_ids[self._word_to_id[word]]
173 |         else:
174 |             return self._convert_word_to_char_ids(word)
175 | 
176 |     def encode_chars(self, sentence, reverse=False, split=True):
177 |         '''
178 |         Encode the sentence as a white space delimited string of tokens.
179 |         '''
180 |         if split:
181 |             chars_ids = [self.word_to_char_ids(cur_word)
182 |                      for cur_word in sentence.split()]
183 |         else:
184 |             chars_ids = [self.word_to_char_ids(cur_word)
185 |                      for cur_word in sentence]
186 |         if reverse:
187 |             return np.vstack([self.eos_chars] + chars_ids + [self.bos_chars])
188 |         else:
189 |             return np.vstack([self.bos_chars] + chars_ids + [self.eos_chars])
190 | 
191 | 
192 | class Batcher(object):
193 |     ''' 
194 |     Batch sentences of tokenized text into character id matrices.
195 |     '''
196 |     def __init__(self, lm_vocab_file: str, max_token_length: int):
197 |         '''
198 |         lm_vocab_file = the language model vocabulary file (one line per
199 |             token)
200 |         max_token_length = the maximum number of characters in each token
201 |         '''
202 |         self._lm_vocab = UnicodeCharsVocabulary(
203 |             lm_vocab_file, max_token_length
204 |         )
205 |         self._max_token_length = max_token_length
206 | 
207 |     def batch_sentences(self, sentences: List[List[str]]):
208 |         '''
209 |         Batch the sentences as character ids
210 |         Each sentence is a list of tokens without <s> or </s>, e.g.
211 |         [['The', 'first', 'sentence', '.'], ['Second', '.']]
212 |         '''
213 |         n_sentences = len(sentences)
214 |         max_length = max(len(sentence) for sentence in sentences) + 2
215 | 
216 |         X_char_ids = np.zeros(
217 |             (n_sentences, max_length, self._max_token_length),
218 |             dtype=np.int64
219 |         )
220 | 
221 |         for k, sent in enumerate(sentences):
222 |             length = len(sent) + 2
223 |             char_ids_without_mask = self._lm_vocab.encode_chars(
224 |                 sent, split=False)
225 |             # add one so that 0 is the mask value
226 |             X_char_ids[k, :length, :] = char_ids_without_mask + 1
227 | 
228 |         return X_char_ids
229 | 
230 | 
231 | class TokenBatcher(object):
232 |     ''' 
233 |     Batch sentences of tokenized text into token id matrices.
234 |     '''
235 |     def __init__(self, lm_vocab_file: str):
236 |         '''
237 |         lm_vocab_file = the language model vocabulary file (one line per
238 |             token)
239 |         '''
240 |         self._lm_vocab = Vocabulary(lm_vocab_file)
241 | 
242 |     def batch_sentences(self, sentences: List[List[str]]):
243 |         '''
244 |         Batch the sentences as character ids
245 |         Each sentence is a list of tokens without <s> or </s>, e.g.
246 |         [['The', 'first', 'sentence', '.'], ['Second', '.']]
247 |         '''
248 |         n_sentences = len(sentences)
249 |         max_length = max(len(sentence) for sentence in sentences) + 2
250 | 
251 |         X_ids = np.zeros((n_sentences, max_length), dtype=np.int64)
252 | 
253 |         for k, sent in enumerate(sentences):
254 |             length = len(sent) + 2
255 |             ids_without_mask = self._lm_vocab.encode(sent, split=False)
256 |             # add one so that 0 is the mask value
257 |             X_ids[k, :length] = ids_without_mask + 1
258 | 
259 |         return X_ids
260 | 
261 | 
262 | ##### for training
263 | def _get_batch(generator, batch_size, num_steps, max_word_length):
264 |     """Read batches of input."""
265 |     cur_stream = [None] * batch_size
266 | 
267 |     no_more_data = False
268 |     while True:
269 |         inputs = np.zeros([batch_size, num_steps], np.int32)
270 |         if max_word_length is not None:
271 |             char_inputs = np.zeros([batch_size, num_steps, max_word_length],
272 |                                 np.int32)
273 |         else:
274 |             char_inputs = None
275 |         targets = np.zeros([batch_size, num_steps], np.int32)
276 | 
277 |         for i in range(batch_size):
278 |             cur_pos = 0
279 | 
280 |             while cur_pos < num_steps:
281 |                 if cur_stream[i] is None or len(cur_stream[i][0]) <= 1:
282 |                     try:
283 |                         cur_stream[i] = list(next(generator))
284 |                     except StopIteration:
285 |                         # No more data, exhaust current streams and quit
286 |                         no_more_data = True
287 |                         break
288 | 
289 |                 how_many = min(len(cur_stream[i][0]) - 1, num_steps - cur_pos)
290 |                 next_pos = cur_pos + how_many
291 | 
292 |                 inputs[i, cur_pos:next_pos] = cur_stream[i][0][:how_many]
293 |                 if max_word_length is not None:
294 |                     char_inputs[i, cur_pos:next_pos] = cur_stream[i][1][
295 |                                                                     :how_many]
296 |                 targets[i, cur_pos:next_pos] = cur_stream[i][0][1:how_many+1]
297 | 
298 |                 cur_pos = next_pos
299 | 
300 |                 cur_stream[i][0] = cur_stream[i][0][how_many:]
301 |                 if max_word_length is not None:
302 |                     cur_stream[i][1] = cur_stream[i][1][how_many:]
303 | 
304 |         if no_more_data:
305 |             # There is no more data.  Note: this will not return data
306 |             # for the incomplete batch
307 |             break
308 | 
309 |         X = {'token_ids': inputs, 'tokens_characters': char_inputs,
310 |                  'next_token_id': targets}
311 | 
312 |         yield X
313 | 
314 | class LMDataset(object):
315 |     """
316 |     Hold a language model dataset.
317 | 
318 |     A dataset is a list of tokenized files.  Each file contains one sentence
319 |         per line.  Each sentence is pre-tokenized and white space joined.
320 |     """
321 |     def __init__(self, filepattern, vocab, reverse=False, test=False,
322 |                  shuffle_on_load=False):
323 |         '''
324 |         filepattern = a glob string that specifies the list of files.
325 |         vocab = an instance of Vocabulary or UnicodeCharsVocabulary
326 |         reverse = if True, then iterate over tokens in each sentence in reverse
327 |         test = if True, then iterate through all data once then stop.
328 |             Otherwise, iterate forever.
329 |         shuffle_on_load = if True, then shuffle the sentences after loading.
330 |         '''
331 |         self._vocab = vocab
332 |         self._all_shards = glob.glob(filepattern)
333 |         print('Found %d shards at %s' % (len(self._all_shards), filepattern))
334 |         self._shards_to_choose = []
335 | 
336 |         self._reverse = reverse
337 |         self._test = test
338 |         self._shuffle_on_load = shuffle_on_load
339 |         self._use_char_inputs = hasattr(vocab, 'encode_chars')
340 | 
341 |         self._ids = self._load_random_shard()
342 | 
343 |     def _choose_random_shard(self):
344 |         if len(self._shards_to_choose) == 0:
345 |             self._shards_to_choose = list(self._all_shards)
346 |             random.shuffle(self._shards_to_choose)
347 |         shard_name = self._shards_to_choose.pop()
348 |         return shard_name
349 | 
350 |     def _load_random_shard(self):
351 |         """Randomly select a file and read it."""
352 |         if self._test:
353 |             if len(self._all_shards) == 0:
354 |                 # we've loaded all the data
355 |                 # this will propogate up to the generator in get_batch
356 |                 # and stop iterating
357 |                 raise StopIteration
358 |             else:
359 |                 shard_name = self._all_shards.pop()
360 |         else:
361 |             # just pick a random shard
362 |             shard_name = self._choose_random_shard()
363 | 
364 |         ids = self._load_shard(shard_name)
365 |         self._i = 0
366 |         self._nids = len(ids)
367 |         return ids
368 | 
369 |     def _load_shard(self, shard_name):
370 |         """Read one file and convert to ids.
371 | 
372 |         Args:
373 |             shard_name: file path.
374 | 
375 |         Returns:
376 |             list of (id, char_id) tuples.
377 |         """
378 |         print('Loading data from: %s' % shard_name)
379 |         with open(shard_name) as f:
380 |             sentences_raw = f.readlines()
381 | 
382 |         if self._reverse:
383 |             sentences = []
384 |             for sentence in sentences_raw:
385 |                 splitted = sentence.split()
386 |                 splitted.reverse()
387 |                 sentences.append(' '.join(splitted))
388 |         else:
389 |             sentences = sentences_raw
390 | 
391 |         if self._shuffle_on_load:
392 |             random.shuffle(sentences)
393 | 
394 |         ids = [self.vocab.encode(sentence, self._reverse)
395 |                for sentence in sentences]
396 |         if self._use_char_inputs:
397 |             chars_ids = [self.vocab.encode_chars(sentence, self._reverse)
398 |                      for sentence in sentences]
399 |         else:
400 |             chars_ids = [None] * len(ids)
401 | 
402 |         print('Loaded %d sentences.' % len(ids))
403 |         print('Finished loading')
404 |         return list(zip(ids, chars_ids))
405 | 
406 |     def get_sentence(self):
407 |         while True:
408 |             if self._i == self._nids:
409 |                 self._ids = self._load_random_shard()
410 |             ret = self._ids[self._i]
411 |             self._i += 1
412 |             yield ret
413 | 
414 |     @property
415 |     def max_word_length(self):
416 |         if self._use_char_inputs:
417 |             return self._vocab.max_word_length
418 |         else:
419 |             return None
420 | 
421 |     def iter_batches(self, batch_size, num_steps):
422 |         for X in _get_batch(self.get_sentence(), batch_size, num_steps,
423 |                            self.max_word_length):
424 | 
425 |             # token_ids = (batch_size, num_steps)
426 |             # char_inputs = (batch_size, num_steps, 50) of character ids
427 |             # targets = word ID of next word (batch_size, num_steps)
428 |             yield X
429 | 
430 |     @property
431 |     def vocab(self):
432 |         return self._vocab
433 | 
434 | class BidirectionalLMDataset(object):
435 |     def __init__(self, filepattern, vocab, test=False, shuffle_on_load=False):
436 |         '''
437 |         bidirectional version of LMDataset
438 |         '''
439 |         self._data_forward = LMDataset(
440 |             filepattern, vocab, reverse=False, test=test,
441 |             shuffle_on_load=shuffle_on_load)
442 |         self._data_reverse = LMDataset(
443 |             filepattern, vocab, reverse=True, test=test,
444 |             shuffle_on_load=shuffle_on_load)
445 | 
446 |     def iter_batches(self, batch_size, num_steps):
447 |         max_word_length = self._data_forward.max_word_length
448 | 
449 |         for X, Xr in zip(
450 |             _get_batch(self._data_forward.get_sentence(), batch_size,
451 |                       num_steps, max_word_length),
452 |             _get_batch(self._data_reverse.get_sentence(), batch_size,
453 |                       num_steps, max_word_length)
454 |             ):
455 | 
456 |             for k, v in Xr.items():
457 |                 X[k + '_reverse'] = v
458 | 
459 |             yield X
460 | 
461 | 
462 | class InvalidNumberOfCharacters(Exception):
463 |     pass
464 | 
465 | 


--------------------------------------------------------------------------------
/bilm-tf/bilm/elmo.py:
--------------------------------------------------------------------------------
  1 | 
  2 | import tensorflow as tf
  3 | 
  4 | def weight_layers(name, bilm_ops, l2_coef=None,
  5 |                   use_top_only=False, do_layer_norm=False):
  6 |     '''
  7 |     Weight the layers of a biLM with trainable scalar weights to
  8 |     compute ELMo representations.
  9 | 
 10 |     For each output layer, this returns two ops.  The first computes
 11 |         a layer specific weighted average of the biLM layers, and
 12 |         the second the l2 regularizer loss term.
 13 |     The regularization terms are also add to tf.GraphKeys.REGULARIZATION_LOSSES 
 14 | 
 15 |     Input:
 16 |         name = a string prefix used for the trainable variable names
 17 |         bilm_ops = the tensorflow ops returned to compute internal
 18 |             representations from a biLM.  This is the return value
 19 |             from BidirectionalLanguageModel(...)(ids_placeholder)
 20 |         l2_coef: the l2 regularization coefficient $\lambda$.
 21 |             Pass None or 0.0 for no regularization.
 22 |         use_top_only: if True, then only use the top layer.
 23 |         do_layer_norm: if True, then apply layer normalization to each biLM
 24 |             layer before normalizing
 25 | 
 26 |     Output:
 27 |         {
 28 |             'weighted_op': op to compute weighted average for output,
 29 |             'regularization_op': op to compute regularization term
 30 |         }
 31 |     '''
 32 |     def _l2_regularizer(weights):
 33 |         if l2_coef is not None:
 34 |             return l2_coef * tf.reduce_sum(tf.square(weights))
 35 |         else:
 36 |             return 0.0
 37 | 
 38 |     # Get ops for computing LM embeddings and mask
 39 |     lm_embeddings = bilm_ops['lm_embeddings']
 40 |     mask = bilm_ops['mask']
 41 | 
 42 |     n_lm_layers = int(lm_embeddings.get_shape()[1])
 43 |     lm_dim = int(lm_embeddings.get_shape()[3])
 44 | 
 45 |     with tf.control_dependencies([lm_embeddings, mask]):
 46 |         # Cast the mask and broadcast for layer use.
 47 |         mask_float = tf.cast(mask, 'float32')
 48 |         broadcast_mask = tf.expand_dims(mask_float, axis=-1)
 49 | 
 50 |         def _do_ln(x):
 51 |             # do layer normalization excluding the mask
 52 |             x_masked = x * broadcast_mask
 53 |             N = tf.reduce_sum(mask_float) * lm_dim
 54 |             mean = tf.reduce_sum(x_masked) / N
 55 |             variance = tf.reduce_sum(((x_masked - mean) * broadcast_mask)**2
 56 |                                     ) / N
 57 |             return tf.nn.batch_normalization(
 58 |                 x, mean, variance, None, None, 1E-12
 59 |             )
 60 | 
 61 |         if use_top_only:
 62 |             layers = tf.split(lm_embeddings, n_lm_layers, axis=1)
 63 |             # just the top layer
 64 |             sum_pieces = tf.squeeze(layers[-1], squeeze_dims=1)
 65 |             # no regularization
 66 |             reg = 0.0
 67 |         else:
 68 |             W = tf.get_variable(
 69 |                 '{}_ELMo_W'.format(name),
 70 |                 shape=(n_lm_layers, ),
 71 |                 initializer=tf.zeros_initializer,
 72 |                 regularizer=_l2_regularizer,
 73 |                 trainable=True,
 74 |             )
 75 | 
 76 |             # normalize the weights
 77 |             normed_weights = tf.split(
 78 |                 tf.nn.softmax(W + 1.0 / n_lm_layers), n_lm_layers
 79 |             )
 80 |             # split LM layers
 81 |             layers = tf.split(lm_embeddings, n_lm_layers, axis=1)
 82 |     
 83 |             # compute the weighted, normalized LM activations
 84 |             pieces = []
 85 |             for w, t in zip(normed_weights, layers):
 86 |                 if do_layer_norm:
 87 |                     pieces.append(w * _do_ln(tf.squeeze(t, squeeze_dims=1)))
 88 |                 else:
 89 |                     pieces.append(w * tf.squeeze(t, squeeze_dims=1))
 90 |             sum_pieces = tf.add_n(pieces)
 91 |     
 92 |             # get the regularizer 
 93 |             reg = [
 94 |                 r for r in tf.get_collection(
 95 |                                 tf.GraphKeys.REGULARIZATION_LOSSES)
 96 |                 if r.name.find('{}_ELMo_W/'.format(name)) >= 0
 97 |             ]
 98 |             if len(reg) != 1:
 99 |                 raise ValueError
100 | 
101 |         # scale the weighted sum by gamma
102 |         gamma = tf.get_variable(
103 |             '{}_ELMo_gamma'.format(name),
104 |             shape=(1, ),
105 |             initializer=tf.ones_initializer,
106 |             regularizer=None,
107 |             trainable=True,
108 |         )
109 |         weighted_lm_layers = sum_pieces * gamma
110 | 
111 |         ret = {'weighted_op': weighted_lm_layers, 'regularization_op': reg}
112 | 
113 |     return ret
114 | 
115 | 


--------------------------------------------------------------------------------
/bilm-tf/bilm/model.py:
--------------------------------------------------------------------------------
  1 | 
  2 | import json
  3 | 
  4 | import h5py
  5 | import numpy as np
  6 | import tensorflow as tf
  7 | 
  8 | from .data import UnicodeCharsVocabulary, Batcher
  9 | 
 10 | DTYPE = 'float32'
 11 | DTYPE_INT = 'int64'
 12 | 
 13 | 
 14 | class BidirectionalLanguageModel(object):
 15 |     def __init__(
 16 |             self,
 17 |             options_file: str,
 18 |             weight_file: str,
 19 |             use_character_inputs=True,
 20 |             embedding_weight_file=None,
 21 |             max_batch_size=128,
 22 |         ):
 23 |         '''
 24 |         Creates the language model computational graph and loads weights
 25 | 
 26 |         Two options for input type:
 27 |             (1) To use character inputs (paired with Batcher)
 28 |                 pass use_character_inputs=True, and ids_placeholder
 29 |                 of shape (None, None, max_characters_per_token)
 30 |                 to __call__
 31 |             (2) To use token ids as input (paired with TokenBatcher),
 32 |                 pass use_character_inputs=False and ids_placeholder
 33 |                 of shape (None, None) to __call__.
 34 |                 In this case, embedding_weight_file is also required input
 35 | 
 36 |         options_file: location of the json formatted file with
 37 |                       LM hyperparameters
 38 |         weight_file: location of the hdf5 file with LM weights
 39 |         use_character_inputs: if True, then use character ids as input,
 40 |             otherwise use token ids
 41 |         max_batch_size: the maximum allowable batch size 
 42 |         '''
 43 |         with open(options_file, 'r') as fin:
 44 |             options = json.load(fin)
 45 | 
 46 |         if not use_character_inputs:
 47 |             if embedding_weight_file is None:
 48 |                 raise ValueError(
 49 |                     "embedding_weight_file is required input with "
 50 |                     "not use_character_inputs"
 51 |                 )
 52 | 
 53 |         self._options = options
 54 |         self._weight_file = weight_file
 55 |         self._embedding_weight_file = embedding_weight_file
 56 |         self._use_character_inputs = use_character_inputs
 57 |         self._max_batch_size = max_batch_size
 58 | 
 59 |         self._ops = {}
 60 |         self._graphs = {}
 61 | 
 62 |     def __call__(self, ids_placeholder):
 63 |         '''
 64 |         Given the input character ids (or token ids), returns a dictionary
 65 |             with tensorflow ops:
 66 | 
 67 |             {'lm_embeddings': embedding_op,
 68 |              'lengths': sequence_lengths_op,
 69 |              'mask': op to compute mask}
 70 | 
 71 |         embedding_op computes the LM embeddings and is shape
 72 |             (None, 3, None, 1024)
 73 |         lengths_op computes the sequence lengths and is shape (None, )
 74 |         mask computes the sequence mask and is shape (None, None)
 75 | 
 76 |         ids_placeholder: a tf.placeholder of type int32.
 77 |             If use_character_inputs=True, it is shape
 78 |                 (None, None, max_characters_per_token) and holds the input
 79 |                 character ids for a batch
 80 |             If use_character_input=False, it is shape (None, None) and
 81 |                 holds the input token ids for a batch
 82 |         '''
 83 |         if ids_placeholder in self._ops:
 84 |             # have already created ops for this placeholder, just return them
 85 |             ret = self._ops[ids_placeholder]
 86 | 
 87 |         else:
 88 |             # need to create the graph
 89 |             if len(self._ops) == 0:
 90 |                 # first time creating the graph, don't reuse variables
 91 |                 lm_graph = BidirectionalLanguageModelGraph(
 92 |                     self._options,
 93 |                     self._weight_file,
 94 |                     ids_placeholder,
 95 |                     embedding_weight_file=self._embedding_weight_file,
 96 |                     use_character_inputs=self._use_character_inputs,
 97 |                     max_batch_size=self._max_batch_size)
 98 |             else:
 99 |                 with tf.variable_scope('', reuse=True):
100 |                     lm_graph = BidirectionalLanguageModelGraph(
101 |                         self._options,
102 |                         self._weight_file,
103 |                         ids_placeholder,
104 |                         embedding_weight_file=self._embedding_weight_file,
105 |                         use_character_inputs=self._use_character_inputs,
106 |                         max_batch_size=self._max_batch_size)
107 | 
108 |             ops = self._build_ops(lm_graph)
109 |             self._ops[ids_placeholder] = ops
110 |             self._graphs[ids_placeholder] = lm_graph
111 |             ret = ops
112 | 
113 |         return ret
114 | 
115 |     def _build_ops(self, lm_graph):
116 |         with tf.control_dependencies([lm_graph.update_state_op]):
117 |             # get the LM embeddings
118 |             token_embeddings = lm_graph.embedding
119 |             layers = [
120 |                 tf.concat([token_embeddings, token_embeddings], axis=2)
121 |             ]
122 | 
123 |             n_lm_layers = len(lm_graph.lstm_outputs['forward'])
124 |             for i in range(n_lm_layers):
125 |                 layers.append(
126 |                     tf.concat(
127 |                         [lm_graph.lstm_outputs['forward'][i],
128 |                          lm_graph.lstm_outputs['backward'][i]],
129 |                         axis=-1
130 |                     )
131 |                 )
132 | 
133 |             # The layers include the BOS/EOS tokens.  Remove them
134 |             sequence_length_wo_bos_eos = lm_graph.sequence_lengths - 2
135 |             layers_without_bos_eos = []
136 |             for layer in layers:
137 |                 layer_wo_bos_eos = layer[:, 1:, :]
138 |                 layer_wo_bos_eos = tf.reverse_sequence(
139 |                     layer_wo_bos_eos, 
140 |                     lm_graph.sequence_lengths - 1,
141 |                     seq_axis=1,
142 |                     batch_axis=0,
143 |                 )
144 |                 layer_wo_bos_eos = layer_wo_bos_eos[:, 1:, :]
145 |                 layer_wo_bos_eos = tf.reverse_sequence(
146 |                     layer_wo_bos_eos,
147 |                     sequence_length_wo_bos_eos,
148 |                     seq_axis=1,
149 |                     batch_axis=0,
150 |                 )
151 |                 layers_without_bos_eos.append(layer_wo_bos_eos)
152 | 
153 |             # concatenate the layers
154 |             lm_embeddings = tf.concat(
155 |                 [tf.expand_dims(t, axis=1) for t in layers_without_bos_eos],
156 |                 axis=1
157 |             )
158 | 
159 |             # get the mask op without bos/eos.
160 |             # tf doesn't support reversing boolean tensors, so cast
161 |             # to int then back
162 |             mask_wo_bos_eos = tf.cast(lm_graph.mask[:, 1:], 'int32')
163 |             mask_wo_bos_eos = tf.reverse_sequence(
164 |                 mask_wo_bos_eos,
165 |                 lm_graph.sequence_lengths - 1,
166 |                 seq_axis=1,
167 |                 batch_axis=0,
168 |             )
169 |             mask_wo_bos_eos = mask_wo_bos_eos[:, 1:]
170 |             mask_wo_bos_eos = tf.reverse_sequence(
171 |                 mask_wo_bos_eos,
172 |                 sequence_length_wo_bos_eos,
173 |                 seq_axis=1,
174 |                 batch_axis=0,
175 |             )
176 |             mask_wo_bos_eos = tf.cast(mask_wo_bos_eos, 'bool')
177 | 
178 |         return {
179 |             'lm_embeddings': lm_embeddings, 
180 |             'lengths': sequence_length_wo_bos_eos,
181 |             'token_embeddings': lm_graph.embedding,
182 |             'mask': mask_wo_bos_eos,
183 |         }
184 | 
185 | 
186 | def _pretrained_initializer(varname, weight_file, embedding_weight_file=None):
187 |     '''
188 |     We'll stub out all the initializers in the pretrained LM with
189 |     a function that loads the weights from the file
190 |     '''
191 |     weight_name_map = {}
192 |     for i in range(2):
193 |         for j in range(8):  # if we decide to add more layers
194 |             root = 'RNN_{}/RNN/MultiRNNCell/Cell{}'.format(i, j)
195 |             weight_name_map[root + '/rnn/lstm_cell/kernel'] = \
196 |                 root + '/LSTMCell/W_0'
197 |             weight_name_map[root + '/rnn/lstm_cell/bias'] = \
198 |                 root + '/LSTMCell/B'
199 |             weight_name_map[root + '/rnn/lstm_cell/projection/kernel'] = \
200 |                 root + '/LSTMCell/W_P_0'
201 | 
202 |     # convert the graph name to that in the checkpoint
203 |     varname_in_file = varname[5:]
204 |     if varname_in_file.startswith('RNN'):
205 |         varname_in_file = weight_name_map[varname_in_file]
206 | 
207 |     if varname_in_file == 'embedding':
208 |         with h5py.File(embedding_weight_file, 'r') as fin:
209 |             # Have added a special 0 index for padding not present
210 |             # in the original model.
211 |             embed_weights = fin[varname_in_file][...]
212 |             weights = np.zeros(
213 |                 (embed_weights.shape[0] + 1, embed_weights.shape[1]),
214 |                 dtype=DTYPE
215 |             )
216 |             weights[1:, :] = embed_weights
217 |     else:
218 |         with h5py.File(weight_file, 'r') as fin:
219 |             if varname_in_file == 'char_embed':
220 |                 # Have added a special 0 index for padding not present
221 |                 # in the original model.
222 |                 char_embed_weights = fin[varname_in_file][...]
223 |                 weights = np.zeros(
224 |                     (char_embed_weights.shape[0] + 1,
225 |                      char_embed_weights.shape[1]),
226 |                     dtype=DTYPE
227 |                 )
228 |                 weights[1:, :] = char_embed_weights
229 |             else:
230 |                 weights = fin[varname_in_file][...]
231 | 
232 |     # Tensorflow initializers are callables that accept a shape parameter
233 |     # and some optional kwargs
234 |     def ret(shape, **kwargs):
235 |         if list(shape) != list(weights.shape):
236 |             raise ValueError(
237 |                 "Invalid shape initializing {0}, got {1}, expected {2}".format(
238 |                     varname_in_file, shape, weights.shape)
239 |             )
240 |         return weights
241 | 
242 |     return ret
243 | 
244 | 
245 | class BidirectionalLanguageModelGraph(object):
246 |     '''
247 |     Creates the computational graph and holds the ops necessary for runnint
248 |     a bidirectional language model
249 |     '''
250 |     def __init__(self, options, weight_file, ids_placeholder,
251 |                  use_character_inputs=True, embedding_weight_file=None,
252 |                  max_batch_size=128):
253 | 
254 |         self.options = options
255 |         self._max_batch_size = max_batch_size
256 |         self.ids_placeholder = ids_placeholder
257 |         self.use_character_inputs = use_character_inputs
258 | 
259 |         # this custom_getter will make all variables not trainable and
260 |         # override the default initializer
261 |         def custom_getter(getter, name, *args, **kwargs):
262 |             kwargs['trainable'] = False
263 |             for i in range(0, 3):
264 |                 try:
265 |                     kwargs['initializer'] = _pretrained_initializer(
266 |                         name, weight_file, embedding_weight_file
267 |                     )
268 |                 except:
269 |                     continue
270 |                 else:
271 |                     break
272 |                         
273 |             return getter(name, *args, **kwargs)
274 | 
275 |         if embedding_weight_file is not None:
276 |             # get the vocab size
277 |             with h5py.File(embedding_weight_file, 'r') as fin:
278 |                 # +1 for padding
279 |                 self._n_tokens_vocab = fin['embedding'].shape[0] + 1
280 |         else:
281 |             self._n_tokens_vocab = None
282 | 
283 |         with tf.variable_scope('bilm', custom_getter=custom_getter):
284 |             self._build()
285 | 
286 |     def _build(self):
287 |         if self.use_character_inputs:
288 |             self._build_word_char_embeddings()
289 |         else:
290 |             self._build_word_embeddings()
291 |         self._build_lstms()
292 | 
293 |     def _build_word_char_embeddings(self):
294 |         '''
295 |         options contains key 'char_cnn': {
296 | 
297 |         'n_characters': 262,
298 | 
299 |         # includes the start / end characters
300 |         'max_characters_per_token': 50,
301 | 
302 |         'filters': [
303 |             [1, 32],
304 |             [2, 32],
305 |             [3, 64],
306 |             [4, 128],
307 |             [5, 256],
308 |             [6, 512],
309 |             [7, 512]
310 |         ],
311 |         'activation': 'tanh',
312 | 
313 |         # for the character embedding
314 |         'embedding': {'dim': 16}
315 | 
316 |         # for highway layers
317 |         # if omitted, then no highway layers
318 |         'n_highway': 2,
319 |         }
320 |         '''
321 |         projection_dim = self.options['lstm']['projection_dim']
322 | 
323 |         cnn_options = self.options['char_cnn']
324 |         filters = cnn_options['filters']
325 |         n_filters = sum(f[1] for f in filters)
326 |         max_chars = cnn_options['max_characters_per_token']
327 |         char_embed_dim = cnn_options['embedding']['dim']
328 |         n_chars = cnn_options['n_characters']
329 |         if n_chars != 262:
330 |             raise InvalidNumberOfCharacters(
331 |                 "Set n_characters=262 after training see the README.md"
332 |             )
333 |         if cnn_options['activation'] == 'tanh':
334 |             activation = tf.nn.tanh
335 |         elif cnn_options['activation'] == 'relu':
336 |             activation = tf.nn.relu
337 | 
338 |         # the character embeddings
339 |         with tf.device("/gpu:0"):
340 |             self.embedding_weights = tf.get_variable(
341 |                     "char_embed", [n_chars, char_embed_dim],
342 |                     dtype=DTYPE,
343 |                     initializer=tf.random_uniform_initializer(-1.0, 1.0)
344 |             )
345 |             # shape (batch_size, unroll_steps, max_chars, embed_dim)
346 |             self.char_embedding = tf.nn.embedding_lookup(self.embedding_weights,
347 |                                                     self.ids_placeholder)
348 | 
349 |         # the convolutions
350 |         def make_convolutions(inp):
351 |             with tf.variable_scope('CNN') as scope:
352 |                 convolutions = []
353 |                 for i, (width, num) in enumerate(filters):
354 |                     if cnn_options['activation'] == 'relu':
355 |                         # He initialization for ReLU activation
356 |                         # with char embeddings init between -1 and 1
357 |                         #w_init = tf.random_normal_initializer(
358 |                         #    mean=0.0,
359 |                         #    stddev=np.sqrt(2.0 / (width * char_embed_dim))
360 |                         #)
361 | 
362 |                         # Kim et al 2015, +/- 0.05
363 |                         w_init = tf.random_uniform_initializer(
364 |                             minval=-0.05, maxval=0.05)
365 |                     elif cnn_options['activation'] == 'tanh':
366 |                         # glorot init
367 |                         w_init = tf.random_normal_initializer(
368 |                             mean=0.0,
369 |                             stddev=np.sqrt(1.0 / (width * char_embed_dim))
370 |                         )
371 |                     w = tf.get_variable(
372 |                         "W_cnn_%s" % i,
373 |                         [1, width, char_embed_dim, num],
374 |                         initializer=w_init,
375 |                         dtype=DTYPE)
376 |                     b = tf.get_variable(
377 |                         "b_cnn_%s" % i, [num], dtype=DTYPE,
378 |                         initializer=tf.constant_initializer(0.0))
379 | 
380 |                     conv = tf.nn.conv2d(
381 |                             inp, w,
382 |                             strides=[1, 1, 1, 1],
383 |                             padding="VALID") + b
384 |                     # now max pool
385 |                     conv = tf.nn.max_pool(
386 |                             conv, [1, 1, max_chars-width+1, 1],
387 |                             [1, 1, 1, 1], 'VALID')
388 | 
389 |                     # activation
390 |                     conv = activation(conv)
391 |                     conv = tf.squeeze(conv, squeeze_dims=[2])
392 | 
393 |                     convolutions.append(conv)
394 | 
395 |             return tf.concat(convolutions, 2)
396 | 
397 |         embedding = make_convolutions(self.char_embedding)
398 | 
399 |         # for highway and projection layers
400 |         n_highway = cnn_options.get('n_highway')
401 |         use_highway = n_highway is not None and n_highway > 0
402 |         use_proj = n_filters != projection_dim
403 | 
404 |         if use_highway or use_proj:
405 |             #   reshape from (batch_size, n_tokens, dim) to (-1, dim)
406 |             batch_size_n_tokens = tf.shape(embedding)[0:2]
407 |             embedding = tf.reshape(embedding, [-1, n_filters])
408 | 
409 |         # set up weights for projection
410 |         if use_proj:
411 |             assert n_filters > projection_dim
412 |             with tf.variable_scope('CNN_proj') as scope:
413 |                     W_proj_cnn = tf.get_variable(
414 |                         "W_proj", [n_filters, projection_dim],
415 |                         initializer=tf.random_normal_initializer(
416 |                             mean=0.0, stddev=np.sqrt(1.0 / n_filters)),
417 |                         dtype=DTYPE)
418 |                     b_proj_cnn = tf.get_variable(
419 |                         "b_proj", [projection_dim],
420 |                         initializer=tf.constant_initializer(0.0),
421 |                         dtype=DTYPE)
422 | 
423 |         # apply highways layers
424 |         def high(x, ww_carry, bb_carry, ww_tr, bb_tr):
425 |             carry_gate = tf.nn.sigmoid(tf.matmul(x, ww_carry) + bb_carry)
426 |             transform_gate = tf.nn.relu(tf.matmul(x, ww_tr) + bb_tr)
427 |             return carry_gate * transform_gate + (1.0 - carry_gate) * x
428 | 
429 |         if use_highway:
430 |             highway_dim = n_filters
431 | 
432 |             for i in range(n_highway):
433 |                 with tf.variable_scope('CNN_high_%s' % i) as scope:
434 |                     W_carry = tf.get_variable(
435 |                         'W_carry', [highway_dim, highway_dim],
436 |                         # glorit init
437 |                         initializer=tf.random_normal_initializer(
438 |                             mean=0.0, stddev=np.sqrt(1.0 / highway_dim)),
439 |                         dtype=DTYPE)
440 |                     b_carry = tf.get_variable(
441 |                         'b_carry', [highway_dim],
442 |                         initializer=tf.constant_initializer(-2.0),
443 |                         dtype=DTYPE)
444 |                     W_transform = tf.get_variable(
445 |                         'W_transform', [highway_dim, highway_dim],
446 |                         initializer=tf.random_normal_initializer(
447 |                             mean=0.0, stddev=np.sqrt(1.0 / highway_dim)),
448 |                         dtype=DTYPE)
449 |                     b_transform = tf.get_variable(
450 |                         'b_transform', [highway_dim],
451 |                         initializer=tf.constant_initializer(0.0),
452 |                         dtype=DTYPE)
453 | 
454 |                 embedding = high(embedding, W_carry, b_carry,
455 |                                  W_transform, b_transform)
456 | 
457 |         # finally project down if needed
458 |         if use_proj:
459 |             embedding = tf.matmul(embedding, W_proj_cnn) + b_proj_cnn
460 | 
461 |         # reshape back to (batch_size, tokens, dim)
462 |         if use_highway or use_proj:
463 |             shp = tf.concat([batch_size_n_tokens, [projection_dim]], axis=0)
464 |             embedding = tf.reshape(embedding, shp)
465 | 
466 |         # at last assign attributes for remainder of the model
467 |         self.embedding = embedding
468 | 
469 | 
470 |     def _build_word_embeddings(self):
471 |         projection_dim = self.options['lstm']['projection_dim']
472 | 
473 |         # the word embeddings
474 |         with tf.device("/gpu:0"):
475 |             self.embedding_weights = tf.get_variable(
476 |                 "embedding", [self._n_tokens_vocab, projection_dim],
477 |                 dtype=DTYPE,
478 |             )
479 |             self.embedding = tf.nn.embedding_lookup(self.embedding_weights,
480 |                                                 self.ids_placeholder)
481 | 
482 | 
483 |     def _build_lstms(self):
484 |         # now the LSTMs
485 |         # these will collect the initial states for the forward
486 |         #   (and reverse LSTMs if we are doing bidirectional)
487 | 
488 |         # parse the options
489 |         lstm_dim = self.options['lstm']['dim']
490 |         projection_dim = self.options['lstm']['projection_dim']
491 |         n_lstm_layers = self.options['lstm'].get('n_layers', 1)
492 |         cell_clip = self.options['lstm'].get('cell_clip')
493 |         proj_clip = self.options['lstm'].get('proj_clip')
494 |         use_skip_connections = self.options['lstm']['use_skip_connections']
495 |         if use_skip_connections:
496 |             print("USING SKIP CONNECTIONS")
497 |         else:
498 |             print("NOT USING SKIP CONNECTIONS")
499 | 
500 |         # the sequence lengths from input mask
501 |         if self.use_character_inputs:
502 |             mask = tf.reduce_any(self.ids_placeholder > 0, axis=2)
503 |         else:
504 |             mask = self.ids_placeholder > 0
505 |         sequence_lengths = tf.reduce_sum(tf.cast(mask, tf.int32), axis=1)
506 |         batch_size = tf.shape(sequence_lengths)[0]
507 | 
508 |         # for each direction, we'll store tensors for each layer
509 |         self.lstm_outputs = {'forward': [], 'backward': []}
510 |         self.lstm_state_sizes = {'forward': [], 'backward': []}
511 |         self.lstm_init_states = {'forward': [], 'backward': []}
512 |         self.lstm_final_states = {'forward': [], 'backward': []}
513 | 
514 |         update_ops = []
515 |         for direction in ['forward', 'backward']:
516 |             if direction == 'forward':
517 |                 layer_input = self.embedding
518 |             else:
519 |                 layer_input = tf.reverse_sequence(
520 |                     self.embedding,
521 |                     sequence_lengths,
522 |                     seq_axis=1,
523 |                     batch_axis=0
524 |                 )
525 | 
526 |             for i in range(n_lstm_layers):
527 |                 if projection_dim < lstm_dim:
528 |                     # are projecting down output
529 |                     lstm_cell = tf.nn.rnn_cell.LSTMCell(
530 |                         lstm_dim, num_proj=projection_dim,
531 |                         cell_clip=cell_clip, proj_clip=proj_clip)
532 |                 else:
533 |                     lstm_cell = tf.nn.rnn_cell.LSTMCell(
534 |                             lstm_dim,
535 |                             cell_clip=cell_clip, proj_clip=proj_clip)
536 | 
537 |                 if use_skip_connections:
538 |                     # ResidualWrapper adds inputs to outputs
539 |                     if i == 0:
540 |                         # don't add skip connection from token embedding to
541 |                         # 1st layer output
542 |                         pass
543 |                     else:
544 |                         # add a skip connection
545 |                         lstm_cell = tf.nn.rnn_cell.ResidualWrapper(lstm_cell)
546 | 
547 |                 # collect the input state, run the dynamic rnn, collect
548 |                 # the output
549 |                 state_size = lstm_cell.state_size
550 |                 # the LSTMs are stateful.  To support multiple batch sizes,
551 |                 # we'll allocate size for states up to max_batch_size,
552 |                 # then use the first batch_size entries for each batch
553 |                 init_states = [
554 |                     tf.Variable(
555 |                         tf.zeros([self._max_batch_size, dim]),
556 |                         trainable=False
557 |                     )
558 |                     for dim in lstm_cell.state_size
559 |                 ]
560 |                 batch_init_states = [
561 |                     state[:batch_size, :] for state in init_states
562 |                 ]
563 | 
564 |                 if direction == 'forward':
565 |                     i_direction = 0
566 |                 else:
567 |                     i_direction = 1
568 |                 variable_scope_name = 'RNN_{0}/RNN/MultiRNNCell/Cell{1}'.format(
569 |                     i_direction, i)
570 |                 with tf.variable_scope(variable_scope_name):
571 |                     layer_output, final_state = tf.nn.dynamic_rnn(
572 |                         lstm_cell,
573 |                         layer_input,
574 |                         sequence_length=sequence_lengths,
575 |                         initial_state=tf.nn.rnn_cell.LSTMStateTuple(
576 |                             *batch_init_states),
577 |                     )
578 | 
579 |                 self.lstm_state_sizes[direction].append(lstm_cell.state_size)
580 |                 self.lstm_init_states[direction].append(init_states)
581 |                 self.lstm_final_states[direction].append(final_state)
582 |                 if direction == 'forward':
583 |                     self.lstm_outputs[direction].append(layer_output)
584 |                 else:
585 |                     self.lstm_outputs[direction].append(
586 |                         tf.reverse_sequence(
587 |                             layer_output,
588 |                             sequence_lengths,
589 |                             seq_axis=1,
590 |                             batch_axis=0
591 |                         )
592 |                     )
593 | 
594 |                 with tf.control_dependencies([layer_output]):
595 |                     # update the initial states
596 |                     for i in range(2):
597 |                         new_state = tf.concat(
598 |                             [final_state[i][:batch_size, :],
599 |                              init_states[i][batch_size:, :]], axis=0)
600 |                         state_update_op = tf.assign(init_states[i], new_state)
601 |                         update_ops.append(state_update_op)
602 |     
603 |                 layer_input = layer_output
604 | 
605 |         self.mask = mask
606 |         self.sequence_lengths = sequence_lengths
607 |         self.update_state_op = tf.group(*update_ops)
608 | 
609 | 
610 | def dump_token_embeddings(vocab_file, options_file, weight_file, outfile):
611 |     '''
612 |     Given an input vocabulary file, dump all the token embeddings to the
613 |     outfile.  The result can be used as the embedding_weight_file when
614 |     constructing a BidirectionalLanguageModel.
615 |     '''
616 |     with open(options_file, 'r') as fin:
617 |         options = json.load(fin)
618 |     max_word_length = options['char_cnn']['max_characters_per_token']
619 | 
620 |     vocab = UnicodeCharsVocabulary(vocab_file, max_word_length)
621 |     batcher = Batcher(vocab_file, max_word_length)
622 | 
623 |     ids_placeholder = tf.placeholder('int32',
624 |                                      shape=(None, None, max_word_length)
625 |     )
626 |     model = BidirectionalLanguageModel(options_file, weight_file)
627 |     embedding_op = model(ids_placeholder)['token_embeddings']
628 | 
629 |     n_tokens = vocab.size
630 |     embed_dim = int(embedding_op.shape[2])
631 | 
632 |     embeddings = np.zeros((n_tokens, embed_dim), dtype=DTYPE)
633 | 
634 |     config = tf.ConfigProto(allow_soft_placement=True)
635 |     with tf.Session(config=config) as sess:
636 |         sess.run(tf.global_variables_initializer())
637 |         for k in range(n_tokens):
638 |             token = vocab.id_to_word(k)
639 |             char_ids = batcher.batch_sentences([[token]])[0, 1, :].reshape(
640 |                 1, 1, -1)
641 |             embeddings[k, :] = sess.run(
642 |                 embedding_op, feed_dict={ids_placeholder: char_ids}
643 |             )
644 | 
645 |     with h5py.File(outfile, 'w') as fout:
646 |         ds = fout.create_dataset(
647 |             'embedding', embeddings.shape, dtype='float32', data=embeddings
648 |         )
649 | 
650 | 
651 | def dump_bilm_embeddings(vocab_file, sentences, options_file,
652 |                          weight_file):
653 |     with open(options_file, 'r') as fin:
654 |         options = json.load(fin)
655 |     max_word_length = options['char_cnn']['max_characters_per_token']
656 | 
657 |     batcher = Batcher(vocab_file, max_word_length)
658 | 
659 |     ids_placeholder = tf.placeholder('int32',
660 |                                      shape=(None, None, max_word_length)
661 |     )
662 |     model = BidirectionalLanguageModel(options_file, weight_file)
663 |     ops = model(ids_placeholder)
664 |     config = tf.ConfigProto(allow_soft_placement=True)
665 |     config.gpu_options.allow_growth = True
666 |     ret_map = {}
667 |     with tf.Session(config=config) as sess:
668 |         sess.run(tf.global_variables_initializer())
669 |         sentence_id = 0
670 |         for sentence in sentences:
671 |             tokens = sentence.strip().split()
672 |             char_ids = batcher.batch_sentences([tokens])
673 |             embeddings = sess.run(
674 |                 ops['lm_embeddings'], feed_dict={ids_placeholder: char_ids}
675 |             )
676 |             ret_map[sentence_id] = embeddings[0]
677 |             sentence_id += 1
678 |     return ret_map
679 | 
680 | 
681 | def dump_bilm_embeddings_inner(vocab_file, line, options_file,
682 |                                weight_file):
683 |     with open(options_file, 'r') as fin:
684 |         options = json.load(fin)
685 |     max_word_length = options['char_cnn']['max_characters_per_token']
686 | 
687 |     batcher = Batcher(vocab_file, max_word_length)
688 | 
689 |     ids_placeholder = tf.placeholder('int32',
690 |                                      shape=(None, None, max_word_length)
691 |     )
692 |     model = BidirectionalLanguageModel(options_file, weight_file)
693 |     ops = model(ids_placeholder)
694 |     config = tf.ConfigProto(allow_soft_placement=True)
695 |     with tf.Session(config=config) as sess:
696 |         sess.run(tf.global_variables_initializer())
697 |         sentence = line.strip().split()
698 |         char_ids = batcher.batch_sentences([sentence])
699 |         embeddings = sess.run(
700 |             ops['lm_embeddings'], feed_dict={ids_placeholder: char_ids}
701 |         )
702 |         return embeddings[0]
703 | 
704 | 
705 | def initialize_sess(vocab_file, options_file, weight_file):
706 |     with open(options_file, 'r') as fin:
707 |         options = json.load(fin)
708 |     max_word_length = options['char_cnn']['max_characters_per_token']
709 |     batcher = Batcher(vocab_file, max_word_length)
710 |     ids_placeholder = tf.placeholder('int32',
711 |                                      shape=(None, None, max_word_length)
712 |                                      )
713 |     model = BidirectionalLanguageModel(options_file, weight_file)
714 |     ops = model(ids_placeholder)
715 |     config = tf.ConfigProto(allow_soft_placement=True)
716 |     config.gpu_options.allow_growth = True
717 |     sess = tf.Session(config=config)
718 |     sess.run(tf.global_variables_initializer())
719 |     return batcher, ids_placeholder, ops, sess
720 | 


--------------------------------------------------------------------------------
/bilm-tf/setup.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/python
 2 | import setuptools
 3 | 
 4 | setuptools.setup(
 5 |     name='bilm',
 6 |     version='0.1',
 7 |     url='http://github.com/allenai/bilm-tf',
 8 |     packages=setuptools.find_packages(),
 9 |     tests_require=[],
10 |     zip_safe=False,
11 |     entry_points='',
12 | )
13 | 
14 | 


--------------------------------------------------------------------------------
/cache.py:
--------------------------------------------------------------------------------
  1 | import hashlib
  2 | import pickle
  3 | import sqlite3
  4 | import time
  5 | 
  6 | from flask import g
  7 | 
  8 | 
  9 | class ServerCache:
 10 | 
 11 |     CLEANUP_THRESHOLD = 10000
 12 | 
 13 |     def __init__(self):
 14 |         self.added_count = 0
 15 |         self.initialized = False
 16 | 
 17 |     @staticmethod
 18 |     def compute_sig(sentence):
 19 |         key_val = str(sentence.get_sent_str() + "|||" + sentence.get_mention_surface() + "|||" + sentence.inference_signature).encode('utf-8')
 20 |         return hashlib.sha224(key_val).hexdigest()
 21 | 
 22 |     @staticmethod
 23 |     def get_mem_db():
 24 |         if 'mem_db' not in g:
 25 |             g.mem_db = sqlite3.connect("./shared_cache.db")
 26 |         return g.mem_db
 27 | 
 28 |     def initialize_cache(self):
 29 |         db = ServerCache.get_mem_db()
 30 |         cursor = db.cursor()
 31 |         cursor.execute("DROP TABLE IF EXISTS memcache")
 32 |         cursor.execute("CREATE TABLE IF NOT EXISTS memcache (key TEXT PRIMARY KEY, value BLOB, time INTEGER)")
 33 |         db.commit()
 34 |         self.added_count = 0
 35 |         self.initialized = True
 36 | 
 37 |     def query_cache(self, sentence):
 38 |         if not self.initialized:
 39 |             self.initialize_cache()
 40 |         db = ServerCache.get_mem_db()
 41 |         cursor = db.cursor()
 42 |         key = ServerCache.compute_sig(sentence)
 43 |         cursor.execute("SELECT value FROM memcache WHERE key=?", [key])
 44 |         data = cursor.fetchone()
 45 |         if data is None:
 46 |             return None
 47 |         else:
 48 |             result_binary = data[0]
 49 |             return pickle.loads(result_binary)
 50 | 
 51 |     def insert_cache(self, sentence):
 52 |         if not self.initialized:
 53 |             self.initialize_cache()
 54 |         db = ServerCache.get_mem_db()
 55 |         cursor = db.cursor()
 56 |         key = ServerCache.compute_sig(sentence)
 57 |         current_timestamp = int(time.time())
 58 |         data = pickle.dumps(sentence)
 59 |         cursor.execute("INSERT INTO memcache VALUES (?, ?, ?)", [key, data, current_timestamp])
 60 |         db.commit()
 61 |         self.added_count += 1
 62 |         if self.added_count > self.CLEANUP_THRESHOLD:
 63 |             self.initialize_cache()
 64 | 
 65 | 
 66 | class SurfaceCache:
 67 |     def __init__(self, cache_file, server_mode=True):
 68 |         self.cache_file = cache_file
 69 |         self.server_mode = server_mode
 70 |         if not self.server_mode:
 71 |             self.surface_db = sqlite3.connect(self.cache_file)
 72 | 
 73 |     def get_surface_db(self):
 74 |         if self.server_mode:
 75 |             if 'surface_db' not in g:
 76 |                 g.surface_db = sqlite3.connect(self.cache_file)
 77 |             return g.surface_db
 78 |         else:
 79 |             return self.surface_db
 80 | 
 81 |     def initialize_cache(self):
 82 |         db = self.get_surface_db()
 83 |         cursor = db.cursor()
 84 |         cursor.execute("CREATE TABLE IF NOT EXISTS cache (surface TEXT PRIMARY KEY, types BLOB)")
 85 |         db.commit()
 86 | 
 87 |     def query_cache(self, surface, limit=5):
 88 |         self.initialize_cache()
 89 |         surface = str(surface).lower()
 90 |         db = self.get_surface_db()
 91 |         cursor = db.cursor()
 92 |         cursor.execute("SELECT types FROM cache WHERE surface=?", [surface])
 93 |         data = cursor.fetchone()
 94 |         if data is None:
 95 |             return None
 96 |         else:
 97 |             result_binary = data[0]
 98 |             cache_dict = sorted((pickle.loads(result_binary)).items(), key=lambda x: x[1], reverse=True)
 99 |             ret = []
100 |             for i in range(0, min(limit, len(cache_dict))):
101 |                 ret.append(cache_dict[i][0])
102 |             return ret
103 | 
104 |     def insert_cache(self, sentence):
105 |         self.initialize_cache()
106 |         surface = sentence.get_mention_surface().lower()
107 |         db = self.get_surface_db()
108 |         cursor = db.cursor()
109 |         cursor.execute("SELECT types FROM cache WHERE surface=?", [surface])
110 |         data = cursor.fetchone()
111 |         if data is None:
112 |             to_insert_cache = {}
113 |             for t in sentence.predicted_types:
114 |                 to_insert_cache[t] = 1
115 |             cursor.execute("INSERT INTO cache VALUES (?, ?)", [surface, pickle.dumps(to_insert_cache)])
116 |             db.commit()
117 |         else:
118 |             previous_cache = pickle.loads(data[0])
119 |             for t in sentence.predicted_types:
120 |                 if t in previous_cache:
121 |                     previous_cache[t] += 1
122 |                 else:
123 |                     previous_cache[t] = 1
124 |             cursor.execute("UPDATE cache SET types=? WHERE surface=?", [pickle.dumps(previous_cache), surface])
125 |             db.commit()
126 | 


--------------------------------------------------------------------------------
/frontend/constants/constants.js:
--------------------------------------------------------------------------------
1 | const SERVER_API = "http://127.0.0.1:5000/";


--------------------------------------------------------------------------------
/frontend/index_content.html:
--------------------------------------------------------------------------------
  1 | <!doctype html>
  2 | <html lang="en">
  3 | <head>
  4 |     <!-- Required meta tags -->
  5 |     <meta charset="utf-8">
  6 |     <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
  7 | 
  8 |     <!-- Bootstrap CSS -->
  9 |     <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/css/bootstrap.min.css"
 10 |           integrity="sha384-MCw98/SFnGE8fJT3GXwEOngsV7Zt27NXFoaoApmYm81iuXoPkFOJwJ8ERdknLPMO" crossorigin="anonymous">
 11 |     <link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.3.1/css/all.css" integrity="sha384-mzrmE5qonljUremFsqc01SB46JvROS7bZs3IO2EmfFsd15uHvIt+Y8vEf7N7fWAU" crossorigin="anonymous">
 12 | 
 13 |     <title>Zoe Online Demo</title>
 14 |     <style>
 15 |         .hiddenRow {
 16 |             padding: 0 !important;
 17 |         }
 18 |     </style>
 19 |     <style>
 20 |     .hidden-surface-col  {
 21 |         display: none;
 22 |     }
 23 |     </style>
 24 | </head>
 25 | <body>
 26 | <!-- Main container -->
 27 | <div class="container">
 28 |     <div class="row mt-3">
 29 |         <div class="col col-8">
 30 |             <div class="row mt-1 mb-3">
 31 |                 <span class="input-group-text">Taxonomy</span>
 32 |             </div>
 33 |             <div class="form-check form-check-inline">
 34 |                 <input class="form-check-input" type="radio" name="taxonomy" id="preset-taxonomy-select"
 35 |                        value="preset_taxonomy"
 36 |                        checked>
 37 |                 <label class="form-check-label" for="preset-taxonomy-select">
 38 |                     Preset
 39 |                 </label>
 40 |                 <div class="ml-1 mr-3">
 41 |                     <button class="btn btn-xs btn-primary" type="button" data-toggle="collapse"
 42 |                             data-target="#preset-help-content" aria-expanded="false"
 43 |                             aria-controls="preset-help-content">
 44 |                         <i class="fas fa-question"></i>
 45 |                     </button>
 46 |                 </div>
 47 |                 <select class="form-control" id="preset-taxonomy-select-value">
 48 |                     <option>figer</option>
 49 |                     <option>bbn</option>
 50 |                     <option>ontonotes</option>
 51 |                 </select>
 52 |             </div>
 53 |             <div class="row mt-3 collapse" id="preset-help-content">
 54 |                 <div class="alert alert-info">
 55 |                     <i class="fas fa-info-circle"></i>
 56 |                     Use this option when you want to type with one of the three preset taxonomies.
 57 |                     Select from the drop-down for the desired taxonomy.
 58 |                 </div>
 59 |             </div>
 60 |             <div class="form-check mt-2">
 61 |                 <input class="form-check-input" type="radio" name="taxonomy" id="custom-taxonomy-select"
 62 |                        value="custom_taxonomy">
 63 |                 <label class="form-check-label" for="custom-taxonomy-select">
 64 |                     Define Your Own
 65 |                 </label>
 66 |                 <small>
 67 |                     <button class="btn btn-xs btn-primary" type="button" data-toggle="collapse"
 68 |                             data-target="#custom-help-content" aria-expanded="false"
 69 |                             aria-controls="custom-help-content">
 70 |                         <i class="fas fa-question"></i>
 71 |                     </button>
 72 |                 </small>
 73 |             </div>
 74 |             <div class="row mt-3 collapse" id="custom-help-content">
 75 |                 <div class="alert alert-info">
 76 |                     <i class="fas fa-info-circle"></i>
 77 |                     Select this option if you want to define your own taxonomy.
 78 |                     To make things easier, instead of letting you choose from FreeBase types,
 79 |                     we ask you to find a few examples that belongs to the custom type you want.
 80 |                     Enter a valid Wikipedia URL in the left input box, and the custom type name in the right input box.
 81 |                     The more examples you find, in theory the more precise the mapping is.
 82 |                 </div>
 83 |             </div>
 84 |         <div id="custom-taxonomy-rule-input" style="display: none">
 85 |         </div>
 86 |         <div id="custom-taxonomy-rule-input-example" style="display:none">
 87 |             <div class="form-row align-items-center">
 88 |                 <div class="col-6">
 89 |                     <input type="text" class="form-control mb-2" id="wikipage-input" placeholder="Wikipedia Page Link">
 90 |                 </div>
 91 |                 <div class="col-3">
 92 |                     <div class="input-group mb-2">
 93 |                         <input type="text" class="form-control" id="type-input" placeholder="Type">
 94 |                     </div>
 95 |                 </div>
 96 |                 <div class="col-auto">
 97 |                     <button type="submit" class="btn btn-primary mb-2" onclick="generateFormRow();">
 98 |                         <i class="fas fa-plus"></i>
 99 |                     </button>
100 |                 </div>
101 |             </div>
102 |         </div>
103 |         </div>
104 |     </div>
105 |     <div class="row mt-3">
106 |         <div class="input-group">
107 |             <div class="input-group-prepend">
108 |                 <span class="input-group-text">Sentence</span>
109 |                 <button class="btn btn-xs btn-primary" type="button" data-toggle="collapse"
110 |                         data-target="#sentence-help-content" aria-expanded="false"
111 |                         aria-controls="sentence-help-content">
112 |                     <i class="fas fa-question"></i>
113 |                 </button>
114 |             </div>
115 |             <input type="text" class="form-control" id="sentence-input">
116 |             <div class="dropdown">
117 |                 <button class="btn btn-outline-secondary dropdown-toggle" type="button" id="dropdownMenuButton"
118 |                         data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
119 |                     Examples
120 |                 </button>
121 |                 <div class="dropdown-menu" aria-labelledby="dropdownMenuButton">
122 |                     <a class="dropdown-item" onclick="useExample(1);">Iced Earth’s...</a>
123 |                     <a class="dropdown-item" onclick="useExample(2);">James was...</a>
124 |                     <a class="dropdown-item" onclick="useExample(3);">A helicopter...</a>
125 |                     <a class="dropdown-item" onclick="useExample(4);">On Saturday...</a>
126 |                     <a class="dropdown-item" onclick="useExample(5);">Jane was...</a>
127 |                 </div>
128 |             </div>
129 |             <div class="input-group-append">
130 |                 <button class="btn btn-secondary" type="button" onclick="generateTokens()">Run!
131 |                 </button>
132 |             </div>
133 |         </div>
134 |     </div>
135 | 
136 |     <div class="row mt-3 collapse" id="sentence-help-content">
137 |         <div class="alert alert-info">
138 |             <i class="fas fa-info-circle"></i>
139 |             Enter a sentence here. Click "Parse!" for the next step.
140 |         </div>
141 |     </div>
142 | 
143 |     <div class="row mt-3">
144 |         <div class="col-sm-9" id="token-display"></div>
145 |         <div class="col-sm-3" id="annotate-button"></div>
146 |     </div>
147 | 
148 |     <div class="row mt-3 collapse" id="mention-help-content">
149 |         <div class="alert alert-info">
150 |             <i class="fas fa-info-circle"></i>
151 |             Click on the tokens that constitutes the mention you want to type.
152 |             Note that overlap or consecutive mention selections are not supported.
153 |             You can always un-select and re-select.
154 |             Hit "Annotate" when you have finished your selection.
155 |         </div>
156 |     </div>
157 | 
158 |     <div class="row mt-3">
159 |         <div class="col col-8">
160 |             <div id="prediction-display"></div>
161 |         </div>
162 |     </div>
163 | 
164 |     <!-- Info section for parameter storage -->
165 |     <div>
166 |         <span id="cur-mention-start" style="display: none">-1</span>
167 |         <span id="cur-mention-end" style="display: none">-1</span>
168 |         <span id="total-token-num" style="display: none">-1</span>
169 |         <span id="using-preset-example" style="display: none">-1</span>
170 |     </div>
171 | 
172 | 
173 | </div>
174 | 
175 | 
176 | <!-- Optional JavaScript -->
177 | <!-- jQuery first, then Popper.js, then Bootstrap JS -->
178 | <script src="https://code.jquery.com/jquery-3.3.1.slim.min.js"
179 |         integrity="sha384-q8i/X+965DzO0rT7abK41JStQIAqVgRVzpbzo5smXKp4YfRvH+8abtTE1Pi6jizo"
180 |         crossorigin="anonymous"></script>
181 | <script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.3/umd/popper.min.js"
182 |         integrity="sha384-ZMP7rVo3mIykV+2+9J3UJ46jBk0WLaUAdn689aCwoqbBJiSnjAK/l8WvCWPIPm49"
183 |         crossorigin="anonymous"></script>
184 | <script src="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/js/bootstrap.min.js"
185 |         integrity="sha384-ChfqqxuZUCnJSK3+MXmPNIyE6ZbWh2IMqE241rYiqJxyMiZ6OW/JmZQ5stwEULTy"
186 |         crossorigin="anonymous"></script>
187 | <script src="https://unpkg.com/axios/dist/axios.min.js"></script>
188 | <script src="./constants/constants.js"></script>
189 | <script>
190 |     function generateTokens() {
191 |         document.getElementById("token-display").innerHTML = "";
192 |         document.getElementById("cur-mention-start").innerText = String(-1);
193 |         document.getElementById("cur-mention-end").innerText = String(-1);
194 |         document.getElementById("total-token-num").innerText = String(-1);
195 |         document.getElementById("prediction-display").innerHTML = "";
196 |         var sentence = document.getElementById("sentence-input").value;
197 |         if (sentence.length == 0) {
198 |             alert("You must enter a sentence to proceed.");
199 |             return;
200 |         }
201 |         let xhr = new XMLHttpRequest();
202 |         xhr.open("POST", SERVER_API + "annotate_token", true);
203 |         xhr.setRequestHeader("Content-Type", "application/json");
204 |         xhr.onreadystatechange = function () {
205 |             if (xhr.readyState === XMLHttpRequest.DONE && xhr.status === 200) {
206 |                 var json = JSON.parse(xhr.responseText);
207 |                 continueGenerateTokens(json);
208 |             }
209 |         };
210 |         var data = JSON.stringify({
211 |             sentence: sentence,
212 |         });
213 |         xhr.send(data);
214 |     }
215 | 
216 |     function continueGenerateTokens(result) {
217 |         var tokens = result["tokens"];
218 |         document.getElementById("total-token-num").innerText = String(tokens.length);
219 |         for (var i = 0; i < tokens.length; i++) {
220 |             var curToken = tokens[i];
221 |             document.getElementById("token-display").innerHTML +=
222 |                 "<button type=\"button\" class=\"btn btn-outline-primary\" id=\"token-"
223 |                 + i
224 |                 + "\" onclick=\"processButton()\">"
225 |                 + curToken
226 |                 + "</button>"
227 |         }
228 |         for (var j = 0; j < tokens.length; j++) {
229 |             var button = document.getElementById("token-" + j);
230 |             button.onclick = function() {
231 |                 processButton(this, true);
232 |             };
233 |         }
234 |         document.getElementById("annotate-button").innerHTML =
235 |             "<button type=\"button\" class=\"btn btn-success\" id=\"annotate-button-actual\">Annotate</button>";
236 |         document.getElementById("annotate-button").innerHTML +=
237 |             "<button class=\"btn btn-xs btn-primary\" type=\"button\" data-toggle=\"collapse\" data-target=\"#mention-help-content\" aria-expanded=\"false\" aria-controls=\"mention-help-content\">\n" +
238 |             "<i class=\"fas fa-question\"></i>\n" +
239 |             "</button>";
240 |         var annotate_button = document.getElementById("annotate-button-actual");
241 |         annotate_button.onclick = generateAnnotation;
242 | 
243 |         var example_id = Number(document.getElementById("using-preset-example").innerText);
244 |         if (example_id != -1) {
245 |             var preset_mentions = getExampleSentenceMention(example_id);
246 |             var m;
247 |             for (m = 0; m < preset_mentions.length; m++) {
248 |                 var n;
249 |                 for (n = preset_mentions[m][0]; n < preset_mentions[m][1]; n++) {
250 |                     processButton(document.getElementById("token-" + String(n)), false);
251 |                 }
252 |             }
253 |             generateAnnotation();
254 |         }
255 |         else {
256 |             generatePresetMentions();
257 |         }
258 |         document.getElementById("using-preset-example").innerText = String(-1);
259 |     }
260 | 
261 |     function getTokens() {
262 |         var parent_div = document.getElementById("token-display");
263 |         var i;
264 |         var tokens = [];
265 |         for (i = 0; i < parent_div.children.length; i++) {
266 |             tokens.push(parent_div.children[i].innerHTML);
267 |         }
268 |         return tokens;
269 |     }
270 | 
271 |     function generatePresetMentions() {
272 |         var sentence = document.getElementById("sentence-input").value;
273 |         var xhr = new XMLHttpRequest();
274 |         xhr.open("POST", SERVER_API + "annotate_mention", true);
275 |         xhr.setRequestHeader("Content-Type", "application/json");
276 |         xhr.onreadystatechange = function () {
277 |             if (xhr.readyState === 4 && xhr.status === 200) {
278 |                 var json = JSON.parse(xhr.responseText);
279 |                 updatePresetResult(json);
280 |             }
281 |         };
282 |         var data = JSON.stringify({
283 |             tokens: getTokens(),
284 |         });
285 |         xhr.send(data);
286 |     }
287 | 
288 |     function updatePresetResult(result) {
289 |         var spans = result['mention_spans'];
290 |         var i;
291 |         var prev_end = -1;
292 |         for (i = 0; i < spans.length; i++) {
293 |             var span = spans[i];
294 |             var start = Number(span[0]);
295 |             var end = Number(span[1]);
296 |             if (start == prev_end) {
297 |                 continue;
298 |             }
299 |             var j;
300 |             for (j = start; j < end; j++) {
301 |                 processButton(document.getElementById("token-" + String(j)), false);
302 |             }
303 |             prev_end = end;
304 |         }
305 |         generateAnnotation();
306 |     }
307 | 
308 |     function processButton(but, manual) {
309 |         if (but.classList.contains("btn-outline-primary")) {
310 |             processButtonSelect(but);
311 |         } else {
312 |             processButtonUnselect(but);
313 |         }
314 |         if (manual == true) {
315 |             document.getElementById("annotate-button").style.visibility = "visible";
316 |         } else {
317 |             document.getElementById("annotate-button").style.visibility = "hidden";
318 |         }
319 |     }
320 | 
321 |     function processButtonUnselect(but) {
322 |         var idx = Number(but.id.split("-")[1]);
323 |         var token_size = Number(document.getElementById("total-token-num").innerText);
324 |         if (idx > 0 && idx < token_size - 1) {
325 |             if (document.getElementById("token-" + String(idx - 1)).classList.contains("btn-primary") &&
326 |                 document.getElementById("token-" + String(idx + 1)).classList.contains("btn-primary")) {
327 |                 alert("You must unselect out-most tokens first.");
328 |                 return;
329 |             }
330 |         }
331 |         but.classList.remove("btn-primary");
332 |         but.classList.add("btn-outline-primary");
333 |     }
334 | 
335 |     function processButtonSelect(but) {
336 |         if (but.classList.contains("btn-outline-primary")){
337 |             but.classList.remove("btn-outline-primary");
338 |             but.classList.add("btn-primary");
339 |         }
340 |     }
341 | 
342 |     function genWikiSpanHelper(title) {
343 |         return "<a target=\"_blank\" href=\"http://en.wikipedia.org/wiki/"
344 |             + title
345 |             + "\"><span class=\"badge badge-primary mt-1\">" + title + "</span></a>";
346 |     }
347 | 
348 |     function updateResult(result) {
349 |         var predictions = result['type'];
350 |         var candidates = result['candidates'];
351 |         var other_predictions = result['other_possible_type'];
352 |         var index = result["index"];
353 |         var selected_candidate_info = result["selected_candidates"][0].split("-");
354 |         var cand_str = "<div class=\"alert alert-success mt-1\" role=\"alert\">";
355 |         if (selected_candidate_info[0] == "SURF") {
356 |             cand_str += "A key concept was selected via surface form: " + genWikiSpanHelper(selected_candidate_info[1]);
357 |             cand_str += "<br>";
358 |         }
359 |         else {
360 |             cand_str += "A most likely match was selected via context consistency: " + genWikiSpanHelper(selected_candidate_info[1]);
361 |             cand_str += "<br>";
362 |         }
363 |         cand_str += "All concepts that were used for inference: ";
364 |         for (var j = 0; j < candidates[0].length; j++) {
365 |             var candidate = candidates[0][j];
366 |             cand_str += genWikiSpanHelper(candidate);
367 |         }
368 |         document.getElementById("row-hidden-" + String(index)).innerHTML = cand_str + "</div>";
369 |         var toggle_button =
370 |             "<button class='btn btn-primary' type='button' data-toggle='collapse' data-target='#row-hidden-"
371 |             + String(index)
372 |             + "'>Show more</button>";
373 |         document.getElementById("row-button-" + String(index)).innerHTML = toggle_button;
374 |         document.getElementById("row-computed-" + String(index)).innerHTML =
375 |             "<div class=\"alert alert-info\" role=\"alert\">" +
376 |             String(predictions[0]) +
377 |             "</div>";
378 |         document.getElementById("row-computed-" + String(index)).innerHTML +=
379 |             "Could also be: " + String(other_predictions);
380 |     }
381 | 
382 |     function updateCachedResult(result) {
383 |         var types = result["type"];
384 |         var index = result["index"];
385 |         document.getElementById("row-simple-" + String(index)).innerHTML = String(types);
386 |     }
387 | 
388 |     function updateVecResult(result) {
389 |         var types = result["type"];
390 |         var index = result["index"];
391 |         document.getElementById("row-vec-" + String(index)).innerHTML = String(types);
392 |     }
393 | 
394 |     function getInferenceMode() {
395 |         if (document.getElementById("preset-taxonomy-select").checked) {
396 |             return document.getElementById("preset-taxonomy-select-value").value;
397 |         }
398 |         else {
399 |             return "custom";
400 |         }
401 |     }
402 | 
403 |     function getCustomInferenceMappings() {
404 |         var parent_div = document.getElementById("custom-taxonomy-rule-input");
405 |         var i;
406 |         var ret = [];
407 |         if (getInferenceMode() != "custom") {
408 |             return ret;
409 |         }
410 |         for (i = 0; i < parent_div.children.length; i++) {
411 |             var cur_id = parent_div.children[i].id;
412 |             var cur_page_group = document.getElementById(cur_id + "-wikipage-input").value.split("/");
413 |             var cur_page = cur_page_group[cur_page_group.length - 1];
414 |             var cur_type = document.getElementById(cur_id + "-type-input").value;
415 |             ret.push(cur_page + "|||" + cur_type);
416 |         }
417 |         return ret;
418 |     }
419 | 
420 |     function generateAnnotation() {
421 |         var sentence = document.getElementById("sentence-input").value;
422 |         var mention_starts = [];
423 |         var mention_ends = [];
424 |         var tokens = [];
425 |         var parent_div = document.getElementById("token-display");
426 |         var i;
427 |         for (i = 0; i < parent_div.children.length; i++) {
428 |             var cur_id = parent_div.children[i].id;
429 |             var idx = Number(cur_id.split("-")[1]);
430 |             tokens.push(parent_div.children[i].innerHTML);
431 |             if (parent_div.children[i].classList.contains("btn-primary")) {
432 |                 if (idx == 0) {
433 |                     mention_starts.push(idx);
434 |                 }
435 |                 if (idx > 0 && document.getElementById("token-" + String(idx - 1)).classList.contains("btn-outline-primary")){
436 |                     mention_starts.push(idx);
437 |                 }
438 |                 if (idx == parent_div.children.length - 1) {
439 |                     mention_ends.push(idx + 1);
440 |                 }
441 |                 if (idx < parent_div.children.length - 1) {
442 |                     if (document.getElementById("token-" + String(idx + 1)).classList.contains("btn-outline-primary")){
443 |                         mention_ends.push(idx + 1);
444 |                     }
445 |                 }
446 |             }
447 |         }
448 |         if (mention_starts.length != mention_ends.length) {
449 |             alert("Parsing Error!");
450 |             return;
451 |         }
452 |         if (mention_starts.length == 0) {
453 |             alert("You must select a mention to proceed.");
454 |             return;
455 |         }
456 |         var loading_sign = "<img width='10%' src='./loading_icon.gif'>Loading...";
457 |         var mention_surfaces = [];
458 |         for (i = 0; i < mention_starts.length; i++) {
459 |             var start_span = mention_starts[i];
460 |             var end_span = mention_ends[i];
461 |             var combined = "";
462 |             var j;
463 |             for (j = start_span; j < end_span; j++) {
464 |                 combined += tokens[j] + " ";
465 |             }
466 |             mention_surfaces.push(combined);
467 |         }
468 |         console.log(mention_surfaces);
469 |         var table = "<table class=\"table\">\n" +
470 |             "<thead>\n" +
471 |             "<tr>\n" +
472 |             "<th scope=\"col\">#</th>\n" +
473 |             "<th scope=\"col\">Mention</th>\n" +
474 |             "<th scope=\"col\">Cached types on surface</th>\n" +
475 |             "<th scope=\"col\" class=\"hidden-surface-col\">Surface embedding (baseline)</th>\n" +
476 |             "<th scope=\"col\">Zoe Output</th>\n" +
477 |             "<th scope=\"col\">Why?</th>\n" +
478 |             "</tr>\n" +
479 |             "</thead>\n" +
480 |             "<tbody>";
481 |         for (i = 0; i < mention_surfaces.length; i++) {
482 |             table += "<tr><th score='row'>" + String(i) + "</th>"
483 |                 + "<td>" + mention_surfaces[i] + "</td>"
484 |                 + "<td id='row-simple-" + String(i) + "'>" + loading_sign + "</td>"
485 |                 + "<td class='hidden-surface-col' id='row-vec-" + String(i) + "'>" + loading_sign + "</td>"
486 |                 + "<td id='row-computed-" + String(i) + "'>" + loading_sign + "</td>"
487 |                 + "<td id='row-button-" + String(i) + "'>" + loading_sign + "</td></tr>"
488 |                 + "<tr>\n" +
489 |                 "<td colspan='6' class='hiddenRow'>\n" +
490 |                 "<div class='collapse' id='row-hidden-" + String(i) + "'></div>" +
491 |                 "</div>\n" +
492 |                 "</td>\n" +
493 |                 "</tr>"
494 |         }
495 |         table += "</tbody></table>";
496 |         document.getElementById("prediction-display").innerHTML =
497 |             "<div class=\"alert alert-success\" role=\"alert\">" +
498 |             "The demo gave the following outputs (Use ▲ token section above to modify mention spans):" +
499 |             "</div>" +
500 |             "<button type=\"button\" class=\"btn btn-info\" id='show-hidden-surface-col-button'>Show surface embedding baseline</button>" +
501 |             table;
502 |         document.getElementById('show-hidden-surface-col-button').onclick = setHiddenColumnVisible;
503 |         for (let i = 0; i < mention_surfaces.length; i++) {
504 |             let xhr_vec = new XMLHttpRequest();
505 |             xhr_vec.open("POST", SERVER_API + "annotate_vec", true);
506 |             xhr_vec.setRequestHeader("Content-Type", "application/json");
507 |             xhr_vec.onreadystatechange = function () {
508 |                 if (xhr_vec.readyState === XMLHttpRequest.DONE && xhr_vec.status === 200) {
509 |                     var json = JSON.parse(xhr_vec.responseText);
510 |                     updateVecResult(json);
511 |                 }
512 |             };
513 |             var data_vec = JSON.stringify({
514 |                 index: i,
515 |                 tokens: getTokens(),
516 |                 mention_starts: [mention_starts[i]],
517 |                 mention_ends: [mention_ends[i]],
518 |             });
519 |             xhr_vec.send(data_vec);
520 | 
521 |             let xhr_simple = new XMLHttpRequest();
522 |             xhr_simple.open("POST", SERVER_API + "annotate_cache", true);
523 |             xhr_simple.setRequestHeader("Content-Type", "application/json");
524 |             xhr_simple.onreadystatechange = function () {
525 |                 if (xhr_simple.readyState === XMLHttpRequest.DONE && xhr_simple.status === 200) {
526 |                     var json = JSON.parse(xhr_simple.responseText);
527 |                     updateCachedResult(json);
528 |                 }
529 |             };
530 |             var data_simple = JSON.stringify({
531 |                 index: i,
532 |                 tokens: getTokens(),
533 |                 mention_starts: [mention_starts[i]],
534 |                 mention_ends: [mention_ends[i]],
535 |             });
536 |             xhr_simple.send(data_simple);
537 | 
538 |             let xhr = new XMLHttpRequest();
539 |             xhr.open("POST", SERVER_API + "annotate", true);
540 |             xhr.setRequestHeader("Content-Type", "application/json");
541 |             xhr.onreadystatechange = function () {
542 |                 if (xhr.readyState === XMLHttpRequest.DONE && xhr.status === 200) {
543 |                     var json = JSON.parse(xhr.responseText);
544 |                     updateResult(json);
545 |                 }
546 |             };
547 |             var data = JSON.stringify({
548 |                 index: i,
549 |                 tokens: getTokens(),
550 |                 mention_starts: [mention_starts[i]],
551 |                 mention_ends: [mention_ends[i]],
552 |                 mode: getInferenceMode(),
553 |                 taxonomy: getCustomInferenceMappings(),
554 |             });
555 |             xhr.send(data);
556 |         }
557 |     }
558 | 
559 |     function setHiddenColumnVisible() {
560 |         var hid_class_name = "hidden-surface-col";
561 |         var elems = document.querySelectorAll(".hidden-surface-col");
562 |         [].forEach.call(elems, function(el) {
563 |             el.classList.remove(hid_class_name);
564 |         });
565 |         document.getElementById("show-hidden-surface-col-button").classList.add(hid_class_name);
566 |     }
567 | </script>
568 | <script>
569 |     function handle_custom_taxonomy_select() {
570 |         document.getElementById("custom-taxonomy-rule-input").style = "";
571 |         document.getElementById("custom-taxonomy-rule-input-example").style = "";
572 |     }
573 |     function handle_preset_taxonomy_select() {
574 |         document.getElementById("custom-taxonomy-rule-input").style = "display:none";
575 |         document.getElementById("custom-taxonomy-rule-input-example").style = "display:none";
576 |     }
577 |     document.getElementById("custom-taxonomy-select").onclick = handle_custom_taxonomy_select;
578 |     document.getElementById("preset-taxonomy-select").onclick = handle_preset_taxonomy_select;
579 | 
580 |     function makeid() {
581 |         var text = "";
582 |         var possible = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";
583 | 
584 |         for (var i = 0; i < 5; i++)
585 |             text += possible.charAt(Math.floor(Math.random() * possible.length));
586 | 
587 |         return text;
588 |     }
589 | 
590 |     function delete_taxonomy_rule(obj) {
591 |         var div_id = obj.id.split("-")[0];
592 |         var to_remove = document.getElementById(div_id);
593 |         to_remove.parentNode.removeChild(to_remove);
594 |     }
595 | 
596 |     function generateFormRow(){
597 |         var id = makeid();
598 |         var page_value = document.getElementById("wikipage-input").value;
599 |         var type_value = document.getElementById("type-input").value;
600 |         if (!page_value.includes("http") || !page_value.includes("wikipedia") || !page_value.includes("/")) {
601 |             alert("You must enter a valid Wikipedia link.");
602 |             return;
603 |         }
604 |         if (type_value.length == 0) {
605 |             alert("You must enter a type");
606 |             return;
607 |         }
608 |         if (type_value[0] != '/') {
609 |             alert("A type must start with /.");
610 |             return;
611 |         }
612 |         var text = "                <div class=\"form-row align-items-center\" id=\"" + id + "\">\n" +
613 |             "                    <div class=\"col-6\">\n" +
614 |             "                        <input type=\"text\" class=\"form-control mb-2\" id=\"" + id + "-wikipage-input\" value=\"" + page_value + "\">\n" +
615 |             "                    </div>\n" +
616 |             "                    <div class=\"col-3\">\n" +
617 |             "                        <div class=\"input-group mb-2\">\n" +
618 |             "                            <input type=\"text\" class=\"form-control\" id=\"" + id + "-type-input\" value=\"" + type_value + "\">\n" +
619 |             "                        </div>\n" +
620 |             "                    </div>\n" +
621 |             "                    <div class=\"col-auto\">\n" +
622 |             "                        <button type=\"submit\" class=\"btn btn-primary mb-2\" id=\"" + id + "-submit-button\">\n" +
623 |             "                            <i class=\"fas fa-minus\"></i>\n" +
624 |             "                        </button>\n" +
625 |             "                    </div>\n" +
626 |             "                </div>"
627 |         document.getElementById("custom-taxonomy-rule-input").innerHTML += text;
628 |         var parent_div = document.getElementById("custom-taxonomy-rule-input");
629 |         var i;
630 |         for (i = 0; i < parent_div.children.length; i++) {
631 |             var cur_id = parent_div.children[i].id;
632 |             document.getElementById(cur_id + "-submit-button").addEventListener('click', function(){
633 |                 delete_taxonomy_rule(this);
634 |             });
635 |         }
636 |         document.getElementById("wikipage-input").value = "";
637 |         document.getElementById("type-input").value = "";
638 |     }
639 | 
640 | </script>
641 | <script>
642 | 
643 |     function getExampleSentence(id) {
644 |         if (id == 1) {
645 |             return "Iced Earth ’s musical style is influenced by many traditional heavy metal groups such as Black Sabbath .";
646 |         }
647 |         if (id == 2) {
648 |             return "James was selected by his hometown team , the Cleveland Cavaliers , as the first overall pick of the 2003 NBA draft .";
649 |         }
650 |         if (id == 3) {
651 |             return "A helicopter which was carrying the owner of Leicester City FC has crashed in flames near the club's ground. ";
652 |         }
653 |         if (id == 4) {
654 |             return "On Saturday night, the authorities were still looking for Bowers, and had searched his apartment with dogs.";
655 |         }
656 |         if (id == 5) {
657 |             return "Jane was infected with an unknown illness, so she took the bus and went to the capital for better treatment.";
658 |         }
659 |     }
660 | 
661 |     function getExampleSentenceMention(id) {
662 |         if (id == 1) {
663 |             return [[0, 2], [11, 13], [16, 18]];
664 |         }
665 |         if (id == 2) {
666 |             return [[0, 1], [5, 7], [9, 11], [20, 21]];
667 |         }
668 |         if (id == 3) {
669 |             return [[0, 2], [6, 7], [8, 11], [14, 15], [19, 20]];
670 |         }
671 |         if (id == 4) {
672 |             return [[1, 2], [5, 6], [10, 11], [15, 17], [18, 19]];
673 |         }
674 |         if (id == 5) {
675 |             return [[0, 1], [4, 7], [9, 10], [12, 13], [17, 18], [19, 21]];
676 |         }
677 |     }
678 | 
679 |     function useExample(id) {
680 |         var sentence = getExampleSentence(id);
681 |         document.getElementById("sentence-input").value = sentence;
682 |         document.getElementById("using-preset-example").innerText = String(id);
683 |     }
684 | 
685 | </script>
686 | </body>
687 | </html>


--------------------------------------------------------------------------------
/frontend/info.js:
--------------------------------------------------------------------------------
 1 | var demoname = "Zoe Demo";
 2 | var demoexplanation = "This is an online demo of our recent paper Zero-Shot Open Entity Typing as Type-Compatible Grounding. Please use the question buttons when you are looking for instructions. If none of them solves your problem, please create an issue on our Github repo.";
 3 | var citations = {
 4 | 	"http://cogcomp.org/page/publication_view/845" : "Zero-Shot Open Entity Typing as Type-Compatible Grounding",
 5 | };
 6 | var contact = "xzhou45@illinois.edu";
 7 | 
 8 | function initial_load() {
 9 | 	document.getElementById("demo-name").innerHTML = demoname;
10 | 	document.getElementById("demo-explanation").innerHTML = "<p>" + demoexplanation + "</p>";
11 | 	if (citations.length != 0) {
12 | 		citation_content = "If you wish to cite this work, please cite the following publication(s):";
13 | 		var cid = 1;
14 | 		for (var key in citations) {
15 | 			citation_content +=
16 | 				"<span ng-repeat='pub in data.publications' class='ng-scope'>" +
17 |                 "<i class='ng-binding'>(" + cid.toString() + ")" +
18 |                 "<a href='" + key + "' class='ng-binding'>" + citations[key] + "</a></i><span ng-if='!$last' class='ng-scope'>,</span>" +
19 |               	"</span>";
20 |             cid ++;
21 | 		}
22 | 		document.getElementById("demo-citations").innerHTML = citation_content;
23 | 		document.getElementById("demo-contact").href = "mailto:" + contact;
24 | 		document.getElementById("demo-contact").innerHTML = contact;
25 | 	}
26 | }
27 | 
28 | initial_load();


--------------------------------------------------------------------------------
/frontend/js/bootstrap.min.js:
--------------------------------------------------------------------------------
1 | /*!
2 |  * Bootstrap v3.1.1 (http://getbootstrap.com)
3 |  * Copyright 2011-2014 Twitter, Inc.
4 |  * Licensed under MIT (https://github.com/twbs/bootstrap/blob/master/LICENSE)
5 |  */
6 | if("undefined"==typeof jQuery)throw new Error("Bootstrap's JavaScript requires jQuery");+function(a){"use strict";function b(){var a=document.createElement("bootstrap"),b={WebkitTransition:"webkitTransitionEnd",MozTransition:"transitionend",OTransition:"oTransitionEnd otransitionend",transition:"transitionend"};for(var c in b)if(void 0!==a.style[c])return{end:b[c]};return!1}a.fn.emulateTransitionEnd=function(b){var c=!1,d=this;a(this).one(a.support.transition.end,function(){c=!0});var e=function(){c||a(d).trigger(a.support.transition.end)};return setTimeout(e,b),this},a(function(){a.support.transition=b()})}(jQuery),+function(a){"use strict";var b='[data-dismiss="alert"]',c=function(c){a(c).on("click",b,this.close)};c.prototype.close=function(b){function c(){f.trigger("closed.bs.alert").remove()}var d=a(this),e=d.attr("data-target");e||(e=d.attr("href"),e=e&&e.replace(/.*(?=#[^\s]*$)/,""));var f=a(e);b&&b.preventDefault(),f.length||(f=d.hasClass("alert")?d:d.parent()),f.trigger(b=a.Event("close.bs.alert")),b.isDefaultPrevented()||(f.removeClass("in"),a.support.transition&&f.hasClass("fade")?f.one(a.support.transition.end,c).emulateTransitionEnd(150):c())};var d=a.fn.alert;a.fn.alert=function(b){return this.each(function(){var d=a(this),e=d.data("bs.alert");e||d.data("bs.alert",e=new c(this)),"string"==typeof b&&e[b].call(d)})},a.fn.alert.Constructor=c,a.fn.alert.noConflict=function(){return a.fn.alert=d,this},a(document).on("click.bs.alert.data-api",b,c.prototype.close)}(jQuery),+function(a){"use strict";var b=function(c,d){this.$element=a(c),this.options=a.extend({},b.DEFAULTS,d),this.isLoading=!1};b.DEFAULTS={loadingText:"loading..."},b.prototype.setState=function(b){var c="disabled",d=this.$element,e=d.is("input")?"val":"html",f=d.data();b+="Text",f.resetText||d.data("resetText",d[e]()),d[e](f[b]||this.options[b]),setTimeout(a.proxy(function(){"loadingText"==b?(this.isLoading=!0,d.addClass(c).attr(c,c)):this.isLoading&&(this.isLoading=!1,d.removeClass(c).removeAttr(c))},this),0)},b.prototype.toggle=function(){var a=!0,b=this.$element.closest('[data-toggle="buttons"]');if(b.length){var c=this.$element.find("input");"radio"==c.prop("type")&&(c.prop("checked")&&this.$element.hasClass("active")?a=!1:b.find(".active").removeClass("active")),a&&c.prop("checked",!this.$element.hasClass("active")).trigger("change")}a&&this.$element.toggleClass("active")};var c=a.fn.button;a.fn.button=function(c){return this.each(function(){var d=a(this),e=d.data("bs.button"),f="object"==typeof c&&c;e||d.data("bs.button",e=new b(this,f)),"toggle"==c?e.toggle():c&&e.setState(c)})},a.fn.button.Constructor=b,a.fn.button.noConflict=function(){return a.fn.button=c,this},a(document).on("click.bs.button.data-api","[data-toggle^=button]",function(b){var c=a(b.target);c.hasClass("btn")||(c=c.closest(".btn")),c.button("toggle"),b.preventDefault()})}(jQuery),+function(a){"use strict";var b=function(b,c){this.$element=a(b),this.$indicators=this.$element.find(".carousel-indicators"),this.options=c,this.paused=this.sliding=this.interval=this.$active=this.$items=null,"hover"==this.options.pause&&this.$element.on("mouseenter",a.proxy(this.pause,this)).on("mouseleave",a.proxy(this.cycle,this))};b.DEFAULTS={interval:5e3,pause:"hover",wrap:!0},b.prototype.cycle=function(b){return b||(this.paused=!1),this.interval&&clearInterval(this.interval),this.options.interval&&!this.paused&&(this.interval=setInterval(a.proxy(this.next,this),this.options.interval)),this},b.prototype.getActiveIndex=function(){return this.$active=this.$element.find(".item.active"),this.$items=this.$active.parent().children(),this.$items.index(this.$active)},b.prototype.to=function(b){var c=this,d=this.getActiveIndex();return b>this.$items.length-1||0>b?void 0:this.sliding?this.$element.one("slid.bs.carousel",function(){c.to(b)}):d==b?this.pause().cycle():this.slide(b>d?"next":"prev",a(this.$items[b]))},b.prototype.pause=function(b){return b||(this.paused=!0),this.$element.find(".next, .prev").length&&a.support.transition&&(this.$element.trigger(a.support.transition.end),this.cycle(!0)),this.interval=clearInterval(this.interval),this},b.prototype.next=function(){return this.sliding?void 0:this.slide("next")},b.prototype.prev=function(){return this.sliding?void 0:this.slide("prev")},b.prototype.slide=function(b,c){var d=this.$element.find(".item.active"),e=c||d[b](),f=this.interval,g="next"==b?"left":"right",h="next"==b?"first":"last",i=this;if(!e.length){if(!this.options.wrap)return;e=this.$element.find(".item")[h]()}if(e.hasClass("active"))return this.sliding=!1;var j=a.Event("slide.bs.carousel",{relatedTarget:e[0],direction:g});return this.$element.trigger(j),j.isDefaultPrevented()?void 0:(this.sliding=!0,f&&this.pause(),this.$indicators.length&&(this.$indicators.find(".active").removeClass("active"),this.$element.one("slid.bs.carousel",function(){var b=a(i.$indicators.children()[i.getActiveIndex()]);b&&b.addClass("active")})),a.support.transition&&this.$element.hasClass("slide")?(e.addClass(b),e[0].offsetWidth,d.addClass(g),e.addClass(g),d.one(a.support.transition.end,function(){e.removeClass([b,g].join(" ")).addClass("active"),d.removeClass(["active",g].join(" ")),i.sliding=!1,setTimeout(function(){i.$element.trigger("slid.bs.carousel")},0)}).emulateTransitionEnd(1e3*d.css("transition-duration").slice(0,-1))):(d.removeClass("active"),e.addClass("active"),this.sliding=!1,this.$element.trigger("slid.bs.carousel")),f&&this.cycle(),this)};var c=a.fn.carousel;a.fn.carousel=function(c){return this.each(function(){var d=a(this),e=d.data("bs.carousel"),f=a.extend({},b.DEFAULTS,d.data(),"object"==typeof c&&c),g="string"==typeof c?c:f.slide;e||d.data("bs.carousel",e=new b(this,f)),"number"==typeof c?e.to(c):g?e[g]():f.interval&&e.pause().cycle()})},a.fn.carousel.Constructor=b,a.fn.carousel.noConflict=function(){return a.fn.carousel=c,this},a(document).on("click.bs.carousel.data-api","[data-slide], [data-slide-to]",function(b){var c,d=a(this),e=a(d.attr("data-target")||(c=d.attr("href"))&&c.replace(/.*(?=#[^\s]+$)/,"")),f=a.extend({},e.data(),d.data()),g=d.attr("data-slide-to");g&&(f.interval=!1),e.carousel(f),(g=d.attr("data-slide-to"))&&e.data("bs.carousel").to(g),b.preventDefault()}),a(window).on("load",function(){a('[data-ride="carousel"]').each(function(){var b=a(this);b.carousel(b.data())})})}(jQuery),+function(a){"use strict";var b=function(c,d){this.$element=a(c),this.options=a.extend({},b.DEFAULTS,d),this.transitioning=null,this.options.parent&&(this.$parent=a(this.options.parent)),this.options.toggle&&this.toggle()};b.DEFAULTS={toggle:!0},b.prototype.dimension=function(){var a=this.$element.hasClass("width");return a?"width":"height"},b.prototype.show=function(){if(!this.transitioning&&!this.$element.hasClass("in")){var b=a.Event("show.bs.collapse");if(this.$element.trigger(b),!b.isDefaultPrevented()){var c=this.$parent&&this.$parent.find("> .panel > .in");if(c&&c.length){var d=c.data("bs.collapse");if(d&&d.transitioning)return;c.collapse("hide"),d||c.data("bs.collapse",null)}var e=this.dimension();this.$element.removeClass("collapse").addClass("collapsing")[e](0),this.transitioning=1;var f=function(){this.$element.removeClass("collapsing").addClass("collapse in")[e]("auto"),this.transitioning=0,this.$element.trigger("shown.bs.collapse")};if(!a.support.transition)return f.call(this);var g=a.camelCase(["scroll",e].join("-"));this.$element.one(a.support.transition.end,a.proxy(f,this)).emulateTransitionEnd(350)[e](this.$element[0][g])}}},b.prototype.hide=function(){if(!this.transitioning&&this.$element.hasClass("in")){var b=a.Event("hide.bs.collapse");if(this.$element.trigger(b),!b.isDefaultPrevented()){var c=this.dimension();this.$element[c](this.$element[c]())[0].offsetHeight,this.$element.addClass("collapsing").removeClass("collapse").removeClass("in"),this.transitioning=1;var d=function(){this.transitioning=0,this.$element.trigger("hidden.bs.collapse").removeClass("collapsing").addClass("collapse")};return a.support.transition?void this.$element[c](0).one(a.support.transition.end,a.proxy(d,this)).emulateTransitionEnd(350):d.call(this)}}},b.prototype.toggle=function(){this[this.$element.hasClass("in")?"hide":"show"]()};var c=a.fn.collapse;a.fn.collapse=function(c){return this.each(function(){var d=a(this),e=d.data("bs.collapse"),f=a.extend({},b.DEFAULTS,d.data(),"object"==typeof c&&c);!e&&f.toggle&&"show"==c&&(c=!c),e||d.data("bs.collapse",e=new b(this,f)),"string"==typeof c&&e[c]()})},a.fn.collapse.Constructor=b,a.fn.collapse.noConflict=function(){return a.fn.collapse=c,this},a(document).on("click.bs.collapse.data-api","[data-toggle=collapse]",function(b){var c,d=a(this),e=d.attr("data-target")||b.preventDefault()||(c=d.attr("href"))&&c.replace(/.*(?=#[^\s]+$)/,""),f=a(e),g=f.data("bs.collapse"),h=g?"toggle":d.data(),i=d.attr("data-parent"),j=i&&a(i);g&&g.transitioning||(j&&j.find('[data-toggle=collapse][data-parent="'+i+'"]').not(d).addClass("collapsed"),d[f.hasClass("in")?"addClass":"removeClass"]("collapsed")),f.collapse(h)})}(jQuery),+function(a){"use strict";function b(b){a(d).remove(),a(e).each(function(){var d=c(a(this)),e={relatedTarget:this};d.hasClass("open")&&(d.trigger(b=a.Event("hide.bs.dropdown",e)),b.isDefaultPrevented()||d.removeClass("open").trigger("hidden.bs.dropdown",e))})}function c(b){var c=b.attr("data-target");c||(c=b.attr("href"),c=c&&/#[A-Za-z]/.test(c)&&c.replace(/.*(?=#[^\s]*$)/,""));var d=c&&a(c);return d&&d.length?d:b.parent()}var d=".dropdown-backdrop",e="[data-toggle=dropdown]",f=function(b){a(b).on("click.bs.dropdown",this.toggle)};f.prototype.toggle=function(d){var e=a(this);if(!e.is(".disabled, :disabled")){var f=c(e),g=f.hasClass("open");if(b(),!g){"ontouchstart"in document.documentElement&&!f.closest(".navbar-nav").length&&a('<div class="dropdown-backdrop"/>').insertAfter(a(this)).on("click",b);var h={relatedTarget:this};if(f.trigger(d=a.Event("show.bs.dropdown",h)),d.isDefaultPrevented())return;f.toggleClass("open").trigger("shown.bs.dropdown",h),e.focus()}return!1}},f.prototype.keydown=function(b){if(/(38|40|27)/.test(b.keyCode)){var d=a(this);if(b.preventDefault(),b.stopPropagation(),!d.is(".disabled, :disabled")){var f=c(d),g=f.hasClass("open");if(!g||g&&27==b.keyCode)return 27==b.which&&f.find(e).focus(),d.click();var h=" li:not(.divider):visible a",i=f.find("[role=menu]"+h+", [role=listbox]"+h);if(i.length){var j=i.index(i.filter(":focus"));38==b.keyCode&&j>0&&j--,40==b.keyCode&&j<i.length-1&&j++,~j||(j=0),i.eq(j).focus()}}}};var g=a.fn.dropdown;a.fn.dropdown=function(b){return this.each(function(){var c=a(this),d=c.data("bs.dropdown");d||c.data("bs.dropdown",d=new f(this)),"string"==typeof b&&d[b].call(c)})},a.fn.dropdown.Constructor=f,a.fn.dropdown.noConflict=function(){return a.fn.dropdown=g,this},a(document).on("click.bs.dropdown.data-api",b).on("click.bs.dropdown.data-api",".dropdown form",function(a){a.stopPropagation()}).on("click.bs.dropdown.data-api",e,f.prototype.toggle).on("keydown.bs.dropdown.data-api",e+", [role=menu], [role=listbox]",f.prototype.keydown)}(jQuery),+function(a){"use strict";var b=function(b,c){this.options=c,this.$element=a(b),this.$backdrop=this.isShown=null,this.options.remote&&this.$element.find(".modal-content").load(this.options.remote,a.proxy(function(){this.$element.trigger("loaded.bs.modal")},this))};b.DEFAULTS={backdrop:!0,keyboard:!0,show:!0},b.prototype.toggle=function(a){return this[this.isShown?"hide":"show"](a)},b.prototype.show=function(b){var c=this,d=a.Event("show.bs.modal",{relatedTarget:b});this.$element.trigger(d),this.isShown||d.isDefaultPrevented()||(this.isShown=!0,this.escape(),this.$element.on("click.dismiss.bs.modal",'[data-dismiss="modal"]',a.proxy(this.hide,this)),this.backdrop(function(){var d=a.support.transition&&c.$element.hasClass("fade");c.$element.parent().length||c.$element.appendTo(document.body),c.$element.show().scrollTop(0),d&&c.$element[0].offsetWidth,c.$element.addClass("in").attr("aria-hidden",!1),c.enforceFocus();var e=a.Event("shown.bs.modal",{relatedTarget:b});d?c.$element.find(".modal-dialog").one(a.support.transition.end,function(){c.$element.focus().trigger(e)}).emulateTransitionEnd(300):c.$element.focus().trigger(e)}))},b.prototype.hide=function(b){b&&b.preventDefault(),b=a.Event("hide.bs.modal"),this.$element.trigger(b),this.isShown&&!b.isDefaultPrevented()&&(this.isShown=!1,this.escape(),a(document).off("focusin.bs.modal"),this.$element.removeClass("in").attr("aria-hidden",!0).off("click.dismiss.bs.modal"),a.support.transition&&this.$element.hasClass("fade")?this.$element.one(a.support.transition.end,a.proxy(this.hideModal,this)).emulateTransitionEnd(300):this.hideModal())},b.prototype.enforceFocus=function(){a(document).off("focusin.bs.modal").on("focusin.bs.modal",a.proxy(function(a){this.$element[0]===a.target||this.$element.has(a.target).length||this.$element.focus()},this))},b.prototype.escape=function(){this.isShown&&this.options.keyboard?this.$element.on("keyup.dismiss.bs.modal",a.proxy(function(a){27==a.which&&this.hide()},this)):this.isShown||this.$element.off("keyup.dismiss.bs.modal")},b.prototype.hideModal=function(){var a=this;this.$element.hide(),this.backdrop(function(){a.removeBackdrop(),a.$element.trigger("hidden.bs.modal")})},b.prototype.removeBackdrop=function(){this.$backdrop&&this.$backdrop.remove(),this.$backdrop=null},b.prototype.backdrop=function(b){var c=this.$element.hasClass("fade")?"fade":"";if(this.isShown&&this.options.backdrop){var d=a.support.transition&&c;if(this.$backdrop=a('<div class="modal-backdrop '+c+'" />').appendTo(document.body),this.$element.on("click.dismiss.bs.modal",a.proxy(function(a){a.target===a.currentTarget&&("static"==this.options.backdrop?this.$element[0].focus.call(this.$element[0]):this.hide.call(this))},this)),d&&this.$backdrop[0].offsetWidth,this.$backdrop.addClass("in"),!b)return;d?this.$backdrop.one(a.support.transition.end,b).emulateTransitionEnd(150):b()}else!this.isShown&&this.$backdrop?(this.$backdrop.removeClass("in"),a.support.transition&&this.$element.hasClass("fade")?this.$backdrop.one(a.support.transition.end,b).emulateTransitionEnd(150):b()):b&&b()};var c=a.fn.modal;a.fn.modal=function(c,d){return this.each(function(){var e=a(this),f=e.data("bs.modal"),g=a.extend({},b.DEFAULTS,e.data(),"object"==typeof c&&c);f||e.data("bs.modal",f=new b(this,g)),"string"==typeof c?f[c](d):g.show&&f.show(d)})},a.fn.modal.Constructor=b,a.fn.modal.noConflict=function(){return a.fn.modal=c,this},a(document).on("click.bs.modal.data-api",'[data-toggle="modal"]',function(b){var c=a(this),d=c.attr("href"),e=a(c.attr("data-target")||d&&d.replace(/.*(?=#[^\s]+$)/,"")),f=e.data("bs.modal")?"toggle":a.extend({remote:!/#/.test(d)&&d},e.data(),c.data());c.is("a")&&b.preventDefault(),e.modal(f,this).one("hide",function(){c.is(":visible")&&c.focus()})}),a(document).on("show.bs.modal",".modal",function(){a(document.body).addClass("modal-open")}).on("hidden.bs.modal",".modal",function(){a(document.body).removeClass("modal-open")})}(jQuery),+function(a){"use strict";var b=function(a,b){this.type=this.options=this.enabled=this.timeout=this.hoverState=this.$element=null,this.init("tooltip",a,b)};b.DEFAULTS={animation:!0,placement:"top",selector:!1,template:'<div class="tooltip"><div class="tooltip-arrow"></div><div class="tooltip-inner"></div></div>',trigger:"hover focus",title:"",delay:0,html:!1,container:!1},b.prototype.init=function(b,c,d){this.enabled=!0,this.type=b,this.$element=a(c),this.options=this.getOptions(d);for(var e=this.options.trigger.split(" "),f=e.length;f--;){var g=e[f];if("click"==g)this.$element.on("click."+this.type,this.options.selector,a.proxy(this.toggle,this));else if("manual"!=g){var h="hover"==g?"mouseenter":"focusin",i="hover"==g?"mouseleave":"focusout";this.$element.on(h+"."+this.type,this.options.selector,a.proxy(this.enter,this)),this.$element.on(i+"."+this.type,this.options.selector,a.proxy(this.leave,this))}}this.options.selector?this._options=a.extend({},this.options,{trigger:"manual",selector:""}):this.fixTitle()},b.prototype.getDefaults=function(){return b.DEFAULTS},b.prototype.getOptions=function(b){return b=a.extend({},this.getDefaults(),this.$element.data(),b),b.delay&&"number"==typeof b.delay&&(b.delay={show:b.delay,hide:b.delay}),b},b.prototype.getDelegateOptions=function(){var b={},c=this.getDefaults();return this._options&&a.each(this._options,function(a,d){c[a]!=d&&(b[a]=d)}),b},b.prototype.enter=function(b){var c=b instanceof this.constructor?b:a(b.currentTarget)[this.type](this.getDelegateOptions()).data("bs."+this.type);return clearTimeout(c.timeout),c.hoverState="in",c.options.delay&&c.options.delay.show?void(c.timeout=setTimeout(function(){"in"==c.hoverState&&c.show()},c.options.delay.show)):c.show()},b.prototype.leave=function(b){var c=b instanceof this.constructor?b:a(b.currentTarget)[this.type](this.getDelegateOptions()).data("bs."+this.type);return clearTimeout(c.timeout),c.hoverState="out",c.options.delay&&c.options.delay.hide?void(c.timeout=setTimeout(function(){"out"==c.hoverState&&c.hide()},c.options.delay.hide)):c.hide()},b.prototype.show=function(){var b=a.Event("show.bs."+this.type);if(this.hasContent()&&this.enabled){if(this.$element.trigger(b),b.isDefaultPrevented())return;var c=this,d=this.tip();this.setContent(),this.options.animation&&d.addClass("fade");var e="function"==typeof this.options.placement?this.options.placement.call(this,d[0],this.$element[0]):this.options.placement,f=/\s?auto?\s?/i,g=f.test(e);g&&(e=e.replace(f,"")||"top"),d.detach().css({top:0,left:0,display:"block"}).addClass(e),this.options.container?d.appendTo(this.options.container):d.insertAfter(this.$element);var h=this.getPosition(),i=d[0].offsetWidth,j=d[0].offsetHeight;if(g){var k=this.$element.parent(),l=e,m=document.documentElement.scrollTop||document.body.scrollTop,n="body"==this.options.container?window.innerWidth:k.outerWidth(),o="body"==this.options.container?window.innerHeight:k.outerHeight(),p="body"==this.options.container?0:k.offset().left;e="bottom"==e&&h.top+h.height+j-m>o?"top":"top"==e&&h.top-m-j<0?"bottom":"right"==e&&h.right+i>n?"left":"left"==e&&h.left-i<p?"right":e,d.removeClass(l).addClass(e)}var q=this.getCalculatedOffset(e,h,i,j);this.applyPlacement(q,e),this.hoverState=null;var r=function(){c.$element.trigger("shown.bs."+c.type)};a.support.transition&&this.$tip.hasClass("fade")?d.one(a.support.transition.end,r).emulateTransitionEnd(150):r()}},b.prototype.applyPlacement=function(b,c){var d,e=this.tip(),f=e[0].offsetWidth,g=e[0].offsetHeight,h=parseInt(e.css("margin-top"),10),i=parseInt(e.css("margin-left"),10);isNaN(h)&&(h=0),isNaN(i)&&(i=0),b.top=b.top+h,b.left=b.left+i,a.offset.setOffset(e[0],a.extend({using:function(a){e.css({top:Math.round(a.top),left:Math.round(a.left)})}},b),0),e.addClass("in");var j=e[0].offsetWidth,k=e[0].offsetHeight;if("top"==c&&k!=g&&(d=!0,b.top=b.top+g-k),/bottom|top/.test(c)){var l=0;b.left<0&&(l=-2*b.left,b.left=0,e.offset(b),j=e[0].offsetWidth,k=e[0].offsetHeight),this.replaceArrow(l-f+j,j,"left")}else this.replaceArrow(k-g,k,"top");d&&e.offset(b)},b.prototype.replaceArrow=function(a,b,c){this.arrow().css(c,a?50*(1-a/b)+"%":"")},b.prototype.setContent=function(){var a=this.tip(),b=this.getTitle();a.find(".tooltip-inner")[this.options.html?"html":"text"](b),a.removeClass("fade in top bottom left right")},b.prototype.hide=function(){function b(){"in"!=c.hoverState&&d.detach(),c.$element.trigger("hidden.bs."+c.type)}var c=this,d=this.tip(),e=a.Event("hide.bs."+this.type);return this.$element.trigger(e),e.isDefaultPrevented()?void 0:(d.removeClass("in"),a.support.transition&&this.$tip.hasClass("fade")?d.one(a.support.transition.end,b).emulateTransitionEnd(150):b(),this.hoverState=null,this)},b.prototype.fixTitle=function(){var a=this.$element;(a.attr("title")||"string"!=typeof a.attr("data-original-title"))&&a.attr("data-original-title",a.attr("title")||"").attr("title","")},b.prototype.hasContent=function(){return this.getTitle()},b.prototype.getPosition=function(){var b=this.$element[0];return a.extend({},"function"==typeof b.getBoundingClientRect?b.getBoundingClientRect():{width:b.offsetWidth,height:b.offsetHeight},this.$element.offset())},b.prototype.getCalculatedOffset=function(a,b,c,d){return"bottom"==a?{top:b.top+b.height,left:b.left+b.width/2-c/2}:"top"==a?{top:b.top-d,left:b.left+b.width/2-c/2}:"left"==a?{top:b.top+b.height/2-d/2,left:b.left-c}:{top:b.top+b.height/2-d/2,left:b.left+b.width}},b.prototype.getTitle=function(){var a,b=this.$element,c=this.options;return a=b.attr("data-original-title")||("function"==typeof c.title?c.title.call(b[0]):c.title)},b.prototype.tip=function(){return this.$tip=this.$tip||a(this.options.template)},b.prototype.arrow=function(){return this.$arrow=this.$arrow||this.tip().find(".tooltip-arrow")},b.prototype.validate=function(){this.$element[0].parentNode||(this.hide(),this.$element=null,this.options=null)},b.prototype.enable=function(){this.enabled=!0},b.prototype.disable=function(){this.enabled=!1},b.prototype.toggleEnabled=function(){this.enabled=!this.enabled},b.prototype.toggle=function(b){var c=b?a(b.currentTarget)[this.type](this.getDelegateOptions()).data("bs."+this.type):this;c.tip().hasClass("in")?c.leave(c):c.enter(c)},b.prototype.destroy=function(){clearTimeout(this.timeout),this.hide().$element.off("."+this.type).removeData("bs."+this.type)};var c=a.fn.tooltip;a.fn.tooltip=function(c){return this.each(function(){var d=a(this),e=d.data("bs.tooltip"),f="object"==typeof c&&c;(e||"destroy"!=c)&&(e||d.data("bs.tooltip",e=new b(this,f)),"string"==typeof c&&e[c]())})},a.fn.tooltip.Constructor=b,a.fn.tooltip.noConflict=function(){return a.fn.tooltip=c,this}}(jQuery),+function(a){"use strict";var b=function(a,b){this.init("popover",a,b)};if(!a.fn.tooltip)throw new Error("Popover requires tooltip.js");b.DEFAULTS=a.extend({},a.fn.tooltip.Constructor.DEFAULTS,{placement:"right",trigger:"click",content:"",template:'<div class="popover"><div class="arrow"></div><h3 class="popover-title"></h3><div class="popover-content"></div></div>'}),b.prototype=a.extend({},a.fn.tooltip.Constructor.prototype),b.prototype.constructor=b,b.prototype.getDefaults=function(){return b.DEFAULTS},b.prototype.setContent=function(){var a=this.tip(),b=this.getTitle(),c=this.getContent();a.find(".popover-title")[this.options.html?"html":"text"](b),a.find(".popover-content")[this.options.html?"string"==typeof c?"html":"append":"text"](c),a.removeClass("fade top bottom left right in"),a.find(".popover-title").html()||a.find(".popover-title").hide()},b.prototype.hasContent=function(){return this.getTitle()||this.getContent()},b.prototype.getContent=function(){var a=this.$element,b=this.options;return a.attr("data-content")||("function"==typeof b.content?b.content.call(a[0]):b.content)},b.prototype.arrow=function(){return this.$arrow=this.$arrow||this.tip().find(".arrow")},b.prototype.tip=function(){return this.$tip||(this.$tip=a(this.options.template)),this.$tip};var c=a.fn.popover;a.fn.popover=function(c){return this.each(function(){var d=a(this),e=d.data("bs.popover"),f="object"==typeof c&&c;(e||"destroy"!=c)&&(e||d.data("bs.popover",e=new b(this,f)),"string"==typeof c&&e[c]())})},a.fn.popover.Constructor=b,a.fn.popover.noConflict=function(){return a.fn.popover=c,this}}(jQuery),+function(a){"use strict";function b(c,d){var e,f=a.proxy(this.process,this);this.$element=a(a(c).is("body")?window:c),this.$body=a("body"),this.$scrollElement=this.$element.on("scroll.bs.scroll-spy.data-api",f),this.options=a.extend({},b.DEFAULTS,d),this.selector=(this.options.target||(e=a(c).attr("href"))&&e.replace(/.*(?=#[^\s]+$)/,"")||"")+" .nav li > a",this.offsets=a([]),this.targets=a([]),this.activeTarget=null,this.refresh(),this.process()}b.DEFAULTS={offset:10},b.prototype.refresh=function(){var b=this.$element[0]==window?"offset":"position";this.offsets=a([]),this.targets=a([]);{var c=this;this.$body.find(this.selector).map(function(){var d=a(this),e=d.data("target")||d.attr("href"),f=/^#./.test(e)&&a(e);return f&&f.length&&f.is(":visible")&&[[f[b]().top+(!a.isWindow(c.$scrollElement.get(0))&&c.$scrollElement.scrollTop()),e]]||null}).sort(function(a,b){return a[0]-b[0]}).each(function(){c.offsets.push(this[0]),c.targets.push(this[1])})}},b.prototype.process=function(){var a,b=this.$scrollElement.scrollTop()+this.options.offset,c=this.$scrollElement[0].scrollHeight||this.$body[0].scrollHeight,d=c-this.$scrollElement.height(),e=this.offsets,f=this.targets,g=this.activeTarget;if(b>=d)return g!=(a=f.last()[0])&&this.activate(a);if(g&&b<=e[0])return g!=(a=f[0])&&this.activate(a);for(a=e.length;a--;)g!=f[a]&&b>=e[a]&&(!e[a+1]||b<=e[a+1])&&this.activate(f[a])},b.prototype.activate=function(b){this.activeTarget=b,a(this.selector).parentsUntil(this.options.target,".active").removeClass("active");var c=this.selector+'[data-target="'+b+'"],'+this.selector+'[href="'+b+'"]',d=a(c).parents("li").addClass("active");d.parent(".dropdown-menu").length&&(d=d.closest("li.dropdown").addClass("active")),d.trigger("activate.bs.scrollspy")};var c=a.fn.scrollspy;a.fn.scrollspy=function(c){return this.each(function(){var d=a(this),e=d.data("bs.scrollspy"),f="object"==typeof c&&c;e||d.data("bs.scrollspy",e=new b(this,f)),"string"==typeof c&&e[c]()})},a.fn.scrollspy.Constructor=b,a.fn.scrollspy.noConflict=function(){return a.fn.scrollspy=c,this},a(window).on("load",function(){a('[data-spy="scroll"]').each(function(){var b=a(this);b.scrollspy(b.data())})})}(jQuery),+function(a){"use strict";var b=function(b){this.element=a(b)};b.prototype.show=function(){var b=this.element,c=b.closest("ul:not(.dropdown-menu)"),d=b.data("target");if(d||(d=b.attr("href"),d=d&&d.replace(/.*(?=#[^\s]*$)/,"")),!b.parent("li").hasClass("active")){var e=c.find(".active:last a")[0],f=a.Event("show.bs.tab",{relatedTarget:e});if(b.trigger(f),!f.isDefaultPrevented()){var g=a(d);this.activate(b.parent("li"),c),this.activate(g,g.parent(),function(){b.trigger({type:"shown.bs.tab",relatedTarget:e})})}}},b.prototype.activate=function(b,c,d){function e(){f.removeClass("active").find("> .dropdown-menu > .active").removeClass("active"),b.addClass("active"),g?(b[0].offsetWidth,b.addClass("in")):b.removeClass("fade"),b.parent(".dropdown-menu")&&b.closest("li.dropdown").addClass("active"),d&&d()}var f=c.find("> .active"),g=d&&a.support.transition&&f.hasClass("fade");g?f.one(a.support.transition.end,e).emulateTransitionEnd(150):e(),f.removeClass("in")};var c=a.fn.tab;a.fn.tab=function(c){return this.each(function(){var d=a(this),e=d.data("bs.tab");e||d.data("bs.tab",e=new b(this)),"string"==typeof c&&e[c]()})},a.fn.tab.Constructor=b,a.fn.tab.noConflict=function(){return a.fn.tab=c,this},a(document).on("click.bs.tab.data-api",'[data-toggle="tab"], [data-toggle="pill"]',function(b){b.preventDefault(),a(this).tab("show")})}(jQuery),+function(a){"use strict";var b=function(c,d){this.options=a.extend({},b.DEFAULTS,d),this.$window=a(window).on("scroll.bs.affix.data-api",a.proxy(this.checkPosition,this)).on("click.bs.affix.data-api",a.proxy(this.checkPositionWithEventLoop,this)),this.$element=a(c),this.affixed=this.unpin=this.pinnedOffset=null,this.checkPosition()};b.RESET="affix affix-top affix-bottom",b.DEFAULTS={offset:0},b.prototype.getPinnedOffset=function(){if(this.pinnedOffset)return this.pinnedOffset;this.$element.removeClass(b.RESET).addClass("affix");var a=this.$window.scrollTop(),c=this.$element.offset();return this.pinnedOffset=c.top-a},b.prototype.checkPositionWithEventLoop=function(){setTimeout(a.proxy(this.checkPosition,this),1)},b.prototype.checkPosition=function(){if(this.$element.is(":visible")){var c=a(document).height(),d=this.$window.scrollTop(),e=this.$element.offset(),f=this.options.offset,g=f.top,h=f.bottom;"top"==this.affixed&&(e.top+=d),"object"!=typeof f&&(h=g=f),"function"==typeof g&&(g=f.top(this.$element)),"function"==typeof h&&(h=f.bottom(this.$element));var i=null!=this.unpin&&d+this.unpin<=e.top?!1:null!=h&&e.top+this.$element.height()>=c-h?"bottom":null!=g&&g>=d?"top":!1;if(this.affixed!==i){this.unpin&&this.$element.css("top","");var j="affix"+(i?"-"+i:""),k=a.Event(j+".bs.affix");this.$element.trigger(k),k.isDefaultPrevented()||(this.affixed=i,this.unpin="bottom"==i?this.getPinnedOffset():null,this.$element.removeClass(b.RESET).addClass(j).trigger(a.Event(j.replace("affix","affixed"))),"bottom"==i&&this.$element.offset({top:c-h-this.$element.height()}))}}};var c=a.fn.affix;a.fn.affix=function(c){return this.each(function(){var d=a(this),e=d.data("bs.affix"),f="object"==typeof c&&c;e||d.data("bs.affix",e=new b(this,f)),"string"==typeof c&&e[c]()})},a.fn.affix.Constructor=b,a.fn.affix.noConflict=function(){return a.fn.affix=c,this},a(window).on("load",function(){a('[data-spy="affix"]').each(function(){var b=a(this),c=b.data();c.offset=c.offset||{},c.offsetBottom&&(c.offset.bottom=c.offsetBottom),c.offsetTop&&(c.offset.top=c.offsetTop),b.affix(c)})})}(jQuery);


--------------------------------------------------------------------------------
/frontend/js/global.css:
--------------------------------------------------------------------------------
  1 | p {
  2 |     font-size: 16px;
  3 | }
  4 | .breadcrumb {
  5 |     border: 1px solid lightgrey;
  6 |     clear: both;
  7 | }
  8 | 
  9 | hr {
 10 |     border-color: lightgrey;
 11 |     opacity: 100%;
 12 |     clear: both;
 13 | }
 14 | h2 {
 15 |     color: white !important;
 16 |     background: lightgray;
 17 |     background: -webkit-linear-gradient(left, #444 0%, #eee 100%);
 18 |     padding: 5px;
 19 |     padding-left: 15px;
 20 |     margin-top: 0px;
 21 |     margin-bottom: 20px;
 22 |     font-weight: normal;
 23 |     border-left: 5px solid #428bca;
 24 | }
 25 | h2 small {
 26 |     color: lightgrey;
 27 |     font-weight: 300;
 28 | }
 29 | .CCG, .CCG h1, .CCG h1 a {
 30 |     font-variant: small-caps;
 31 |     font-size: 1.44em;
 32 |     color: #2a6496;
 33 | }
 34 | .CCG h1 a {
 35 |     line-height: 0.1em;
 36 | }
 37 | .CCG {
 38 |     padding-left: 15px;
 39 |     padding-right: 10px;
 40 | }
 41 | .CCG small {
 42 |     font-size: 0.45em;
 43 |     text-transform: uppercase;
 44 |     color: #ddb482;
 45 |     font-weight: lighter;
 46 |     letter-spacing: 7.8px;
 47 |     display: block;
 48 |     line-height: 0.7em;
 49 | }
 50 | .CCG img {
 51 |     position: absolute;
 52 |     top: 0px;
 53 |     right: 0px;
 54 |     height:100px;
 55 | }
 56 | 
 57 | .popover{
 58 |     max-width:800px;
 59 | }
 60 | 
 61 | #nav li .ell{
 62 |     margin:0px;
 63 |     padding:0px;
 64 |     padding-top:   2px;
 65 |     padding-bottom:2px;
 66 |     padding-left:4px;
 67 |     padding-right:4px;
 68 |     margin:4px;
 69 | }
 70 | 
 71 | #nav{
 72 |     margin-top:-10px;
 73 |     margin-left:-25px;
 74 | }
 75 | 
 76 | .lead{
 77 |     padding:20px;
 78 | }
 79 | 
 80 | .ccg-sidebar{
 81 |     width:110px;
 82 | }
 83 | 
 84 | #illinois-logo{
 85 |     margin:  10px;
 86 |     padding: 10px;
 87 | }
 88 | 
 89 | .navigation-container{
 90 |   margin-left: 50px;
 91 |   margin-right: 50px;
 92 | }
 93 | 
 94 | /* cloaking directive */
 95 | 
 96 | [ng\:cloak], [ng-cloak], [data-ng-cloak], [x-ng-cloak], .ng-cloak, .x-ng-cloak {
 97 |     display: none !important;
 98 | }
 99 | #problems {
100 |     font-size: 0.8em;
101 |     position: fixed;
102 |     bottom: 0px;
103 |     left: 17px;
104 | }
105 | #rollover {
106 |     font-size: 0.8em;
107 |     position: fixed;
108 |     top: 17px;
109 |     right: 17px;
110 |     width: 200px;
111 | }
112 | .licensing {
113 |     overflow-y: scroll;
114 |     height: 300px;
115 |     width: 100%;
116 |     border: 1px solid #DDD;
117 |     padding: 10px;
118 |     background-color: white;
119 | }
120 | .holder {
121 |     padding: 0 5%;
122 | }
123 | .header-container{
124 |   background-color: #FFFFFF;
125 |   border-bottom: 20px solid #2a6496; 
126 | }
127 | /* Navigation */
128 | .nav-pills > li a {
129 |   color: black;
130 |   border-radius: 0px;
131 | }
132 | .nav-pills > li a:hover{
133 |   background-color: #2a6496;
134 |   color: white;
135 |   -webkit-transition: all 0.2s;
136 |   transition: all 0.2s; 
137 | }
138 | 
139 | .research-project-card {
140 | 	min-height: 200px;
141 | }
142 | 
143 | .research-project-card a h5{
144 |   background-color: #7F5425;
145 |   padding: 1em 0.2em;
146 |   color: white;
147 | }
148 | 
149 | .inline {
150 |   display: inline-block;
151 |   vertical-align: middle;
152 | }
153 | 
154 | /*.licensing{
155 | 	overflow-y: scroll;
156 | 	height: 300px;
157 | 	width: 100%;
158 | 	border: 1px solid #DDD;
159 | 	padding: 10px;
160 | 	background-color: white;
161 | }*/
162 | .list-group-item.past-project {
163 |   min-height: 150px;
164 | }
165 | 


--------------------------------------------------------------------------------
/frontend/js/logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CogComp/zoe/75030cae103743e76420c2a0d74f52bab039f0a0/frontend/js/logo.png


--------------------------------------------------------------------------------
/frontend/loading_icon.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CogComp/zoe/75030cae103743e76420c2a0d74f52bab039f0a0/frontend/loading_icon.gif


--------------------------------------------------------------------------------
/install.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | if ! [ -x "$(command -v java)" ]; then
 4 |     echo 'Error: Java in not installed.'
 5 |     exit 1
 6 | fi
 7 | if ! [ -x "$(command -v mvn)" ]; then
 8 |     echo 'Error: maven is not installed.'
 9 |     exit 1
10 | fi
11 | if ! [ -x "$(command -v python3)" ]; then
12 |     echo 'Error: python 3.x is not installed.'
13 |     exit 1
14 | fi
15 | if ! [ -x "$(command -v virtualenv)" ]; then
16 |     echo 'Error: virtualenv is not installed.'
17 |     exit 1
18 | fi
19 | if ! [ -x "$(command -v wget)" ]; then
20 |     echo 'Error: wget is not found. Either install or find replacement and modify this script.'
21 |     exit 1
22 | fi
23 | if ! [ -x "$(command -v unzip)" ]; then
24 |     echo 'Error: unzip is not found. Either install or find replacement and modify this script.'
25 |     exit 1
26 | fi
27 | echo 'All dependencies satisfied. Moving on...'
28 | 
29 | virtualenv -p python3 venv
30 | cd ./bilm-tf
31 | ../venv/bin/python3 setup.py install
32 | wget http://cogcomp.org/Data/ccgPapersData/xzhou45/zoe/model.zip
33 | unzip model.zip
34 | rm model.zip
35 | cd ../
36 | venv/bin/pip3 install Cython
37 | venv/bin/pip3 install -r requirements.txt
38 | wget http://cogcomp.org/Data/ccgPapersData/xzhou45/zoe/data.zip
39 | unzip -n data.zip
40 | rm data.zip
41 | python -m ccg_nlpy download


--------------------------------------------------------------------------------
/main.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import pickle
  3 | import sys
  4 | 
  5 | from zoe_utils import DataReader
  6 | from zoe_utils import ElmoProcessor
  7 | from zoe_utils import EsaProcessor
  8 | from zoe_utils import Evaluator
  9 | from zoe_utils import InferenceProcessor
 10 | 
 11 | 
 12 | class ZoeRunner:
 13 | 
 14 |     """
 15 |     @allow_tensorflow sets whether the system will do run-time ELMo processing.
 16 |                       It's set to False in experiments as ELMo results are cached,
 17 |                       but please set it to default True when running on new sentences.
 18 |     """
 19 |     def __init__(self, allow_tensorflow=True):
 20 |         self.elmo_processor = ElmoProcessor(allow_tensorflow)
 21 |         self.esa_processor = EsaProcessor()
 22 |         self.inference_processor = InferenceProcessor("figer")
 23 |         self.evaluator = Evaluator()
 24 |         self.evaluated = []
 25 | 
 26 |     """
 27 |     Process a single sentence
 28 |     @sentence: a sentence in zoe_utils.Sentence structure
 29 |     @return: a sentence in zoe_utils that has predicted types set
 30 |     """
 31 |     def process_sentence(self, sentence, inference_processor=None):
 32 |         esa_candidates = self.esa_processor.get_candidates(sentence)
 33 |         elmo_candidates = self.elmo_processor.rank_candidates(sentence, esa_candidates)
 34 |         if len(elmo_candidates) > 0 and elmo_candidates[0][0] == self.elmo_processor.stop_sign:
 35 |             return -1
 36 |         if inference_processor is None:
 37 |             inference_processor = self.inference_processor
 38 |         inference_processor.inference(sentence, elmo_candidates, esa_candidates)
 39 |         return sentence
 40 | 
 41 |     def process_sentence_vec(self, sentence, inference_processor=None):
 42 |         esa_candidates = self.esa_processor.get_candidates(sentence)
 43 |         elmo_candidates = self.elmo_processor.rank_candidates_vec(sentence, esa_candidates)
 44 |         if len(elmo_candidates) > 0 and elmo_candidates[0][0] == self.elmo_processor.stop_sign:
 45 |             return -1
 46 |         if inference_processor is None:
 47 |             inference_processor = self.inference_processor
 48 |         inference_processor.inference(sentence, elmo_candidates, esa_candidates)
 49 |         return sentence
 50 | 
 51 |     """
 52 |     Helper function to evaluate on a dataset that has multiple sentences
 53 |     @file_name: A string indicating the data file. 
 54 |                 Note the format needs to be the common json format, see examples
 55 |     @mode: A string indicating the mode. This adjusts the inference mode, and set caches etc.
 56 |     @return: None
 57 |     """
 58 |     def evaluate_dataset(self, file_name, mode, do_inference=True, use_prior=True, use_context=True, size=-1):
 59 |         if not os.path.isfile(file_name):
 60 |             print("[ERROR] Invalid input data file.")
 61 |             return
 62 |         self.inference_processor = InferenceProcessor(mode, do_inference, use_prior, use_context)
 63 |         dataset = DataReader(file_name, size)
 64 |         for sentence in dataset.sentences:
 65 |             processed = self.process_sentence(sentence)
 66 |             if processed == -1:
 67 |                 continue
 68 |             self.evaluated.append(processed)
 69 |             processed.print_self()
 70 |             evaluator = Evaluator()
 71 |             evaluator.print_performance(self.evaluated)
 72 | 
 73 |     """
 74 |     Helper function that saves the predicted sentences list to a file.
 75 |     @file_name: A string indicating the target file path. 
 76 |                 Note it will override the content
 77 |     @return: None
 78 |     """
 79 |     def save(self, file_name):
 80 |         with open(file_name, "wb") as handle:
 81 |             pickle.dump(self.evaluated, handle, pickle.HIGHEST_PROTOCOL)
 82 | 
 83 |     @staticmethod
 84 |     def evaluate_saved_runlog(log_name):
 85 |         with open(log_name, "rb") as handle:
 86 |             sentences = pickle.load(handle)
 87 |         evaluator = Evaluator()
 88 |         evaluator.print_performance(sentences)
 89 | 
 90 | 
 91 | if __name__ == "__main__":
 92 |     if len(sys.argv) < 2:
 93 |         print("[ERROR] choose from 'figer', 'bbn', 'ontonotes' or 'eval'")
 94 |         exit(0)
 95 |     if sys.argv[1] == "figer":
 96 |         runner = ZoeRunner(allow_tensorflow=False)
 97 |         runner.elmo_processor.load_cached_embeddings("data/FIGER/target.min.embedding.pickle", "data/FIGER/wikilinks.min.embedding.pickle")
 98 |         runner.evaluate_dataset("data/FIGER/test_sampled.json", "figer")
 99 |         runner.save("data/log/runlog_figer.pickle")
100 |     if sys.argv[1] == "bbn":
101 |         runner = ZoeRunner(allow_tensorflow=False)
102 |         runner.elmo_processor.load_cached_embeddings("data/BBN/target.min.embedding.pickle", "data/BBN/wikilinks.min.embedding.pickle")
103 |         runner.evaluate_dataset("data/BBN/test.json", "bbn")
104 |         runner.save("data/log/runlog_bbn.pickle")
105 |     if sys.argv[1] == "ontonotes":
106 |         runner = ZoeRunner(allow_tensorflow=False)
107 |         runner.elmo_processor.load_cached_embeddings("data/ONTONOTES/target.min.embedding.pickle", "data/ONTONOTES/wikilinks.min.embedding.pickle")
108 |         runner.evaluate_dataset("data/ONTONOTES/test.json", "ontonotes", size=1000)
109 |         runner.save("data/log/runlog_ontonotes.pickle")
110 |     if sys.argv[1] == "eval" and len(sys.argv) > 2:
111 |         ZoeRunner.evaluate_saved_runlog(sys.argv[2])
112 | 


--------------------------------------------------------------------------------
/mapping/README.md:
--------------------------------------------------------------------------------
 1 | ## Mappings
 2 | 
 3 | ### What are these
 4 | 
 5 | Here we provide three mappings that maps FreeBase types to different taxonomies.
 6 | 
 7 | ### How to modify or create your own
 8 | 
 9 | Each mapping is composed with two files: *.mapping and *.logic.mapping
10 | 
11 | Each line in a *.mapping file contains a tab-separated pair, where the left is a FreeBase type, and the right is a target type.
12 | Note that each target type is automatically dissected into hierarchies by splitting "/". This file works in
13 | a "OR" logic, that is, a target type is assigned if *any* of the FreeBase mapping sources is found in an entity's FreeBase types.
14 | 
15 | *.logic.mapping serves as a supplementary mapping for more logics beyond "OR". 
16 | Each lineThere are three patterns of each line:
17 | - "+\tA\tB" means if type A appears in the mapped types, type B will be added
18 | - "-\tA\tB" means if type A appears in the mapped types, type B (if present) will be removed
19 | - "=\tA\tB" means if type A and type B are equivalent for evaluation purposes.
20 | 
21 | 


--------------------------------------------------------------------------------
/mapping/bbn.logic.mapping:
--------------------------------------------------------------------------------
1 | -	/ORGANIZATION	/LAW
2 | -	/LOCATION/RIVER	/LOCATION/LAKE_SEA_OCEAN
3 | -	/ORGANIZATION/GOVERNMENT	/ORGANIZATION/CORPORATION
4 | -	/PERSON	ALL_OTHER
5 | -	/GPE	ALL_OTHER
6 | -	/LOCATION/CONTINENT	/LOCATION/REGION


--------------------------------------------------------------------------------
/mapping/bbn.mapping:
--------------------------------------------------------------------------------
 1 | /business/employer	/ORGANIZATION/CORPORATION
 2 | /organization/organization	/ORGANIZATION
 3 | /people/person	/PERSON
 4 | /location/continent	/LOCATION/CONTINENT
 5 | /base/locations/continents	/LOCATION/CONTINENT
 6 | /location/location	/LOCATION
 7 | /aviation/airport	/FAC/AIRPORT
 8 | /location/country	/GPE/COUNTRY
 9 | /location/citytown	/GPE/CITY
10 | /location/location	/LOCATION
11 | /aviation/aircraft_model	/PRODUCT/VEHICLE
12 | /geography/river	/LOCATION/RIVER
13 | /geography/body_of_water	/LOCATION/LAKE_SEA_OCEAN
14 | /location/statistical_region	/LOCATION/REGION
15 | /base/plants/plant	/PLANT
16 | /architecture/building	/BUILDING
17 | /travel/hotel	/ORGANIZATION/HOTEL
18 | /time/event	/EVENT
19 | /transportation/road	/FAC/HIGHWAY_STREET
20 | /book/written_work	/WORK_OF_ART/BOOK
21 | /medicine/disease	/DISEASE
22 | /language/human_language	/LANGUAGE
23 | /base/locations/states_and_provences	/GPE/STATE_PROVINCE
24 | /location/cn_province	/GPE/STATE_PROVINCE
25 | /music/composition	/WORK_OF_ART/SONG
26 | /transportation/bridge	/FAC/BRIDGE
27 | /military/war	/EVENT/WAR
28 | /cvg/computer_videogame	/GAME
29 | /automotive/model	/PRODUCT/VEHICLE
30 | /visual_art/artwork	/WORK_OF_ART/PAINTING
31 | /food/food	/SUBSTANCE/FOOD
32 | /medicine/hospital	/ORGANIZATION/HOSPITAL
33 | /government/political_party	/ORGANIZATION/POLITICAL
34 | /law	/LAW
35 | /theater/play	/WORK_OF_ART/PLAY
36 | /government/government	/ORGANIZATION/GOVERNMENT
37 | /meteorology/tropical_cyclone	/EVENT/HURRICANE
38 | /biology/animal	/ANIMAL
39 | /religion/religion	/ORGANIZATION/RELIGIOUS
40 | /medicine/drug	/SUBSTANCE/DRUG
41 | /chemistry/chemical_compound	/SUBSTANCE/CHEMICAL
42 | /education/academic_institution	/ORGANIZATION/EDUCATIONAL
43 | /education/educational_institution	/ORGANIZATION/EDUCATIONAL
44 | /law/invention	/PRODUCT/WEAPON
45 | /government/government_agency	/ORGANIZATION/GOVERNMENT
46 | /government/governmental_body	/ORGANIZATION/GOVERNMENT


--------------------------------------------------------------------------------
/mapping/figer.logic.mapping:
--------------------------------------------------------------------------------
 1 | +	/building	/location
 2 | +	/news_agency	/organization
 3 | +	/news_agency	/organization/company
 4 | -	/news_agency	/written_work
 5 | +	/written_work	/art
 6 | -	/organization/educational_institution	/organization/company
 7 | -	/organization/sports_league	/organization/company
 8 | +	/transportation/road	/location
 9 | +	/livingthing	/living_thing
10 | +	/living_thing	/livingthing
11 | =	/living_thing	/livingthing


--------------------------------------------------------------------------------
/mapping/figer.mapping:
--------------------------------------------------------------------------------
  1 | /base/terrorism/terrorist	/person/terrorist
  2 | /base/terrorism/terrorist_attack	/event/terrorist_attack
  3 | /base/terrorism/terrorist_organization	/organization/terrorist_organization
  4 | /people/person	/person
  5 | /location/location	/location
  6 | /location/citytown	/location/city
  7 | /sports/pro_athlete	/person/athlete
  8 | /biology/organism_classification	/living_thing
  9 | /organization/organization	/organization
 10 | /music/album	/music
 11 | /music/artist	/person/artist
 12 | /soccer/football_player	/person/athlete
 13 | /government/politician	/person/politician
 14 | /book/author	/person/author
 15 | /architecture/structure	/building
 16 | /film/film	/art/film
 17 | /time/event	/event
 18 | /business/business_operation	/organization/company
 19 | /geography/geographical_feature	/location
 20 | /film/actor	/person/actor
 21 | /book/written_work	/written_work
 22 | /tv/tv_actor	/person/actor
 23 | /education/educational_institution	/organization/educational_institution
 24 | /music/composition	/music
 25 | /architecture/building	/building
 26 | /geography/body_of_water	/location/body_of_water
 27 | /book/book	/written_work
 28 | /music/musical_group	/person/musician
 29 | /tv/tv_program	/broadcast_program
 30 | /sports/sports_team	/organization/sports_team
 31 | /geography/river	/location/body_of_water
 32 | /location/administrative_division	/location
 33 | /boats/ship	/product/ship
 34 | /education/school	/organization/educational_institution
 35 | /visual_art/visual_artist	/person/artist
 36 | /astronomy/celestial_object	/astral_body
 37 | /baseball/baseball_player	/person/athlete
 38 | /american_football/football_player	/person/athlete
 39 | /transportation/road	/transportation/road
 40 | /cvg/computer_videogame	/game
 41 | /astronomy/orbital_relationship	/astral_body
 42 | /book/periodical	/written_work
 43 | /education/university	/organization/educational_institution
 44 | /film/director	/person/director
 45 | /soccer/football_team	/organization/sports_team
 46 | /music/composer	/person/musician
 47 | /cricket/cricket_player	/person/athlete
 48 | /government/u_s_congressperson	/person/politician
 49 | /film/writer	/person/author
 50 | /astronomy/asteroid	/astral_body
 51 | /broadcast/artist	/person/artist
 52 | /aviation/airport	/building/airport
 53 | /military/military_conflict	/event/military_conflict
 54 | /government/political_party	/government/political_party
 55 | /geography/mountain	/geography/mountain
 56 | /geography/lake	/location/body_of_water
 57 | /military/military_unit	/military
 58 | /chemistry/chemical_compound	/chemistry
 59 | /time/recurring_event	/event
 60 | /geography/island	/geography/island
 61 | /media_common/adaptation	/art/film
 62 | /basketball/basketball_player	/person/athlete
 63 | /architecture/museum	/building
 64 | /tv/tv_personality	/person/actor
 65 | /location/uk_civil_parish	/location
 66 | /film/producer	/person/artist
 67 | /user/robert/data_nursery/railway_station	/building
 68 | /ice_hockey/hockey_player	/person/athlete
 69 | /music/producer	/person/artist
 70 | /computer/software	/software
 71 | /book/magazine	/written_work
 72 | /people/ethnicity	/people/ethnicity
 73 | /aviation/aircraft_model	/product/airplane
 74 | /government/election	/event/election
 75 | /location/census_designated_place	/location
 76 | /sports/tournament_event_competition	/event/sports_event
 77 | /music/record_label	/organization/company
 78 | /medicine/disease	/disease
 79 | /architecture/architect	/person/architect
 80 | /sports/sports_facility	/building/sports_facility
 81 | /base/crime/lawyer	/person
 82 | /language/human_language	/language
 83 | /location/neighborhood	/location
 84 | /music/guitarist	/person/musician
 85 | /sports/boxer	/person/athlete
 86 | /base/popstra/celebrity	/person
 87 | /theater/play	/play
 88 | /base/rugby/rugby_player	/person/athlete
 89 | /automotive/model	/product/car
 90 | /book/journal	/written_work
 91 | /aviation/airline	/organization/airline
 92 | /transportation/bridge	/location/bridge
 93 | /theater/theater_actor	/person/actor
 94 | /architecture/skyscraper	/building
 95 | /user/sprocketonline/economics/legislation	/law
 96 | /medicine/drug	/medicine/drug
 97 | /olympics/olympic_event_competition	/event/sports_event
 98 | /book/newspaper	/written_work
 99 | /award/award	/award
100 | /sports/cyclist	/person/athlete
101 | /royalty/noble_title	/title
102 | /tv/tv_producer	/person/artist
103 | /government/government_agency	/government_agency
104 | /government/government_body	/government_agency
105 | /government/general_election	/event/election
106 | /tv/tv_writer	/person/author
107 | /location/us_county	/location/county
108 | /tennis/tennis_player	/person/athlete
109 | /medicine/medical_treatment	/medicine/medical_treatment
110 | /internet/website	/internet/website
111 | /visual_art/artwork	/art
112 | /royalty/monarch	/person/monarch
113 | /tv/tv_director	/person/director
114 | /location/australian_suburb	/location
115 | /organization/non_profit_organization	/organization
116 | /music/lyricist	/person/author
117 | /medicine/anatomical_structure	/body_part
118 | /sports/sports_league	/organization/sports_league
119 | /medicine/hospital	/building/hospital
120 | /religion/religious_leader	/person/religious_leader
121 | /base/givennames/given_name	/person
122 | /people/place_of_interment	/location/cemetery
123 | /boats/ship_class	/product/ship
124 | /user/akatenev/weapons/weapon	/product/weapon
125 | /tv/tv_program_creator	/person/artist
126 | /sports/australian_rules_footballer	/person/athlete
127 | /travel/tourist_attraction	/location
128 | /american_football/football_coach	/person/coach
129 | /business/shopping_center	/location
130 | /event/disaster	/event/natural_disaster
131 | /food/dish	/food
132 | /geography/mountain_range	/geography/mountain
133 | /astronomy/star	/astral_body
134 | /sports/golfer	/person/athlete
135 | /film/cinematographer	/person/artist
136 | /book/short_story	/written_work
137 | /government/government_office_or_title	/title
138 | /medicine/physician	/person/doctor
139 | /music/conductor	/person/musician
140 | /user/joshuamclark/default_domain/bird	/livingthing/animal
141 | /architecture/house	/building
142 | /games/game	/game
143 | /people/family_name	/person
144 | /film/editor	/person/artist
145 | /basketball/basketball_coach	/person/coach
146 | /user/patrick/default_domain/submarine	/product/ship
147 | /rail/locomotive_class	/train
148 | /music/songwriter	/person/author
149 | /law/judge	/person
150 | /cvg/cvg_developer	/person/engineer
151 | /comic_books/comic_book_series	/written_work
152 | /base/switzerland/ch_city	/location/city
153 | /celebrities/celebrity	/person
154 | /chess/chess_player	/person/athlete
155 | /location/cemetery	/location/cemetery
156 | /tv/tv_network	/broadcast_network
157 | /opera/opera	/play
158 | /dining/restaurant	/building/restaurant
159 | /geography/glacier	/geography/glacier
160 | /military/armed_force	/military
161 | /cricket/cricket_bowler	/person/athlete
162 | /rail/railway	/rail/railway
163 | /basketball/basketball_team	/organization/sports_team
164 | /martial_arts/martial_artist	/person/artist
165 | /book/publishing_company	/organization/company
166 | /music/instrument	/product/instrument
167 | /food/ingredient	/food
168 | /people/profession	/title
169 | /metropolitan_transit/transit_line	/metropolitan_transit/transit_line
170 | /base/handball/handball_player	/person/athlete
171 | /architecture/lighthouse	/building
172 | /ice_hockey/hockey_team	/organization/sports_team
173 | /book/poem	/written_work
174 | /book/literary_series	/written_work
175 | /location/cn_county	/location/county
176 | /government/governmental_body	/government/government
177 | /user/skud/legal/treaty	/law
178 | /base/americancomedy/comedian	/person/actor
179 | /business/consumer_product	/product
180 | /theater/theater	/building/theater
181 | /computer/programming_language	/computer/programming_language
182 | /meteorology/tropical_cyclone	/event/natural_disaster
183 | /medicine/icd_9_cm_classification	/disease
184 | /base/infrastructure/power_station	/building/power_station
185 | /base/crime/law_enforcement_authority	/military
186 | /religion/deity	/god
187 | /base/disaster2/attack	/event/attack
188 | /cvg/cvg_publisher	/organization/company
189 | /baseball/baseball_team	/organization/sports_team
190 | /base/morelaw/canadian_lawyer	/person
191 | /medicine/symptom	/medicine/symptom
192 | /base/hotels/hotel	/building/hotel
193 | /music/concert_tour	/event
194 | /military/military_commander	/person/soldier
195 | /business/job_title	/title
196 | /food/food	/food
197 | /tennis/tennis_tournament	/event/sports_event
198 | /base/disaster2/death_causing_event	/event/attack
199 | /base/foodrecipes/recipe_ingredient	/food
200 | /sports/professional_sports_team	/organization/sports_team
201 | /base/formula1/formula_1_grand_prix	/event/sports_event
202 | /travel/accommodation	/building/hotel
203 | /base/nascar/nascar_driver	/person/athlete
204 | /geography/mountain_pass	/geography/mountain
205 | /base/scubadiving/marine_creature	/livingthing/animal
206 | /location/jp_city_town	/location/city
207 | /astronomy/astronomer	/person
208 | /cvg/cvg_designer	/person/artist
209 | /film/film_festival	/event
210 | /computer/computer_scientist	/person
211 | /base/morelaw/canadian_judge	/person
212 | /base/prison/prison	/government_agency
213 | /user/robert/mobile_phones/mobile_phone	/product/mobile_phone
214 | /spaceflight/spacecraft	/product/spacecraft
215 | /base/fashionmodels/fashion_model	/person
216 | /biology/protein	/biology
217 | /user/tsegaran/random/formula_one_race	/event/sports_event
218 | /conferences/conference_series	/event
219 | /user/robert/data_nursery/aircraft_engine	/product/engine_device
220 | /zoos/zoo	/building
221 | /base/formula1/formula_1_driver	/person/athlete
222 | /finance/currency	/finance/currency
223 | /location/australian_local_government_area	/location
224 | /engineering/engine	/product/engine_device
225 | /aviation/airliner_accident	/event/natural_disaster
226 | /music/festival	/event
227 | /time/day_of_year	/time
228 | /base/sportssandbox/sports_event	/event/sports_event
229 | /geography/waterfall	/location/body_of_water
230 | /film/production_company	/organization/company
231 | /user/skud/boats/submarine	/product/ship
232 | /location/in_district	/location/county
233 | /user/skud/boats/vessel_class	/product/ship
234 | /music/bassist	/person/musician
235 | /user/robert/default_domain/given_name	/person
236 | /base/athletics/track_and_field_athlete	/person/athlete
237 | /location/de_city	/location/city
238 | /cvg/musical_game_song	/music
239 | /amusement_parks/park	/park
240 | /location/country	/location/country
241 | /location/jp_district	/location/county
242 | /base/newsevents/news_reporting_organisation	/news_agency
243 | /education/student_radio_station	/broadcast_network
244 | /user/robert/earthquakes/earthquake	/event/natural_disaster
245 | /astronomy/galaxy	/astral_body
246 | /base/rugby/rugby_club	/organization/sports_team
247 | /time/holiday	/time
248 | /music/engineer	/person/artist
249 | /religion/religion	/religion/religion
250 | /film/film_production_designer	/person/artist
251 | /user/robert/data_nursery/galaxy	/astral_body
252 | /education/fraternity_sorority	/organization/fraternity_sorority
253 | /education/department	/education/department
254 | /wine/grape_variety	/food
255 | /aviation/aircraft_manufacturer	/organization/company
256 | /games/playing_card_game	/game
257 | /opera/librettist	/person/musician
258 | /base/rugby/rugby_coach	/person/coach
259 | /venture_capital/venture_investor	/organization/company
260 | /computer/software_developer	/person/engineer
261 | /education/school_newspaper	/newspaper
262 | /dining/chef	/person
263 | /user/tsegaran/legal/act_of_congress	/law
264 | /base/sails/sailing_ship_class	/product/ship
265 | /geography/mountaineer	/person
266 | /base/popstra/company	/organization/company
267 | /location/nl_municipality	/location/county
268 | /user/tsegaran/computer/algorithm	/computer/algorithm
269 | /base/americancivilwar/battle	/event/military_conflict
270 | /food/cheese	/food
271 | /fashion/fashion_designer	/person/artist
272 | /user/tsegaran/random/locomotive	/train
273 | /music/orchestra	/person/musician
274 | /soccer/football_league	/organization/sports_league
275 | /user/robert/military/military_person	/person/soldier
276 | /film/film_art_director	/person/director
277 | /religion/religious_organization	/organization
278 | /computer/computer	/product/computer
279 | /religion/religious_leadership_title	/title
280 | /film/film_company	/organization/company
281 | /base/ovguide/country_musical_groups	/person/musician
282 | /law/courthouse	/building
283 | /baseball/baseball_manager	/person/coach
284 | /base/ports/port_of_call	/location
285 | /base/engineering/canal	/location/body_of_water
286 | /law/court	/government_agency
287 | /soccer/football_team_manager	/person/coach
288 | /skiing/ski_area	/location
289 | /user/skud/boats/warship_class	/product/ship
290 | /boats/ship_type	/product/ship
291 | /base/engineering/dam	/building/dam
292 | /user/lindenb/default_domain/scientist	/person
293 | /metropolitan_transit/transit_system	/transit
294 | /geography/island_group	/geography/island
295 | /base/ovguide/bollywood_films	/art/film
296 | /astronomy/astronomical_observatory	/building
297 | /education/educational_degree	/education/educational_degree
298 | /book/illustrator	/person/artist
299 | /base/usnationalparks/us_national_park	/park
300 | /sports/school_sports_team	/organization/sports_team
301 | /comic_strips/comic_strip_creator	/person/artist
302 | /medicine/surgeon	/person/doctor
303 | /cvg/cvg_platform	/software
304 | /music/opera_singer	/person/musician
305 | /location/ar_department	/location/county
306 | /music/guitar	/product/instrument
307 | /book/periodical_publisher	/organization/company
308 | /user/robert/us_congress/us_senator	/person/politician
309 | /baseball/baseball_coach	/person/coach
310 | /base/wrestling/professional_wrestler	/person/athlete
311 | /base/handball/handball_team	/organization/sports_team
312 | /cvg/game_series	/game
313 | /location/de_rural_district	/location
314 | /cvg/game_voice_actor	/person/artist
315 | /base/fires/explosion	/event/natural_disaster
316 | /base/exoplanetology/exoplanet	/astral_body
317 | /base/aptamer/chemical_compound	/chemistry
318 | /user/skud/embassies_and_consulates/embassy	/government_agency
319 | /base/casinos/casino	/building
320 | /government/national_anthem	/music
321 | /user/kconragan/graphic_design/graphic_designer	/person/artist
322 | /user/carmenmfenn1/ballet/ballet	/play
323 | /location/cn_prefecture_level_city	/location/city
324 | /film/film_costumer_designer	/person/artist
325 | /travel/transport_terminus	/building
326 | /visual_art/color	/visual_art/color
327 | /language/language_dialect	/language
328 | /architecture/architecture_firm	/organization/company
329 | /location/us_cbsa	/location
330 | /user/skud/boats/cruise_ship	/product/ship
331 | /digicams/digital_camera	/product/camera
332 | /base/aptamer/nucleic_acid	/biology
333 | /base/crime/police_department	/government_agency
334 | /location/ca_census_division	/location
335 | /opera/opera_company	/organization/company
336 | /religion/religious_text	/written_work
337 | /book/short_non_fiction	/written_work
338 | /location/region	/location
339 | /base/fight/protest	/event/protest
340 | /location/cn_county_level_city	/location/city
341 | /base/activism/organization	/organization
342 | /base/popstra/location	/location
343 | /music/drummer	/person/musician
344 | /book/translated_work	/written_work
345 | /library/public_library_system	/building/library
346 | /base/fires/fires	/event/natural_disaster
347 | /base/greece/gr_city	/location/city
348 | /film/film_distributor	/organization/company
349 | /user/arielb/israel/israeli_settlement	/location
350 | /location/uk_non_metropolitan_district	/location
351 | /base/disaster2/infectious_disease	/disease
352 | /royalty/order_of_chivalry	/title
353 | /medicine/infectious_disease	/disease
354 | /base/morelaw/court	/building
355 | /base/column/column_author	/person/author
356 | /business/trade_union	/organization
357 | /base/popstra/organization	/organization
358 | /library/public_library	/building/library
359 | /government/government	/government/government
360 | /comic_books/comic_book_creator	/person/artist
361 | /biology/animal	/livingthing/animal
362 | /finance/stock_exchange	/finance/stock_exchange
363 | /baseball/baseball_league	/organization/sports_league
364 | /broadcast/tv_channel	/broadcast/tv_channel
365 | /base/bioventurist/science_or_technology_company	/organization/company
366 | /user/anandology/default_domain/railway_station	/building
367 | /base/fairytales/fairy_tale	/written_work
368 | /film/film_series	/art/film
369 | /user/hangy/default_domain/at_municipality	/location/city
370 | /user/maxim75/default_domain/transit_stop_connection	/building
371 | /film/film_featured_song	/music
372 | /base/fashion/fashion_designer	/person/artist
373 | /film/film_festival_event	/event
374 | /games/game_designer	/person/artist
375 | /user/skud/boats/submarine_class	/product/ship
376 | /user/carmenmfenn1/greco_roman_mythology/greek_deity	/god
377 | /ice_hockey/hockey_coach	/person/coach
378 | /cvg/musical_game	/game
379 | /base/train/multiple_unit	/train
380 | /base/magic/magician	/person
381 | /user/carmenmfenn1/ballet/ballet_dancer	/person/artist
382 | /rail/electric_locomotive_class	/train
383 | /base/americancomedy/movie	/art/film
384 | /rail/locomotive	/train
385 | /base/train/electric_locomotive	/train
386 | /base/tallships/tall_ship	/product/ship
387 | /user/skud/boats/sailing_vessel	/product/ship
388 | /base/classiccars/classic_car	/product/car
389 | /user/carmenmfenn1/greco_roman_mythology/roman_deity	/god
390 | /base/volleyball/beach_volleyball_player	/person/athlete
391 | /cricket/cricket_team	/organization/sports_team
392 | /american_football/football_team	/organization/sports_team
393 | /base/bookstores/bookstore	/building
394 | /architecture/tower	/building
395 | /base/disaster2/tornado	/event/natural_disaster
396 | /user/iubookgirl/default_domain/academic_library	/building/library
397 | /base/crime/law_firm	/organization/company
398 | /base/movietheatres/movie_theatre	/building/theater
399 | /base/marchmadness/ncaa_basketball_team	/organization/sports_team
400 | /base/weapons/weapon	/product/weapon
401 | /book/publication	/written_work
402 | /astronomy/comet	/astral_body
403 | /base/peleton/cyclist	/person/athlete
404 | /base/americancomedy/tv_show	/broadcast_program
405 | /automotive/company	/organization/company
406 | /base/surfing/surfer	/person/athlete
407 | /base/disaster2/rail_accident	/event/natural_disaster
408 | /user/robert/data_nursery/rail_accident	/event/natural_disaster
409 | /base/fires/fire_department	/government_agency
410 | /base/disaster2/flood	/event/natural_disaster
411 | /user/robert/roman_empire/roman_emperor	/person/monarch
412 | /user/techgnostic/default_domain/tv_series_serial	/broadcast_program
413 | /base/infrastructure/nuclear_power_plant	/building
414 | /base/crime/appellate_court	/building
415 | /user/alecf/recreation/park	/park
416 | /music/track	/music
417 | /opera/opera_house	/building
418 | /base/switzerland/ch_district	/location
419 | /music/concert_film	/art/film
420 | /location/us_indian_reservation	/location
421 | /chemistry/chemical_element	/chemistry
422 | /user/robert/data_nursery/chinese_emperor	/person/monarch
423 | /base/sportssandbox/sports_recurring_event	/event/sports_event
424 | /base/filmcameras/camera	/product/camera
425 | /base/backpacking1/wilderness_area	/location
426 | /base/athletics/athletics_marathon	/event/sports_event
427 | /base/backpacking1/national_forest	/location
428 | /education/student_organization	/organization
429 | /award/award_ceremony	/event
430 | /base/skateboarding/skateboarder	/person/athlete
431 | /cricket/cricket_coach	/person/coach
432 | /business/oil_field	/location
433 | /base/peleton/road_bicycle_racing_event	/event/sports_event
434 | /conferences/conference	/event
435 | /base/charities/charity	/organization
436 | /base/jsbach/bach_composition	/music
437 | /tv/tv_theme_song	/music
438 | /location/de_urban_district	/location
439 | /base/snowboard/snowboarder	/person/athlete
440 | /location/it_province	/location/province
441 | /location/it_frazione	/location
442 | /astronomy/nebula	/astral_body
443 | /user/patrick/default_domain/submarine_class	/product/ship
444 | /architecture/building_complex	/building
445 | /base/bioventurist/organization	/organization
446 | /base/smarthistory/visual_artist	/person/artist
447 | /base/pgschools/school	/organization/educational_institution
448 | /base/fight/political_rebellion_or_revolution	/event/protest
449 | /base/engineering/mine	/location
450 | /location/fr_department	/location/county
451 | /food/beer_country_region	/location
452 | /base/wrestling/championship_title	/title
453 | /astronomy/star_system	/astral_body
454 | /aviation/aircraft	/product/airplane
455 | /radio/radio_program	/broadcast_program
456 | /location/id_city	/location/city
457 | /astronomy/constellation	/astral_body
458 | /food/tea	/food
459 | /food/beverage	/food
460 | /location/pr_municipality	/location/city
461 | /theater/theater_director	/person/director
462 | /soccer/fifa	/organization
463 | /location/kp_city	/location/city
464 | /cricket/cricket_stadium	/building/sports_facility
465 | /astronomy/supernova	/astral_body
466 | /location/es_comarca	/location/county
467 | /government/election_campaign	/event/election
468 | /rail/diesel_locomotive_class	/train
469 | /comic_books/comic_book_writer	/person/artist
470 | /olympics/olympic_games	/event/sports_event
471 | /language/conlang	/language
472 | /location/uk_unitary_authority	/location
473 | /location/vn_province	/location/province
474 | /location/in_division	/location/county
475 | /automotive/automotive_class	/product/car
476 | /exhibitions/exhibition	/event
477 | /automotive/make	/organization/company
478 | /opera/opera_director	/person/director
479 | /location/es_province	/location/province
480 | /wine/wine_sub_region	/location
481 | /location/us_state	/location/province
482 | /film/film_casting_director	/person/director
483 | /book/reviewed_work	/written_work
484 | /food/beer	/food
485 | /comic_books/comic_book_story	/written_work
486 | /location/jp_prefecture	/written_work
487 | /government/us_vice_president	/person/politician
488 | /american_football/super_bowl	/event/sports_event
489 | /location/id_subdistrict	/location
490 | /location/in_city	/location/city
491 | /government/us_president	/person/politician
492 | /organization/australian_organization	/organization
493 | /location/uk_metropolitan_borough	/location
494 | /location/uk_non_metropolitan_county	/location/county
495 | /location/id_province	/location/province
496 | /martial_arts/martial_arts_organization	/organization
497 | /location/cn_autonomous_county	/location/county
498 | /education/academic_post_title	/title
499 | /music/concert	/event
500 | /location/uk_council_area	/location
501 | /location/mx_state	/location/province
502 | /location/de_regierungsbezirk	/location/county
503 | /location/cn_autonomous_prefecture	/location/county
504 | /law/constitutional_amendment	/law
505 | /digicams/digital_camera_manufacturer	/organization/company
506 | /wine/wine	/food
507 | /soccer/football_world_cup	/event/sports_event
508 | /location/in_state	/location/province
509 | /location/tw_district	/location/county
510 | /location/br_state	/location/province
511 | /sports/golf_course	/location
512 | /location/fr_region	/location/province
513 | /location/ca_indian_reserve	/location
514 | /royalty/chivalric_title	/title
515 | /location/ua_oblast	/location/province
516 | /location/jp_special_ward	/location
517 | /location/cn_province	/location/province
518 | /location/ar_province	/location/province
519 | /book/book_binding	/location/city
520 | /location/vn_provincial_city	/location/city
521 | /book/translation	/written_work
522 | /music/music_video_director	/person/director
523 | /location/it_region	/location
524 | /cricket/cricket_tournament_event	/event/sports_event
525 | /biology/amino_acid	/biology
526 | /location/de_borough	/location
527 | /location/ru_republic	/location/province
528 | /location/hk_district	/location
529 | /education/school_magazine	/written_work
530 | /location/jp_designated_city	/location/city
531 | /location/us_territory	/location
532 | /location/my_division	/location/county
533 | /location/de_state	/location/province
534 | /location/uk_overseas_territory	/location
535 | /location/my_state	/location/province
536 | /cricket/cricket_tournament	/event/sports_event
537 | /location/nl_province	/location/province
538 | /automotive/model_year	/product/car
539 | /music/single	/music
540 | /wine/vineyard	/location
541 | /organization/organization_committee_title	/title
542 | /location/province	/location/province
543 | /location/mx_municipality	/location
544 | /location/uk_region	/location
545 | /location/kp_province	/location/province
546 | /location/jp_subprefecture	/location
547 | /location/cn_prefecture	/location
548 | /music/live_album	/music
549 | /location/ru_krai	/location/province
550 | /law/judicial_title	/title
551 | /cricket/cricket_league	/organization/sports_league
552 | /astronomy/planet	/location/province
553 | /location/in_union_territory	/location/province
554 | /location/continent	/location
555 | /book/short_non_fiction_variety	/written_work
556 | /book/serialized_work	/written_work
557 | /astronomy/galactic_super_cluster	/astral_body
558 | /astronomy/galactic_group	/astral_body
559 | /astronomy/galactic_cluster	/astral_body
560 | /astronomy/celestial_object_with_coordinate_system	/astral_body
561 | /location/vn_township	/location/city
562 | /location/uk_metropolitan_county	/location/county
563 | /location/ru_autonomous_okrug	/location/province
564 | /location/kp_metropolitan_city	/location/city
565 | /location/australian_state	/location/province
566 | /biology/chromosome	/biology


--------------------------------------------------------------------------------
/mapping/ontonotes.logic.mapping:
--------------------------------------------------------------------------------
1 | +	EMPTY	/other


--------------------------------------------------------------------------------
/mapping/ontonotes.mapping:
--------------------------------------------------------------------------------
 1 | /people/person	/person
 2 | /book/author	/person/artist/author
 3 | /film/actor	/person/artist/actor
 4 | /music/artist	/person/artist/music
 5 | /sports/pro_athlete	/person/athlete
 6 | /medicine/physician	/person/doctor
 7 | /government/politician	/person/political_figure
 8 | /government/government_office_or_title	/person/political_figure
 9 | /base/crime/criminal_defence_attorney	/person/legal
10 | /base/crime/lawyer_type	/person/legal
11 | /law/judge	/person/legal
12 | /fictional_universe/fictional_job_title	/person/title
13 | /business/job_title	/person/title
14 | /government/government_office_category	/person/title
15 | /aviation/airport	/location/structure/airport
16 | /architecture/building	/location/structure
17 | /travel/hotel	/location/structure/hotel
18 | /sports/sports_facility	/location/structure/sports_facility
19 | /geography/body_of_water	/location/geography/body_of_water
20 | /geography/mountain	/location/geography/mountain
21 | /geography/*	/location/geography
22 | /transportation/bridge	/location/transit/bridge
23 | /rail/railway	/location/transit/railway
24 | /transportation/road	/location/transit/road
25 | /location/citytown	/location/city
26 | /location/country	/location/country
27 | /amusement_parks/park	/location/park
28 | /base/usnationalparks/us_national_park	/location/park
29 | /location/location	/location
30 | /organization/organization_type	/organization
31 | /organization/organization	/organization
32 | /base/newsevents/news_reporting_organisation	/organization/company/news
33 | /book/publishing_company	/organization/company/news
34 | /broadcast/producer	/organization/company/broadcast
35 | /business/employer	/organization/company
36 | /education/academic_institution	/organization/education
37 | /education/educational_institution	/organization/education
38 | /government/government_agency	/organization/government
39 | /government/government	/organization/government
40 | /government/governmental_body	/organization/government
41 | /base/crime/type_of_law_enforcement_agency	/organization/government
42 | /military/military_unit	/organization/military
43 | /government/political_party	/organization/political_party
44 | /sports/sports_team	/organization/sports_team
45 | /finance/stock_exchange	/organization/stock_exchange
46 | /tv/tv_program	/other/art/broadcast
47 | /film/film	/other/art/film
48 | /music/album	/other/art/music
49 | /music/composition	/other/art/music
50 | /theater/play	/other/art/stage
51 | /opera/opera	/other/art/stage
52 | /book/written_work	/other/art/writing
53 | /book/short_story	/other/art/writing
54 | /book/poem	/other/art/writing
55 | /book/literary_series	/other/art/writing
56 | /book/publication	/other/art/writing
57 | /time/event	/other/event
58 | /time/holiday	/other/event/holiday
59 | /military/military_conflict	/other/event/violent_conflict
60 | /medicine/medical_treatment	/other/health/treatment
61 | /award/award	/other/award
62 | /medicine/anatomical_structure	/other/body_part
63 | /finance/currency	/other/currency
64 | /biology/animal	/other/living_thing/animal
65 | /base/plants/plant	/other/living_thing
66 | /law/invention	/other/product/weapon
67 | /automotive/model	/other/product/vehicle
68 | /aviation/aircraft_model	/other/product/vehicle
69 | /computer/*	/other/product/computer
70 | /computer/software	/other/product/software
71 | /food/food	/other/food
72 | /religion/religion	/other/religion
73 | /people/ethnicity	/other/heritage
74 | /base/morelaw/type_of_legal_subject	/other/legal
75 | /user/sprocketonline/economics/legislation	/other/legal
76 | /user/tsegaran/legal/act_of_congress	/other/legal
77 | /user/skud/legal/treaty	/other/legal
78 | /law/constitutional_amendment	/other/legal
79 | EMPTY	/other


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | h5py
 2 | tensorflow
 3 | numpy
 4 | scipy
 5 | regex
 6 | Flask
 7 | flask-cors
 8 | ccg_nlpy
 9 | gensim
10 | requests


--------------------------------------------------------------------------------
/scripts.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import pickle
  3 | import sqlite3
  4 | import sys
  5 | 
  6 | from ccg_nlpy import local_pipeline
  7 | 
  8 | from cache import SurfaceCache
  9 | from main import ZoeRunner
 10 | from zoe_utils import DataReader
 11 | from zoe_utils import ElmoProcessor
 12 | from zoe_utils import Sentence
 13 | 
 14 | 
 15 | def convert_esa_map(esa_file_name, freq_file_name, invcount_file_name):
 16 |     esa_map = {}
 17 |     with open(esa_file_name) as f:
 18 |         for line in f:
 19 |             line = line.strip()
 20 |             if len(line.split("\t")) <= 1:
 21 |                 continue
 22 |             key = line.split("\t")[0]
 23 |             value = line.split("\t")[1]
 24 |             esa_map[key] = value
 25 |     with open('data/esa/esa.pickle', 'wb') as handle:
 26 |         pickle.dump(esa_map, handle, protocol=pickle.HIGHEST_PROTOCOL)
 27 | 
 28 |     freq_map = {}
 29 |     with open(freq_file_name) as f:
 30 |         for line in f:
 31 |             line = line.strip()
 32 |             if len(line.split("\t")) <= 1:
 33 |                 continue
 34 |             key = line.split("\t")[0]
 35 |             value = int(line.split("\t")[1])
 36 |             freq_map[key] = value
 37 |     with open('data/esa/freq.pickle', 'wb') as handle:
 38 |         pickle.dump(freq_map, handle, protocol=pickle.HIGHEST_PROTOCOL)
 39 | 
 40 |     invcount_map = {}
 41 |     with open(invcount_file_name) as f:
 42 |         for line in f:
 43 |             line = line.strip()
 44 |             if len(line.split("\t")) <= 1:
 45 |                 continue
 46 |             key = line.split("\t")[0]
 47 |             value = int(line.split("\t")[1])
 48 |             invcount_map[key] = value
 49 |     with open('data/esa/invcount.pickle', 'wb') as handle:
 50 |         pickle.dump(invcount_map, handle, protocol=pickle.HIGHEST_PROTOCOL)
 51 | 
 52 | 
 53 | def convert_wikilinks_sent_examples(sent_example_file_name):
 54 |     sent_example_map = {}
 55 |     max_bytes = 2 ** 31 - 1
 56 |     with open(sent_example_file_name) as f:
 57 |         for line in f:
 58 |             line = line.strip()
 59 |             if len(line.split("\t")) <= 1:
 60 |                 continue
 61 |             key = line.split("\t")[0]
 62 |             value = line.split("\t")[1]
 63 |             sent_example_map[key] = value
 64 |     bytes_out = pickle.dumps(sent_example_map, protocol=pickle.HIGHEST_PROTOCOL)
 65 |     with open('data/sent_example.pickle', 'wb') as handle:
 66 |         for idx in range(0, len(bytes_out), max_bytes):
 67 |             handle.write(bytes_out[idx:idx + max_bytes])
 68 | 
 69 | 
 70 | def convert_freebase(freebase_file_name, freebase_sup_file_name):
 71 |     ret_map = {}
 72 |     with open(freebase_file_name) as f:
 73 |         for line in f:
 74 |             line = line.strip()
 75 |             if len(line.split("\t")) <= 1:
 76 |                 continue
 77 |             key = line.split("\t")[0]
 78 |             val = line.split("\t")[1]
 79 |             ret_map[key] = val
 80 |     with open(freebase_sup_file_name) as f:
 81 |         for line in f:
 82 |             line = line.strip()
 83 |             if len(line.split("\t")) <= 1:
 84 |                 continue
 85 |             key = line.split("\t")[0]
 86 |             val = line.split("\t")[1]
 87 |             if key not in ret_map:
 88 |                 ret_map[key] = val
 89 |     with open('data/title2freebase.pickle', 'wb') as handle:
 90 |         pickle.dump(ret_map, handle, protocol=pickle.HIGHEST_PROTOCOL)
 91 | 
 92 | 
 93 | def convert_prob(prob_file_name, n2c_fil2_name):
 94 |     prob_map = {}
 95 |     n2c_map = {}
 96 |     with open(n2c_fil2_name) as f:
 97 |         for line in f:
 98 |             line = line.strip()
 99 |             n2c_map[line.split("\t")[0]] = line.split("\t")[1]
100 |     with open(prob_file_name) as f:
101 |         for line in f:
102 |             line = line.strip()
103 |             key = line.split("\t")[0]
104 |             val = float(line.split("\t")[1])
105 |             surface = key.split("|")[0]
106 |             title = key.split("|")[1]
107 |             if title in n2c_map:
108 |                 title = n2c_map[title]
109 |             if surface in prob_map:
110 |                 cur_highest = prob_map[surface][1]
111 |                 if val > cur_highest:
112 |                     prob_map[surface] = (title, val)
113 |             else:
114 |                 prob_map[surface] = (title, val)
115 |     with open('data/prior_prob.pickle', 'wb') as handle:
116 |         pickle.dump(prob_map, handle, protocol=pickle.HIGHEST_PROTOCOL)
117 | 
118 | 
119 | def convert_cached_embeddings(raw_file_name, output_file_name):
120 |     ret_map = {}
121 |     max_bytes = 2 ** 31 - 1
122 |     with open(raw_file_name) as f:
123 |         for line in f:
124 |             line = line.strip()
125 |             token = line.split("\t")[0]
126 |             if line.split("\t")[1] == "null":
127 |                 continue
128 |             vals = line.split("\t")[1].split(",")
129 |             vec = []
130 |             for val in vals:
131 |                 vec.append(float(val))
132 |             ret_map[token] = vec
133 |     bytes_out = pickle.dumps(ret_map, protocol=pickle.HIGHEST_PROTOCOL)
134 |     with open(output_file_name, 'wb') as handle:
135 |         for idx in range(0, len(bytes_out), max_bytes):
136 |             handle.write(bytes_out[idx:idx + max_bytes])
137 | 
138 | 
139 | def reduce_cache_file_size(cache_pickle_file_name, title_file_name, out_file_name):
140 |     with open(cache_pickle_file_name, "rb") as handle:
141 |         cache_map = pickle.load(handle)
142 |     title_set = set()
143 |     with open(title_file_name, "r") as f:
144 |         for line in f:
145 |             line = line.strip()
146 |             title_set.add(line)
147 |     ret_map = {}
148 |     for key in cache_map:
149 |         if key in title_set:
150 |             ret_map[key] = cache_map[key]
151 |     with open(out_file_name, "wb") as handle:
152 |         pickle.dump(ret_map, handle, protocol=pickle.HIGHEST_PROTOCOL)
153 | 
154 | 
155 | def check_data_file_integrity(mode=""):
156 |     file_list = [
157 |         'data/esa/esa.pickle',
158 |         'data/esa/freq.pickle',
159 |         'data/esa/invcount.pickle',
160 |         'data/prior_prob.pickle',
161 |         'data/sent_example.pickle',
162 |         'data/title2freebase.pickle',
163 |     ]
164 |     corpus_supplements = []
165 |     if mode == "figer":
166 |         corpus_supplements = [
167 |             'data/FIGER/target.embedding.pickle',
168 |             'data/FIGER/wikilinks.embedding.pickle',
169 |             'mapping/figer.mapping',
170 |             'mapping/figer.logic.mapping'
171 |         ]
172 |     passed = True
173 |     for file in file_list + corpus_supplements:
174 |         if not os.path.isfile(file):
175 |             print("[ERROR]: Missing " + file)
176 |             passed = False
177 |     if not passed:
178 |         print("You have one or more file missing. Please refer to README for solution.")
179 |     else:
180 |         print("All required or suggested files are here. Go ahead and run experiments!")
181 | 
182 | 
183 | def compare_runlogs(runlog_file_a, runlog_file_b):
184 |     if not os.path.isfile(runlog_file_a) or not os.path.isfile(runlog_file_b):
185 |         print("Invalid input file names")
186 |     with open (runlog_file_a, "rb") as handle:
187 |         log_a = pickle.load(handle)
188 |     with open (runlog_file_b, "rb") as handle:
189 |         log_b = pickle.load(handle)
190 |     for sentence in log_a:
191 |         for compare_sentence in log_b:
192 |             if sentence.get_sent_str() == compare_sentence.get_sent_str():
193 |                 if sentence.get_mention_surface() == compare_sentence.get_mention_surface():
194 |                     if sentence.predicted_types != compare_sentence.predicted_types:
195 |                         print(sentence.get_sent_str())
196 |                         print(sentence.get_mention_surface())
197 |                         print(sentence.gold_types)
198 |                         print("Log A prediction: " + str(sentence.predicted_types))
199 |                         print("Log B prediction: " + str(compare_sentence.predicted_types))
200 | 
201 | 
202 | def produce_cache():
203 |     elmo_processor = ElmoProcessor(allow_tensorflow=True)
204 |     to_process = []
205 |     to_process_concepts = []
206 |     sorted_pairs = sorted(elmo_processor.sent_example_map.items())
207 |     cur_processing_file_num = ord(sorted_pairs[0][0][0])
208 |     sub_map_index = 0
209 |     max_bytes = 2 ** 31 - 1
210 |     for pair in sorted_pairs:
211 |         concept = pair[0]
212 |         file_num = ord(concept[0])
213 |         if file_num != cur_processing_file_num or len(to_process) > 10000:
214 |             new_start = False
215 |             if file_num != cur_processing_file_num:
216 |                 new_start = True
217 |             out_file_name = "data/cache/batch_" + str(cur_processing_file_num) + "_" + str(sub_map_index) + ".pickle"
218 |             if new_start:
219 |                 out_file_name = "data/cache/batch_" + str(cur_processing_file_num) + ".pickle"
220 |             if not os.path.isfile(out_file_name) and cur_processing_file_num >= 65:
221 |                 print("Prepared to run ELMo on " + chr(cur_processing_file_num))
222 |                 print("This batch contains " + str(len(to_process_concepts)) + " concepts, and " + str(len(to_process)) + " sentences.")
223 |                 elmo_map = elmo_processor.process_batch(to_process)
224 |                 batch_map = {}
225 |                 for processed_concept in to_process_concepts:
226 |                     if processed_concept in elmo_map:
227 |                         batch_map[processed_concept] = elmo_map[processed_concept]
228 |                 bytes_out = pickle.dumps(batch_map, protocol=pickle.HIGHEST_PROTOCOL)
229 |                 with open(out_file_name, "wb") as handle:
230 |                     for idx in range(0, len(bytes_out), max_bytes):
231 |                         handle.write(bytes_out[idx:idx + max_bytes])
232 |                 print("Processed all concepts start with " + chr(cur_processing_file_num))
233 |                 print()
234 |             to_process = []
235 |             to_process_concepts = []
236 |             if new_start:
237 |                 cur_processing_file_num = file_num
238 |                 sub_map_index = 0
239 |             else:
240 |                 sub_map_index += 1
241 |         example_sentences_str = elmo_processor.sent_example_map[concept]
242 |         example_sentences = example_sentences_str.split("|||")
243 |         for i in range(0, min(len(example_sentences), 10)):
244 |             to_process.append(example_sentences[i])
245 |         to_process_concepts.append(concept)
246 | 
247 | 
248 | def progress_bar(value, endvalue, bar_length=20):
249 |     percent = float(value) / endvalue
250 |     arrow = '-' * int(round(percent * bar_length) - 1) + '>'
251 |     spaces = ' ' * (bar_length - len(arrow))
252 |     sys.stdout.write("\rProgress: [{0}] {1}%".format(arrow + spaces, round(percent * 100, 3)))
253 |     sys.stdout.flush()
254 | 
255 | 
256 | def produce_surface_cache(db_name, cache_name):
257 |     pipeline = local_pipeline.LocalPipeline()
258 |     cache = SurfaceCache(db_name, server_mode=False)
259 |     runner = ZoeRunner()
260 |     runner.elmo_processor.load_sqlite_db(cache_name, server_mode=False)
261 |     dataset = DataReader("data/large_text.json", size=-1, unique=True)
262 |     counter = 0
263 |     total = len(dataset.sentences)
264 |     for sentence in dataset.sentences:
265 |         ta = pipeline.doc([sentence.tokens], pretokenized=True)
266 |         for chunk in ta.get_shallow_parse:
267 |             new_sentence = Sentence(sentence.tokens, chunk['start'], chunk['end'])
268 |             runner.process_sentence(new_sentence)
269 |             cache.insert_cache(new_sentence)
270 |         progress_bar(counter, total)
271 | 
272 | 
273 | def produce_magnitude_vec_file(db_name, out_file):
274 |     conn = sqlite3.connect(db_name)
275 |     cursor = conn.cursor()
276 |     cursor.execute("SELECT * FROM data")
277 |     w = open(out_file, "w")
278 |     for row in cursor:
279 |         key = row[0]
280 |         val = row[1]
281 |         val = val[1:-1].replace(",", "")
282 |         w.write(key + " " + val + "\n")
283 | 
284 | 
285 | if __name__ == '__main__':
286 |     if len(sys.argv) < 2:
287 |         print("[ERROR]: No command given.")
288 |         exit(0)
289 |     if sys.argv[1] == "CHECKFILE":
290 |         if len(sys.argv) == 2:
291 |             check_data_file_integrity()
292 |         else:
293 |             check_data_file_integrity(sys.argv[2])
294 |     if sys.argv[1] == "COMPARE":
295 |         if len(sys.argv) < 4:
296 |             print("Need two files for comparison.")
297 |         compare_runlogs(sys.argv[2], sys.argv[3])
298 |     if sys.argv[1] == "CACHE":
299 |         produce_cache()
300 |     if sys.argv[1] == "SURFACECACHE":
301 |         produce_surface_cache("data/surface_cache.db", "/Volumes/Storage/Resources/wikilinks/elmo_cache_correct.db")
302 |     if sys.argv[1] == "PRODUCE_VEC":
303 |         produce_magnitude_vec_file("/Volumes/External/elmo_cache_correct.db", "/Volumes/External/elmo_cache.vec")
304 | 


--------------------------------------------------------------------------------
/server.py:
--------------------------------------------------------------------------------
  1 | import json
  2 | import signal
  3 | import time
  4 | import traceback
  5 | 
  6 | import requests
  7 | from ccg_nlpy import local_pipeline
  8 | from flask import Flask
  9 | from flask import request
 10 | from flask import send_from_directory
 11 | from flask_cors import CORS
 12 | 
 13 | from cache import ServerCache
 14 | from cache import SurfaceCache
 15 | from main import ZoeRunner
 16 | from zoe_utils import InferenceProcessor
 17 | from zoe_utils import Sentence
 18 | 
 19 | 
 20 | class CogCompLoggerClient:
 21 |     def __init__(self, demo_name, base_url="http://127.0.0.1:5000"):
 22 |         self.demo_name = demo_name
 23 |         self.base_url = base_url
 24 |         if self.base_url.endswith("/"):
 25 |             self.url = self.base_url + "log"
 26 |         else:
 27 |             self.url = self.base_url + "/log"
 28 | 
 29 |     def log(self, content=""):
 30 |         params = {
 31 |             'entry_name': self.demo_name,
 32 |             'content': content
 33 |         }
 34 |         result = requests.post(url=self.url, params=params).json()
 35 |         if result['result'] == 'SUCCESS':
 36 |             return True
 37 |         return False
 38 | 
 39 |     def log_dict(self, d=None):
 40 |         if d is None:
 41 |             return self.log()
 42 |         else:
 43 |             return self.log(content=json.dumps(d))
 44 | 
 45 | 
 46 | class Server:
 47 | 
 48 |     """
 49 |     Initialize the server with needed resources
 50 |     @sql_db_path: The path pointing to the ELMo caches sqlite file
 51 |     """
 52 |     def __init__(self, sql_db_path, surface_cache_path):
 53 |         self.app = Flask(__name__)
 54 |         CORS(self.app)
 55 |         self.mem_cache = ServerCache()
 56 |         self.surface_cache = SurfaceCache(surface_cache_path)
 57 |         self.pipeline = local_pipeline.LocalPipeline()
 58 |         self.pipeline_initialize_helper(['.'])
 59 |         self.logger = CogCompLoggerClient("zoe", base_url="http://macniece.seas.upenn.edu:4005")
 60 |         self.runner = ZoeRunner(allow_tensorflow=True)
 61 |         status = self.runner.elmo_processor.load_sqlite_db(sql_db_path, server_mode=True)
 62 |         if not status:
 63 |             print("ELMo cache file is not found. Server mode is prohibited without it.")
 64 |             print("Please contact the author for this cache, or modify this code if you know what you are doing.")
 65 |             exit(1)
 66 |         self.runner.elmo_processor.rank_candidates_vec()
 67 |         signal.signal(signal.SIGINT, self.grace_end)
 68 | 
 69 |     @staticmethod
 70 |     def handle_root(path):
 71 |         return send_from_directory('./frontend', path)
 72 | 
 73 |     @staticmethod
 74 |     def handle_redirection():
 75 |         return Server.handle_root("index.html")
 76 | 
 77 |     def parse_custom_rules(self, rules):
 78 |         type_to_titles = {}
 79 |         freebase_freq_total = {}
 80 |         for rule in rules:
 81 |             title = rule.split("|||")[0]
 82 |             freebase_types = []
 83 |             if title in self.runner.inference_processor.freebase_map:
 84 |                 freebase_types = self.runner.inference_processor.freebase_map[title].split(",")
 85 |             for ft in freebase_types:
 86 |                 if ft in freebase_freq_total:
 87 |                     freebase_freq_total[ft] += 1
 88 |                 else:
 89 |                     freebase_freq_total[ft] = 1
 90 |             custom_type = rule.split("|||")[1]
 91 |             if custom_type in type_to_titles:
 92 |                 type_to_titles[custom_type].append(title)
 93 |             else:
 94 |                 type_to_titles[custom_type] = [title]
 95 |         counter = 0
 96 |         ret = {}
 97 |         for custom_type in type_to_titles:
 98 |             freebase_freq = {}
 99 |             for title in type_to_titles[custom_type]:
100 |                 freebase_types = []
101 |                 if title in self.runner.inference_processor.freebase_map:
102 |                     freebase_types = self.runner.inference_processor.freebase_map[title].split(",")
103 |                     counter += 1
104 |                 for freebase_type in freebase_types:
105 |                     if freebase_type in freebase_freq:
106 |                         freebase_freq[freebase_type] += 1
107 |                     else:
108 |                         freebase_freq[freebase_type] = 1
109 |             for ft in freebase_freq:
110 |                 if float(freebase_freq[ft]) > float(counter) * 0.5 and freebase_freq[ft] == freebase_freq_total[ft]:
111 |                     ft = "/" + ft.replace(".", "/")
112 |                     ret[ft] = custom_type
113 |         return ret
114 | 
115 |     """
116 |     Main request handler
117 |     It requires the request to contain required information like tokens/mentions
118 |     in the format of a json string
119 |     
120 |     @param_override: override API input with a pre-defined dictionary
121 |     """
122 |     def handle_input(self, param_override=None):
123 |         start_time = time.time()
124 |         ret = {}
125 |         r = request.get_json()
126 |         if param_override is not None:
127 |             r = param_override
128 |         if "tokens" not in r or "mention_starts" not in r or "mention_ends" not in r or "index" not in r:
129 |             ret["type"] = [["INVALID_INPUT"]]
130 |             ret["index"] = -1
131 |             ret["mentions"] = []
132 |             ret["candidates"] = [[]]
133 |             return json.dumps(ret)
134 |         sentences = []
135 |         for i in range(0, len(r["mention_starts"])):
136 |             sentence = Sentence(r["tokens"], int(r["mention_starts"][i]), int(r["mention_ends"][i]), "")
137 |             sentences.append(sentence)
138 |         mode = r["mode"]
139 |         predicted_types = []
140 |         predicted_candidates = []
141 |         other_possible_types = []
142 |         selected_candidates = []
143 |         mentions = []
144 |         if mode != "figer":
145 |             if mode != "custom":
146 |                 selected_inference_processor = InferenceProcessor(mode, resource_loader=self.runner.inference_processor)
147 |             else:
148 |                 rules = r["taxonomy"]
149 |                 mappings = self.parse_custom_rules(rules)
150 |                 selected_inference_processor = InferenceProcessor(mode, custom_mapping=mappings)
151 |         else:
152 |             selected_inference_processor = self.runner.inference_processor
153 | 
154 |         for sentence in sentences:
155 |             sentence.set_signature(selected_inference_processor.signature())
156 |             cached = self.mem_cache.query_cache(sentence)
157 |             if cached is not None:
158 |                 sentence = cached
159 |             else:
160 |                 self.runner.process_sentence(sentence, selected_inference_processor)
161 |                 try:
162 |                     self.mem_cache.insert_cache(sentence)
163 |                     self.surface_cache.insert_cache(sentence)
164 |                 except:
165 |                     print("Cache insertion exception. Ignored.")
166 |             predicted_types.append(list(sentence.predicted_types))
167 |             predicted_candidates.append(sentence.elmo_candidate_titles)
168 |             mentions.append(sentence.get_mention_surface_raw())
169 |             selected_candidates.append(sentence.selected_title)
170 |             other_possible_types.append(sentence.could_also_be_types)
171 | 
172 |         elapsed_time = time.time() - start_time
173 |         print("Processed mention " + str([x.get_mention_surface() for x in sentences]) + " in mode " + mode + ". TIME: " + str(elapsed_time) + " seconds.")
174 | 
175 |         # Post logging request to Cogcomp Logger
176 |         self.logger.log_dict(r)
177 | 
178 |         ret["type"] = predicted_types
179 |         ret["candidates"] = predicted_candidates
180 |         ret["mentions"] = mentions
181 |         ret["index"] = r["index"]
182 |         ret["selected_candidates"] = selected_candidates
183 |         ret["other_possible_type"] = other_possible_types
184 |         return json.dumps(ret)
185 | 
186 |     def pipeline_initialize_helper(self, tokens):
187 |         doc = self.pipeline.doc([tokens], pretokenized=True)
188 |         doc.get_shallow_parse
189 |         doc.get_ner_conll
190 |         doc.get_ner_ontonotes
191 |         doc.get_view("MENTION")
192 | 
193 |     def handle_tokenizer_input(self):
194 |         r = request.get_json()
195 |         ret = {"tokens": []}
196 |         if "sentence" not in r:
197 |             return json.dumps(ret)
198 |         doc = self.pipeline.doc(r["sentence"])
199 |         token_view = doc.get_tokens
200 |         for cons in token_view:
201 |             ret["tokens"].append(str(cons))
202 |         return json.dumps(ret)
203 | 
204 |     """
205 |     Handles requests for mention filling
206 |     """
207 |     def handle_mention_input(self):
208 |         r = request.get_json()
209 |         ret = {'mention_spans': []}
210 |         if "tokens" not in r:
211 |             return json.dumps(ret)
212 |         tokens = r["tokens"]
213 |         doc = self.pipeline.doc([tokens], pretokenized=True)
214 |         shallow_parse_view = doc.get_shallow_parse
215 |         ner_conll_view = doc.get_ner_conll
216 |         ner_ontonotes_view = doc.get_ner_ontonotes
217 |         md_view = doc.get_view("MENTION")
218 |         ret_set = set()
219 |         ret_list = []
220 |         additions_views = []
221 |         if ner_ontonotes_view.cons_list is not None:
222 |             additions_views.append(ner_ontonotes_view)
223 |         if md_view.cons_list is not None:
224 |             additions_views.append(md_view)
225 |         if shallow_parse_view.cons_list is not None:
226 |             additions_views.append(shallow_parse_view)
227 |         try:
228 |             if ner_conll_view.cons_list is not None:
229 |                 for ner_conll in ner_conll_view:
230 |                     for i in range(ner_conll['start'], ner_conll['end']):
231 |                         ret_set.add(i)
232 |                     ret_list.append((ner_conll['start'], ner_conll['end']))
233 |             for additions_view in additions_views:
234 |                 for cons in additions_view:
235 |                     add_to_list = True
236 |                     if additions_view.view_name != "MENTION":
237 |                         if additions_view.view_name == "SHALLOW_PARSE" and cons['label'] != "NP":
238 |                             continue
239 |                         start = int(cons['start'])
240 |                         end = int(cons['end'])
241 |                     else:
242 |                         start = int(cons['properties']['EntityHeadStartSpan'])
243 |                         end = int(cons['properties']['EntityHeadEndSpan'])
244 |                     for i in range(max(start - 1, 0), min(len(tokens), end + 1)):
245 |                         if i in ret_set:
246 |                             add_to_list = False
247 |                             break
248 |                     if add_to_list:
249 |                         for i in range(start, end):
250 |                             ret_set.add(i)
251 |                         ret_list.append((start, end))
252 |         except Exception as e:
253 |             traceback.print_exc()
254 |             print(e)
255 |         ret['mention_spans'] = ret_list
256 |         return json.dumps(ret)
257 | 
258 |     """
259 |     Handles surface form cached requests
260 |     This is expected to return sooner than actual processing
261 |     """
262 |     def handle_simple_input(self):
263 |         ret = {}
264 |         r = request.get_json()
265 |         if "tokens" not in r or "mention_starts" not in r or "mention_ends" not in r or "index" not in r:
266 |             ret["type"] = [["INVALID_INPUT"]]
267 |             return json.dumps(ret)
268 |         sentences = []
269 |         for i in range(0, len(r["mention_starts"])):
270 |             sentence = Sentence(r["tokens"], int(r["mention_starts"][i]), int(r["mention_ends"][i]), "")
271 |             sentences.append(sentence)
272 |         types = []
273 |         for sentence in sentences:
274 |             surface = sentence.get_mention_surface()
275 |             cached_types = self.surface_cache.query_cache(surface)
276 |             if cached_types is not None:
277 |                 distinct = set()
278 |                 for t in cached_types:
279 |                     distinct.add("/" + t.split("/")[1])
280 |                 types.append(list(distinct))
281 |             else:
282 |                 types.append([])
283 |         ret["type"] = types
284 |         ret["index"] = r["index"]
285 |         return json.dumps(ret)
286 | 
287 |     def handle_word2vec_input(self):
288 |         ret = {}
289 |         r = request.get_json()
290 |         if "tokens" not in r or "mention_starts" not in r or "mention_ends" not in r or "index" not in r:
291 |             ret["type"] = [["INVALID_INPUT"]]
292 |             return json.dumps(ret)
293 |         sentences = []
294 |         for i in range(0, len(r["mention_starts"])):
295 |             sentence = Sentence(r["tokens"], int(r["mention_starts"][i]), int(r["mention_ends"][i]), "")
296 |             sentences.append(sentence)
297 |         predicted_types = []
298 |         for sentence in sentences:
299 |             self.runner.process_sentence_vec(sentence)
300 |             predicted_types.append(list(sentence.predicted_types))
301 |         ret["type"] = predicted_types
302 |         ret["index"] = r["index"]
303 |         return json.dumps(ret)
304 | 
305 |     def handle_elmo_input(self):
306 |         ret = {}
307 |         results = []
308 |         r = request.get_json()
309 |         if "sentence" not in r:
310 |             ret["vectors"] = []
311 |             return json.dumps(ret)
312 |         elmo_map = self.runner.elmo_processor.process_single_continuous(r["sentence"])
313 |         for token in r["sentence"].split():
314 |             results.append((token, str(elmo_map[token])))
315 |         ret["vectors"] = results
316 |         return json.dumps(ret)
317 | 
318 |     def handle_logger_test(self):
319 |         params = {
320 |             "tokens": ["Iced", "Earth", "\\u2019", "s", "musical", "style", "is", "influenced", "by", "many", "traditional", "heavy", "metal", "groups", "such", "as", "Black", "Sabbath", "."],
321 |             "index": 0,
322 |             "mention_starts": [0],
323 |             "mention_ends": [2],
324 |             "mode": "figer",
325 |             "taxonomy": [],
326 |         }
327 |         self.handle_input(param_override=params)
328 |         return "finished"
329 | 
330 |     """
331 |     Handler to start the Flask app
332 |     @localhost: Whether the server lives only in localhost
333 |     @port: A port number, default to 80 (Web)
334 |     """
335 |     def start(self, localhost=False, port=80):
336 |         self.app.add_url_rule("/", "", self.handle_redirection)
337 |         self.app.add_url_rule("/<path:path>", "<path:path>", self.handle_root)
338 |         self.app.add_url_rule("/annotate", "annotate", self.handle_input, methods=['POST'])
339 |         self.app.add_url_rule("/annotate_token", "annotate_token", self.handle_tokenizer_input, methods=['POST'])
340 |         self.app.add_url_rule("/annotate_mention", "annotate_mention", self.handle_mention_input, methods=['POST'])
341 |         self.app.add_url_rule("/annotate_cache", "annotate_cache", self.handle_simple_input, methods=['POST'])
342 |         self.app.add_url_rule("/annotate_vec", "annotate_vec", self.handle_word2vec_input, methods=['POST'])
343 |         self.app.add_url_rule("/annotate_elmo", "annotate_elmo", self.handle_elmo_input, methods=['POST'])
344 |         # Specifically saved for logger test
345 |         self.app.add_url_rule("/test", "test", self.handle_logger_test, methods=['POST', 'GET'])
346 |         if localhost:
347 |             self.app.run()
348 |         else:
349 |             self.app.run(host='0.0.0.0', port=port)
350 | 
351 |     def grace_end(self, signum, frame):
352 |         print("Gracefully Existing...")
353 |         if self.runner.elmo_processor.db_loaded:
354 |             self.runner.elmo_processor.db_conn.close()
355 |         print("Resource Released. Existing.")
356 |         exit(0)
357 | 
358 | 
359 | if __name__ == '__main__':
360 |     # First argument is a placeholder. Please ask for the actual file.
361 |     server = Server("elmo_cache_correct.db", "./data/surface_cache_new.db")
362 |     server.start(localhost=True)
363 | 
364 | 


--------------------------------------------------------------------------------