├── LICENSE
├── README.md
├── changes in torchtext.txt
├── config
    ├── README.md
    ├── __init__.py
    ├── config.json
    └── config.logger
├── crf
    ├── CRF.py
    └── __init__.py
├── datahelper
    ├── __init__.py
    ├── dataset_reader.py
    ├── embedding_helper.py
    └── preprocessor.py
├── dropout_models
    ├── __init__.py
    ├── dropout.py
    ├── gaussian_dropout.py
    └── variational_dropout.py
├── evaluation
    ├── __init__.py
    ├── evaluator.py
    ├── interactive_evaluator.py
    ├── multiple_model_evaluator.py
    ├── single_model_evaluator.py
    └── single_model_ner_evaluator.py
├── main.py
├── models
    ├── CNN.py
    ├── GRU.py
    ├── LSTM.py
    ├── Transformer.py
    ├── Transformer_OpenAI.py
    ├── Util_CNN.py
    └── __init__.py
├── optimizer
    └── custom_optimizer.py
├── requirements.txt
├── scorer
    ├── __init__.py
    ├── accuracy_scorer.py
    └── ner_scorer.py
├── training
    ├── __init__.py
    ├── multiple_model_trainer.py
    ├── single_model_ner_trainer.py
    ├── single_model_trainer.py
    └── trainer.py
└── utils
    ├── __init__.py
    └── utils.py


/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2019 Hikmet Bahadir Sahin
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # README 
  2 | 
  3 | ## Update 29-05-2019
  4 |  
  5 | - Yay another push, another update!
  6 | - As I said in my previous update yesterday, I continue to develop a specific OpenAI version of the Transformer.
  7 |   - In today's update, I added the LanguageModelHead definition w.r.t. original Tensorflow implementation + huggingface's PyTorch implementation.
  8 |   - In addition, I added "Block" layer which contains (1) an attention operation followed by (2) a layer normalization followed by (3) an MLP and finally followed by (4) a layer normalization.
  9 |   - In overall, the implementation is again similar to the reference codes; however, I added an extention to the "Block" and "Attention" part. 
 10 |     - OpenAI version uses Conv1D for its attention calculations while the "All you need is attention" version uses Linear layer (Nope Conv1D does not equal to a Linear layer: [Source](https://stackoverflow.com/questions/55576314/conv1d-with-kernel-size-1-interpretation))
 11 |     - Since I want to use built-in nn.MultiHeadAttention function, just for fun, I put a conditional into the "Block" definition. If you trigger it, you will use the attention with linear. Otherwise, your model will call OpenAI version.
 12 | - Lastly, the new model is still under construction. I am highly confident that it has many bugs even now =)
 13 | 
 14 |  
 15 | # Table Of Contents
 16 | 
 17 | - [Introduction](#introduction)
 18 | - [Library Dependencies](#library-dependencies)
 19 | - [Project Skeleton](#project-skeleton)
 20 | - [Project Details](#project-details)
 21 | - [To-do](#to-do)
 22 | - [How-to-run](#how-to-run)
 23 |   - [Important Note Before Start](#important-note-before-start)
 24 |   - [Configuration JSON Format](#configuration-json-format)
 25 |   - [How to Run Main](#how-to-run-main)
 26 |   - [Training from Scratch-Training from Checkpoint-Interactive Evaluation](#training-from-scratch-training-from-checkpoint-interactive-evaluation)
 27 | - [Results](#results)
 28 |   - [Test Results for TextCNN](#test-results-for-textcnn)
 29 | - [Previous Updates](#previous-updates)
 30 |   - [January 2019 - Wiki Link](https://github.com/hbahadirsahin/nlp-experiments-in-pytorch/wiki/Previous-Updates-(January-2019))
 31 |   - [February 2019 - Wiki Link](https://github.com/hbahadirsahin/nlp-experiments-in-pytorch/wiki/Previous-Updates-(February-2019))
 32 | - [References for Code Development](#references-for-code-development)
 33 | 
 34 | ## Introduction
 35 | 
 36 | This is my personal, pet project which I apply machine learning and natural language processing stuffs by using PyTorch. I stopped working with Tensorflow after some hellish times that I could not do some basic extentions (such fasttext based oov embeddings, details are below). Also, Tensorflow's updates and functionality deprecation rate is annoying for me. 
 37 | 
 38 | In this repository, I implement popular learning models and extend them with different minor adjustments (like variational dropouts). Even though it is really slow, I execute experiments by using these models on a dataset which me and my old colleagues in Huawei constructed (details are below, again) and try to announce experiment results.
 39 | 
 40 | ## Library Dependencies
 41 | 
 42 | Before diving into details, the python and library versions are as follows: 
 43 | 
 44 | - python 3.6 (works well with 3.7, too)
 45 | - torch 1.0.1
 46 | - torchtext 0.3.1
 47 | - numpy 1.15.4 (due to PyTorch 1.0)
 48 | - setuptools 40.8.0 (Hell no idea why pipreqs put this into requirements.txt)
 49 | - spacy 2.0.16 (for interactive evaluation only)
 50 | - gensim 3.6.0 (for fasttext embeddings, as well as OOV Embedding generation.)
 51 | - adabound 0.0.5
 52 | 
 53 | ## Project Skeleton
 54 | 
 55 | I try to keep every part of the project clean and easy to follow. Even though the folders are self explanatory for me, let me explain them for those who may have hard time to understand.
 56 | 
 57 | - `./crf/CRF.py` contains the conditional random field implementation (not finished yet). 
 58 | - `./datahelper/dataset_reader.py` contains the "DatasetLoader" object that reads a text dataset, splits it into 3 subsets (train/vali/test), creates vocabulary and iterators. It is a little bit hard-coded for the dataset I am using now. However, it is easy to make changes to use it for your own dataset.
 59 | - `./datahelper/embedding_helper.py` is a helper class to generate OOV word embeddings. To use Fasttext-based OOV embedding generation, it leverages Gensim!
 60 | - `./datahelper/preprocessor.py` contains the "Preprocessor" object and actions to apply on sentences. 
 61 | - `./dropout_models/gaussian_dropout.py` contains the Gaussian Dropout object. 
 62 | - `./dropout_models/variational_dropout.py` contains the Variational Dropout object. 
 63 | - `./dropout_models/dropout.py` contains the Dropout object which you can select your dropout type among Bernoulli (basic), Gaussian and Variational dropout types. 
 64 | - `./evaluation/evaluator.py` is the factory for evaluation objects that are used in model trainings as well as interactive evaluation.
 65 | - `./evaluation/xyz_evaluator.py` methods are the evaluator functions for specified models.
 66 | - `./model/xyz.py` contains network objects.
 67 | - `./model/Util_xyz.py` contains custom-defined objects that are used in `xyz`.
 68 | - `./optimizer/custom_optimizer.py` contains custom-defined optimizer objects.
 69 | - `./scorer/accuracy_scorer.py` contains classification accuracy metric calculations.
 70 | - `./scorer/ner_scorer.py` contains NER-task related metric calculations.
 71 | - `./training/trainer.py` is a class that returns the necessary trainer for the user's selected learning model
 72 | - `./training/xyz_trainer.py` methods are the trainer functions for specified models.
 73 | - `./utils/utils.py` contains both utility and common methods that are being used in several places in the project.
 74 | - `./main.py` is the main code. To execute this project, one needs to provide a valid `config.json` file which contains the necessary configuration properties.
 75 | - `./config/config.json` is the configuration file.  
 76 | 
 77 | ## Project Details
 78 | 
 79 | - As the other Tensorflow-based repository, I will use the dataset that me and my old colleagues constructed 3 years ago. "English/Turkish Wikipedia Named-Entity Recognition and Text Categorization Dataset" is publicly available: https://data.mendeley.com/datasets/cdcztymf4k/1
 80 | - Text CNN, CharCNN, VDCNN, Conv-Deconv CNN, basic LSTM/GRU and Transformer (Google version) models are currently available to train and evaluate in the repository. More models will be added. 
 81 | - Fasttext embeddings are used (by default but it can be changed). Eventually, one can use Torchtext to download the "pre-defined" embedding files. However, since Turkish embeddings were not included in, I manually edit the Torchtext backend codes (please check the "changes in the torchtext.txt" file). Also note that, everytime you update Torchtext, you need to re-add those changes again.
 82 | - Embeddings (whether random or pretrained) can be "static", "nonstatic", or "multichannel".
 83 | - For OOV words, OOVEmbeddingCreator is developed (under datahelper/embedding_helper). There are 5 different basic approaches defined to generate OOV embeddings: (1) zeros vector, (2) ones vector, (3) random vector (between 0, 1), (4) (r1, r2) ranged uniformly random vector, (5) Fasttext CharNgram-based vectors.
 84 | - Even though I am focusing on Turkish versions of the dataset, I believe "Preprocessor" can work for English dataset, too. In future, I may add more language specific methods. 
 85 | - Main code loads properties from config.json (inside config folder). 
 86 | - I tested all training, evaluation, model/vocabulary saving/loading aspects of the code for several epochs without any problem (except out of memory errors =)).
 87 | 
 88 | ## To-do 
 89 | 
 90 | - [x] ~~Better configuration/property reading, handling, instead of hard-coded dictionaries~~ (Update: 11-Jan-2019)
 91 | - [x] ~~Variational Dropout. Update: Variational and Gaussian dropout methods are added. Reference: [Variational Dropout and
 92 | the Local Reparameterization Trick](https://arxiv.org/pdf/1506.02557.pdf)~~
 93 | - [x] ~~Extend main flow and learning models with respect to new dropout models.~~ 
 94 | - [x] ~~Add character-level data preprocessing.~~
 95 | - [x] ~~Add character-level data loading.~~
 96 | - [ ] Run the current piece of code for the aforementioned datasets and define a text categorization baseline (for both Turkish and English). 
 97 | - [ ] Variational Dropout related extensions (current version is from 2015 paper but obviously more recent versions are out there for me to implement =)) + bayes by backprop for CNN (a.k.a. Bayesian CNN)
 98 | - [ ] Attention.
 99 | - [ ] Different learning algorithms (DeepCNN, LSTM, GRU, any-kind-of-hybrid versions of those algorithms, transformers).
100 |   - [x] TextCNN
101 |   - [x] GRU 
102 |   - [x] LSTM
103 |   - [x] ~~Multilayer CNN~~ (I removed this model and decided to continue with CharCNN and VDCNN instead).
104 |   - [x] CharCNN
105 |   - [x] VDCNN (Very Deep CNN)
106 |   - [x] Transformer (*Attention is All You Need* version) (**Modified for Text Classification/NER!**) 
107 |   - [ ] Transformer (*Improving Language Understanding by Generative Pre-Training* version)
108 |   - [ ] Transformer-XL (*Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context* version)
109 |   - [x] Conv-Deconv CNN
110 |   - [ ] Encoder-Decoder GRU
111 |   - [ ] Encoder-Decoder LSTM
112 |   - [ ] Hybrid stuff (Like CNN+LSTM/GRU)
113 | - [x] ~~CRF layer to be able to do NER experiments~~.
114 |   - [ ] Add new models that will use CRF as their last layer (such as LSTMCRF, GRUCRF, CNNCRF, etc.)
115 |   - [ ] Develop NER-related performance metrics and update training/evaluation flows to use these metrics.
116 | - [ ] For Turkish, I plan to add morphological disambiguation (https://github.com/erayyildiz/Neural-Morphological-Disambiguation-for-Turkish). 
117 | - [ ] Different language models.
118 |   - [ ] ELMO (pretrained Turkish/English embeddings)
119 |   - [ ] BERT (pretrained Turkish/English embeddings)
120 | - [ ] Document length categorization/NER support (Conv-Deconv CNN implementation supports document-length tasks, but more support will come with ELMO and BERT update).
121 | 
122 | ## How-to-run
123 | 
124 | ### Important Note Before Start
125 | 
126 | I had to make some changes in the torchtext backend codes to be able to do several stuffs:
127 | 
128 | - I don't know why, torchtext does not split a dataset into 3 subsets (train/val/test) even if there is a function for it. I changed it to fix that issue. Hopefully, one day torchtext will fix it offically =)
129 | - To be able to work with Turkish Fasttext embeddings, I added its respective alias.
130 | - To be able to apply Fasttext's CharNGram to OOV words to generate OOV embeddings, a minor change has been made to Vector object.
131 | - To be able to read any dataset without any problem, a minor change has been made to torchtext's utils.py.
132 | 
133 | ### Configuration JSON Format
134 | 
135 | To be able to run the main code, you need to provide a valid JSON file which contains 4 main properties. These are `dataset_properties`, `model_properties`, `training_properties`, and `evaluation_properties`:
136 | 
137 | - `dataset_properties` contains dataset-related information such as path, embedding, batch information.
138 | - `model_properties` contains model-related parameters. Inside this property,
139 |   - `common_model_properties` contains common properties for all models like embeddings, vocabulary size, etc.
140 |   - `model_name` (like text_cnn, char_cnn, etc.) contains model-specific properties.
141 | - `training_properties` contains training-related properties.
142 | - `evaluation_properties` contains evaluation-related properties.
143 | 
144 | Details of the `config.json` can be found in "/config/README.md" folder.
145 | 
146 | ### How to Run Main
147 | 
148 | If you make the necessary changes described in "changes in torchtext.txt" and prepare "config.json", you have two ways to run the code.
149 | 
150 | - If you are using an IDE, copy/paste your "config.json" file's path as an argument and press run button.
151 | - If you are an old-school command window lover, type `python main.py --config /path/to/config.json`.
152 | 
153 | ### Training from Scratch-Training from Checkpoint-Interactive Evaluation
154 | 
155 | You can train your model from 0th epoch until max_epoch, and/or continue your training from xth epoch to the end. You do not need to do anything extra for the first case; however, to be able to continue your training you need to make necessary changes in "config.json":
156 | 
157 | - If `dataset_properties/checkpoint_path` is empty, the code will start a new training process. If you type your saved PyTorch model, the main flow will automatically load it and continue from where it left.
158 |   - Additionally, you can provide saved vocabulary files for words (`dataset_properties/saved_sentence_vocab` (don't ask why it is sentence)) and labels (`dataset_properties/saved_category_vocab`).
159 |   
160 | To be able to activate interactive evaluation, you need to make necessary changes in "config.json":
161 | 
162 | - Change `model_properties/common_model_properties/run_mode`'s value to "eval_interactive".
163 | - Provide your model's path to be evaluated and your saved vocabulary files' path by using `evaluation_properties`.
164 | 
165 | ## Results
166 | 
167 | This section presents the Top-1 and Top-5 test accuracies for **text categorization task** of my experiments. Due to computational resource limit, I cannot test every single parameter/hyperparameter. In general, I hold algorithm parameters same for all experiments; however, I change embedding related parameters. I assume the result table is self-explanatory. As a final note, I *won't* share my best models and I *won't* guarantee reproducibility. Dataset splits (training/validation/test) are deterministic for all experiments, but anything else that needs random initialization is non-deterministic. 
168 | 
169 | Note: Epoch is set to 20 for all experiments, until further notice (last update: 31-10-2018). However, if I believe that results may improve, I let the experiment run for 10 more epochs (at most 30 epoch per experiments). 
170 | 
171 | Note 2 (**Update: 22-01-2019**): Most of the English-language experiments are executed in Google Cloud (by using 300$ initial credit). Since, I want to finish as many experiments as possible, I cannot increase the max_epoch from 20 to 30. In this experiments, I saw that validation loss and accuracies were improving in every epoch until the 20th, and I am pretty sure models can improve further. Unfortunately, I chose the maximum number of experiment runs instead of best results for each experiment in this trade-off. 
172 | 
173 | ### Test Results for TextCNN
174 | 
175 | |#| Language | # Of Categories | Pre-trained Embedding | OOV Embedding | Embedding Training | Top-1 Test Accuracy | Top-5 Test Accuracy |   
176 | |-|:--------:|:-----------------------------:|-----------------------|---------------|--------------------|:-------------------:|:-------------------:|
177 | |1|Turkish|25| Fasttext | zeros | static	| 49.4565 | 76.2760 |
178 | |2|Turkish|25| Fasttext | zeros | nonstatic	| 62.6054 | 86.3384 |
179 | |3|Turkish|25| Fasttext | Fasttext | static	|  49.6810  | 75.2684 |
180 | |4|Turkish|25| Fasttext | Fasttext | nonstatic	| 63.9391  | 87.9597 |
181 | |5|Turkish|49| Fasttext | zeros | static	| 43.5519  | 68.4336 |
182 | |6|Turkish|49| Fasttext | zeros | nonstatic	| 56.0081  | 79.8634 |
183 | |7|Turkish|49| Fasttext | Fasttext | static	| 43.8025  | 68.8641 |
184 | |8|Turkish|49| Fasttext | Fasttext | nonstatic	| 60.4009  | 82.7879 |
185 | |9|English|25| Fasttext | zeros | static	| 56.2290 | 83.2425 |
186 | |10|English|25| Fasttext | zeros | nonstatic	| 64.2642 | 89.2115 |
187 | |11|English|25| Fasttext | Fasttext | static	| 56.5313 | 83.9873 |
188 | |12|English|25| Fasttext | Fasttext | nonstatic	| 65.9558 | 91.1536 |
189 | |13|English|49| Fasttext | zeros | static	| 51.3862 | 78.7806 |
190 | |14|English|49| Fasttext | zeros | nonstatic	| 59.2086*  | 84.8054 |
191 | |15|English|49| Fasttext | Fasttext | static	| 51.7878 | 79.9472 |
192 | |16|English|49| Fasttext | Fasttext | nonstatic	| 55.3833*  | 80.4958 |
193 | 
194 | * Note that the experiment 14 resulted with a better score than 16, unlike other similar setups. The main reason is, I changed the "learning_rate" of the optimizer to a smaller value for the experiment 16 (well, for the sake of the experiment =)), and it appears that smaller learning rate made the learning process a bit slower (in terms of number of epochs). If I can find a chance to run this experiment again in Google Cloud (a.k.a. have enough credit to run it one more time), I will update the learning rate properly. 
195 | 
196 | ## Previous Updates
197 | 
198 | In this title, I will save the previous updates for me and the visitors to keep track.
199 | 
200 | ## May 2019
201 | 
202 | ### Update 28-05-2019
203 |  
204 | - I've started to work on *Improving Language Understanding by Generative Pre-Training* version of Transformer architecture.
205 |   - The project will contain only languange model and classifier heads, and I will also add NER head. Rest of the original heads in the paper can be found in huggingface's github =)
206 |   - And yes, I know huggingface has a PyTorch implementation which I wrote to my [References](#references-for-code-development) ages ago. Obviously, I won't c/p the already implemented code since it does not fit to my architectural flow as well as its PyTorch version is pretty old.
207 |   - The end product would be provide same results (if you be a good human being and pray to the randomization gods), however, the implementation will differ due to my architectural concerns and capabilities of the latest PyTorch.
208 | - I've also updated my library versions (using PyTorch 1.1 now) but I forgot updating requirement.txt (It will be updated in next commit).
209 |   - As a side note, in the last version of PyTorch, MultiHeadedAttention is developed under "torch.nn". For testing and comparison purposes, I tried to use it; however, I could not make it work in my TransformerGoogle code (you can find the commented-out line in model file). Return value of this new method does not fit the rest of the methods in the model (it is solvable but I won't waste time for it).
210 | 
211 | ## April 2019
212 | 
213 | ### Update 30-04-2019
214 |  
215 | - I know I said that I will push more updates 2 months ago, but failed to deliver it since I am a filthy lazy dude =) 
216 | - Hopefully, I will push some new stuff in 2-3 weeks. Next week, I have a US conference trip and I will buy myself a brand new monster laptop for both gaming and faster training times. After that I will 
217 |   - Develop the last 2 transformer papers.
218 |   - Develop BERT and ELMO embeddings to the flow (not the algorithms but loading/inferring the pretrained model stuff). Maybe I can include GPT-2 models from the latest OpenAI paper but I am not sure about it for now (since that model is being used mainly for text generation). 
219 | - Semi-relevant update: After I got several e-mails from researchers around  the world (even from European Commission =)), I decided to improve my [dataset](https://data.mendeley.com/datasets/cdcztymf4k/1). I came up with a plan which I will execute after I push the updates I mentioned above. So, if you have any ideas and/or requests for the new dataset, just let me know.
220 | 
221 | ## March 2019
222 | 
223 | ### Update 02-03-2019
224 |  
225 | - Recently, I read "Adaptive Gradient Methods with Dynamic Bound of Learning Rate" paper ([paper](https://openreview.net/forum?id=Bkg3g2R9FX) - [github](https://github.com/Luolc/AdaBound)) and decided to add it into my project. The presented results are promising, but I have not tested this new optimizer in my own experiments (for now).
226 |   - Since Adabound can be installed via `pip install`, I updated the requirement.txt. 
227 |   - `config.json` is also updated, two new parameters are added related to Adabound.
228 | - Finally, personal issues are finalizing (new job, better state of mind =)). 
229 |   - I plan to add more models for text categorization starting with new Transformer codes (check To-Do list for details).  
230 |   - After two new Transformer code, I decide to add Elmo and BERT support (I won't train my own models but will use pretrained Turkish-English models).
231 | 
232 | 
233 | ## References for Code Development
234 | 
235 | Below repositories really helped me to write a decent and working code:
236 | - https://github.com/bamtercelboo/cnn-lstm-bilstm-deepcnn-clstm-in-pytorch
237 | - https://github.com/bentrevett/pytorch-sentiment-analysis
238 | - https://github.com/j-min/Dropouts/blob/master/Gaussian_Variational_Dropout.ipynb
239 | - https://github.com/felix-laumann/Bayesian_CNN/
240 | - https://github.com/kefirski/variational_dropout/
241 | - https://github.com/dreamgonfly/deep-text-classification-pytorch/
242 | - https://pytorch.org/tutorials/beginner/nlp/advanced_tutorial.html
243 | - https://github.com/allenai/allennlp/blob/master/allennlp/modules/conditional_random_field.py
244 | - https://github.com/threelittlemonkeys/lstm-crf-pytorch/
245 | - https://github.com/ymym3412/textcnn-conv-deconv-pytorch/blob/master/model.py
246 | - http://nlp.seas.harvard.edu/2018/04/03/attention.html
247 | - https://github.com/huggingface/pytorch-openai-transformer-lm
248 | 


--------------------------------------------------------------------------------
/changes in torchtext.txt:
--------------------------------------------------------------------------------
 1 | #####################################################################################
 2 | ################################# dataset.py ########################################
 3 | #####################################################################################
 4 | def check_split_ratio(split_ratio):
 5 |     """Check that the split ratio argument is not malformed"""
 6 |     valid_ratio = 0.
 7 |     if isinstance(split_ratio, float):
 8 |         # Only the train set relative ratio is provided
 9 |         # Assert in bounds, validation size is zero
10 |         assert 0. < split_ratio < 1., (
11 |             "Split ratio {} not between 0 and 1".format(split_ratio))
12 | 
13 |         test_ratio = (1. - split_ratio) / 2
14 |         valid_ratio = (1. - split_ratio) / 2
15 |         return (split_ratio, test_ratio, valid_ratio)
16 |     elif isinstance(split_ratio, list):
17 |         # A list of relative ratios is provided
18 |         length = len(split_ratio)
19 |         assert length == 2 or length == 3, (
20 |             "Length of split ratio list should be 2 or 3, got {}".format(split_ratio))
21 | 
22 |         # Normalize if necessary
23 |         ratio_sum = sum(split_ratio)
24 |         if not ratio_sum == 1.:
25 |             split_ratio = [float(ratio) / ratio_sum for ratio in split_ratio]
26 | 
27 |         if length == 2:
28 |             return tuple(split_ratio + [valid_ratio])
29 |         return tuple(split_ratio)
30 |     else:
31 |         raise ValueError('Split ratio must be float or a list, got {}'
32 |                          .format(type(split_ratio)))
33 | 
34 | #####################################################################################
35 | ################################# vocab.py ##########################################
36 | #####################################################################################
37 | class Vectors(object):				
38 | 	....		
39 |     def __getitem__(self, token):
40 |         if token in self.stoi:
41 |             return self.vectors[self.stoi[token]]
42 |         else:
43 |             return self.unk_init(torch.Tensor(self.dim), token)
44 | 	.....		
45 | 	
46 | #####################################################################################
47 | ################################# vocab.py ##########################################
48 | #####################################################################################
49 | pretrained_aliases = {
50 |     "charngram.100d": partial(CharNGram),
51 |     "fasttext.en.300d": partial(FastText, language="en"),
52 |     "fasttext.tr.300d": partial(FastText, language="tr"), # Added Turkish Fasttext
53 |     "fasttext.simple.300d": partial(FastText, language="simple"),
54 |     "glove.42B.300d": partial(GloVe, name="42B", dim="300"),
55 |     "glove.840B.300d": partial(GloVe, name="840B", dim="300"),
56 |     "glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim="25"),
57 |     "glove.twitter.27B.50d": partial(GloVe, name="twitter.27B", dim="50"),
58 |     "glove.twitter.27B.100d": partial(GloVe, name="twitter.27B", dim="100"),
59 |     "glove.twitter.27B.200d": partial(GloVe, name="twitter.27B", dim="200"),
60 |     "glove.6B.50d": partial(GloVe, name="6B", dim="50"),
61 |     "glove.6B.100d": partial(GloVe, name="6B", dim="100"),
62 |     "glove.6B.200d": partial(GloVe, name="6B", dim="200"),
63 |     "glove.6B.300d": partial(GloVe, name="6B", dim="300")
64 | }
65 | 
66 | #####################################################################################
67 | ################################# utils.py ##########################################
68 | #####################################################################################
69 | def unicode_csv_reader(unicode_csv_data, **kwargs):
70 |     """Since the standard csv library does not handle unicode in Python 2, we need a wrapper.
71 |     Borrowed and slightly modified from the Python docs:
72 |     https://docs.python.org/2/library/csv.html#csv-examples"""
73 |     if six.PY2:
74 |         # csv.py doesn't do Unicode; encode temporarily as UTF-8:
75 |         csv_reader = csv.reader(utf_8_encoder(unicode_csv_data), **kwargs)
76 |         for row in csv_reader:
77 |             # decode UTF-8 back to Unicode, cell by cell:
78 |             yield [cell.decode('utf-8') for cell in row]
79 |     else:
80 |         for line in csv.reader(unicode_csv_data, quotechar=None, **kwargs):
81 |             yield line


--------------------------------------------------------------------------------
/config/README.md:
--------------------------------------------------------------------------------
  1 | # README
  2 | 
  3 | This folder only holds a configuration JSON file. This file is used to define all properties and parameters that are
  4 | needed to be execute this project.
  5 | 
  6 | ## Configuration Details
  7 | 
  8 | `config.json` has 4 main property dictionaries:
  9 | 
 10 | - `dataset_properties` defines the dataset reading/loading/preparing related properties.
 11 | - `training_properties` defines all training-related properties and parameters algorithm/optimizer type, learning rate,
 12 | decay rate, and so on.
 13 | - `evaluation_properties` defines the necessary model/vocabulary paths for evaluation purposes.
 14 | - `model_properties` defines anything that is needed to create a model.
 15 | 
 16 | ### Dataset Properties
 17 | 
 18 | There are 10 properties under this `dataset_properties`:
 19 | 
 20 | - `data_path`: The original dataset file path (The current version of the code splits a full dataset to train/validation/test sets. But does not allow you to use an already splitted version).
 21 | - `stop_word_path`: The stop word file path.
 22 | - `embedding_vector`: Embedding alias that torchtext needs/uses while building vocabulary (predefined aliases can be found torchtext's vocab.py file). For instance "fasttext.en.300d", "fasttext.tr.300d", etc.
 23 | - `vector_cache`: The embedding file path that torchtext creates. To prevent it to download same file over and over again, you need to provide its path.
 24 | - `pretrained_embedding_path`: This is the original, Gensim readable, embedding files' path (note that only use case for this is Fasttext-based OOV word generation).
 25 | - `checkpoint_path`: The path for saved model file that you want to continue your training.
 26 | - `oov_embedding_type`: The property to define how to handle out-of-vocabulary word embeddings. It takes a string and can be "zeros", "ones", "random", "uniform" or "fasttext_oov".
 27 | - `batch_size`: Self-explanatory.
 28 | - `fixed_length`: Integer parameter to define the maximum length of an input string (lets say for a sentence, it limits the number of words). For values lower than or equal to 0, the dataset loader uses dynamic input size. It is for reducing the memory consumption.
 29 | - `min_freq`: Integer parameter to define the minimum frequency condition on words to be appear in vocabulary. For values lower than or equal to 0, the dataset loader will assign every seen word (min_freq=1) in vocabulary. It is for reducing the memory consumption.
 30 | 
 31 | ### Training Properties
 32 | 
 33 | There are 14 properties under this `training_properties` which determines the learning algorithm, optimizer, optimizer's
 34 | parameters and printing/saving/evaluating related stuff:
 35 | 
 36 | - `learner`: String parameter to choose which learning algorithm to use. It can be "text_cnn", "gru", "lstm", "char_cnn",
 37 | "vdcnn", "conv_deconv_cnn", "transformer_google" and "lstmcrf" (Last Update: 31-01-2019) 
 38 | - `task`: String parameter to choose which task the learner will be trained on. It can be "classification" or "ner".
 39 | - `optimizer`: String parameter to choose which optimizer to use. It can be "Adam", "SGD", "OpenAIAdam", "Noam", "Padam" and "adabound".
 40 | - `learning_rate`: Self-explanatory. Takes float value.
 41 | - `final_learning_rate`: This float-valued parameter is used in Adabound optimizer which is the learning rate for SGD inside Adabound method. 
 42 | - `scheduler_type`: String parameter to choose a scheduler for OpenAIAdam optimizer (it has no usage for others). It can
 43 | be "cos", "constant" or "linear".
 44 | - `amsgrad`: Boolean parameter to choose whether to use amsgrad or not in Padam optimizer.
 45 | - `amsbound`: Boolean parameter to choose whether to use amsbound or not in Adabound optimizer.
 46 | - `partial_adam`: Float parameter to define the "partial" parameter in Padam. It can take values between (0, 0.5]
 47 | - `weight_decay`: Float parameter for L2 normalization term. *Note that for my test cases, any value bigger than 0,
 48 | literally fucked my performance.*
 49 | - `momentum`: Self-explanatory (it is only for "SGD"). Takes float value.
 50 | - `norm_ratio`: Gradient clipping ratio. Takes float value.
 51 | - `topk`: Tuple value for top-k accuracy calculations (Default: (1, 5)). It is tuple because I c/p related code from Pytorch's
 52 | imagenet example without modifying it. It does not have any effects on training, it is for logging/monitoring purposes.
 53 | - `print_every_batch_step`: Print loss and accuracy at every x step.
 54 | - `save_every_epoch`: Save the model at every epoch.
 55 | - `eval_every`: Run the trained model for validation set at every epoch.
 56 | 
 57 | ### Evaluation Properties
 58 | 
 59 | There are 3 properties under this `evaluation_properties`:
 60 | 
 61 | - `model_path`: The path for the model file that you want to evaluate.
 62 | - `sentence_vocab`: Saved vocabulary (for words) file path.
 63 | - `category_vocab`: Saved vocabulary (for labels) file path.
 64 | 
 65 | ### Model Properties
 66 | 
 67 | This is the biggest and longest set of properties. Obviously, it tends to get bigger as long as I add new models. The
 68 | root `model_properties` contains several inner dictionaries. The first inner dictionary is `common_model_properties` which
 69 | defines the common things that are not change w.r.t. selected learning algorithm. The rest of the inner dictionaries are
 70 | the learning algorithms that are developed in this repository.
 71 | 
 72 | - `common_model_properties`:
 73 |     - `run_mode`: String parameter to define the main executing job. It can be either "train" or "eval_interactive".
 74 |     - `use_pretrained_embed`: Boolean parameter to define whether the learning algorithm uses pretrained embeddings or not.
 75 |     - `embed_train_type`: String parameter to define whether the embedding layer is trainable or not. It can be "static",
 76 |     "nonstatic" or "multichannel". Except Text CNN model, "multichannel" embeddings are not used!
 77 |     - `use_batch_norm`: Boolean parameter to determine the batch normalization usage.
 78 |     - `batch_norm_momentum`: Float parameter to define batch normalization momentum parameter.
 79 |     - `batch_norm_affine`: Boolean parameter to define whether batch normalization uses affine or not.
 80 | - `text_cnn`:
 81 |     - `use_padded_conv`: Boolean parameter to define whether convolution layer pads the input or not.
 82 |     - `dropout_type`: String parameter to choose which dropout method to use. It can be "bernoulli", "gaussian" or
 83 |     "variational"
 84 |     - `keep_prob`: Float parameter to define the dropout's keeping probability.
 85 |     - `filter_count`: Integer parameter to define the filter count in the convolutional layer.
 86 |     - `filter_sizes`: List of integers parameter to define the filter sizes in the convolutional layer. Default value is
 87 |      [3, 4, 5]. Size of the list is not constant/limited/pre-determined!
 88 | - `char_cnn`:
 89 |     - `dropout_type`: String parameter to choose which dropout method to use. It can be "bernoulli", "gaussian" or
 90 |     "variational"
 91 |     - `keep_prob`: Float parameter to define the dropout's keeping probability.
 92 |     - `max_sequence_length`: Integer parameter to define the maximum sequence length (in terms of characters). Default value
 93 |     is 1014 (as it is defined in CharCNN article).
 94 |     - `feature_size`: String parameter to define network size. It can be "large" (conv_filter_count=1024, linear_unit_count=2048),
 95 |     "small" (conv_filter_count=256, linear_unit_count=1024) or ""(empty string)
 96 |     - `filter_count`: Integer parameter to define filter count in the convolutional layer. If feature_size is empty, all
 97 |     convolution layers will use this parameter as the filter_count value.
 98 |     - `filter_sizes`: : List of integers parameter to define the filter sizes in the convolutional layers. Default value
 99 |     is [7, 7, 3, 3, 3, 3]. Size of the list is limited to 6.
100 |     - `max_pool_kernels`: List of integers parameter to define the kernel sizes in the maxpooling layers. Default value
101 |     is [3, 3, 3]. Size of the list is limited to 3.
102 |     - `linear_unit_count`: Integer parameter to define the number of hidden units in the fully connected layer. If feature_size
103 |     is empty, fully connected layer will use this parameter as the linear_unit_count value.
104 | - `vdcnn`:
105 |     - `keep_prob`: Float parameter to define the dropout's keeping probability. This model only use "bernoulli" dropout.
106 |     - `depth`: Integer parameter to define the depth of the network. It can be 9, 17, 29, or 49.
107 |     - `filter_counts`: List of integers parameter to define the filter counts in the convolutional layers. Default value
108 |     is [64, 128, 256, 512]. Size of the list is limited to 4.
109 |     - `filter_size`: Integer parameter to define the filter size for convolutional layers. All layers use the same size.
110 |     - `use_shortcut`: Boolean parameter to determine shortcut usage in VDCNN model.
111 |     - `downsampling_type`: String parameter to define downsampling method. It can be "resnet", "vgg" or kmax".
112 |     - `maxpool_filter_size`: Integer parameter that defines kernel size for all maxpooling operations.
113 |     - `kmax`: An integer parameter that defines "k" value for KMaxPooling operation.
114 | - `conv_deconv_cnn`:
115 |     - `keep_prob`: Float parameter to define the dropout's keeping probability. This model only use "bernoulli" dropout.
116 |     - `filter_counts`: List of integers parameter to define the filter counts in the encoder convolutional layers. For
117 |     deconvonvolution part, this parameter is reversed. Default value is [300, 600, 500]. Size of the list is limited to 3.
118 |     - `filter_sizes`: List of integers parameter to define the filter sizes in the encoder convolutional layers. For
119 |     deconvonvolution part, this parameter is reversed. Default value is [5, 5, 12]. Size of the list is limited to 3.
120 |     - `strides`: List of integers parameter to define the strides in the encoder convolutional layers. For
121 |     deconvonvolution part, this parameter is reversed. Default value is [2, 2, 1]. Size of the list is limited to 3.
122 |     - `temperature`: Float parameter to define temperature parameter of the Deconvolution stage.
123 |     - `hidden_layer_size`: Integer parameter to define the number of hidden units in the Classifier stage .
124 | - `gru`:
125 |     - `dropout_type`: String parameter to choose which dropout method to use. It can be "bernoulli", "gaussian" or
126 |     "variational"
127 |     - `keep_prob`: Float parameter to define the dropout's keeping probability.
128 |     - `hidden_dim`: Integer parameter to define the hidden dimension.
129 |     - `num_layers`: Integer parameter to define the number of GRU layers.
130 |     - `bidirectional`: Boolean parameter to define bidirectionality.
131 |     - `bias`: Boolean parameter to define the usage of bias.
132 | - `lstm`:
133 |     - `dropout_type`: String parameter to choose which dropout method to use. It can be "bernoulli", "gaussian" or
134 |     "variational"
135 |     - `keep_prob`: Float parameter to define the dropout's keeping probability.
136 |     - `hidden_dim`: Integer parameter to define the hidden dimension.
137 |     - `num_layers`: Integer parameter to define the number of GRU layers.
138 |     - `bidirectional`: Boolean parameter to define bidirectionality.
139 |     - `bias`: Boolean parameter to define the usage of bias.
140 | - `transformer_google`:
141 |     - `use_embed_sqrt_mul`: Boolean parameter to initialize embeddings by multiplying it with the square root of the
142 |     model size. Initially, its value is False.
143 |     - `keep_prob_encoder`: Float parameter to define the dropout's keeping probability in encoder. This model only use
144 |     "bernoulli" dropout.
145 |     - `keep_prob_pe`: Float parameter to define the dropout's keeping probability in positional embeddings. This model
146 |     only use "bernoulli" dropout.
147 |     - `keep_prob_pff`": Float parameter to define the dropout's keeping probability in positional feed-forward. This model
148 |     only use "bernoulli" dropout.
149 |     - `keep_prob_attn`: Float parameter to define the dropout's keeping probability in attention. This model only use
150 |     "bernoulli" dropout.
151 |     - `keep_prob_clf`: Float parameter to define the dropout's keeping probability in classifier. This model only use
152 |     "bernoulli" dropout.
153 |     - `transformer_type`: String parameter to define the job of the transformer model. Currently, it can only take "classifier"
154 |     value.
155 |     - `heads`: Integer parameter to define the number of parallel attention layers.
156 |     - `num_encoder_layers`: Integer parameter to define the number of encoder layers.
157 |     - `num_hidden_pos_ff`: Integer parameter to define number of hidden units in position-wise feed-forward network.
158 |     - `max_length`: Integer parameter to define the maximum length of the input. Default value is 5000.
159 | 


--------------------------------------------------------------------------------
/config/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hbahadirsahin/nlp-experiments-in-pytorch/5d05a53146dffd707e4d037230656f980d7be05c/config/__init__.py


--------------------------------------------------------------------------------
/config/config.json:
--------------------------------------------------------------------------------
  1 | {
  2 |   "dataset_properties": {
  3 |     "data_path": "./data/turkish_test.DUMP",
  4 |     "batch_size": 32,
  5 |     "stop_word_path": "./data/stopwords/turkish",
  6 |     "embedding_vector": "fasttext.tr.300d",
  7 |     "vector_cache": "./data/fasttext",
  8 |     "pretrained_embedding_path": "./data/fasttext/wiki.tr",
  9 |     "checkpoint_path": "",
 10 |     "oov_embedding_type": "zeros",
 11 |     "fixed_length": 0,
 12 |     "min_freq": 0
 13 |   },
 14 |   "model_properties": {
 15 |     "common_model_properties": {
 16 |       "run_mode": "train",
 17 |       "use_pretrained_embed": true,
 18 |       "embed_train_type": "static",
 19 |       "use_batch_norm": true,
 20 |       "batch_norm_momentum": 0.1,
 21 |       "batch_norm_affine": false
 22 |     },
 23 |     "text_cnn": {
 24 |       "use_padded_conv": true,
 25 |       "dropout_type": "bernoulli",
 26 |       "keep_prob": 0.5,
 27 |       "filter_count": 64,
 28 |       "filter_sizes": [
 29 |         3,
 30 |         4,
 31 |         5
 32 |       ]
 33 |     },
 34 |     "char_cnn": {
 35 |       "dropout_type": "bernoulli",
 36 |       "keep_prob": 0.5,
 37 |       "max_sequence_length": 1014,
 38 |       "feature_size": "large",
 39 |       "filter_count": 1024,
 40 |       "filter_sizes": [
 41 |         7,
 42 |         7,
 43 |         3,
 44 |         3,
 45 |         3,
 46 |         3
 47 |       ],
 48 |       "max_pool_kernels": [
 49 |         3,
 50 |         3,
 51 |         3
 52 |       ],
 53 |       "linear_unit_count": 2048
 54 |     },
 55 |     "vdcnn": {
 56 |       "keep_prob": 0.5,
 57 |       "depth": 9,
 58 |       "filter_counts": [
 59 |         64,
 60 |         128,
 61 |         256,
 62 |         512
 63 |       ],
 64 |       "filter_size": 3,
 65 |       "use_shortcut": true,
 66 |       "downsampling_type": "resnet",
 67 |       "maxpool_filter_size": 3,
 68 |       "kmax": 8
 69 |     },
 70 |     "conv_deconv_cnn": {
 71 |       "keep_prob": 0.5,
 72 |       "filter_counts": [
 73 |         300,
 74 |         600,
 75 |         500
 76 |       ],
 77 |       "filter_sizes": [
 78 |         5,
 79 |         5,
 80 |         12
 81 |       ],
 82 |       "strides": [
 83 |         2,
 84 |         2,
 85 |         1
 86 |       ],
 87 |       "temperature": 0.01,
 88 |       "hidden_layer_size": 500
 89 |     },
 90 |     "gru": {
 91 |       "dropout_type": "bernoulli",
 92 |       "keep_prob": 0.5,
 93 |       "hidden_dim": 300,
 94 |       "num_layers": 1,
 95 |       "bidirectional": false,
 96 |       "bias": true
 97 |     },
 98 |     "lstm": {
 99 |       "dropout_type": "bernoulli",
100 |       "keep_prob": 0.5,
101 |       "hidden_dim": 100,
102 |       "num_layers": 2,
103 |       "bidirectional": true,
104 |       "bias": true
105 |     },
106 |     "transformer_google": {
107 |       "use_embed_sqrt_mul": false,
108 |       "keep_prob_encoder": 0.1,
109 |       "keep_prob_pe": 0.1,
110 |       "keep_prob_pff": 0.1,
111 |       "keep_prob_attn": 0.1,
112 |       "keep_prob_clf": 0.5,
113 |       "transformer_type": "classifier",
114 |       "heads": 6,
115 |       "num_encoder_layers": 6,
116 |       "num_hidden_pos_ff": 2048,
117 |       "max_length": 5000
118 |     }
119 |   },
120 |   "training_properties": {
121 |     "learner": "transformer_google",
122 |     "task": "classification",
123 |     "optimizer": "SGD",
124 |     "learning_rate": 0.05,
125 |     "final_learning_rate": 0.1,
126 |     "scheduler_type": "cos",
127 |     "amsgrad": true,
128 |     "amsbound": false,
129 |     "partial_adam": 0.25,
130 |     "weight_decay": 0,
131 |     "momentum": 0.9,
132 |     "norm_ratio": 5,
133 |     "epoch": 10,
134 |     "print_every_batch_step": 250,
135 |     "save_every_epoch": 1,
136 |     "topk": [
137 |       5,
138 |       1
139 |     ],
140 |     "eval_every": 1
141 |   },
142 |   "evaluation_properties": {
143 |     "model_path": "D:/PyTorchNLP/saved/2018-12-06/",
144 |     "sentence_vocab": "D:/PyTorchNLP/saved/vocab/sentence_vocab.dat",
145 |     "category_vocab": "D:/PyTorchNLP/saved/vocab/category_vocab.dat"
146 |   }
147 | }


--------------------------------------------------------------------------------
/config/config.logger:
--------------------------------------------------------------------------------
 1 | [loggers]
 2 | keys=root
 3 | 
 4 | [handlers]
 5 | keys=consoleHandler
 6 | 
 7 | [formatters]
 8 | keys=sampleFormatter
 9 | 
10 | [logger_root]
11 | level=INFO
12 | handlers=consoleHandler
13 | 
14 | [handler_consoleHandler]
15 | class=StreamHandler
16 | level=INFO
17 | formatter=sampleFormatter
18 | args=(sys.stdout,)
19 | 
20 | [formatter_sampleFormatter]
21 | format=%(asctime)s - %(name)s - %(levelname)s - %(message)s


--------------------------------------------------------------------------------
/crf/CRF.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | 
  4 | from utils.utils import log_sum_exp
  5 | 
  6 | 
  7 | class ConditionalRandomField(nn.Module):
  8 |     def __init__(self, args):
  9 |         super(ConditionalRandomField, self).__init__()
 10 | 
 11 |         self.args_common = args["common_model_properties"]
 12 | 
 13 |         self.tag_size = self.args_common["num_tags"]
 14 |         self.start_id = self.args_common["start_id"]
 15 |         self.end_id = self.args_common["end_id"]
 16 |         self.pad_id = self.args_common["padding_id"]
 17 | 
 18 |         # Device
 19 |         self.device = self.args_common["device"]
 20 | 
 21 |         # Matrix of transition parameters. Entry i,j is the score of transitioning *to* i *from* j
 22 |         self.transition = nn.Parameter(torch.Tensor(self.tag_size, self.tag_size)).to(self.device)
 23 | 
 24 |         self.transition.data[self.start_id, :] = -10000.  # no transition to start
 25 |         self.transition.data[:, self.end_id] = -10000.  # no transition from end except to pad
 26 |         self.transition.data[:, self.pad_id] = -10000.  # no transition from pad except to pad
 27 |         self.transition.data[self.pad_id, :] = -10000.  # no transition to pad except from end
 28 |         self.transition.data[self.pad_id, self.end_id] = 0.
 29 |         self.transition.data[self.pad_id, self.pad_id] = 0.
 30 | 
 31 |         torch.nn.init.xavier_normal_(self.transition)
 32 | 
 33 |     def _forward(self, x, mask):
 34 |         # initialize forward variables in log space
 35 |         batch_size, seq_length, tag_size = x.size()
 36 | 
 37 |         # Size of init_alphas = [Batch_size, Tag_size]
 38 |         alpha = torch.full((batch_size, self.tag_size), -10000.).to(self.device)
 39 |         alpha[:, self.start_id] = 0.
 40 | 
 41 |         # Size of transition_scores = [1, Tag_size, Tag_size]
 42 |         transition_scores = self.transition.unsqueeze(0)
 43 | 
 44 |         for i in range(seq_length):
 45 |             mask_broadcast = mask[:, i].unsqueeze(1)
 46 | 
 47 |             # Size of emition_scores = [Batch_size, Tag_Size, 1]
 48 |             emition_scores = x[:, i].unsqueeze(2)
 49 | 
 50 |             # Size of alpha_broadcast: [Batch_Size, Tag_Size, Tag_Size]
 51 |             alpha_broadcast = log_sum_exp(alpha.unsqueeze(1) + emition_scores + transition_scores)
 52 | 
 53 |             # Size of alpha: [Batch_size, Tag_Size]
 54 |             alpha = alpha_broadcast * mask_broadcast + alpha * (1 - mask_broadcast)
 55 |         return log_sum_exp(alpha + self.transition[self.end_id])
 56 | 
 57 |     def _score(self, x, tags, mask):
 58 |         batch_size, seq_length, tag_size = x.size()
 59 | 
 60 |         score = torch.zeros(batch_size).to(self.device)
 61 | 
 62 |         x = x.unsqueeze(3)
 63 |         trans = self.transition.unsqueeze(2)
 64 | 
 65 |         for t in range(seq_length - 1):  # recursion through the sequence
 66 |             mask_broadcast = mask[:, t]
 67 |             emition_scores = torch.cat([x[t, tag[t + 1]] for x, tag in zip(x, tags)])
 68 |             transition_scores = torch.cat([trans[tag[t + 1], tag[t]] for tag in tags])
 69 |             score += (emition_scores + transition_scores) * mask_broadcast
 70 | 
 71 |         last_tag = tags.gather(1, mask.sum(1).long().unsqueeze(1)).squeeze(1)
 72 |         score += self.transition[self.end_id, last_tag]
 73 |         return score
 74 | 
 75 |     def forward(self, input, tags, mask, reduce=True):
 76 |         forward_score = self._forward(input, mask)
 77 |         gold_score = self._score(input, tags, mask)
 78 |         nll = forward_score - gold_score
 79 |         return nll if not reduce else torch.mean(nll)
 80 | 
 81 |     def viterbi_decode(self, x, mask):
 82 |         batch_size, seq_length, tag_size = x.size()
 83 | 
 84 |         backpointers = torch.LongTensor().to(self.device)
 85 |         # Initialize the viterbi variables in log space
 86 |         path_score = torch.full((batch_size, self.tag_size), -10000.).to(self.device)
 87 |         path_score[:, self.start_id] = 0.
 88 | 
 89 |         for next_tag in range(seq_length):
 90 |             mask_broadcast = mask[:, next_tag].unsqueeze(1)
 91 |             path_score_broadcast = path_score.unsqueeze(1) + self.transition
 92 |             path_score_broadcast, backpointer_broadcast = torch.max(path_score_broadcast, 2)
 93 |             path_score_broadcast += x[:, next_tag]
 94 |             path_score = path_score_broadcast * mask_broadcast + path_score * (1 - mask_broadcast)
 95 |             backpointers = torch.cat((backpointers, backpointer_broadcast.unsqueeze(1)), 1)
 96 | 
 97 |         path_score += self.transition[self.end_id]
 98 |         best_path_scores, best_tag_ids = torch.max(path_score, 1)
 99 | 
100 |         backpointers = backpointers.tolist()
101 |         best_paths = [[tag_id] for tag_id in best_tag_ids.tolist()]
102 | 
103 |         for batch in range(batch_size):
104 |             best_tag = best_tag_ids[batch]
105 |             idx = int(mask[batch].sum().item())
106 |             for bptr_t in reversed(backpointers[batch][:idx]):
107 |                 best_tag = bptr_t[best_tag]
108 |                 best_paths[batch].append(best_tag)
109 |             best_paths[batch].pop()
110 |             best_paths[batch].reverse()
111 | 
112 |         return best_path_scores, best_paths
113 | 
114 | 
115 | 


--------------------------------------------------------------------------------
/crf/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hbahadirsahin/nlp-experiments-in-pytorch/5d05a53146dffd707e4d037230656f980d7be05c/crf/__init__.py


--------------------------------------------------------------------------------
/datahelper/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hbahadirsahin/nlp-experiments-in-pytorch/5d05a53146dffd707e4d037230656f980d7be05c/datahelper/__init__.py


--------------------------------------------------------------------------------
/datahelper/dataset_reader.py:
--------------------------------------------------------------------------------
  1 | from __future__ import print_function
  2 | 
  3 | import logging.config
  4 | import random
  5 | 
  6 | from torchtext import data
  7 | 
  8 | from datahelper.preprocessor import Preprocessor
  9 | from embedding_helper import OOVEmbeddingCreator
 10 | 
 11 | SEED = 1234
 12 | 
 13 | logging.config.fileConfig(fname='D:/PyTorchNLP/config/config.logger', disable_existing_loggers=False)
 14 | logger = logging.getLogger("Dataset Loader")
 15 | 
 16 | 
 17 | class DatasetLoader(object):
 18 |     def __init__(self, data_path, vector, task, fix_length=0, min_freq=0, level="word", unk_init=None,
 19 |                  preprocessor=None,
 20 |                  vector_cache=None):
 21 |         assert data_path is not None and vector is not None
 22 | 
 23 |         self.data_path = data_path
 24 |         self.vector = vector
 25 |         self.preprocessor = preprocessor
 26 |         self.vector_cache = vector_cache
 27 |         self.unk_init = unk_init
 28 |         self.level = level
 29 |         self.task = task
 30 | 
 31 |         self.sentence_vocab = None
 32 |         self.category_vocab = None
 33 |         self.ner_vocab = None
 34 | 
 35 |         self.sentence_vocab_vectors = None
 36 | 
 37 |         self.train_iter = None
 38 |         self.val_iter = None
 39 |         self.test_iter = None
 40 | 
 41 |         self.min_freq = min_freq
 42 |         self.fix_length = fix_length
 43 |         if min_freq < 0:
 44 |             self.min_freq = 0
 45 |         if fix_length <= 0:
 46 |             self.fix_length = None
 47 | 
 48 |     '''
 49 |         This method is for character-level stuff.
 50 |         Since, torchtext do not let me do preprocess before tokenization in its normal flow (it always applies tokenization
 51 |         first, then does preprocess), I listened an advice from stackoverflow and wrote my own tokenizer which does 
 52 |         preprocess first, then applies tokenize preprocessed sentence into characters. Not happy with it, but it works =)
 53 |         :param sentence:
 54 |         :return: tokenized_sentence
 55 |     '''
 56 | 
 57 |     def evil_workaround_tokenizer(self, sentence):
 58 |         preprocessed_sentence = self.preprocessor(sentence)
 59 |         tokenized_sentence = list(preprocessed_sentence)
 60 |         return tokenized_sentence
 61 | 
 62 |     def create_fields(self, seq_input=True, seq_ner=True, seq_cat=False):
 63 |         if self.level == "word":
 64 |             sentence_field = data.Field(sequential=seq_input, preprocessing=self.preprocessor, fix_length=self.fix_length,
 65 |                                         init_token="<start>", eos_token="<end>")
 66 |         elif self.level == "char":
 67 |             sentence_field = data.Field(sequential=seq_input, tokenize=self.evil_workaround_tokenizer, fix_length=1014)
 68 |             # sentence_field = data.NestedField(nested_field)
 69 |         else:
 70 |             raise KeyError("Sentence_field is undefined!")
 71 | 
 72 |         ner_label_field = data.Field(sequential=seq_ner, init_token="<start>", eos_token="<end>", unk_token=None)
 73 |         category_label_field = data.LabelField(sequential=seq_cat)
 74 |         return sentence_field, ner_label_field, category_label_field
 75 | 
 76 |     def read_dataset(self, batch_size=128, split_ratio=0.7, format="tsv"):
 77 |         sf, nlf, clf = self.create_fields()
 78 |         if self.task == "classification":
 79 |             dataset = data.TabularDataset(path=self.data_path,
 80 |                                           format=format,
 81 |                                           skip_header=True,
 82 |                                           fields=[("category_labels", clf),
 83 |                                                   ("ner_labels", None),
 84 |                                                   ("sentence", sf)])
 85 |         elif self.task == "ner":
 86 |             dataset = data.TabularDataset(path=self.data_path,
 87 |                                           format=format,
 88 |                                           skip_header=True,
 89 |                                           fields=[("category_labels", None),
 90 |                                                   ("ner_labels", nlf),
 91 |                                                   ("sentence", sf)])
 92 |         else:
 93 |             raise ValueError("Training task is not defined! It can be 'classification' or 'ner'")
 94 | 
 95 |         logger.info("Splitting dataset into train/dev/test")
 96 |         train, val, test = self.create_splits(dataset, split_ratio)
 97 |         logger.info("Splitting done!")
 98 |         logger.info("Creating vocabulary")
 99 |         self.create_vocabs(dataset, sf, clf, nlf)
100 |         logger.info("Vocabulary created!")
101 |         logger.info("Creating iterators")
102 |         self.create_iterator(train, val, test, batch_size)
103 |         return train, val, test
104 | 
105 |     @staticmethod
106 |     def create_splits(dataset, split_ratio):
107 |         return dataset.split(split_ratio=split_ratio, random_state=random.seed(SEED))
108 | 
109 |     def create_vocabs(self, train, sentence_field, category_label_field, ner_label_field):
110 |         if self.level == "word":
111 |             sentence_field.build_vocab(train, vectors=self.vector, vectors_cache=self.vector_cache,
112 |                                        unk_init=self.unk_init, min_freq=self.min_freq)
113 |         else:
114 |             sentence_field.build_vocab(train)
115 | 
116 |         self.sentence_vocab = sentence_field.vocab
117 |         self.sentence_vocab_vectors = sentence_field.vocab.vectors
118 | 
119 |         if self.task == "classification":
120 |             category_label_field.build_vocab(train)
121 |             self.category_vocab = category_label_field.vocab
122 |         else:
123 |             ner_label_field.build_vocab(train)
124 |             self.ner_vocab = ner_label_field.vocab
125 | 
126 |     def create_iterator(self, train, val, test, batch_size):
127 |         self.train_iter, self.val_iter, self.test_iter = data.BucketIterator.splits(datasets=(train, val, test),
128 |                                                                                     batch_sizes=(
129 |                                                                                         batch_size, batch_size,
130 |                                                                                         batch_size),
131 |                                                                                     sort=True,
132 |                                                                                     sort_key=lambda x: -len(x.sentence),
133 |                                                                                     sort_within_batch=False,
134 |                                                                                     repeat=False)
135 | 
136 | 
137 | if __name__ == '__main__':
138 |     stop_word_path = "D:/Anaconda3/nltk_data/corpora/stopwords/turkish"
139 |     data_path = "D:/PyTorchNLP/data/turkish_test.DUMP"
140 |     vector_cache = "D:/PyTorchNLP/data/fasttext"
141 |     level = "word"
142 |     is_char_level = False
143 |     task = "ner"
144 | 
145 |     if task == "ner":
146 |         stop_word_path = None
147 | 
148 |     preprocessor = Preprocessor(stop_word_path,
149 |                                 is_remove_digit=False,
150 |                                 is_remove_punctuations=False,
151 |                                 is_char_level=is_char_level)
152 | 
153 |     unkembedding = OOVEmbeddingCreator(type="zeros",
154 |                                        fasttext_model_path="D:/PyTorchNLP/data/fasttext/wiki.tr")
155 | 
156 |     dataset_helper = DatasetLoader(data_path=data_path,
157 |                                    vector="fasttext.tr.300d",
158 |                                    preprocessor=preprocessor.preprocess,
159 |                                    level=level,
160 |                                    vector_cache=vector_cache,
161 |                                    unk_init=unkembedding.create_oov_embedding,
162 |                                    min_freq=0,
163 |                                    fix_length=0,
164 |                                    task=task
165 |                                    )
166 | 
167 |     print("Reading dataset")
168 |     train, val, test = dataset_helper.read_dataset(batch_size=32)
169 |     print(len(train), "-", len(val), "-", len(test))
170 |     sentence_vocab = dataset_helper.sentence_vocab
171 |     category_vocab = dataset_helper.category_vocab
172 |     ner_vocab = dataset_helper.ner_vocab
173 | 
174 |     print("Vocab:", len(sentence_vocab))
175 |     # print("Vocab:", len(category_vocab))
176 |     print("Most freq:", sentence_vocab.freqs.most_common(20))
177 |     # print("Most freq:", category_vocab.freqs.most_common(20))
178 |     print("Itos:", sentence_vocab.itos[:50])
179 |     print("Itos:", ner_vocab.itos)
180 |     print("Stoi:", ner_vocab.stoi)
181 |     # print("Stoi:", category_vocab.stoi)
182 | 
183 |     train_iter = dataset_helper.train_iter
184 |     val_iter = dataset_helper.val_iter
185 |     test_iter = dataset_helper.test_iter
186 | 
187 |     print("Train iter size:", len(train_iter))
188 |     print("Val iter size:", len(val_iter))
189 |     print("Test iter size:", len(test_iter))
190 | 
191 |     for idx, batch in enumerate(val_iter):
192 |         batch_x = batch.sentence
193 |         # batch_category = batch.category_labels
194 |         batch_ner = batch.ner_labels
195 |         batch_x = batch_x.permute(1, 0)
196 |         batch_ner = batch_ner.permute(1, 0)
197 |         print(idx, ":", batch_x.size())
198 |         print(idx, ":", batch_ner.size())
199 |         # print(batch_x)
200 |         # print(batch_category)
201 |         # print(batch_ner)
202 |         # batch_x = torch.reshape(batch_x, (batch_x.size(0), batch_x.size(1)*batch_x.size(2)))
203 |         for idx2, x in enumerate(batch_x):
204 |             if dataset_helper.level == "word":
205 |                 s = [sentence_vocab.itos[i] for i in x]
206 |                 n = [ner_vocab.itos[i] for i in batch_ner[idx2]]
207 |             else:
208 |                 # s = [sentence_vocab.itos[char] for sentence in batch_x for word in sentence for char in word]
209 |                 s = [sentence_vocab.itos[idx] for idx in x]
210 | 
211 |             print(idx2, "(", len(x), ")-", x)
212 |             print(idx2, "(", len(s), ")-", s)
213 |             print(idx2, "(", len(batch_ner[idx2]), ")-", batch_ner[idx2])
214 |             print(idx2, "(", len(n), ")-", n)
215 |             print("")
216 | 
217 | 
218 | 


--------------------------------------------------------------------------------
/datahelper/embedding_helper.py:
--------------------------------------------------------------------------------
 1 | import logging.config
 2 | 
 3 | import torch
 4 | from gensim.models import FastText
 5 | 
 6 | logging.config.fileConfig(fname="D:/PyTorchNLP/config/config.logger", disable_existing_loggers=False)
 7 | logger = logging.getLogger("Embedding Helper")
 8 | 
 9 | class OOVEmbeddingCreator(object):
10 |     def __init__(self, type="zeros", range=(-0.25, 0.25), fasttext_model_path="None"):
11 |         assert type == "zeros" or type == "ones" or type == "random" or type == "uniform" or type == "fasttext_oov"
12 |         self.type = type
13 |         self.range = range
14 |         self.fasttext_model_path = fasttext_model_path
15 |         self.random_emb = None
16 |         self.uniform_emb = None
17 |         logger.info("> OOV Embedding mode: %s", self.type)
18 |         if self.type == "fasttext_oov":
19 |             assert self.fasttext_model_path is not None
20 |             logger.info(">> Fasttext model will be loaded and embeddings for OOV words will be calculated by using it!")
21 |             logger.info(">> Beware that the process may take a while due to this process!")
22 |             self.model = FastText.load_fasttext_format(self.fasttext_model_path)
23 | 
24 |     def create_oov_embedding(self, vector, word=None):
25 |         if self.type == "zeros":
26 |             return torch.zeros(vector.size())
27 |         elif self.type == "ones":
28 |             return torch.ones(vector.size())
29 |         elif self.type == "random":
30 |             if self.random_emb is None:
31 |                 self.random_emb = torch.randn(vector.size())
32 |             return self.random_emb
33 |         elif self.type == "uniform":
34 |             if self.uniform_emb is None:
35 |                 self.uniform_emb = torch.FloatTensor(vector.size()).uniform_(self.range[0], self.range[1])
36 |             return self.uniform_emb
37 |         elif self.type == "fasttext_oov":
38 |             try:
39 |                 res = torch.from_numpy(self.model.wv.word_vec(word))
40 |             except KeyError:
41 |                 res = torch.randn(vector.size())
42 |             return res
43 | 


--------------------------------------------------------------------------------
/datahelper/preprocessor.py:
--------------------------------------------------------------------------------
  1 | import re
  2 | import string
  3 | 
  4 | all_letters = string.ascii_letters + ".,;"
  5 | 
  6 | 
  7 | class Preprocessor(object):
  8 |     def __init__(self, stop_word_path=None, is_remove_digit=True, is_remove_punctuations=True, is_char_level=False):
  9 |         self.stop_word_path = stop_word_path
 10 |         if stop_word_path is not None:
 11 |             self.stop_words = self.load_stop_words(stop_word_path)
 12 |         self.is_remove_digit = is_remove_digit
 13 |         self.is_remove_punctuations = is_remove_punctuations
 14 |         self.is_char_level = is_char_level
 15 | 
 16 |     @staticmethod
 17 |     def load_stop_words(path):
 18 |         return set(line.strip() for line in open(path))
 19 | 
 20 |     @staticmethod
 21 |     def remove_line_breaks(sentence):
 22 |         return sentence.replace("\r", "").replace("\n", "")
 23 | 
 24 |     @staticmethod
 25 |     def remove_punctuations(sentence):
 26 |         sentence = sentence.replace('"', '')
 27 |         sentence = sentence.replace("'", "")
 28 |         return "".join([ch for ch in sentence if ch not in string.punctuation])
 29 | 
 30 |     @staticmethod
 31 |     def remove_multiple_white_spaces(sentence):
 32 |         return " ".join(sentence.split())
 33 | 
 34 |     def remove_stop_words(self, sentence):
 35 |         return " ".join([word for word in sentence.split() if word not in self.stop_words])
 36 | 
 37 |     @staticmethod
 38 |     def to_lowercase(sentence):
 39 |         return sentence.lower()
 40 | 
 41 |     @staticmethod
 42 |     def remove_digits(sentence):
 43 |         return "".join([word for word in sentence if not word.isdigit()])
 44 | 
 45 |     @staticmethod
 46 |     def replace_digits(sentence):
 47 |         return re.sub("\d+", "<NUM>", sentence)
 48 | 
 49 |     @staticmethod
 50 |     def replace_digits_in_char_level(sentence):
 51 |         return [re.sub("\d+", "<NUM>", i) for i in sentence]
 52 | 
 53 |     @staticmethod
 54 |     def change_space_to_special_token(sentence):
 55 |         return ["<SPACE>" if i == " " else i for i in sentence]
 56 | 
 57 |     @staticmethod
 58 |     def remove_alphanumeric(sentence):
 59 |         return "".join([word for word in sentence if not word.isalnum()])
 60 | 
 61 |     @staticmethod
 62 |     def remove_non_utf8(sentence):
 63 |         return bytes(sentence, "utf-8").decode("utf-8", "ignore")
 64 | 
 65 |     @staticmethod
 66 |     def change_currency_characters(sentence):
 67 |         return sentence.replace('$', 'dolar').replace('£', 'sterlin').replace('€', 'euro')
 68 | 
 69 |     @staticmethod
 70 |     def change_space_to_special_token(sentence):
 71 |         return ["<SPACE>" if i == " " else i for i in sentence]
 72 | 
 73 |     def preprocess(self, sentence):
 74 |         # TorchText returns a list of words instead of a normal sentence.
 75 |         # First, create the sentence again. Then, do preprocess. Finally, return the preprocessed sentence as list
 76 |         # of words
 77 |         if self.is_char_level:
 78 |             x = sentence
 79 |         else:
 80 |             x = " ".join(sentence)
 81 |         x = self.to_lowercase(x)
 82 |         x = self.change_currency_characters(x)
 83 | 
 84 |         if self.is_remove_punctuations:
 85 |             x = self.remove_punctuations(x)
 86 | 
 87 |         if self.stop_word_path is not None:
 88 |             x = self.remove_stop_words(x)
 89 | 
 90 |         if self.is_remove_digit:
 91 |             x = self.remove_digits(x)
 92 |         elif self.is_remove_digit is False and self.is_char_level is False:
 93 |             x = self.replace_digits(x)
 94 | 
 95 |         x = self.remove_line_breaks(x)
 96 |         x = self.remove_multiple_white_spaces(x)
 97 | 
 98 |         if self.is_char_level:
 99 |             x = list((x.strip()))
100 |             x = self.change_space_to_special_token(x)
101 |             x = self.replace_digits_in_char_level(x)
102 |             return x
103 |         else:
104 |             return (x.strip()).split()
105 | 
106 | 
107 | if __name__ == '__main__':
108 |     stop_word_path = "D:/Anaconda3/nltk_data/corpora/stopwords/turkish"
109 |     dataset_path = "D:/PyTorchNLP/data/twnertc_basic_tr.DUMP"
110 | 
111 |     preprocessor = Preprocessor(stop_word_path,
112 |                                 is_remove_digit=False,
113 |                                 is_remove_punctuations=False)
114 | 
115 |     with open(dataset_path, encoding="utf-8") as dataset:
116 |         for counter, line in enumerate(dataset):
117 |             line_tokens = line.split('\t')
118 |             sentence = line_tokens[2]
119 |             print(" > ", sentence)
120 |             print(" = ", preprocessor.preprocess(sentence))
121 |             if counter == 10:
122 |                 break
123 | 


--------------------------------------------------------------------------------
/dropout_models/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hbahadirsahin/nlp-experiments-in-pytorch/5d05a53146dffd707e4d037230656f980d7be05c/dropout_models/__init__.py


--------------------------------------------------------------------------------
/dropout_models/dropout.py:
--------------------------------------------------------------------------------
 1 | import torch.nn as nn
 2 | from dropout_models.gaussian_dropout import GaussianDropout
 3 | from dropout_models.variational_dropout import VariationalDropout
 4 | 
 5 | 
 6 | class Dropout(object):
 7 |     def __init__(self, keep_prob=0.5, dimension=None, dropout_type="bernoulli"):
 8 |         self.keep_prob = keep_prob
 9 |         self.dimension = dimension
10 |         self.dropout_type = dropout_type
11 |         self.dropout = self.create_dropout()
12 | 
13 |     def create_dropout(self):
14 |         if self.dropout_type == "bernoulli":
15 |             return nn.Dropout(self.keep_prob)
16 |         elif self.dropout_type == "gaussian":
17 |             return GaussianDropout(prob=self.keep_prob)
18 |         elif self.dropout_type == "variational":
19 |             return VariationalDropout(prob=self.keep_prob, dimension=self.dimension)
20 | 


--------------------------------------------------------------------------------
/dropout_models/gaussian_dropout.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import torch.nn as nn
 3 | 
 4 | 
 5 | class GaussianDropout(nn.Module):
 6 |     def __init__(self, prob):
 7 |         super(GaussianDropout, self).__init__()
 8 |         if 0 < prob <= 0.5:
 9 |             self.alpha = torch.Tensor([prob / (1.0 - prob)])
10 |         else:
11 |             prob = 0.5
12 |             self.alpha = torch.Tensor([prob / (1.0 - prob)])
13 | 
14 |     def forward(self, x):
15 |         if self.train():
16 |             # Epsilon ~ N(1, alpha)
17 |             epsilon = torch.randn(x.size()) * self.alpha + 1
18 |             if x.is_cuda:
19 |                 epsilon = epsilon.cuda()
20 |             # Local reparametrization trick: x_i = ^x_i * epsilon_i
21 |             return x * epsilon
22 |         else:
23 |             return x
24 | 


--------------------------------------------------------------------------------
/dropout_models/variational_dropout.py:
--------------------------------------------------------------------------------
 1 | import logging.config
 2 | 
 3 | import torch
 4 | import torch.nn as nn
 5 | from torch.autograd import Variable
 6 | 
 7 | logging.config.fileConfig(fname='./config/config.logger', disable_existing_loggers=False)
 8 | logger = logging.getLogger("Dropout")
 9 | 
10 | class VariationalDropout(nn.Module):
11 |     def __init__(self, prob, dimension=None):
12 |         super(VariationalDropout, self).__init__()
13 | 
14 |         self.dimension = dimension
15 | 
16 |         alpha = 1.0
17 |         if prob <= 0.5:
18 |             alpha = prob / (1 - prob)
19 |         else:
20 |             logger.warning("Caution! With the current alpha value ({}), you may trapped in local optima!".format(prob))
21 |             logger.warning("It is suggested that probability value should be <= 0.5")
22 |             alpha = prob / (1 - 0.49)
23 |         self.max_alpha = alpha
24 | 
25 |         log_alpha = torch.log(torch.ones(dimension) * alpha)
26 |         self.log_alpha = nn.Parameter(log_alpha)
27 | 
28 |         self.c = [1.16145124, -1.50204118, 0.58629921]
29 | 
30 |     def kl(self):
31 |         alpha = torch.exp(self.log_alpha)
32 | 
33 |         kl = -(0.5 * self.log_alpha + self.c[0] * alpha + self.c[1] * alpha ** 2 + self.c[2] * alpha ** 3)
34 | 
35 |         return torch.mean(kl)
36 | 
37 |     def forward(self, x):
38 |         self.log_alpha.data = torch.clamp(self.log_alpha.data, max=self.max_alpha)
39 |         kld = self.kl()
40 | 
41 |         if self.train():
42 |             # Epsilon ~ N(0, 1)
43 |             epsilon = Variable(torch.randn(x.size()))
44 |             if x.is_cuda:
45 |                 epsilon = epsilon.cuda()
46 | 
47 |             alpha = torch.exp(self.log_alpha)
48 | 
49 |             # Epsilon ~ N(1, alpha)
50 |             epsilon = epsilon * alpha
51 | 
52 |             return x * epsilon, kld
53 |         else:
54 |             return x, kld
55 | 


--------------------------------------------------------------------------------
/evaluation/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hbahadirsahin/nlp-experiments-in-pytorch/5d05a53146dffd707e4d037230656f980d7be05c/evaluation/__init__.py


--------------------------------------------------------------------------------
/evaluation/evaluator.py:
--------------------------------------------------------------------------------
 1 | import logging.config
 2 | 
 3 | from evaluation.interactive_evaluator import InteractiveEvaluator
 4 | from evaluation.single_model_ner_evaluator import SingleModelNerEvaluator
 5 | from evaluation.single_model_evaluator import SingleModelEvaluator
 6 | 
 7 | logging.config.fileConfig(fname='./config/config.logger', disable_existing_loggers=False)
 8 | logger = logging.getLogger("Evaluator")
 9 | 
10 | 
11 | class Evaluator(object):
12 |     @staticmethod
13 |     def evaluator_factory(type, device):
14 |         if type == "single_model_evaluator":
15 |             logger.info("Evaluator type is %s", type)
16 |             dev_evaluator = SingleModelEvaluator(device, is_vali=True)
17 |             test_evaluator = SingleModelEvaluator(device, is_vali=False)
18 |             return dev_evaluator, test_evaluator
19 |         elif type == "single_model_ner_evaluator":
20 |             logger.info("Evaluator type is %s", type)
21 |             dev_evaluator = SingleModelNerEvaluator(device, is_vali=True)
22 |             test_evaluator = SingleModelNerEvaluator(device, is_vali=False)
23 |             return dev_evaluator, test_evaluator
24 |         elif type == "interactive_evaluator":
25 |             return InteractiveEvaluator(device)
26 |         else:
27 |             ValueError("Unrecognized trainer type")
28 | 


--------------------------------------------------------------------------------
/evaluation/interactive_evaluator.py:
--------------------------------------------------------------------------------
 1 | import logging.config
 2 | 
 3 | import pkg_resources
 4 | 
 5 | try:
 6 |     pkg_resources.get_distribution("spacy")
 7 | except pkg_resources.DistributionNotFound:
 8 |     print("Spacy has not been found! As sentence tokenizer .split() will be used!")
 9 |     HAS_SPACY = False
10 | else:
11 |     import spacy
12 | 
13 |     HAS_SPACY = True
14 | import torch
15 | import torch.nn.functional as F
16 | 
17 | from utils.utils import load_best_model, load_vocabulary
18 | 
19 | logging.config.fileConfig(fname='./config/config.logger', disable_existing_loggers=False)
20 | logger = logging.getLogger("Evaluator")
21 | 
22 | 
23 | class InteractiveEvaluator(object):
24 |     def __init__(self, device="cpu"):
25 |         self.device = device
26 | 
27 |     def evaluate_interactive(self, model_path, sentence_vocab_path, category_vocab_path, preprocessor, topk):
28 |         sentence_vocab = load_vocabulary(sentence_vocab_path)
29 |         category_vocab = load_vocabulary(category_vocab_path)
30 | 
31 |         model = load_best_model(model_path)
32 |         model = model.to(self.device)
33 |         model.eval()
34 | 
35 |         with torch.no_grad():
36 |             while True:
37 |                 try:
38 |                     sentence = input("Enter a test sentence (Type q or quit to exit!):")
39 |                 except ValueError:
40 |                     logger.error("Invalid input. Try again! (Type q or quit to exit!)")
41 |                     continue
42 | 
43 |                 if sentence.lower() != "q" and sentence.lower() != "quit":
44 |                     # Below 5 lines of weird looking code is for tokenizing a test input, correctly.
45 |                     # Obviously, sentence.split() does not work if the sentence has punctuations to tokenize.
46 |                     # Example: "a, b c." sentence should be tokenized as "a , b c . ".
47 |                     # That's why I added spacy tokenizer. And, lucky me it works for Turkish, too =)
48 |                     # Note that I added this tokenization to preprocessor; however, it takes too much time to prepare a
49 |                     # whole dataset in training process. Since the dataset I am using is already tokenized as it should be,
50 |                     # I wrote the below code to only evaluation process which is less shorter than my comment to explain
51 |                     # this situation =)
52 |                     if HAS_SPACY:
53 |                         nlp_tokenizer = spacy.load("en")
54 |                         doc = nlp_tokenizer(sentence.lower())
55 |                         tokenized_sentence = [token.text for token in doc]
56 |                         preprocessed_sentence = preprocessor(tokenized_sentence)
57 |                         temp = nlp_tokenizer(" ".join(preprocessed_sentence))
58 |                         preprocessed_sentence = [token.text for token in temp]
59 |                     else:
60 |                         preprocessed_sentence = preprocessor(sentence.lower().split())
61 | 
62 |                     indexed_test_sentence = [sentence_vocab.stoi[token] for token in preprocessed_sentence]
63 | 
64 |                     tensored_test_sentence = torch.LongTensor(indexed_test_sentence).to(self.device)
65 | 
66 |                     tensored_test_sentence = tensored_test_sentence.unsqueeze(1)
67 | 
68 |                     logit = model(tensored_test_sentence)
69 |                     probs = F.softmax(logit, dim=1)
70 | 
71 |                     predicted_category_probs, predicted_category_ids = probs.topk(topk, 1, True, True)
72 | 
73 |                     predicted_category_ids = predicted_category_ids.t()
74 | 
75 |                     predicted_labels = []
76 |                     for idx in predicted_category_ids:
77 |                         predicted_labels.append(category_vocab.itos[idx])
78 | 
79 |                     if topk == 1:
80 |                         logger.info("Predicted category is {} with probability {}".format(predicted_labels[0],
81 |                                                                                           predicted_category_probs[0][
82 |                                                                                               0].item()))
83 |                     else:
84 |                         logger.info("Top-{} predicted labels are as follows in order:".format(topk))
85 |                         for idx, label in enumerate(predicted_labels):
86 |                             logger.info("> {} - Predicted category is {} with probability {:.4f}".format(idx + 1,
87 |                                                                                                          label,
88 |                                                                                                          predicted_category_probs[
89 |                                                                                                              0][
90 |                                                                                                              idx].item()))
91 |                 else:
92 |                     logger.info("Interactive evaluation ends!")
93 |                     break
94 | 


--------------------------------------------------------------------------------
/evaluation/multiple_model_evaluator.py:
--------------------------------------------------------------------------------
 1 | import logging.config
 2 | 
 3 | import torch
 4 | 
 5 | from utils.utils import calculate_accuracy, calculate_topk_accuracy, load_best_model
 6 | 
 7 | logging.config.fileConfig(fname='./config/config.logger', disable_existing_loggers=False)
 8 | logger = logging.getLogger("Evaluator")
 9 | 
10 | class MultipleModelEvaluator(object):
11 |     def __init__(self, device, is_vali):
12 |         self.device = device
13 |         self.is_vali = is_vali
14 | 
15 |     def evaluate_iter(self, encoder, decoder, classifier, input, reconst_criterion, supervised_criterion, save_path,
16 |                       topk):
17 |         total_loss = 0
18 |         total_acc = 0
19 |         total_acc_topk = 0
20 | 
21 |         if not self.is_vali:
22 |             logger.info("Test mode!")
23 |             encoder = load_best_model(save_path, filename="saved_best_encoder")
24 |             decoder = load_best_model(save_path, filename="saved_best_decoder")
25 |             classifier = load_best_model(save_path, filename="saved_best_classifier")
26 |         else:
27 |             logger.info("Validation mode!")
28 | 
29 |         encoder.eval()
30 |         decoder.eval()
31 |         classifier.eval()
32 | 
33 |         with torch.no_grad():
34 |             for batch in input:
35 |                 batch_x = batch.sentence.to(self.device)
36 |                 batch_y = batch.category_labels.to(self.device)
37 | 
38 |                 hidden = encoder(batch_x)
39 |                 reconstruction_probs = decoder(hidden)
40 |                 supervised_predictions = classifier(hidden.squeeze())
41 | 
42 |                 reconstruction_loss = reconst_criterion(reconstruction_probs, batch_x)
43 |                 supervised_loss = supervised_criterion(supervised_predictions, batch_y)
44 | 
45 |                 accuracy = calculate_accuracy(supervised_predictions, batch_y)
46 |                 accuracy_topk = calculate_topk_accuracy(supervised_predictions, batch_y, topk=topk)
47 | 
48 |                 total_loss = total_loss + reconstruction_loss.item() + supervised_loss.item()
49 |                 total_acc += accuracy
50 |                 total_acc_topk += accuracy_topk[0].item()
51 | 
52 |                 torch.cuda.empty_cache()
53 | 
54 |             current_loss = total_loss / len(input)
55 |             current_acc = total_acc / len(input)
56 |             current_acc_topk = total_acc_topk / len(input)
57 | 
58 |             return current_loss, current_acc, current_acc_topk
59 | 


--------------------------------------------------------------------------------
/evaluation/single_model_evaluator.py:
--------------------------------------------------------------------------------
 1 | import logging.config
 2 | 
 3 | import torch
 4 | 
 5 | from utils.utils import load_best_model
 6 | 
 7 | logging.config.fileConfig(fname='./config/config.logger', disable_existing_loggers=False)
 8 | logger = logging.getLogger("Evaluator")
 9 | 
10 | class SingleModelEvaluator(object):
11 |     def __init__(self, device, is_vali):
12 |         self.device = device
13 |         self.is_vali = is_vali
14 | 
15 |     def evaluate_iter(self, model, input, criterion, save_path, scorer):
16 |         total_loss = 0
17 |         total_acc = 0
18 |         total_acc_topk = 0
19 | 
20 |         if not self.is_vali:
21 |             logger.info("Test mode!")
22 |             model = load_best_model(save_path)
23 |         else:
24 |             logger.info("Validation mode!")
25 |         model.eval()
26 | 
27 |         with torch.no_grad():
28 |             for batch in input:
29 |                 batch_x = batch.sentence.to(self.device)
30 |                 batch_y = batch.category_labels.to(self.device)
31 | 
32 |                 predictions, _ = model(batch_x)
33 | 
34 |                 loss = criterion(predictions, batch_y)
35 |                 accuracy = scorer.calculate_accuracy(predictions, batch_y)
36 |                 accuracy_topk = scorer.calculate_topk_accuracy(predictions, batch_y)
37 | 
38 |                 total_loss += loss.item()
39 |                 total_acc += accuracy
40 |                 total_acc_topk += accuracy_topk[0].item()
41 | 
42 |                 torch.cuda.empty_cache()
43 | 
44 |             current_loss = total_loss / len(input)
45 |             current_acc = total_acc / len(input)
46 |             current_acc_topk = total_acc_topk / len(input)
47 | 
48 |             return current_loss, current_acc, current_acc_topk
49 | 


--------------------------------------------------------------------------------
/evaluation/single_model_ner_evaluator.py:
--------------------------------------------------------------------------------
 1 | import logging.config
 2 | 
 3 | import torch
 4 | 
 5 | from models.GRU import GRU
 6 | from models.LSTM import LSTMBase
 7 | from utils.utils import load_best_model
 8 | 
 9 | logging.config.fileConfig(fname='./config/config.logger', disable_existing_loggers=False)
10 | logger = logging.getLogger("Evaluator")
11 | 
12 | 
13 | class SingleModelNerEvaluator(object):
14 |     def __init__(self, device, is_vali):
15 |         self.device = device
16 |         self.is_vali = is_vali
17 | 
18 |     def evaluate_iter(self, model, input, save_path, scorer, detailed_ner_log=True):
19 |         total_loss = 0
20 |         macro_f1 = 0
21 |         macro_precision = 0
22 |         macro_recall = 0
23 |         total_token_acc = 0
24 | 
25 |         if not self.is_vali:
26 |             logger.info("Test mode!")
27 |             model = load_best_model(save_path)
28 |         else:
29 |             logger.info("Validation mode!")
30 |         model.eval()
31 | 
32 |         full_ground_truth_list = list()
33 |         full_prediction_list = list()
34 | 
35 |         with torch.no_grad():
36 |             for batch in input:
37 |                 batch_x = batch.sentence.to(self.device)
38 |                 batch_y = batch.ner_labels.to(self.device)
39 | 
40 |                 if isinstance(model, GRU) or isinstance(model, LSTMBase):
41 |                     model.hidden = model.init_hidden(batch_x.size(1))
42 | 
43 |                 pred_scores, predictions = model.decode(batch_x)
44 | 
45 |                 batch_y = batch_y.permute(1, 0)
46 | 
47 |                 scorer.token_level_accuracy(predictions, batch_y)
48 | 
49 |                 full_ground_truth_list.extend(batch_y.tolist())
50 |                 full_prediction_list.extend(predictions)
51 | 
52 |                 token_level_accuracy = scorer.token_accuracy
53 | 
54 |                 total_token_acc += token_level_accuracy
55 | 
56 |                 torch.cuda.empty_cache()
57 | 
58 |             scorer.f1_score(full_prediction_list, full_ground_truth_list)
59 |             macro_f1 = scorer.avg_macro_f1
60 |             macro_precision = scorer.avg_macro_precision
61 |             macro_recall = scorer.avg_macro_recall
62 |             current_token_acc = total_token_acc / len(input)
63 | 
64 |             if detailed_ner_log:
65 |                 scorer.print_detailed_score_log()
66 | 
67 |             return macro_f1, macro_precision, macro_recall, current_token_acc
68 | 


--------------------------------------------------------------------------------
/main.py:
--------------------------------------------------------------------------------
  1 | from __future__ import print_function
  2 | 
  3 | import argparse
  4 | import datetime
  5 | import json
  6 | import logging.config
  7 | import os
  8 | 
  9 | import torch
 10 | 
 11 | from datahelper.dataset_reader import DatasetLoader
 12 | from datahelper.embedding_helper import OOVEmbeddingCreator
 13 | from datahelper.preprocessor import Preprocessor
 14 | from evaluation.evaluator import Evaluator
 15 | from models.CNN import TextCnn, CharCNN, VDCNN, ConvDeconvCNN
 16 | from models.GRU import GRU
 17 | from models.LSTM import LSTM, LSTMCRF
 18 | from models.Transformer import TransformerGoogle
 19 | from training.trainer import Trainer
 20 | from utils.utils import save_vocabulary
 21 | 
 22 | os.environ['CUDA_LAUNCH_BLOCKING'] = "1"
 23 | 
 24 | logging.config.fileConfig(fname='./config/config.logger', disable_existing_loggers=False)
 25 | logger = logging.getLogger("Main")
 26 | 
 27 | device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 28 | 
 29 | 
 30 | def initialize_model_and_trainer(model_properties, training_properties, datasetloader, device):
 31 |     logger.info("Model type is %s", training_properties["learner"])
 32 |     if training_properties["learner"] == "text_cnn":
 33 |         model = TextCnn(model_properties).to(device)
 34 |         trainer = Trainer.trainer_factory("single_model_trainer", training_properties, datasetloader, device)
 35 |     elif training_properties["learner"] == "gru":
 36 |         model = GRU(model_properties).to(device)
 37 |         trainer = Trainer.trainer_factory("single_model_trainer", training_properties, datasetloader, device)
 38 |     elif training_properties["learner"] == "lstm":
 39 |         model = LSTM(model_properties).to(device)
 40 |         trainer = Trainer.trainer_factory("single_model_trainer", training_properties, datasetloader, device)
 41 |     elif training_properties["learner"] == "char_cnn":
 42 |         model = CharCNN(model_properties).to(device)
 43 |         trainer = Trainer.trainer_factory("single_model_trainer", training_properties, datasetloader, device)
 44 |     elif training_properties["learner"] == "vdcnn":
 45 |         model = VDCNN(model_properties).to(device)
 46 |         trainer = Trainer.trainer_factory("single_model_trainer", training_properties, datasetloader, device)
 47 |     elif training_properties["learner"] == "conv_deconv_cnn":
 48 |         model = ConvDeconvCNN(model_properties)
 49 |         trainer = Trainer.trainer_factory("single_model_trainer", training_properties, datasetloader, device)
 50 |     elif training_properties["learner"] == "transformer_google":
 51 |         model = TransformerGoogle(model_properties).model.to(device)
 52 |         trainer = Trainer.trainer_factory("single_model_trainer", training_properties, datasetloader, device)
 53 |     elif training_properties["learner"] == "lstmcrf":
 54 |         assert training_properties["task"] == "ner"
 55 |         model = LSTMCRF(model_properties).to(device)
 56 |         trainer = Trainer.trainer_factory("single_model_ner_trainer", training_properties, datasetloader, device)
 57 |     else:
 58 |         raise ValueError("Model is not defined! Available learner values are : 'text_cnn', 'char_cnn', 'vdcnn', 'gru', "
 59 |                          "'lstm', 'conv_deconv_cnn' and 'transformer_google'")
 60 | 
 61 |     return model, trainer
 62 | 
 63 | 
 64 | if __name__ == '__main__':
 65 |     parser = argparse.ArgumentParser()
 66 | 
 67 |     parser.add_argument("-c", "--config", default="D:/PyTorchNLP/config/config.json", type=str,
 68 |                         help="config.json path. Caution! Default path is hard-coded, local path.")
 69 | 
 70 |     args = parser.parse_args()
 71 | 
 72 |     config = json.load(open(args.config))
 73 | 
 74 |     dataset_properties = config["dataset_properties"]
 75 |     model_properties = config["model_properties"]
 76 |     training_properties = config["training_properties"]
 77 |     evaluation_properties = config["evaluation_properties"]
 78 | 
 79 |     assert model_properties["common_model_properties"]["run_mode"] == "train" or \
 80 |            model_properties["common_model_properties"]["run_mode"] == "eval_interactive"
 81 | 
 82 |     assert training_properties["task"] == "classification" or training_properties["task"] == "ner"
 83 | 
 84 |     logger.info("Initial device is %s", device)
 85 |     if "cuda" == device:
 86 |         torch.backends.cudnn.benchmark = True
 87 |         torch.backends.cudnn.fastest = True
 88 |     else:
 89 |         torch.set_num_threads(8)
 90 |         torch.backends.cudnn.enabled = False
 91 | 
 92 |     stop_word_path = dataset_properties["stop_word_path"]
 93 |     data_path = dataset_properties["data_path"]
 94 |     vector_cache = dataset_properties["vector_cache"]
 95 |     fasttext_model_path = dataset_properties["pretrained_embedding_path"]
 96 | 
 97 |     oov_embedding_type = dataset_properties["oov_embedding_type"]
 98 |     batch_size = dataset_properties["batch_size"]
 99 |     min_freq = dataset_properties["min_freq"]
100 |     fix_length = dataset_properties["fixed_length"]
101 | 
102 |     embedding_vector = dataset_properties["embedding_vector"]
103 | 
104 |     training_task = training_properties["task"]
105 | 
106 |     save_dir = os.path.abspath(os.path.join(os.curdir, "saved", datetime.datetime.today().strftime('%Y-%m-%d')))
107 |     save_dir_vocab = os.path.abspath(os.path.join(os.curdir, "saved", "vocab"))
108 |     if not os.path.isdir(save_dir):
109 |         os.makedirs(save_dir)
110 |     if not os.path.isdir(save_dir_vocab):
111 |         os.makedirs(save_dir_vocab)
112 |     logger.info("Saving directory for models is %s", save_dir)
113 |     logger.info("Saving directory for vocabulary files is %s", save_dir_vocab)
114 |     training_properties["save_path"] = save_dir
115 | 
116 |     level = "word"
117 |     is_char_level = False
118 |     if training_properties["learner"] == "charcnn" or training_properties["learner"] == "vdcnn":
119 |         logger.info("Caution: Due to selected learning model, everything will be executed in character-level!")
120 |         level = "char"
121 |         is_char_level = True
122 | 
123 |     if training_task == "ner":
124 |         stop_word_path = None
125 | 
126 |     logger.info("Initialize Preprocessor")
127 |     preprocessor = Preprocessor(stop_word_path,
128 |                                 is_remove_digit=True,
129 |                                 is_remove_punctuations=False,
130 |                                 is_char_level=is_char_level)
131 | 
132 |     if model_properties["common_model_properties"]["run_mode"] == "train":
133 |         logger.info("Initialize OOVEmbeddingCreator")
134 |         unkembedding = OOVEmbeddingCreator(type=oov_embedding_type,
135 |                                            fasttext_model_path=fasttext_model_path)
136 | 
137 |         logger.info("Initialize DatasetLoader")
138 |         datasetloader = DatasetLoader(data_path=data_path,
139 |                                       vector=embedding_vector,
140 |                                       preprocessor=preprocessor.preprocess,
141 |                                       level=level,
142 |                                       vector_cache=vector_cache,
143 |                                       unk_init=unkembedding.create_oov_embedding,
144 |                                       min_freq=min_freq,
145 |                                       fix_length=fix_length,
146 |                                       task=training_task
147 |                                       )
148 | 
149 |         logger.info("Loading train, validation and test sets")
150 |         train, val, test = datasetloader.read_dataset(batch_size=batch_size)
151 |         logger.info("Loading vocabularies")
152 |         sentence_vocab = datasetloader.sentence_vocab
153 |         category_vocab = datasetloader.category_vocab
154 |         ner_vocab = datasetloader.ner_vocab
155 |         logger.info("Loading embeddings")
156 |         pretrained_embeddings = datasetloader.sentence_vocab_vectors
157 |         logger.info("Updating properties")
158 |         model_properties["common_model_properties"]["device"] = device
159 | 
160 |         if training_properties["learner"] == "charcnn":
161 |             model_properties["common_model_properties"]["vocab_size"] = len(sentence_vocab)
162 |             model_properties["common_model_properties"]["embed_dim"] = len(sentence_vocab) - 1
163 |         elif training_properties["learner"] == "vdcnn":
164 |             model_properties["common_model_properties"]["vocab_size"] = len(sentence_vocab)
165 |             model_properties["common_model_properties"]["embed_dim"] = 16
166 |         else:
167 |             model_properties["common_model_properties"]["vocab_size"] = pretrained_embeddings.size()[0]
168 |             model_properties["common_model_properties"]["embed_dim"] = pretrained_embeddings.size()[1]
169 | 
170 |         if category_vocab is not None:
171 |             model_properties["common_model_properties"]["num_class"] = len(category_vocab)
172 |         if ner_vocab is not None:
173 |             model_properties["common_model_properties"]["ner_vocab"] = ner_vocab
174 |             model_properties["common_model_properties"]["num_tags"] = len(ner_vocab)
175 |             model_properties["common_model_properties"]["start_id"] = ner_vocab.stoi["<start>"]
176 |             model_properties["common_model_properties"]["end_id"] = ner_vocab.stoi["<end>"]
177 | 
178 |         model_properties["common_model_properties"]["vocab"] = sentence_vocab
179 |         model_properties["common_model_properties"]["padding_id"] = sentence_vocab.stoi["<pad>"]
180 |         model_properties["common_model_properties"]["pretrained_weights"] = pretrained_embeddings
181 |         model_properties["common_model_properties"]["batch_size"] = dataset_properties["batch_size"]
182 | 
183 |         logger.info("Saving vocabulary files")
184 |         save_vocabulary(sentence_vocab, os.path.abspath(os.path.join(save_dir_vocab, "sentence_vocab.dat")))
185 |         if category_vocab is not None:
186 |             save_vocabulary(category_vocab, os.path.abspath(os.path.join(save_dir_vocab, "category_vocab.dat")))
187 |         if ner_vocab is not None:
188 |             save_vocabulary(ner_vocab, os.path.abspath(os.path.join(save_dir_vocab, "ner_vocab.dat")))
189 | 
190 |         logger.info("Initialize model and trainer")
191 |         model, trainer = initialize_model_and_trainer(model_properties, training_properties, datasetloader, device)
192 | 
193 |         if dataset_properties["checkpoint_path"] is None or dataset_properties["checkpoint_path"] == "":
194 |             logger.info("Train process is starting from scratch!")
195 |             trainer.train_iters(model)
196 |         else:
197 |             checkpoint = torch.load(dataset_properties["checkpoint_path"])
198 |             logger.info("Train process is reloading from epoch {}".format(checkpoint["epoch"]))
199 |             trainer.train_iters(model, checkpoint)
200 | 
201 |     elif model_properties["common_model_properties"]["run_mode"] == "eval_interactive":
202 |         interactive_evaluator = Evaluator.evaluator_factory("interactive_evaluator", "cpu")
203 | 
204 |         model_path = evaluation_properties["model_path"]
205 |         sentence_vocab_path = evaluation_properties["sentence_vocab"]
206 |         category_vocab_path = evaluation_properties["category_vocab"]
207 | 
208 |         logger.info("Interactive evaluation mode for model {}:".format(model_path))
209 | 
210 |         interactive_evaluator.evaluate_interactive(model_path=model_path,
211 |                                                    sentence_vocab_path=sentence_vocab_path,
212 |                                                    category_vocab_path=category_vocab_path,
213 |                                                    preprocessor=preprocessor.preprocess,
214 |                                                    topk=training_properties["topk"])
215 |     logger.info("Done!")
216 | 


--------------------------------------------------------------------------------
/models/CNN.py:
--------------------------------------------------------------------------------
  1 | import logging.config
  2 | 
  3 | import torch
  4 | import torch.nn as nn
  5 | import torch.nn.functional as F
  6 | 
  7 | from Util_CNN import KMaxPooling, LayerBlock, ConvolutionEncoder, DeconvolutionDecoder, FullyConnectedClassifier
  8 | from dropout_models.dropout import Dropout
  9 | 
 10 | logging.config.fileConfig(fname='./config/config.logger', disable_existing_loggers=False)
 11 | logger = logging.getLogger("CNN")
 12 | 
 13 | 
 14 | class TextCnn(nn.Module):
 15 |     def __init__(self, args):
 16 |         super(TextCnn, self).__init__()
 17 |         self.args_common = args["common_model_properties"]
 18 |         self.args_specific = args["text_cnn"]
 19 | 
 20 |         self.vocab = self.args_common["vocab"]
 21 | 
 22 |         # Device
 23 |         self.device = self.args_common["device"]
 24 | 
 25 |         # Input/Output dimensions
 26 |         self.embed_num = self.args_common["vocab_size"]
 27 |         self.embed_dim = self.args_common["embed_dim"]
 28 |         self.num_class = self.args_common["num_class"]
 29 | 
 30 |         # Embedding parameters
 31 |         self.padding_id = self.args_common["padding_id"]
 32 | 
 33 |         # Condition parameters
 34 |         self.use_pretrained_embed = self.args_common["use_pretrained_embed"]
 35 |         self.embed_train_type = self.args_common["embed_train_type"]
 36 |         self.use_padded_conv = self.args_specific["use_padded_conv"]
 37 |         self.use_batch_norm = self.args_common["use_batch_norm"]
 38 | 
 39 |         # Pretrained embedding weights
 40 |         self.pretrained_weights = self.args_common["pretrained_weights"]
 41 | 
 42 |         # Dropout type
 43 |         self.dropout_type = self.args_specific["dropout_type"]
 44 | 
 45 |         # Dropout probabilities
 46 |         keep_prob = self.args_specific["keep_prob"]
 47 | 
 48 |         # Batch normalization parameters
 49 |         self.batch_norm_momentum = self.args_common["batch_norm_momentum"]
 50 |         self.batch_norm_affine = self.args_common["batch_norm_affine"]
 51 | 
 52 |         # Convolution parameters
 53 |         self.input_channel = 1
 54 |         self.filter_count = self.args_specific["filter_count"]
 55 |         self.filter_sizes = self.args_specific["filter_sizes"]
 56 | 
 57 |         # Embedding Layer Initialization
 58 |         if self.embed_train_type == "multichannel":
 59 |             self.embed, self.embed_static = self.initialize_embeddings()
 60 |             self.embed = self.embed.cpu()
 61 |             self.embed_static = self.embed_static.cpu()
 62 |         else:
 63 |             self.embed, _ = self.initialize_embeddings()
 64 |             self.embed = self.embed.cpu()
 65 | 
 66 |         # Convolution Initialization
 67 |         self.convs = self.initialize_conv_layer()
 68 | 
 69 |         # Initialize convolution weights
 70 |         self.initialize_weights()
 71 | 
 72 |         # Flatten conv layers' output
 73 |         num_flatten_feature = len(self.filter_sizes) * self.filter_count
 74 | 
 75 |         # Batch Normalization initialization
 76 |         if self.use_batch_norm:
 77 |             logger.info("> Batch Normalization")
 78 |             self.initialize_batch_normalization(num_flatten_feature)
 79 | 
 80 |         # Dropout initialization
 81 |         if self.dropout_type == "bernoulli" or self.dropout_type == "gaussian":
 82 |             logger.info("> Dropout - %s", self.dropout_type)
 83 |             self.dropout = Dropout(keep_prob=keep_prob, dimension=None, dropout_type=self.dropout_type).dropout
 84 |         elif self.dropout_type == "variational":
 85 |             logger.info("> Dropout - %s", self.dropout_type)
 86 |             self.dropout_before_flatten = Dropout(keep_prob=0.2, dimension=num_flatten_feature,
 87 |                                                   dropout_type=self.dropout_type).dropout
 88 |             self.dropout_fc1 = Dropout(keep_prob=keep_prob, dimension=num_flatten_feature // 2,
 89 |                                        dropout_type=self.dropout_type).dropout
 90 |         else:
 91 |             logger.info("> Dropout - Bernoulli (You provide undefined dropout type!)")
 92 |             self.dropout = Dropout(keep_prob=keep_prob, dimension=None, dropout_type="bernoulli").dropout
 93 | 
 94 |         # Fully Connected Layer 1 initialization
 95 |         self.fc1 = nn.Linear(in_features=num_flatten_feature,
 96 |                              out_features=num_flatten_feature // 2,
 97 |                              bias=True)
 98 | 
 99 |         # Fully Connected Layer 2 initialization
100 |         self.fc2 = nn.Linear(in_features=num_flatten_feature // 2,
101 |                              out_features=self.num_class,
102 |                              bias=True)
103 | 
104 |     def initialize_embeddings(self):
105 |         logger.info("> Embeddings")
106 |         embed = nn.Embedding(num_embeddings=self.embed_num,
107 |                              embedding_dim=self.embed_dim,
108 |                              padding_idx=self.padding_id)
109 | 
110 |         embed_static = None
111 |         # Create 2nd embedding layer for multichannel purpose
112 |         if self.embed_train_type == "multichannel":
113 |             embed_static = nn.Embedding(num_embeddings=self.embed_num,
114 |                                         embedding_dim=self.embed_dim,
115 |                                         padding_idx=self.padding_id)
116 | 
117 |         if self.use_pretrained_embed:
118 |             logger.info("> Pre-trained Embeddings")
119 |             embed.from_pretrained(self.pretrained_weights)
120 |             if self.embed_train_type == "multichannel":
121 |                 embed_static.from_pretrained(self.pretrained_weights)
122 |         else:
123 |             logger.info("> Random Embeddings")
124 |             random_embedding_weights = torch.rand(self.embed_num, self.embed_dim)
125 |             embed.from_pretrained(random_embedding_weights)
126 |             if self.embed_train_type == "multichannel":
127 |                 embed_static.from_pretrained(random_embedding_weights)
128 | 
129 |         if self.embed_train_type == "static":
130 |             logger.info("> Static Embeddings")
131 |             embed.weight.requires_grad = False
132 |         elif self.embed_train_type == "nonstatic":
133 |             logger.info("> Non-Static Embeddings")
134 |             embed.weight.requires_grad = True
135 |         elif self.embed_train_type == "multichannel":
136 |             embed.weight.requires_grad = True
137 |             embed_static.weight.requires_grad = False
138 |         else:
139 |             raise ValueError("Embedding train type can be (1) static, (2) nonstatic or (3) multichannel")
140 |         return embed, embed_static
141 | 
142 |     def initialize_conv_layer(self):
143 |         if self.use_padded_conv:
144 |             logger.info("> Padded convolution")
145 |             return nn.ModuleList([nn.Conv2d(in_channels=self.input_channel,
146 |                                             out_channels=self.filter_count,
147 |                                             kernel_size=(filter_size, self.embed_dim),
148 |                                             stride=(1, 1),
149 |                                             padding=(filter_size // 2, 0),
150 |                                             bias=True) for filter_size in self.filter_sizes])
151 |         else:
152 |             logger.info("> Without-pad convolution")
153 |             return nn.ModuleList([nn.Conv2d(in_channels=self.input_channel,
154 |                                             out_channels=self.filter_count,
155 |                                             kernel_size=(filter_size, self.embed_dim),
156 |                                             bias=True) for filter_size in self.filter_sizes])
157 | 
158 |     def initialize_weights(self):
159 |         for conv in self.convs:
160 |             nn.init.xavier_normal_(conv.weight)
161 |             conv.bias.data.fill_(0.01)
162 | 
163 |     def initialize_batch_normalization(self, num_flatten_feature):
164 |         self.convs_bn = nn.BatchNorm2d(num_features=self.filter_count,
165 |                                        momentum=self.batch_norm_momentum,
166 |                                        affine=self.batch_norm_affine)
167 |         self.fc1_bn = nn.BatchNorm1d(num_features=num_flatten_feature // 2,
168 |                                      momentum=self.batch_norm_momentum,
169 |                                      affine=self.batch_norm_affine)
170 |         self.fc2_bn = nn.BatchNorm1d(num_features=self.num_class,
171 |                                      momentum=self.batch_norm_momentum,
172 |                                      affine=self.batch_norm_affine)
173 | 
174 |     def forward(self, x):
175 |         kl_loss = torch.Tensor([0.0])
176 |         # Input shape: [sentence_length, batch_size]
177 |         x = x.permute(1, 0)
178 |         # X shape: [batch_size, sentence_length]
179 |         x = self.embed(x)
180 |         if self.embed_train_type == "multichannel":
181 |             x_static = self.embed_static(x)
182 |             x = torch.stack[(x_static, x), 1]
183 |         if "cuda" in str(self.device):
184 |             x = x.cuda()
185 |             kl_loss = kl_loss.cuda()
186 |         # X shape: [batch_size, sentence_length, embedding_dim]
187 |         x = x.unsqueeze(1)
188 |         # X shape: [batch_size, 1, sentence_length, embedding_dim]
189 |         if self.use_batch_norm:
190 |             x = [self.convs_bn(F.relu(conv(x))).squeeze(3) for conv in self.convs]
191 |         else:
192 |             x = [F.relu(conv(x)).squeeze(3) for conv in self.convs]
193 |         # X[i] shape: [batch_size, filter_count, sentence_length - filter_size[i]]
194 |         x = [F.max_pool1d(conv, conv.size(2)).squeeze(2) for conv in x]
195 |         # X[i] shape: [batch_size, filter_count]
196 |         if self.dropout_type == "variational":
197 |             x, kld = self.dropout_before_flatten(torch.cat(x, dim=1))
198 |             kl_loss += kld.sum()
199 |         else:
200 |             x = self.dropout(torch.cat(x, dim=1))
201 |         # Fully Connected Layers
202 |         if self.use_batch_norm:
203 |             if self.dropout_type == "variational":
204 |                 x, kld = self.dropout_fc1(self.fc1_bn(F.relu(self.fc1(x))))
205 |                 kl_loss += kld.sum()
206 |             else:
207 |                 x = self.dropout(self.fc1_bn(F.relu(self.fc1(x))))
208 |             x = self.fc2_bn(self.fc2(x))
209 |         else:
210 |             if self.dropout_type == "variational":
211 |                 x, kld = self.dropout_fc1(F.relu(self.fc1(x)))
212 |                 kl_loss += kld.sum()
213 |             else:
214 |                 x = self.dropout(F.relu(self.fc1(x)))
215 |             x = self.fc2(x)
216 |         return x, kl_loss
217 | 
218 | 
219 | class CharCNN(nn.Module):
220 |     def __init__(self, args):
221 |         super(CharCNN, self).__init__()
222 | 
223 |         self.args_common = args["common_model_properties"]
224 |         self.args_specific = args["char_cnn"]
225 | 
226 |         # Device
227 |         self.device = self.args_common["device"]
228 | 
229 |         # Input/Output dimensions
230 |         self.vocab_size = self.args_common["vocab_size"]
231 |         self.embed_dim = self.args_common["embed_dim"]
232 |         self.num_class = self.args_common["num_class"]
233 | 
234 |         # Embedding parameters
235 |         self.padding_id = self.args_common["padding_id"]
236 | 
237 |         # Dropout type
238 |         self.dropout_type = self.args_specific["dropout_type"]
239 | 
240 |         # Dropout probabilities
241 |         self.keep_prob = self.args_specific["keep_prob"]
242 | 
243 |         # CharCNN specific parameters
244 |         self.max_sequence_length = self.args_specific["max_sequence_length"]
245 | 
246 |         if self.args_specific["feature_size"] == "large":
247 |             self.filter_count = 1024
248 |             self.linear_unit_count = 2048
249 |         elif self.args_specific["feature_size"] == "small":
250 |             self.filter_count = 256
251 |             self.linear_unit_count = 1024
252 |         else:
253 |             self.filter_count = self.args_specific["filter_count"]
254 |             self.linear_unit_count = self.args_specific["linear_unit_count"]
255 | 
256 |         self.filter_sizes = self.args_specific["filter_sizes"]
257 |         self.max_pool_kernels = self.args_specific["max_pool_kernels"]
258 | 
259 |         # Embedding initialization
260 |         # As the original CharCNN paper, I initialized char embeddings as one-hot vector.
261 |         self.embedding = nn.Embedding(self.vocab_size, self.embed_dim, padding_idx=self.padding_id).cpu()
262 |         self.embedding.weight.data = torch.eye(self.vocab_size, self.embed_dim)
263 |         self.embedding.weight.reqiures_grad = False
264 | 
265 |         # Convolution Layer 1
266 |         self.conv1 = nn.Conv1d(in_channels=self.embed_dim, out_channels=self.filter_count,
267 |                                kernel_size=self.filter_sizes[0])
268 |         self.pool1 = nn.MaxPool1d(kernel_size=self.max_pool_kernels[0])
269 | 
270 |         # Convolution Layer 2
271 |         self.conv2 = nn.Conv1d(in_channels=self.filter_count, out_channels=self.filter_count,
272 |                                kernel_size=self.filter_sizes[1])
273 |         self.pool2 = nn.MaxPool1d(kernel_size=self.max_pool_kernels[1])
274 | 
275 |         # Convolution Layer 3
276 |         self.conv3 = nn.Conv1d(in_channels=self.filter_count, out_channels=self.filter_count,
277 |                                kernel_size=self.filter_sizes[2])
278 | 
279 |         # Convolution Layer 4
280 |         self.conv4 = nn.Conv1d(in_channels=self.filter_count, out_channels=self.filter_count,
281 |                                kernel_size=self.filter_sizes[3])
282 | 
283 |         # Convolution Layer 5
284 |         self.conv5 = nn.Conv1d(in_channels=self.filter_count, out_channels=self.filter_count,
285 |                                kernel_size=self.filter_sizes[4])
286 | 
287 |         # Convolution Layer 6
288 |         self.conv6 = nn.Conv1d(in_channels=self.filter_count, out_channels=self.filter_count,
289 |                                kernel_size=self.filter_sizes[5])
290 |         self.pool3 = nn.MaxPool1d(kernel_size=self.max_pool_kernels[2])
291 | 
292 |         # Activation
293 |         self.relu = nn.ReLU()
294 | 
295 |         # Number of features after convolution blocks
296 |         num_features = (self.max_sequence_length - 96) // 27 * self.filter_count
297 | 
298 |         self.initialize_dropout(num_features)
299 | 
300 |         # Linear Block 1
301 |         self.linear1 = nn.Linear(num_features, self.linear_unit_count)
302 | 
303 |         # Linear Block 2
304 |         self.linear2 = nn.Linear(self.linear_unit_count, self.linear_unit_count)
305 | 
306 |         # Linear Block 3
307 |         self.linear3 = nn.Linear(self.linear_unit_count, self.num_class)
308 | 
309 |     def initialize_dropout(self, num_features):
310 |         # Dropout initialization
311 |         if self.dropout_type == "bernoulli" or self.dropout_type == "gaussian":
312 |             logger.info("> Dropout - %s", self.dropout_type)
313 |             self.dropout = Dropout(keep_prob=self.keep_prob, dimension=None, dropout_type=self.dropout_type).dropout
314 |         elif self.dropout_type == "variational":
315 |             logger.info("> Dropout - %s", self.dropout_type)
316 |             self.dropout = Dropout(keep_prob=self.keep_prob, dimension=num_features,
317 |                                    dropout_type=self.dropout_type).dropout
318 |         else:
319 |             logger.info("> Dropout - Bernoulli (You provide undefined dropout type!)")
320 |             self.dropout = Dropout(keep_prob=self.keep_prob, dimension=None, dropout_type="bernoulli").dropout
321 | 
322 |     def forward(self, batch):
323 |         kl_loss = torch.Tensor([0.0])
324 |         # Get batch size to beginning
325 |         x = batch.permute(1, 0)
326 |         # Embedding magic
327 |         x = self.embedding(x)
328 |         x = x.permute(0, 2, 1)
329 |         if "cuda" in str(self.device):
330 |             x = x.cuda()
331 |             kl_loss = kl_loss.cuda()
332 |         # To Convolution-Pooling
333 |         x = self.pool1(self.relu(self.conv1(x)))
334 |         x = self.pool2(self.relu(self.conv2(x)))
335 |         x = self.relu(self.conv3(x))
336 |         x = self.relu(self.conv4(x))
337 |         x = self.relu(self.conv5(x))
338 |         x = self.pool3(self.relu(self.conv6(x)))
339 |         # Flatten
340 |         x = x.view(x.size(0), -1)
341 |         # To Linear
342 |         if self.dropout_type == "variational":
343 |             x, kld = self.dropout(self.relu(self.linear1(x)))
344 |             kl_loss += kld.sum()
345 |         else:
346 |             x = self.dropout(self.relu(self.linear1(x)))
347 |         if self.dropout_type == "variational":
348 |             x, kld = self.dropout(self.relu(self.linear2(x)))
349 |             kl_loss += kld.sum()
350 |         else:
351 |             x = self.dropout(self.relu(self.linear2(x)))
352 |         x = self.linear3(x)
353 | 
354 |         return x, kl_loss
355 | 
356 | 
357 | class VDCNN(nn.Module):
358 |     def __init__(self, args):
359 |         super(VDCNN, self).__init__()
360 | 
361 |         self.args_common = args["common_model_properties"]
362 |         self.args_specific = args["vdcnn"]
363 | 
364 |         # Device
365 |         self.device = self.args_common["device"]
366 | 
367 |         # Input/Output dimensions
368 |         self.vocab_size = self.args_common["vocab_size"]
369 |         self.embed_dim = self.args_common["embed_dim"]
370 |         self.num_class = self.args_common["num_class"]
371 | 
372 |         # Embedding parameters
373 |         self.padding_id = self.args_common["padding_id"]
374 | 
375 |         # Condition parameters
376 |         self.use_pretrained_embed = self.args_common["use_pretrained_embed"]
377 |         self.use_shortcut = self.args_specific["use_shortcut"]
378 | 
379 |         # Pretrained embedding weights
380 |         self.pretrained_weights = self.args_common["pretrained_weights"]
381 | 
382 |         # Dropout probabilities
383 |         self.keep_prob = self.args_specific["keep_prob"]
384 |         # Dropout type
385 |         self.dropout_type = nn.Dropout(self.keep_prob)
386 | 
387 |         # Batch normalization parameters
388 |         self.batch_norm_momentum = self.args_common["batch_norm_momentum"]
389 |         self.batch_norm_affine = self.args_common["batch_norm_affine"]
390 | 
391 |         # Convolution parameters
392 |         self.depth = self.args_specific["depth"]
393 |         assert self.depth in [9, 17, 29, 49]
394 |         self.filter_counts = self.args_specific["filter_counts"]
395 |         self.filter_size = self.args_specific["filter_size"]
396 | 
397 |         # Downsampling parameters
398 |         self.downsampling_type = self.args_specific["downsampling_type"]
399 |         self.maxpool_filter_size = self.args_specific["maxpool_filter_size"]
400 |         self.k = self.args_specific["kmax"]
401 | 
402 |         self.embedding = nn.Embedding(self.vocab_size, self.embed_dim, padding_idx=self.padding_id).cpu()
403 | 
404 |         number_of_layers = self.initialize_number_of_layers()
405 |         layers = nn.ModuleList()
406 | 
407 |         first_conv_layer = nn.Conv1d(in_channels=self.embed_dim,
408 |                                      out_channels=self.filter_count[0],
409 |                                      kernel_size=self.filter_size,
410 |                                      padding=1)
411 |         layers.append(first_conv_layer)
412 | 
413 |         # Add second convolution layer block where input_size is self.filter_count[0], output_size is self.filter_count[0]
414 |         for n in range(number_of_layers[0]):
415 |             layers.append(LayerBlock(input_channel_size=self.filter_count[0],
416 |                                      filter_count=self.filter_count[0],
417 |                                      conv_filter_size=self.filter_size,
418 |                                      maxpool_filter_size=self.maxpool_filter_size,
419 |                                      kmax_k=self.k,
420 |                                      downsample_type=self.downsampling_type,
421 |                                      use_shortcut=self.use_shortcut))
422 | 
423 |         # Add third convolution layer block where input_size is self.filter_count[0], output_size is self.filter_count[1]
424 |         layers.append(LayerBlock(input_channel_size=self.filter_count[0],
425 |                                  filter_count=self.filter_count[1],
426 |                                  conv_filter_size=self.filter_size,
427 |                                  maxpool_filter_size=self.maxpool_filter_size,
428 |                                  kmax_k=self.k,
429 |                                  downsample_type=self.downsampling_type,
430 |                                  downsample=True,
431 |                                  use_shortcut=self.use_shortcut))
432 |         for n in range(number_of_layers[1] - 1):
433 |             layers.append(LayerBlock(input_channel_size=self.filter_count[1],
434 |                                      filter_count=self.filter_count[1],
435 |                                      conv_filter_size=self.filter_size,
436 |                                      maxpool_filter_size=self.maxpool_filter_size,
437 |                                      kmax_k=self.k,
438 |                                      downsample_type=self.downsampling_type,
439 |                                      use_shortcut=self.use_shortcut))
440 | 
441 |         # Add fourth convolution layer block where input_size is self.filter_count[1], output_size is self.filter_count[2]
442 |         layers.append(LayerBlock(input_channel_size=self.filter_count[1],
443 |                                  filter_count=self.filter_count[2],
444 |                                  conv_filter_size=self.filter_size,
445 |                                  maxpool_filter_size=self.maxpool_filter_size,
446 |                                  kmax_k=self.k,
447 |                                  downsample_type=self.downsampling_type,
448 |                                  downsample=True,
449 |                                  use_shortcut=self.use_shortcut))
450 |         for n in range(number_of_layers[2] - 1):
451 |             layers.append(LayerBlock(input_channel_size=self.filter_count[2],
452 |                                      filter_count=self.filter_count[2],
453 |                                      conv_filter_size=self.filter_size,
454 |                                      maxpool_filter_size=self.maxpool_filter_size,
455 |                                      kmax_k=self.k,
456 |                                      downsample_type=self.downsampling_type,
457 |                                      use_shortcut=self.use_shortcut))
458 | 
459 |         # Add fifth convolution layer block where input_size is self.filter_count[2], output_size is self.filter_count[3]
460 |         layers.append(LayerBlock(input_channel_size=self.filter_count[2],
461 |                                  filter_count=self.filter_count[3],
462 |                                  conv_filter_size=self.filter_size,
463 |                                  maxpool_filter_size=self.maxpool_filter_size,
464 |                                  kmax_k=self.k,
465 |                                  downsample_type=self.downsampling_type,
466 |                                  downsample=True,
467 |                                  use_shortcut=self.use_shortcut))
468 |         for n in range(number_of_layers[2] - 1):
469 |             layers.append(LayerBlock(input_channel_size=self.filter_count[3],
470 |                                      filter_count=self.filter_count[3],
471 |                                      conv_filter_size=self.filter_size,
472 |                                      maxpool_filter_size=self.maxpool_filter_size,
473 |                                      kmax_k=self.k,
474 |                                      downsample_type=self.downsampling_type,
475 |                                      use_shortcut=self.use_shortcut))
476 | 
477 |         self.all_conv_layers == nn.Sequential(*layers)
478 |         self.kmax_pooling == KMaxPooling(k=self.k)
479 | 
480 |         self.relu = nn.ReLU()
481 |         self.fc1 = nn.Linear(self.filter_counts[3] * self.k, 2048)
482 |         self.fc2 = nn.Linear(2048, 2048)
483 |         self.fc3 = nn.Linear(2048, self.num_class)
484 | 
485 |     def initialize_number_of_layers(self):
486 |         if self.depth == 9:
487 |             return [2] * 4
488 |         elif self.depth == 17:
489 |             return [4] * 4
490 |         elif self.depth == 29:
491 |             return [10, 10, 4, 4]
492 |         elif self.depth == 49:
493 |             return [16, 16, 10, 6]
494 | 
495 |     def forward(self, batch):
496 |         kl_loss = torch.Tensor([0.0])
497 |         # Get batch size to beginning
498 |         x = batch.permute(1, 0)
499 |         # Embedding magic
500 |         x = self.embedding(x)
501 |         x = x.permute(0, 2, 1)
502 |         if "cuda" in str(self.device):
503 |             x = x.cuda()
504 |             kl_loss = kl_loss.cuda()
505 |         x = self.all_conv_layers(x)
506 |         x = self.kmax_pooling(x)
507 |         x = x.view(x.size(0), -1)
508 |         x = self.relu(self.fc1(x))
509 |         x = self.relu(self.fc2(x))
510 |         x = self.fc3(x)
511 |         return x, kl_loss
512 | 
513 | 
514 | class ConvDeconvCNN(nn.Module):
515 |     def __init__(self, args):
516 |         super(ConvDeconvCNN, self).__init__()
517 | 
518 |         self.args = args["common_model_properties"]
519 | 
520 |         # Input/Output dimensions
521 |         self.vocab_size = self.args["vocab_size"]
522 |         self.embed_dim = self.args["embed_dim"]
523 | 
524 |         # Embedding parameters
525 |         self.padding_id = self.args["padding_id"]
526 | 
527 |         # Condition parameters
528 |         self.use_pretrained_embed = self.args["use_pretrained_embed"]
529 | 
530 |         # Pretrained embedding weights
531 |         self.pretrained_weights = self.args["pretrained_weights"]
532 | 
533 |         # Initialize embeddings
534 |         self.embedding = nn.Embedding(self.vocab_size, self.embed_dim, padding_idx=self.padding_id).cpu()
535 |         if self.use_pretrained_embed:
536 |             logger.info("> Pre-trained Embeddings")
537 |             self.embedding.from_pretrained(self.pretrained_weights)
538 | 
539 |         self.encoder = ConvolutionEncoder(args, self.embedding)
540 |         self.decoder = DeconvolutionDecoder(args, self.embedding)
541 |         self.classifier = FullyConnectedClassifier(args)
542 | 
543 |     def forward(self, x):
544 |         return self.encoder(self.decoder(self.classifier(x)))
545 | 


--------------------------------------------------------------------------------
/models/GRU.py:
--------------------------------------------------------------------------------
  1 | import logging.config
  2 | 
  3 | import torch
  4 | import torch.nn as nn
  5 | import torch.nn.functional as F
  6 | from torch.autograd import Variable
  7 | 
  8 | from dropout_models.dropout import Dropout
  9 | 
 10 | logging.config.fileConfig(fname='./config/config.logger', disable_existing_loggers=False)
 11 | logger = logging.getLogger("GRU")
 12 | 
 13 | 
 14 | class GRU(nn.Module):
 15 |     def __init__(self, args):
 16 |         super(GRU, self).__init__()
 17 |         self.args_common = args["common_model_properties"]
 18 |         self.args_specific = args["gru"]
 19 | 
 20 |         self.hidden_dim = self.args_specific["hidden_dim"]
 21 |         self.num_layers = self.args_specific["num_layers"]
 22 |         self.batch_size = self.args_common["batch_size"]
 23 | 
 24 |         self.vocab = self.args_common["vocab"]
 25 | 
 26 |         # Device
 27 |         self.device = self.args_common["device"]
 28 | 
 29 |         # Input/Output dimensions
 30 |         self.embed_num = self.args_common["vocab_size"]
 31 |         self.embed_dim = self.args_common["embed_dim"]
 32 |         self.num_class = self.args_common["num_class"]
 33 | 
 34 |         # Embedding parameters
 35 |         self.padding_id = self.args_common["padding_id"]
 36 | 
 37 |         # Condition parameters
 38 |         self.use_pretrained_embed = self.args_common["use_pretrained_embed"]
 39 |         self.embed_train_type = self.args_common["embed_train_type"]
 40 |         self.bidirectional = self.args_specific["bidirectional"]
 41 |         self.rnn_bias = self.args_specific["bias"]
 42 | 
 43 |         # Pretrained embedding weights
 44 |         self.pretrained_weights = self.args_common["pretrained_weights"]
 45 | 
 46 |         # Dropout type
 47 |         self.dropout_type = self.args_specific["dropout_type"]
 48 | 
 49 |         # Dropout probabilities
 50 |         self.keep_prob = self.args_specific["keep_prob"]
 51 | 
 52 |         self.embed = self.initialize_embeddings()
 53 | 
 54 |         # It is NOT the inner GRU dropout!
 55 |         self.dropout = self.initialize_dropout()
 56 | 
 57 |         self.gru = nn.GRU(self.embed_dim,
 58 |                           self.hidden_dim,
 59 |                           dropout=self.keep_prob,
 60 |                           num_layers=self.num_layers,
 61 |                           bidirectional=self.bidirectional,
 62 |                           bias=self.rnn_bias)
 63 | 
 64 |         self.hidden = self.init_hidden()
 65 | 
 66 |         if self.bidirectional is True:
 67 |             self.h2o = nn.Linear(self.hidden_dim * 2, self.num_class)
 68 |         else:
 69 |             self.h2o = nn.Linear(self.hidden_dim, self.num_class)
 70 | 
 71 |     def init_hidden(self, batch_size):
 72 |         if self.bidirectional is True:
 73 |             return Variable(torch.zeros((1, batch_size, self.hidden_dim * 2))).to(self.device)
 74 |         else:
 75 |             return Variable(torch.zeros((1, batch_size, self.hidden_dim))).to(self.device)
 76 | 
 77 |     def initialize_embeddings(self):
 78 |         logger.info("> Embeddings")
 79 |         embed = nn.Embedding(num_embeddings=self.embed_num,
 80 |                              embedding_dim=self.embed_dim,
 81 |                              padding_idx=self.padding_id).cpu()
 82 |         if self.use_pretrained_embed:
 83 |             logger.info("> Pre-trained Embeddings")
 84 |             embed.from_pretrained(self.pretrained_weights)
 85 |         else:
 86 |             logger.info("> Random Embeddings")
 87 |             random_embedding_weights = torch.rand(self.embed_num, self.embed_dim)
 88 |             embed.from_pretrained(random_embedding_weights)
 89 | 
 90 |         if self.embed_train_type == "static":
 91 |             logger.info("> Static Embeddings")
 92 |             embed.weight.requires_grad = False
 93 |         elif self.embed_train_type == "nonstatic":
 94 |             logger.info("> Non-Static Embeddings")
 95 |             embed.weight.requires_grad = True
 96 |         return embed
 97 | 
 98 |     def initialize_dropout(self):
 99 |         if self.dropout_type == "bernoulli" or self.dropout_type == "gaussian":
100 |             logger.info("> Dropout - %s", self.dropout_type)
101 |             return Dropout(keep_prob=self.keep_prob, dimension=None, dropout_type=self.dropout_type).dropout
102 |         elif self.dropout_type == "variational":
103 |             logger.info("> Dropout - %s", self.dropout_type)
104 |             return Dropout(keep_prob=self.keep_prob, dimension=self.hidden_dim,
105 |                            dropout_type=self.dropout_type).dropout
106 |         else:
107 |             logger.info("> Dropout - Bernoulli (You provide undefined dropout type!)")
108 |             return Dropout(keep_prob=self.keep_prob, dimension=None, dropout_type="bernoulli").dropout
109 | 
110 |     def forward(self, batch):
111 |         kl_loss = torch.Tensor([0.0])
112 | 
113 |         x = self.embed(batch)
114 |         x = self.dropout(x)
115 |         x = x.view(len(x), self.batch_size, -1)
116 | 
117 |         if "cuda" in str(self.device):
118 |             x = x.cuda()
119 |         out, self.hidden = self.gru(x, self.hidden)
120 |         out = torch.transpose(out, 0, 1)
121 |         out = torch.transpose(out, 1, 2)
122 | 
123 |         out = F.max_pool1d(input=out, kernel_size=out.size(2)).squeeze(2)
124 |         out = torch.tanh(out)
125 | 
126 |         out = self.h2o(out)
127 |         out = F.log_softmax(out, dim=1)
128 |         return out, kl_loss
129 | 


--------------------------------------------------------------------------------
/models/LSTM.py:
--------------------------------------------------------------------------------
  1 | import logging.config
  2 | 
  3 | import torch
  4 | import torch.nn as nn
  5 | import torch.nn.functional as F
  6 | from torch.autograd import Variable
  7 | 
  8 | from dropout_models.dropout import Dropout
  9 | from crf.CRF import ConditionalRandomField
 10 | 
 11 | logging.config.fileConfig(fname='./config/config.logger', disable_existing_loggers=False)
 12 | logger = logging.getLogger("LSTM")
 13 | 
 14 | 
 15 | class LSTMBase(nn.Module):
 16 |     def __init__(self, args):
 17 |         super(LSTMBase, self).__init__()
 18 |         self.args_common = args["common_model_properties"]
 19 |         self.args_specific = args["lstm"]
 20 | 
 21 |         self.hidden_dim = self.args_specific["hidden_dim"]
 22 |         self.num_layers = self.args_specific["num_layers"]
 23 |         self.batch_size = self.args_common["batch_size"]
 24 | 
 25 |         self.vocab = self.args_common["vocab"]
 26 | 
 27 |         # Device
 28 |         self.device = self.args_common["device"]
 29 | 
 30 |         # Input/Output dimensions
 31 |         self.embed_num = self.args_common["vocab_size"]
 32 |         self.embed_dim = self.args_common["embed_dim"]
 33 | 
 34 |         # Embedding parameters
 35 |         self.padding_id = self.args_common["padding_id"]
 36 | 
 37 |         # Condition parameters
 38 |         self.use_pretrained_embed = self.args_common["use_pretrained_embed"]
 39 |         self.embed_train_type = self.args_common["embed_train_type"]
 40 |         self.bidirectional = self.args_specific["bidirectional"]
 41 |         self.rnn_bias = self.args_specific["bias"]
 42 | 
 43 |         # Pretrained embedding weights
 44 |         self.pretrained_weights = self.args_common["pretrained_weights"]
 45 | 
 46 |         # Dropout type
 47 |         self.dropout_type = self.args_specific["dropout_type"]
 48 | 
 49 |         # Dropout probabilities
 50 |         self.keep_prob = self.args_specific["keep_prob"]
 51 | 
 52 |         self.embed = self.initialize_embeddings()
 53 | 
 54 |         # It is NOT the inner LSTM dropout!
 55 |         self.dropout = self.initialize_dropout()
 56 | 
 57 |         self.lstm = nn.LSTM(self.embed_dim,
 58 |                             self.hidden_dim,
 59 |                             dropout=self.keep_prob,
 60 |                             num_layers=self.num_layers,
 61 |                             bidirectional=self.bidirectional,
 62 |                             bias=self.rnn_bias,
 63 |                             batch_first=True)
 64 | 
 65 |         self.hidden = self.init_hidden(self.batch_size)
 66 | 
 67 |     def init_hidden(self, batch_size):
 68 |         if self.bidirectional is True:
 69 |             return (Variable(torch.zeros(self.num_layers * 2, batch_size, self.hidden_dim).to(self.device)),
 70 |                     Variable(torch.zeros(self.num_layers * 2, batch_size, self.hidden_dim).to(self.device)))
 71 |         else:
 72 |             return (Variable(torch.zeros(self.num_layers, batch_size, self.hidden_dim).to(self.device)),
 73 |                     Variable(torch.zeros(self.num_layers, batch_size, self.hidden_dim).to(self.device)))
 74 | 
 75 |     def initialize_embeddings(self):
 76 |         logger.info("> Embeddings")
 77 |         embed = nn.Embedding(num_embeddings=self.embed_num,
 78 |                              embedding_dim=self.embed_dim,
 79 |                              padding_idx=self.padding_id).cpu()
 80 |         if self.use_pretrained_embed:
 81 |             logger.info("> Pre-trained Embeddings")
 82 |             embed.from_pretrained(self.pretrained_weights)
 83 |         else:
 84 |             logger.info("> Random Embeddings")
 85 |             random_embedding_weights = torch.rand(self.embed_num, self.embed_dim)
 86 |             embed.from_pretrained(random_embedding_weights)
 87 | 
 88 |         if self.embed_train_type == "static":
 89 |             logger.info("> Static Embeddings")
 90 |             embed.weight.requires_grad = False
 91 |         elif self.embed_train_type == "nonstatic":
 92 |             logger.info("> Non-Static Embeddings")
 93 |             embed.weight.requires_grad = True
 94 |         return embed
 95 | 
 96 |     def initialize_dropout(self):
 97 |         if self.dropout_type == "bernoulli" or self.dropout_type == "gaussian":
 98 |             logger.info("> Dropout - %s", self.dropout_type)
 99 |             return Dropout(keep_prob=self.keep_prob, dimension=None, dropout_type=self.dropout_type).dropout
100 |         elif self.dropout_type == "variational":
101 |             logger.info("> Dropout - %s", self.dropout_type)
102 |             return Dropout(keep_prob=self.keep_prob, dimension=self.hidden_dim,
103 |                            dropout_type=self.dropout_type).dropout
104 |         else:
105 |             logger.info("> Dropout - Bernoulli (You provide undefined dropout type!)")
106 |             return Dropout(keep_prob=self.keep_prob, dimension=None, dropout_type="bernoulli").dropout
107 | 
108 | 
109 | class LSTM(LSTMBase):
110 |     def __init__(self, args):
111 |         super(LSTM, self).__init__(args)
112 | 
113 |         self.num_class = self.args_common["num_class"]
114 | 
115 |         if self.bidirectional is True:
116 |             self.h2o = nn.Linear(self.hidden_dim * 2, self.num_class)
117 |         else:
118 |             self.h2o = nn.Linear(self.hidden_dim, self.num_class)
119 | 
120 |     def forward(self, batch):
121 |         kl_loss = torch.Tensor([0.0])
122 | 
123 |         x = self.embed(batch)
124 |         x = self.dropout(x)
125 |         x = x.view(len(x), self.batch_size, -1)
126 | 
127 |         if "cuda" in str(self.device):
128 |             x = x.cuda()
129 | 
130 |         out, self.hidden = self.lstm(x, self.hidden)
131 |         out = torch.transpose(out, 0, 1)
132 |         out = torch.transpose(out, 1, 2)
133 | 
134 |         out = F.max_pool1d(input=out, kernel_size=out.size(2)).squeeze(2)
135 |         out = torch.tanh(out)
136 | 
137 |         out = self.h2o(out)
138 |         out = F.log_softmax(out, dim=1)
139 |         return out, kl_loss
140 | 
141 | 
142 | class LSTMCRF(LSTMBase):
143 |     def __init__(self, args):
144 |         super(LSTMCRF, self).__init__(args)
145 | 
146 |         self.num_tags = self.args_common["num_tags"]
147 | 
148 |         if self.bidirectional is True:
149 |             self.h2o = nn.Linear(self.hidden_dim * 2, self.num_tags)
150 |         else:
151 |             self.h2o = nn.Linear(self.hidden_dim, self.num_tags)
152 | 
153 |         self.crf = ConditionalRandomField(args).to(self.device)
154 | 
155 |     def forward(self, batch_x, batch_y):
156 |         kl_loss = torch.Tensor([0.0])
157 | 
158 |         x = batch_x.permute(1, 0)
159 |         y = batch_y.permute(1, 0)
160 | 
161 |         mask = x.data.gt(1).float()
162 |         input_lengths = mask.sum(1).int()
163 | 
164 |         x = self.embed(x)
165 |         if "cuda" in str(self.device):
166 |             x = x.cuda()
167 | 
168 |         x = nn.utils.rnn.pack_padded_sequence(x, input_lengths, batch_first=True)
169 |         h, _ = self.lstm(x, self.hidden)
170 |         h, _ = nn.utils.rnn.pad_packed_sequence(h, batch_first=True)
171 |         h = self.h2o(h)
172 |         h *= mask.unsqueeze(2)
173 | 
174 |         out = self.crf(h, y, mask)
175 | 
176 |         return out, kl_loss
177 | 
178 |     def decode(self, batch_x):
179 |         x = batch_x.permute(1, 0)
180 | 
181 |         mask = x.data.gt(1).float()
182 |         input_lengths = mask.sum(1).int()
183 | 
184 |         x = self.embed(x)
185 |         if "cuda" in str(self.device):
186 |             x = x.cuda()
187 | 
188 |         x = nn.utils.rnn.pack_padded_sequence(x, input_lengths, batch_first=True)
189 |         h, _ = self.lstm(x, self.hidden)
190 |         h, _ = nn.utils.rnn.pad_packed_sequence(h, batch_first=True)
191 |         h = self.h2o(h)
192 |         h *= mask.unsqueeze(2)
193 | 
194 |         return self.crf.viterbi_decode(h, mask)
195 | 


--------------------------------------------------------------------------------
/models/Transformer.py:
--------------------------------------------------------------------------------
  1 | import copy
  2 | import math
  3 | 
  4 | import matplotlib.pyplot as plt
  5 | import numpy as np
  6 | import torch
  7 | import torch.nn as nn
  8 | import torch.nn.functional as F
  9 | from torch.autograd import Variable
 10 | 
 11 | from utils.utils import clones
 12 | 
 13 | 
 14 | class LayerNormGoogle(nn.Module):
 15 |     def __init__(self, features, epsilon=1e-6):
 16 |         super(LayerNormGoogle, self).__init__()
 17 |         self.a_2 = nn.Parameter(torch.ones(features))
 18 |         self.b_2 = nn.Parameter(torch.zeros(features))
 19 |         self.epsilon = epsilon
 20 | 
 21 |     def forward(self, x):
 22 |         mean = x.mean(-1, keepdim=True)
 23 |         std = x.std(-1, keepdim=True)
 24 |         return self.a_2 * (x - mean) / (std + self.epsilon) + self.b_2
 25 | 
 26 | 
 27 | class EncoderBlockGoogle(nn.Module):
 28 |     def __init__(self, layer, num_layers):
 29 |         super(EncoderBlockGoogle, self).__init__()
 30 |         self.layers = clones(layer, num_layers)
 31 |         self.norm = LayerNormGoogle(layer.size)
 32 | 
 33 |     def forward(self, x, mask):
 34 |         for layer in self.layers:
 35 |             x = layer(x, mask)
 36 |         return self.norm(x)
 37 | 
 38 | 
 39 | class ResidualConnectionGoogle(nn.Module):
 40 |     def __init__(self, size, keep_prob):
 41 |         super(ResidualConnectionGoogle, self).__init__()
 42 |         self.norm = LayerNormGoogle(size)
 43 |         # TODO: Use dropout interface
 44 |         self.dropout = nn.Dropout(keep_prob)
 45 | 
 46 |     def forward(self, input, sublayer):
 47 |         return input + self.dropout(sublayer(self.norm(input)))
 48 | 
 49 | 
 50 | class EncoderLayerGoogle(nn.Module):
 51 |     def __init__(self, size, attention, feed_forward, keep_prob):
 52 |         super(EncoderLayerGoogle, self).__init__()
 53 |         self.size = size
 54 |         self.attention = attention
 55 |         self.feed_forward = feed_forward
 56 |         # Each encoder layer has two sublayers
 57 |         self.sublayer = clones(ResidualConnectionGoogle(size, keep_prob), 2)
 58 | 
 59 |     def forward(self, x, mask):
 60 |         x = self.sublayer[0](x, lambda x: self.attention(x, x, x, mask))
 61 |         return self.sublayer[1](x, self.feed_forward)
 62 | 
 63 | 
 64 | class EncoderClassifier(nn.Module):
 65 |     def __init__(self, embedding, encoder, classifier, device, is_average=True):
 66 |         super(EncoderClassifier, self).__init__()
 67 |         self.embedding = embedding
 68 |         self.encoder = encoder
 69 |         self.classifier = classifier
 70 |         self.device = device
 71 |         self.is_average = is_average
 72 | 
 73 |     def forward(self, x, mask=None):
 74 |         kl_loss = torch.Tensor([0.0])
 75 |         # Initial x.size() = [length, batch_size]
 76 |         x = x.permute(1, 0)
 77 |         # After permute x.size = [batch_size, length]
 78 |         x = self.embedding(x)
 79 |         if "cuda" in str(self.device):
 80 |             x = x.cuda()
 81 |             kl_loss = kl_loss.cuda()
 82 |         x = self.encoder(x, mask)
 83 |         if self.is_average:
 84 |             # Averaged sentence representation
 85 |             x = torch.mean(x, dim=1)
 86 |         x = self.classifier(x)
 87 |         return x, kl_loss
 88 | 
 89 | 
 90 | class Classifier(nn.Module):
 91 |     def __init__(self, d_model, d_hidden, num_classes, keep_prob):
 92 |         super(Classifier, self).__init__()
 93 |         self.linear1 = nn.Linear(d_model, d_hidden)
 94 |         self.dropout = nn.Dropout(keep_prob)
 95 |         self.relu = nn.ReLU()
 96 |         self.linear2 = nn.Linear(d_hidden, num_classes)
 97 | 
 98 |     def forward(self, x):
 99 |         x = self.dropout(self.relu(self.linear1(x)))
100 |         x = self.linear2(x)
101 |         return x
102 | 
103 | 
104 | class MultiHeadedAttentionGoogle(nn.Module):
105 |     def __init__(self, heads=8, d_model=512, keep_prob=0.1):
106 |         super(MultiHeadedAttentionGoogle, self).__init__()
107 |         assert d_model % heads == 0
108 |         self.d_k = d_model // heads
109 |         self.heads = heads
110 |         self.linears = clones(nn.Linear(d_model, d_model), 4)
111 |         self.attn = None
112 |         self.dropout = nn.Dropout(keep_prob)
113 | 
114 |     def attention(self, query, key, value, mask=None):
115 |         # Dot product attention
116 |         d_k = query.size(-1)
117 |         scores = torch.matmul(query, key.transpose(-2, -1)) / math.sqrt(d_k)
118 | 
119 |         if mask is not None:
120 |             scores = scores.masked_fill(mask == 0, -1e9)
121 | 
122 |         p_attn = F.softmax(scores, dim=-1)
123 | 
124 |         if self.dropout is not None:
125 |             p_attn = self.dropout(p_attn)
126 | 
127 |         return torch.matmul(p_attn, value), p_attn
128 | 
129 |     def forward(self, query, key, value, mask=None):
130 |         num_batches = query.size(0)
131 |         if mask is not None:
132 |             mask = mask.unsqueeze(1)
133 | 
134 |         # Apply linear projection on the input sequence and split the heads.
135 |         query, key, value = [linear(x).view(num_batches, -1, self.heads, self.d_k).transpose(1, 2)
136 |                              for linear, x in zip(self.linears, (query, key, value))]
137 | 
138 |         # Apply attention on the projected and splitted vectors
139 |         x, self.attn = self.attention(query, key, value, mask=mask)
140 | 
141 |         #  Concat vectors and apply linear
142 |         x = x.transpose(1, 2).contiguous().view(num_batches, -1, self.heads * self.d_k)
143 | 
144 |         return self.linears[-1](x)
145 | 
146 | 
147 | class PositionalFeedForwardGoogle(nn.Module):
148 |     def __init__(self, d_model, d_ff, keep_prob=0.1):
149 |         super(PositionalFeedForwardGoogle, self).__init__()
150 |         self.w_1 = nn.Linear(d_model, d_ff)
151 |         self.w_2 = nn.Linear(d_ff, d_model)
152 |         self.dropout = nn.Dropout(keep_prob)
153 |         self.relu = nn.ReLU()
154 | 
155 |     def forward(self, input):
156 |         return self.w_2(self.dropout(self.relu(self.w_1(input))))
157 | 
158 | 
159 | class Embeddings(nn.Module):
160 |     def __init__(self, embed_dim, vocab_size, padding_id, use_pretrained_embed, pretrained_weights,
161 |                  optional_sqrt_mul=False):
162 |         super(Embeddings, self).__init__()
163 |         # Initialize embeddings
164 |         self.embedding = nn.Embedding(vocab_size, embed_dim, padding_idx=padding_id).cpu()
165 |         if use_pretrained_embed:
166 |             self.embedding.from_pretrained(pretrained_weights)
167 |         self.embed_dim = embed_dim
168 |         self.optional_sqrt_mul = optional_sqrt_mul
169 | 
170 |     def forward(self, input):
171 |         if self.optional_sqrt_mul:
172 |             return self.embedding(input) * math.sqrt(self.embed_dim)
173 |         else:
174 |             return self.embedding(input)
175 | 
176 | 
177 | class PositionalEncodingGoogle(nn.Module):
178 |     def __init__(self, d_model, keep_prob=0.1, max_len=5000):
179 |         super(PositionalEncodingGoogle, self).__init__()
180 |         self.dropout = nn.Dropout(keep_prob)
181 | 
182 |         positional_encoding = torch.zeros(max_len, d_model)
183 |         pos = torch.arange(0., max_len).unsqueeze(1)
184 |         # Log space
185 |         div_term = torch.exp(torch.arange(0., d_model, 2) * (-math.log(10000) / d_model))
186 | 
187 |         positional_encoding[:, 0::2] = torch.sin(pos * div_term)
188 |         positional_encoding[:, 1::2] = torch.cos(pos * div_term)
189 | 
190 |         positional_encoding = positional_encoding.unsqueeze(0)
191 |         self.register_buffer("pe", positional_encoding)
192 | 
193 |     def forward(self, input):
194 |         return self.dropout(input + Variable(self.pe[:, :input.size(1)], requires_grad=False))
195 | 
196 | 
197 | class TransformerGoogle:
198 |     def __init__(self, args):
199 |         super(TransformerGoogle, self).__init__()
200 | 
201 |         self.args_common = args["common_model_properties"]
202 |         self.args_specific = args["transformer_google"]
203 | 
204 |         # Device
205 |         self.device = self.args_common["device"]
206 | 
207 |         # Input/Output dimensions
208 |         self.vocab_size = self.args_common["vocab_size"]
209 |         self.embed_dim = self.args_common["embed_dim"]
210 |         self.num_class = self.args_common["num_class"]
211 | 
212 |         # Embedding parameters
213 |         self.padding_id = self.args_common["padding_id"]
214 | 
215 |         # Condition parameters
216 |         self.use_pretrained_embed = self.args_common["use_pretrained_embed"]
217 |         self.use_embed_sqrt_mul = self.args_specific["use_embed_sqrt_mul"]
218 | 
219 |         # Pretrained embedding weights
220 |         self.pretrained_weights = self.args_common["pretrained_weights"]
221 | 
222 |         # Dropout probabilities for each individual part of the full model.
223 |         self.keep_prob_encoder = self.args_specific["keep_prob_encoder"]
224 |         self.keep_prob_pe = self.args_specific["keep_prob_pe"]
225 |         self.kee_prob_pff = self.args_specific["keep_prob_pff"]
226 |         self.keep_prob_attn = self.args_specific["keep_prob_attn"]
227 |         self.keep_prob_clf = self.args_specific["keep_prob_clf"]
228 | 
229 |         # Condition parameter for the transformer type (It only supports classification for now)
230 |         self.transformer_type = self.args_specific["transformer_type"]
231 | 
232 |         # Number of parallel attention layers for MultiHeadedAttention
233 |         self.heads = self.args_specific["heads"]
234 | 
235 |         # Number of encoder layers
236 |         self.num_encoder_layers = self.args_specific["num_encoder_layers"]
237 | 
238 |         # Number of hidden count units for Position-Wise Feed-Forward Network
239 |         self.num_hidden_pos_ff = self.args_specific["num_hidden_pos_ff"]
240 | 
241 |         # Maximum length of an input
242 |         self.max_length = self.args_specific["max_length"]
243 | 
244 |         if self.transformer_type == "classifier":
245 |             self.model = self.create_classifier_transformer()
246 |         else:
247 |             raise ValueError("Transformer can be created as classifier for now!")
248 | 
249 |     def create_classifier_transformer(self):
250 |         c = copy.deepcopy
251 | 
252 |         # Initialize individual parts of the full model
253 |         # attention = torch.nn.MultiheadAttention(num_heads=self.heads, embed_dim=self.embed_dim,
254 |         #                                         dropout=self.keep_prob_attn)
255 |         attention = MultiHeadedAttentionGoogle(heads=self.heads, d_model=self.embed_dim, keep_prob=self.keep_prob_attn)
256 | 
257 |         ff = PositionalFeedForwardGoogle(d_model=self.embed_dim, d_ff=self.num_hidden_pos_ff,
258 |                                          keep_prob=self.kee_prob_pff)
259 | 
260 |         embeddings = Embeddings(self.embed_dim, self.vocab_size, self.padding_id, self.use_pretrained_embed,
261 |                                 self.pretrained_weights, optional_sqrt_mul=self.use_embed_sqrt_mul)
262 | 
263 |         positional_embeddings = PositionalEncodingGoogle(d_model=self.embed_dim, keep_prob=self.keep_prob_pe,
264 |                                                          max_len=self.max_length)
265 | 
266 |         # Initialize the full model
267 |         model = EncoderClassifier(nn.Sequential(embeddings, c(positional_embeddings)),
268 |                                   EncoderBlockGoogle(
269 |                                       EncoderLayerGoogle(self.embed_dim, c(attention), c(ff), self.keep_prob_encoder),
270 |                                       self.num_encoder_layers),
271 |                                   Classifier(self.embed_dim, d_hidden=self.embed_dim // 2, num_classes=self.num_class,
272 |                                              keep_prob=self.keep_prob_clf),
273 |                                   device=self.device)
274 | 
275 |         # Initialize model parameters
276 |         for p in model.parameters():
277 |             if p.dim() > 1:
278 |                 nn.init.xavier_uniform_(p)
279 |         return model
280 | 
281 | 
282 | if __name__ == '__main__':
283 |     print("Transformer tests")
284 |     plt.figure(figsize=(15, 5))
285 |     pe = PositionalEncodingGoogle(20, 0)
286 |     y = pe.forward(Variable(torch.zeros(1, 100, 20)))
287 |     plt.plot(np.arange(100), y[0, :, 4:8].data.numpy())
288 |     plt.legend(["dim %d" % p for p in [4, 5, 6, 7]])
289 |     plt.show()
290 | 


--------------------------------------------------------------------------------
/models/Transformer_OpenAI.py:
--------------------------------------------------------------------------------
  1 | import copy
  2 | import math
  3 | 
  4 | import matplotlib.pyplot as plt
  5 | import numpy as np
  6 | import torch
  7 | import torch.nn as nn
  8 | import torch.nn.functional as F
  9 | from torch.autograd import Variable
 10 | 
 11 | from utils.utils import clones
 12 | 
 13 | 
 14 | 
 15 | class Embeddings(nn.Module):
 16 |     def __init__(self, embed_dim, vocab_size, keep_prob, padding_id, use_pretrained_embed, pretrained_weights):
 17 |         super(Embeddings, self).__init__()
 18 |         # Initialize embeddings
 19 |         self.embedding = nn.Embedding(vocab_size, embed_dim, padding_idx=padding_id).cpu()
 20 |         if use_pretrained_embed:
 21 |             self.load_pretrained_weights()
 22 |         self.embed_drop = nn.Dropout(keep_prob)
 23 | 
 24 |     def forward(self, input):
 25 |         x = self.embed_drop(self.embedding(input))
 26 |         out = x.sum(dim=2)
 27 |         return out
 28 | 
 29 | 
 30 | class LayerNorm(nn.Module):
 31 |     def __init__(self, features, epsilon=1e-5):
 32 |         self.a_2 = nn.Parameter(torch.ones(features))
 33 |         self.b_2 = nn.Parameter(torch.zeros(features))
 34 |         self.epsilon = epsilon
 35 | 
 36 |     def forward(self, x):
 37 |         mean = x.mean(-1, keepdim=True)
 38 |         std = x.std(-1, keepdim=True)
 39 |         return self.a_2 * (x - mean) / torch.sqrt(std + self.epsilon) + self.b_2
 40 | 
 41 | 
 42 | class MultiLayerPerceptron(nn.Module):
 43 |     def __init__(self, num_state, embed_dim, keep_prob):
 44 |         self.fc = nn.Conv1d(num_state, 1, embed_dim)
 45 |         self.proj = nn.Conv1d(embed_dim, 1, num_state)
 46 |         self.activation = nn.ReLU()
 47 |         self.dropout = nn.Dropout(keep_prob)
 48 | 
 49 |     def forward(self, input):
 50 |         x = self.activation(self.fc(input))
 51 |         x = self.dropout(self.proj(x))
 52 |         return x
 53 | 
 54 | 
 55 | class ModifiedMultiHeadedAttention(nn.Module):
 56 |     def __init__(self, num_state, n_ctx, num_heads, keep_prob_attention, keep_prob_residual, scale=False):
 57 |         assert num_state % num_heads == 0
 58 |         self.bias = torch.tril(torch.ones(n_ctx, n_ctx)).view(1, 1, n_ctx, n_ctx)
 59 |         self.num_heads = num_heads
 60 |         self.split_size = num_state
 61 |         self.scale = scale
 62 |         self.attn = nn.Conv1d(num_state * 3, 1, num_state)
 63 |         self.proj = nn.Conv1d(num_state, 1, num_state)
 64 |         self.attn_dropout = nn.Dropout(keep_prob_attention)
 65 |         self.residual_dropout = nn.Dropout(keep_prob_residual)
 66 | 
 67 |     def attention(self, query, key, value):
 68 |         weight = torch.matmul(query, key)
 69 |         if self.scale:
 70 |             weight = weight / math.sqrt(value.size(-1))
 71 | 
 72 |         # Mask attention weights
 73 |         bias = self.bias[:, :, :weight.size(-2), :weight.size(-1)]
 74 |         weight = weight * bias - 1e9 * (1 - bias)
 75 | 
 76 |         p_attn = F.softmax(weight, dim=-1)
 77 |         if self.attn_dropout is not None:
 78 |             p_attn = self.attn_dropout(p_attn)
 79 |         return torch.matmul(p_attn, value)
 80 | 
 81 |     # Direct c/p from huggingface, which is the equivalent of original tensorflow implementation.
 82 |     def merge_heads(self, x):
 83 |         x = x.permute(0, 2, 1, 3)
 84 |         new_x_shape = x.size()[:-2] + (x.size(-2) * x.size(-1),)
 85 |         return x.view(*new_x_shape)
 86 | 
 87 |     # Direct c/p from huggingface, which is the equivalent of original tensorflow implementation.
 88 |     def split_heads(self, x, is_key=False):
 89 |         new_x_shape = x.size()[:-1] + (self.n_head, x.size(-1) // self.n_head)
 90 |         x = x.view(*new_x_shape)
 91 |         if is_key:
 92 |             return x.permute(0, 2, 3, 1)
 93 |         else:
 94 |             return x.permute(0, 2, 1, 3)
 95 | 
 96 |     def forward(self, input):
 97 |         x = self.attn(input)
 98 |         query, key, value = x.split(self.split_size, dim=2)
 99 |         query = self.split_heads(query)
100 |         key = self.split_heads(key, is_key=True)
101 |         value = self.split_heads(value)
102 |         out = self.proj(self.merge_heads(self.attention(query, key, value)))
103 |         return self.residual_dropout(out)
104 | 
105 | 
106 | class Block(nn.Module):
107 |     def __init__(self, embed_dim, num_heads, keep_prob_attention, keep_prob_residual, keep_prob_mlp, n_ctx=512,
108 |                  scale=False, use_builtin_mha=False):
109 |         if use_builtin_mha:
110 |             self.attention = nn.MultiheadAttention(embed_dim=embed_dim,
111 |                                                    num_heads=num_heads,
112 |                                                    dropout=keep_prob_attention)
113 |         else:
114 |             self.attention = ModifiedMultiHeadedAttention(num_state=embed_dim,
115 |                                                           n_ctx=n_ctx,
116 |                                                           num_heads=num_heads,
117 |                                                           keep_prob_attention=keep_prob_attention,
118 |                                                           keep_prob_residual=keep_prob_residual,
119 |                                                           scale=scale)
120 |         self.layer_norm1 = LayerNorm(embed_dim)
121 |         self.mlp = MultiLayerPerceptron(4 * embed_dim, embed_dim, keep_prob_mlp)
122 |         self.layer_norm2 = LayerNorm(embed_dim)
123 | 
124 |     def forward(self, input):
125 |         x = self.attn(input)
126 |         x_hat = self.ln_1(input + x)
127 |         x = self.mlp(x_hat)
128 |         x = self.ln_2(x_hat + x)
129 |         return x
130 | 
131 | class LanguageModelHead(nn.Module):
132 |     def __init__(self, embedding, embed_dim):
133 |         super(LanguageModelHead, self).__init__()
134 |         self.embed_dim = embed_dim
135 |         self.decoder = nn.Linear(in_features=embedding.embedding.weight.shape[1],
136 |                                 out_features=embedding.embedding.weight.shape[0],
137 |                                 bias=True)
138 |         self.decoder.weight = embedding.embedding.weight
139 | 
140 |     def forward(self, input):
141 |         # Remove last token
142 |         x = input[:, :-1].view(-1, self.embed_dim)
143 |         x = self.decoder(x)
144 |         return x
145 | 
146 | 
147 | class TransformerOpenAI:
148 |     def __init__(self, args):
149 |         super(TransformerOpenAI, self).__init__()
150 | 
151 |         self.args_common = args["common_model_properties"]
152 |         self.args_specific = args["transformer_openai"]
153 | 
154 |         # Device
155 |         self.device = self.args_common["device"]
156 | 
157 |         # Input/Output dimensions
158 |         self.vocab_size = self.args_common["vocab_size"]
159 |         self.embed_dim = self.args_common["embed_dim"]
160 |         self.num_class = self.args_common["num_class"]
161 | 
162 |         # Embedding parameters
163 |         self.padding_id = self.args_common["padding_id"]
164 | 
165 |         # Condition parameters
166 |         self.use_pretrained_embed = self.args_common["use_pretrained_embed"]
167 | 
168 |         # Model/Context size
169 |         self.d_model = self.args_specific["d_model"]
170 | 
171 |         # Dropout probabilities for each individual part of the full model.
172 |         self.keep_prob_embed = self.args_specific["keep_prob_embed"]
173 | 
174 |         # Number of parallel attention layers for MultiHeadedAttention
175 |         self.heads = self.args_specific["heads"]
176 | 
177 |         # Number of layers in terms of Blocks
178 |         self.num_layers = self.args_specific["num_layers"]
179 | 
180 |         if self.transformer_type == "classifier":
181 |             self.model = self.create_classifier_transformer()
182 |         else:
183 |             raise ValueError("Transformer can be created as classifier for now!")
184 | 
185 |     def create_classifier_transformer(self):
186 |         c = copy.deepcopy
187 | 
188 |         embedding = Embeddings(self.embed_dim, self.vocab_size, self.keep_prob_embed, self.padding_id,
189 |                                self.use_pretrained_embed, self.pretrained_weights)
190 | 
191 | 
192 | 


--------------------------------------------------------------------------------
/models/Util_CNN.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | import torch.nn.functional as F
  4 | 
  5 | 
  6 | class ConvolutionalBlock(nn.Module):
  7 |     def __init__(self, input_channel_size, filter_count, filter_size, stride):
  8 |         super(ConvolutionalBlock, self).__init__()
  9 |         relu = nn.ReLU()
 10 |         bn = nn.BatchNorm1d(num_features=filter_count)
 11 |         conv1 = nn.Conv1d(in_channels=input_channel_size,
 12 |                           out_channels=filter_count,
 13 |                           kernel_size=filter_size,
 14 |                           stride=stride,
 15 |                           padding=1)
 16 |         conv2 = nn.Conv1d(in_channels=filter_count,
 17 |                           out_channels=filter_count,
 18 |                           kernel_size=filter_size,
 19 |                           stride=1,
 20 |                           padding=1)
 21 |         self.block = nn.Sequential(conv1, bn, relu, conv2, bn, relu)
 22 | 
 23 |     def forward(self, input):
 24 |         return self.block(input)
 25 | 
 26 | 
 27 | class KMaxPooling(nn.Module):
 28 |     def __init__(self, k):
 29 |         super(KMaxPooling, self).__init__()
 30 |         assert 1 < k
 31 |         self.k = k
 32 | 
 33 |     def forward(self, input):
 34 |         kmax, _ = input.topk(input.shape(2) // self.k, dim=2)
 35 |         return kmax
 36 | 
 37 | 
 38 | class LayerBlock(nn.Module):
 39 |     def __init__(self, input_channel_size, filter_count, conv_filter_size, maxpool_filter_size, kmax_k=2,
 40 |                  downsample=False, downsample_type="resnet", use_shortcut=True):
 41 |         super(LayerBlock, self).__init__()
 42 |         self.downsample = downsample
 43 |         self.use_shortcut = use_shortcut
 44 | 
 45 |         self.pool = None
 46 |         stride = 1
 47 |         if self.downsample:
 48 |             if downsample_type == "resnet":
 49 |                 stride = 2
 50 |             elif downsample_type == "vgg":
 51 |                 self.pool = nn.MaxPool1d(kernel_size=maxpool_filter_size, stride=2, padding=1)
 52 |             elif downsample_type == "kmax":
 53 |                 self.pool = self.KMaxPooling(k=kmax_k)
 54 |             else:
 55 |                 raise KeyError("Downsample_type can be (1) resnet, (2) vgg, or (3) kmax")
 56 | 
 57 |         self.convolutional_block = self.ConvolutionalBlock(input_channel_size=input_channel_size,
 58 |                                                            filter_count=filter_count,
 59 |                                                            filter_size=conv_filter_size,
 60 |                                                            stride=stride)
 61 | 
 62 |         if use_shortcut and self.downsample:
 63 |             self.shortcut = nn.Conv1d(in_channels=input_channel_size,
 64 |                                       out_channels=filter_count,
 65 |                                       kernel_size=1,
 66 |                                       stride=2)
 67 | 
 68 |     def forward(self, input):
 69 |         residual = input
 70 |         if self.downsample and self.pool:
 71 |             x = self.pool(input)
 72 |         x = self.convolutional_block(x)
 73 | 
 74 |         if self.downsample and self.use_shortcut:
 75 |             residual = self.shortcut(residual)
 76 | 
 77 |         if self.use_shortcut:
 78 |             x += residual
 79 |         return x
 80 | 
 81 | 
 82 | class ConvolutionEncoder(nn.Module):
 83 |     def __init__(self, args, embedding):
 84 |         super(ConvolutionEncoder, self).__init__()
 85 |         self.args_common = args["common_model_properties"]
 86 |         self.args_specific = args["conv_deconv_cnn"]
 87 | 
 88 |         # Device
 89 |         self.device = self.args_common["device"]
 90 | 
 91 |         # Input/Output dimensions
 92 |         self.embed_dim = self.args_common["embed_dim"]
 93 | 
 94 |         # Condition parameters
 95 |         self.use_batch_norm = self.args_common["use_batch_norm"]
 96 | 
 97 |         # Batch normalization parameters
 98 |         self.batch_norm_momentum = self.args_common["batch_norm_momentum"]
 99 |         self.batch_norm_affine = self.args_common["batch_norm_affine"]
100 | 
101 |         # Convolution parameters
102 |         self.input_channel = 1
103 |         self.filter_counts = self.args_specific["filter_counts"]
104 |         self.filter_sizes = self.args_specific["filter_sizes"]
105 |         self.strides = self.args_specific["strides"]
106 | 
107 |         self.embedding = embedding
108 | 
109 |         # Initialize convolutions
110 |         self.conv1 = nn.Conv2d(in_channels=self.input_channel,
111 |                                out_channels=self.filter_counts[0],
112 |                                kernel_size=(self.filter_sizes[0], self.embed_dim),
113 |                                stride=self.strides[0],
114 |                                bias=True)
115 |         self.conv2 = nn.Conv2d(in_channels=self.filter_counts[0],
116 |                                out_channels=self.filter_counts[1],
117 |                                kernel_size=(self.filter_sizes[1], 1),
118 |                                stride=self.strides[1],
119 |                                bias=True)
120 |         self.conv3 = nn.Conv2d(in_channels=self.filter_counts[1],
121 |                                out_channels=self.filter_counts[2],
122 |                                kernel_size=(self.filter_sizes[2], 1),
123 |                                stride=self.strides[2],
124 |                                bias=True)
125 | 
126 |         # Initialize batch norms
127 |         if self.use_batch_norm:
128 |             self.conv1_bn = nn.BatchNorm2d(num_features=self.filter_counts[0],
129 |                                            momentum=self.batch_norm_momentum,
130 |                                            affine=self.batch_norm_affine)
131 |             self.conv2_bn = nn.BatchNorm2d(num_features=self.filter_counts[1],
132 |                                            momentum=self.batch_norm_momentum,
133 |                                            affine=self.batch_norm_affine)
134 | 
135 |         # Well, self-explanatory.
136 |         self.relu = nn.ReLU()
137 | 
138 |     def forward(self, batch):
139 |         batch_permuted = batch.permute(1, 0)
140 |         h = self.embed(batch_permuted)
141 |         if "cuda" in str(self.device):
142 |             h = h.cuda()
143 | 
144 |         if self.use_batch_norm:
145 |             h = self.relu(self.conv1_bn(self.conv1(h)))
146 |             h = self.relu(self.conv2_bn(self.conv2(h)))
147 |             h = self.relu(self.conv3(h))
148 |         else:
149 |             h = self.relu(self.conv1(h))
150 |             h = self.relu(self.conv2(h))
151 |             h = self.relu(self.conv3(h))
152 | 
153 |         return h
154 | 
155 | 
156 | class DeconvolutionDecoder(nn.Module):
157 |     def __init__(self, args, embedding):
158 |         super(DeconvolutionDecoder, self).__init__()
159 |         self.args_common = args["common_model_properties"]
160 |         self.args_specific = args["conv_deconv_cnn"]
161 | 
162 |         # Device
163 |         self.device = self.args_common["device"]
164 | 
165 |         # Input/Output dimensions
166 |         self.embed_dim = self.args_common["embed_dim"]
167 | 
168 |         # Condition parameters
169 |         self.use_batch_norm = self.args_common["use_batch_norm"]
170 | 
171 |         # Batch normalization parameters
172 |         self.batch_norm_momentum = self.args_common["batch_norm_momentum"]
173 |         self.batch_norm_affine = self.args_common["batch_norm_affine"]
174 | 
175 |         # Convolution parameters
176 |         self.input_channel = 1
177 |         self.filter_counts = list(reversed(self.args_specific["filter_counts"]))
178 |         self.filter_sizes = list(reversed(self.args_specific["filter_sizes"]))
179 |         self.strides = list(reversed(self.args_specific["strides"]))
180 |         self.temperature = args["deconv_temperature"]
181 | 
182 |         self.embedding = embedding
183 | 
184 |         # Initialize deconvolutions
185 |         self.deconv1 = nn.ConvTranspose2d(in_channels=self.filter_counts[0],
186 |                                           out_channels=self.filter_counts[1],
187 |                                           kernel_size=(self.filter_sizes[0], 1),
188 |                                           stride=self.strides[0],
189 |                                           bias=True)
190 |         self.deconv2 = nn.ConvTranspose2d(in_channels=self.filter_counts[1],
191 |                                           out_channels=self.filter_counts[2],
192 |                                           kernel_size=(self.filter_sizes[1], 1),
193 |                                           stride=self.strides[1],
194 |                                           bias=True)
195 |         self.deconv3 = nn.ConvTranspose2d(in_channels=self.filter_counts[2],
196 |                                           out_channels=self.input_channel,
197 |                                           kernel_size=(self.filter_sizes[2], self.embed_dim),
198 |                                           stride=self.strides[2],
199 |                                           bias=True)
200 | 
201 |         # Initialize batch norms
202 |         if self.use_batch_norm:
203 |             self.deconv1_bn = nn.BatchNorm2d(num_features=self.filter_counts[0],
204 |                                              momentum=self.batch_norm_momentum,
205 |                                              affine=self.batch_norm_affine)
206 |             self.deconv2_bn = nn.BatchNorm2d(num_features=self.filter_counts[1],
207 |                                              momentum=self.batch_norm_momentum,
208 |                                              affine=self.batch_norm_affine)
209 | 
210 |         # Well, self-explanatory.
211 |         self.relu = nn.ReLU()
212 | 
213 |     def forward(self, h):
214 |         if self.use_batch_norm:
215 |             x_ = self.relu(self.deconv1_bn(self.deconv1(h)))
216 |             x_ = self.relu(self.deconv2_bn(self.deconv2(x_)))
217 |             x_ = self.relu(self.deconv3(x_))
218 |         else:
219 |             x_ = self.relu(self.deconv1(h))
220 |             x_ = self.relu(self.deconv2(x_))
221 |             x_ = self.relu(self.deconv3(x_))
222 | 
223 |         x_ = x_.squeeze()
224 | 
225 |         # p(w^t = v): Probability of w^t to be word v, as w^t is the t'th word of the reconstructed sentence.
226 |         normalized_x_ = torch.norm(x_, p=2, dim=2, keepdim=True)
227 |         reconstructed_x_ = x_ / normalized_x_
228 | 
229 |         normalized_w = (nn.Variable(self.embedding.weight.data).t()).unsqueeze(0)
230 |         normalized_w = normalized_w.expand(reconstructed_x_.size(0), *normalized_w.size())
231 |         probs = torch.bmm(reconstructed_x_, normalized_w) / self.temperature
232 |         # Reconstruction log probabilities (not loss)
233 |         return F.log_softmax(probs, dim=2)
234 | 
235 | 
236 | class FullyConnectedClassifier(nn.Module):
237 |     def __init__(self, args):
238 |         super(FullyConnectedClassifier, self).__init__()
239 |         self.args_common = args["common_model_properties"]
240 |         self.args_specific = args["conv_deconv_cnn"]
241 | 
242 |         # This block is not configurable for any network architecture!
243 |         # It is designed for Conv-Deconv CNN, hence its input size is the output size of the Encoder CNN.
244 |         self.input_size = self.args_specific["filter_counts"][2]
245 |         self.hidden_layer_size = self.args_specific["hidden_layer_size"]
246 |         self.num_class = self.args_common["num_class"]
247 |         self.keep_prob = self.args_common["keep_prob"]
248 | 
249 |         self.fc1 = nn.Linear(self.input_size, self.hidden_layer_size)
250 |         self.fc2 = nn.Linear(self.hidden_layer_size, self.num_class)
251 | 
252 |         self.dropout = nn.Dropout(self.keep_prob)
253 | 
254 |     def forward(self, input):
255 |         x = self.dropout(self.fc1(input))
256 |         x = self.fc2(x)
257 |         # Supervised log probabilities
258 |         return F.log_softmax(x, dim=1)
259 | 


--------------------------------------------------------------------------------
/models/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hbahadirsahin/nlp-experiments-in-pytorch/5d05a53146dffd707e4d037230656f980d7be05c/models/__init__.py


--------------------------------------------------------------------------------
/optimizer/custom_optimizer.py:
--------------------------------------------------------------------------------
  1 | import math
  2 | 
  3 | import matplotlib.pyplot as plt
  4 | import numpy as np
  5 | import torch
  6 | from torch.nn.utils import clip_grad_norm_
  7 | from torch.optim import Optimizer
  8 | 
  9 | 
 10 | def warmup_cosine(x, warmup=0.002):
 11 |     s = 0
 12 |     if x < warmup:
 13 |         s = 1
 14 |     return s * (x / warmup) + (1 - s) * (0.5 * (1 + torch.cos(math.pi * x)))
 15 | 
 16 | 
 17 | def warmup_constant(x, warmup=0.002):
 18 |     s = 0
 19 |     if x < warmup:
 20 |         s = 1
 21 |     return s * (x / warmup) + (1 - s)
 22 | 
 23 | 
 24 | def warmup_linear(x, warmup=0.002):
 25 |     s = 0
 26 |     if x < warmup:
 27 |         s = 1
 28 |     return (s * (x / warmup) + (1 - s)) * (1 - x)
 29 | 
 30 | 
 31 | SCHEDULES = {
 32 |     "cos": warmup_cosine,
 33 |     "constant": warmup_constant,
 34 |     "linear": warmup_linear,
 35 | }
 36 | 
 37 | 
 38 | class NoamOptimizer():
 39 |     # Direct c/p from Attention is All You Need notebook (famous Harvard's one)
 40 |     def __init__(self, model_size, factor, warmup, optimizer):
 41 |         self.optimizer = optimizer
 42 |         self._step = 0
 43 |         self.warmup = warmup
 44 |         self.factor = factor
 45 |         self.model_size = model_size
 46 |         self._rate = 0
 47 | 
 48 |     def step(self):
 49 |         # Update parameters and rate
 50 |         self._step += 1
 51 |         rate = self.rate()
 52 |         for p in self.optimizer.param_groups:
 53 |             p["lr"] = rate
 54 |         self._rate = rate
 55 |         self.optimizer.step()
 56 | 
 57 |     def rate(self, step=None):
 58 |         if step is None:
 59 |             step = self._step
 60 |         return self.factor * (self.model_size ** (-0.5) * min(step ** (-0.5), step * self.warmup ** (-1.5)))
 61 | 
 62 | 
 63 | class OpenAIAdam(Optimizer):
 64 |     # Referance to https://github.com/huggingface
 65 |     # Open AI version of Adam with weight decay
 66 |     def __init__(self, params, lr, schedule, warmup, t_total, b1=0.9, b2=0.999, e=1e-8, l2=0, vector_l2=False,
 67 |                  max_grad_norm=-1, **kwargs):
 68 |         assert 0 < lr
 69 |         assert schedule == "cos" or schedule == "constant" or schedule == "linear"
 70 |         assert 0 < warmup
 71 |         assert 0 < b1 <= 1.0
 72 |         assert 0 < b2 <= 1.0
 73 |         assert 0 < e
 74 | 
 75 |         defaults = dict(lr=lr, schedule=schedule, warmup=warmup, t_total=t_total, b1=b1, b2=b2, e=e, l2=l2,
 76 |                         vector_l2=vector_l2, max_grad_norm=max_grad_norm)
 77 | 
 78 |         super(OpenAIAdam, self).__init__(params, defaults)
 79 | 
 80 |     def step(self, closure=None):
 81 |         # Performs a single optimization step
 82 |         loss = None
 83 |         if closure is not None:
 84 |             loss = closure()
 85 | 
 86 |         for group in self.param_groups:
 87 |             for p in group["params"]:
 88 |                 if p.grad is None:
 89 |                     continue
 90 |                 grad = p.grad.date
 91 |                 if grad.is_sparse:
 92 |                     raise RuntimeError("Sparse gradients. Use SparseAdam")
 93 | 
 94 |                 state = self.state[p]
 95 | 
 96 |                 # State initialization
 97 |                 if len(state) == 0:
 98 |                     state["step"] = 0
 99 |                     # Exponential moving average of gradient values
100 |                     state["exp_avg"] = torch.zeros_like(p.data)
101 |                     # Exponential moving average of squared gradient values
102 |                     state["exp_avg_sq"] = torch.zeros_like(p.data)
103 | 
104 |                 exp_avg, exp_avg_sq = state["exp_avg"], state["exp_avg_sq"]
105 |                 beta1, beta2 = group["b1"], group["b2"]
106 | 
107 |                 state["step"] += 1
108 | 
109 |                 # Add grad clipping
110 |                 if group["max_grad_norm"] > 0:
111 |                     clip_grad_norm_(p, group["max_grad_norm"])
112 | 
113 |                 # Decay the first and second moment running average coefficient
114 |                 exp_avg.mul_(beta1).add_(1 - beta1, grad)
115 |                 exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad)
116 |                 denom = exp_avg_sq.sqrt().add_(group["e"])
117 | 
118 |                 bias_correction1 = 1 - beta1 ** state["step"]
119 |                 bias_correction2 = 1 - beta2 ** state["step"]
120 | 
121 |                 schedule_fn = SCHEDULES(group["schedule"])
122 |                 lr_scheduled = group["lr"] * schedule_fn(state["state"] / group["t_total"], group["warmup"])
123 |                 step_size = lr_scheduled * math.sqrt(bias_correction2) / bias_correction1
124 | 
125 |                 p.data.addcdiv_(-step_size, exp_avg, denom)
126 | 
127 |                 # Add weight decay at the end (fixed version)
128 |                 if (len(p.size()) > 1 or group["vector_l2"]) and group["l2"] > 0:
129 |                     p.data.add_(-lr_scheduled * group["l2"], p.data)
130 |         return loss
131 | 
132 | 
133 | class Padam(Optimizer):
134 |     """Partially Adaptive Momentum Estimation algorithm"""
135 | 
136 |     def __init__(self, params, lr, amsgrad, e=1e-8, b1=0.9, b2=0.999, partial=0.25, weight_decay=0, max_grad_norm=-1,
137 |                  **kwargs):
138 |         assert 0 < lr
139 |         assert 0 < b1 <= 1.0
140 |         assert 0 < b2 <= 1.0
141 |         assert 0 < e
142 |         assert 0 < partial <= 0.5
143 |         defaults = dict(lr=lr, b1=b1, b2=b2, e=e, amsgrad=amsgrad, partial=partial, weight_decay=weight_decay,
144 |                         max_grad_norm=max_grad_norm)
145 |         super(Padam, self).__init__(params, defaults)
146 | 
147 |     def step(self, closure=None):
148 |         # Performs a single optimization step
149 |         loss = None
150 |         if closure is not None:
151 |             loss = closure()
152 | 
153 |         for group in self.param_groups:
154 |             for p in group['params']:
155 |                 if p.grad is None:
156 |                     continue
157 |                 grad = p.grad.data
158 |                 if grad.is_sparse:
159 |                     raise RuntimeError("Sparse gradients. Use SparseAdam")
160 | 
161 |                 amsgrad = group['amsgrad']
162 |                 partial = group['partial']
163 | 
164 |                 state = self.state[p]
165 | 
166 |                 # State initialization
167 |                 if len(state) == 0:
168 |                     state['step'] = 0
169 |                     # Exponential moving average of gradient values
170 |                     state['exp_avg'] = torch.zeros_like(p.data)
171 |                     # Exponential moving average of squared gradient values
172 |                     state['exp_avg_sq'] = torch.zeros_like(p.data)
173 |                     if amsgrad:
174 |                         # Maintains max of all exp. moving avg. of sq. grad. values
175 |                         state['max_exp_avg_sq'] = torch.zeros_like(p.data)
176 | 
177 |                 exp_avg, exp_avg_sq = state["exp_avg"], state["exp_avg_sq"]
178 |                 beta1, beta2 = group["b1"], group["b2"]
179 |                 if amsgrad:
180 |                     max_exp_avg_sq = state['max_exp_avg_sq']
181 |                 state["step"] += 1
182 | 
183 |                 # Add grad clipping
184 |                 if group["max_grad_norm"] > 0:
185 |                     clip_grad_norm_(p, group["max_grad_norm"])
186 | 
187 |                 if group['weight_decay'] != 0:
188 |                     grad = grad.add(group['weight_decay'], p.data)
189 | 
190 |                 # Decay the first and second moment running average coefficient
191 |                 exp_avg.mul_(beta1).add_(1 - beta1, grad)
192 |                 exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad)
193 |                 if amsgrad:
194 |                     # Maintains the maximum of all 2nd moment running avg. till now
195 |                     torch.max(max_exp_avg_sq, exp_avg_sq, out=max_exp_avg_sq)
196 |                     # Use the max. for normalizing running avg. of gradient
197 |                     denom = max_exp_avg_sq.sqrt().add_(group['e'])
198 |                 else:
199 |                     denom = exp_avg_sq.sqrt().add_(group["e"])
200 | 
201 |                 bias_correction1 = 1 - beta1 ** state["step"]
202 |                 bias_correction2 = 1 - beta2 ** state["step"]
203 |                 step_size = group['lr'] * math.sqrt(bias_correction2) / bias_correction1
204 | 
205 |                 p.data.addcdiv_(-step_size, exp_avg, denom ** (partial * 2))
206 |         return loss
207 | 
208 | 
209 | if __name__ == '__main__':
210 |     opts = [NoamOptimizer(512, 1, 4000, None),
211 |             NoamOptimizer(512, 1, 8000, None),
212 |             NoamOptimizer(256, 1, 4000, None)]
213 |     plt.plot(np.arange(1, 20000), [[opt.rate(i) for opt in opts] for i in range(1, 20000)])
214 |     plt.legend(["512:4000", "512:8000", "256:4000"])
215 |     plt.show()
216 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | numpy==1.16.0
2 | torch==1.0.1
3 | torchtext==0.3.1
4 | setuptools==40.8.0
5 | gensim==3.6.0
6 | adabound==0.0.5
7 | matplotlib==3.0.2
8 | spacy==2.0.16
9 | 


--------------------------------------------------------------------------------
/scorer/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hbahadirsahin/nlp-experiments-in-pytorch/5d05a53146dffd707e4d037230656f980d7be05c/scorer/__init__.py


--------------------------------------------------------------------------------
/scorer/accuracy_scorer.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | 
 3 | 
 4 | class AccuracyScorer(object):
 5 |     def __init__(self, topk=(2,)):
 6 |         super(AccuracyScorer, self).__init__()
 7 |         self.topk = topk
 8 | 
 9 |     @staticmethod
10 |     def calculate_accuracy(predictions, ground_truths):
11 |         correct = (torch.max(predictions, 1)[1].view(ground_truths.size()).data == ground_truths.data)
12 |         return float(correct.sum()) / len(correct) * 100
13 | 
14 |     def calculate_topk_accuracy(self, predictions, ground_truths):
15 |         """
16 |         Code copied/pasted from PyTorch Imagenet example: https://github.com/pytorch/examples/blob/master/imagenet/main.py
17 |         Computes the accuracy over the k top predictions for the specified values of k
18 |         """
19 |         with torch.no_grad():
20 |             maxk = max(self.topk)
21 |             batch_size = ground_truths.size(0)
22 | 
23 |             _, pred = predictions.topk(maxk, 1, True, True)
24 |             pred = pred.t()
25 |             correct = pred.eq(ground_truths.view(1, -1).expand_as(pred))
26 | 
27 |             res = []
28 |             for k in self.topk:
29 |                 correct_k = correct[:k].view(-1).float().sum(0, keepdim=True)
30 |                 res.append(correct_k.mul_(100.0 / batch_size))
31 |         return res
32 | 


--------------------------------------------------------------------------------
/scorer/ner_scorer.py:
--------------------------------------------------------------------------------
 1 | import logging.config
 2 | 
 3 | logging.config.fileConfig(fname='./config/config.logger', disable_existing_loggers=False)
 4 | logger = logging.getLogger("NerScorer")
 5 | 
 6 | class NerScorer(object):
 7 |     def __init__(self, ner_vocab):
 8 |         super(NerScorer, self).__init__()
 9 |         self.ner_vocab = ner_vocab
10 |         self.token_accuracy = 0
11 |         self.avg_macro_precision = 0
12 |         self.avg_macro_recall = 0
13 |         self.avg_macro_f1 = 0
14 |         self.macro_precision = {}
15 |         self.macro_recall = {}
16 |         self.macro_f1 = {}
17 | 
18 |     def token_level_accuracy(self, prediction, ground_truth):
19 |         token_count = 0
20 |         matched = 0
21 | 
22 |         for p_seq, gt_seq in zip(prediction, ground_truth):
23 |             for p, gt in zip(p_seq, gt_seq):
24 |                 token_count += 1
25 |                 if p == gt:
26 |                     matched += 1
27 | 
28 |         self.token_accuracy = matched * 100.0 / token_count
29 | 
30 |     def __initialize_dict(self):
31 |         d = {}
32 |         for v in self.ner_vocab.stoi:
33 |             d[self.ner_vocab.stoi[v]] = 0
34 |         return d
35 | 
36 |     def __add_to_dict(self, d, tag):
37 |         if tag in d:
38 |             d[tag] += 1
39 |         else:
40 |             d[tag] = 1
41 |         return d
42 | 
43 |     def __calculate_tag_f1(self, f1, tp, fp, fn):
44 |         precision = {}
45 |         recall = {}
46 |         for tag in tp:
47 |             precision[tag] = tp[tag] / (tp[tag] + fp[tag] + 1e-16)
48 |             recall[tag] = tp[tag] / (tp[tag] + fn[tag] + 1e-16)
49 |             f1[tag] = (2 * precision[tag] * recall[tag] / (precision[tag] + recall[tag] + 1e-16)) * 100
50 |         self.macro_f1 = f1
51 |         self.macro_precision = precision
52 |         self.macro_recall = recall
53 | 
54 |     def __calculate_mean_f1(self):
55 |         self.avg_macro_f1 = sum(self.macro_f1.values()) / float(len(self.macro_f1))
56 |         self.avg_macro_precision = sum(self.macro_precision.values()) / float(len(self.macro_precision))
57 |         self.avg_macro_recall = sum(self.macro_recall.values()) / float(len(self.macro_recall))
58 | 
59 |     def f1_score(self, prediction, ground_truth):
60 |         true_positives = self.__initialize_dict()
61 |         false_positives = self.__initialize_dict()
62 |         false_negatives = self.__initialize_dict()
63 |         f1 = self.__initialize_dict()
64 | 
65 |         for p_seq, gt_seq in zip(prediction, ground_truth):
66 |             for p, gt in zip(p_seq, gt_seq):
67 |                 if p == gt:
68 |                     true_positives = self.__add_to_dict(true_positives, gt)
69 |                 else:
70 |                     false_negatives = self.__add_to_dict(false_negatives, gt)
71 |                     false_positives = self.__add_to_dict(false_positives, p)
72 |         self.__calculate_tag_f1(f1, true_positives, false_positives, false_negatives)
73 |         self.__calculate_mean_f1()
74 | 
75 |     def print_detailed_score_log(self):
76 |         logger.info("--------------------")
77 |         logger.info("Detailed Tag-Based Score")
78 |         for tag in self.macro_f1:
79 |             logger.info("Tag: {} - Precision: {:.4f} - Recall: {:.4f} - F1: {:.4f}".format(self.ner_vocab.itos[tag],
80 |                                                                                            self.macro_precision[tag],
81 |                                                                                            self.macro_recall[tag],
82 |                                                                                            self.macro_f1[tag]))
83 |         logger.info("--------------------")
84 | 


--------------------------------------------------------------------------------
/training/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hbahadirsahin/nlp-experiments-in-pytorch/5d05a53146dffd707e4d037230656f980d7be05c/training/__init__.py


--------------------------------------------------------------------------------
/training/multiple_model_trainer.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import time
  3 | 
  4 | import torch
  5 | import torch.nn as nn
  6 | import torch.optim as optim
  7 | 
  8 | from custom_optimizer import OpenAIAdam, Padam
  9 | from evaluation.evaluator import Evaluator
 10 | from utils.utils import time_since, calculate_accuracy, save_best_model, calculate_topk_accuracy, \
 11 |     scheduled_annealing_strategy
 12 | 
 13 | 
 14 | class MultipleModelTrainer(object):
 15 |     def __init__(self, training_properties, train_iter, dev_iter, test_iter, device):
 16 |         self.optimizer_type = training_properties["optimizer"]
 17 |         self.learning_rate = training_properties["learning_rate"]
 18 |         self.weight_decay = training_properties["weight_decay"]
 19 |         self.momentum = training_properties["momentum"]
 20 |         self.epoch = training_properties["epoch"]
 21 |         self.topk = training_properties["topk"]
 22 |         self.print_every = training_properties["print_every_batch_step"]
 23 |         self.save_every = training_properties["save_every_epoch"]
 24 |         self.eval_every = training_properties["eval_every"]
 25 |         self.save_path = training_properties["save_path"]
 26 | 
 27 |         self.openAIAdamSchedulerType = training_properties["scheduler_type"]
 28 |         self.amsgrad = training_properties["amsgrad"]
 29 |         self.partial_adam = training_properties["partial_adam"]
 30 | 
 31 |         self.train_iter = train_iter
 32 |         self.dev_iter = dev_iter
 33 |         self.test_iter = test_iter
 34 | 
 35 |         self.device = device
 36 | 
 37 |         self.dev_evaluator, self.test_evaluator = Evaluator().evaluator_factory("multiple_model_evaluator", self.device)
 38 | 
 39 |     def init_optimizer(self, model):
 40 |         print("Optimizer type is {} !".format(self.optimizer_type))
 41 | 
 42 |         if self.optimizer_type == "Adam":
 43 |             return optim.Adam(model.parameters(), lr=self.learning_rate, weight_decay=self.weight_decay)
 44 |         elif self.optimizer_type == "SGD":
 45 |             return optim.SGD(model.parameters(), lr=self.learning_rate, weight_decay=self.weight_decay,
 46 |                              momentum=self.momentum)
 47 |         elif self.optimizer_type == "OpenAIAdam":
 48 |             return OpenAIAdam(model.parameters(), lr=self.learning_rate, schedule=self.openAIAdamSchedulerType,
 49 |                               warmup=0.002, t_total=len(self.train_iter) * self.epoch)
 50 |         elif self.optimizer_type == "Padam":
 51 |             return Padam(model.parameters(), lr=self.learning_rate, amsgrad=self.amsgrad, partial=self.partial_adam,
 52 |                          weight_decay=self.weight_decay)
 53 |         else:
 54 |             raise ValueError("Invalid optimizer type! Choose Adam, SGD, Padam or OpenAIAdam!")
 55 | 
 56 |     def train_iters_multi_model(self, models, checkpoint=None):
 57 |         # Under the assumption of models is a list that contains encoder, decoder and classifier in order.
 58 |         encoder = models[0]
 59 |         decoder = models[1]
 60 |         classifier = models[2]
 61 | 
 62 |         encoder_optimizer = self.init_optimizer(encoder)
 63 |         decoder_optimizer = self.init_optimizer(decoder)
 64 |         classifier_optimizer = self.init_optimizer(classifier)
 65 | 
 66 |         reconstruction_criterion = nn.CrossEntropyLoss().to(self.device)
 67 |         supervised_criterion = nn.NLLLoss().to(self.device)
 68 | 
 69 |         start = time.time()
 70 |         old_path = None
 71 |         best_vali_acc = -1
 72 |         best_vali_loss = -1
 73 |         best_vali_acc_topk = -1
 74 |         start_epoch = 1
 75 | 
 76 |         if checkpoint is not None:
 77 |             encoder.load_state_dict(checkpoint["encoder_state_dict"])
 78 |             decoder.load_state_dict(checkpoint["decoder_state_dict"])
 79 |             classifier.load_state_dict(checkpoint["classifier_stat_dict"])
 80 |             encoder_optimizer.load_state_dict(checkpoint["encoder_optimizer_state_dict"])
 81 |             decoder_optimizer.load_state_dict(checkpoint["decoder_optimizer_state_dict"])
 82 |             classifier_optimizer.load_state_dict(checkpoint["classifier_optimizer_state_dict"])
 83 |             start_epoch = checkpoint["epoch"] + 1
 84 |             best_vali_acc = checkpoint["best_vali_acc"]
 85 |             best_vali_loss = checkpoint["best_vali_loss"]
 86 |             best_vali_acc_topk = checkpoint["best_vali_acc_topk"]
 87 | 
 88 |         print("Training...")
 89 |         for e in range(start_epoch, self.epoch + 1):
 90 |             alpha = scheduled_annealing_strategy(epoch=e, max_epoch=self.epoch)
 91 |             total_loss, reconst_loss, supervised_loss, accuracy, accuracy_topk = self.train(encoder=encoder,
 92 |                                                                                             decoder=decoder,
 93 |                                                                                             classifier=classifier,
 94 |                                                                                             encoder_optimizer=encoder_optimizer,
 95 |                                                                                             decoder_optimizer=decoder_optimizer,
 96 |                                                                                             classifier_optimizer=classifier_optimizer,
 97 |                                                                                             reconst_criterion=reconstruction_criterion,
 98 |                                                                                             supervised_criterion=supervised_criterion,
 99 |                                                                                             alpha=alpha)
100 | 
101 |             self.print_epoch(start, e, reconst_loss, supervised_loss, total_loss, accuracy, accuracy_topk)
102 | 
103 |             if e % self.eval_every == 0:
104 |                 vali_loss, vali_accuracy, vali_accuracy_topk = self.dev_evaluatorevaluate_iter(encoder=encoder,
105 |                                                                                                decoder=decoder,
106 |                                                                                                classifier=classifier,
107 |                                                                                                input=self.dev_iter,
108 |                                                                                                reconstruction_criterion=reconstruction_criterion,
109 |                                                                                                supervised_criterion=supervised_criterion,
110 |                                                                                                save_path=self.save_path,
111 |                                                                                                topk=self.topk)
112 |                 if best_vali_acc < vali_accuracy:
113 |                     best_vali_loss = vali_loss
114 |                     best_vali_acc = vali_accuracy
115 |                     best_vali_acc_topk = vali_accuracy_topk
116 |                     save_best_model(encoder, self.save_path, filename="saved_best_encoder.pt")
117 |                     save_best_model(decoder, self.save_path, filename="saved_best_decoder.pt")
118 |                     save_best_model(classifier, self.save_path, filename="saved_best_classifier.pt")
119 | 
120 |                 self.print_validation(vali_loss, best_vali_loss, vali_accuracy, best_vali_acc, vali_accuracy_topk,
121 |                                       best_vali_acc_topk)
122 | 
123 |             if e % self.save_every == 0:
124 |                 filename = "saved_model_step{}.pt".format(e)
125 |                 out_path = os.path.abspath(os.path.join(self.save_path, filename))
126 |                 if old_path is not None:
127 |                     os.remove(old_path)
128 |                 torch.save({
129 |                     "epoch": e,
130 |                     "best_vali_acc": best_vali_acc,
131 |                     "best_vali_loss": best_vali_loss,
132 |                     "best_vali_acc_topk": best_vali_acc_topk,
133 |                     "encoder_state_dict": encoder.state_dict(),
134 |                     'decoder_state_dict': decoder.state_dict(),
135 |                     'classifier_state_dict': classifier.state_dict(),
136 |                     'encoder_optimizer_state_dict': encoder_optimizer.state_dict(),
137 |                     'decoder_optimizer_state_dict': decoder_optimizer.state_dict(),
138 |                     'classifier_optimizer_state_dict': classifier_optimizer.state_dict()
139 |                 }, out_path)
140 |                 old_path = out_path
141 | 
142 |         test_loss, test_accuracy, test_accuracy_topk = self.test_evaluatorevaluate_iter(encoder=encoder,
143 |                                                                                         decoder=decoder,
144 |                                                                                         classifier=classifier,
145 |                                                                                         input=self.dev_iter,
146 |                                                                                         reconstruction_criterion=reconstruction_criterion,
147 |                                                                                         supervised_criterion=supervised_criterion,
148 |                                                                                         save_path=self.save_path,
149 |                                                                                         topk=self.topk)
150 |         self.print_test(test_loss, test_accuracy, test_accuracy_topk)
151 | 
152 |     def train(self, encoder, decoder, classifier, encoder_optimizer, decoder_optimizer, classifier_optimizer,
153 |               reconst_criterion, supervised_criterion, alpha=1):
154 |         epoch_reconstruction_loss = 0
155 |         epoch_supervised_loss = 0
156 |         epoch_total_acc = 0
157 |         epoch_total_acc_topk = 0
158 |         step = 1
159 | 
160 |         encoder.train()
161 |         decoder.train()
162 |         classifier.train()
163 | 
164 |         for batch in self.train_iter:
165 |             encoder_optimizer.zero_grad()
166 |             decoder_optimizer.zero_grad()
167 |             classifier_optimizer.zero_grad()
168 | 
169 |             batch_x = batch.sentence.to(self.device)
170 |             batch_y = batch.category_labels.to(self.device)
171 | 
172 |             hidden = encoder(batch_x)
173 |             reconstruction_probs = decoder(hidden)
174 |             supervised_predictions = classifier(hidden.squeeze())
175 | 
176 |             reconstruction_loss = reconst_criterion(reconstruction_probs, batch_x)
177 |             supervised_loss = supervised_criterion(supervised_predictions, batch_y)
178 | 
179 |             total_loss = alpha * reconstruction_loss + supervised_loss
180 | 
181 |             accuracy = calculate_accuracy(supervised_predictions, batch_y)
182 |             accuracy_topk = calculate_topk_accuracy(supervised_predictions, batch_y, topk=self.topk)
183 | 
184 |             total_loss.backward()
185 | 
186 |             encoder_optimizer.step()
187 |             decoder_optimizer.step()
188 |             classifier.step()
189 | 
190 |             step += 1
191 | 
192 |             epoch_reconstruction_loss += reconstruction_loss.item()
193 |             epoch_supervised_loss += supervised_loss.item()
194 |             epoch_total_acc += accuracy
195 |             epoch_total_acc_topk += accuracy_topk[0].item()
196 | 
197 |             if step % self.print_every == 0:
198 |                 self.print_step(step, reconstruction_loss, supervised_loss, accuracy, accuracy_topk)
199 |             torch.cuda.empty_cache()
200 | 
201 |         epoch_total_loss = epoch_reconstruction_loss + epoch_supervised_loss
202 |         return epoch_total_loss / len(self.train_iter), epoch_reconstruction_loss / len(
203 |             self.train_iter), epoch_supervised_loss / len(
204 |             self.train_iter), epoch_total_acc / len(self.train_iter), epoch_total_acc_topk / len(self.train_iter)
205 | 
206 |     def print_step(self, step, reconstruction_loss, supervised_loss, accuracy, accuracy_topk):
207 |         print("Batch {}/{} - "
208 |               "Batch Reconstruction Loss: {:.4f} - "
209 |               "Batch Supervised Loss: {:.4f} - "
210 |               "Batch Accuracy: {:.4f} - "
211 |               "Batch Accuracy Top-{} {:.4f}".format(step,
212 |                                                     len(self.train_iter),
213 |                                                     reconstruction_loss.item(),
214 |                                                     supervised_loss.item(),
215 |                                                     accuracy,
216 |                                                     self.topk[0],
217 |                                                     accuracy_topk[0].item()))
218 | 
219 |     def print_epoch(self, start, e, reconst_loss, supervised_loss, total_loss, accuracy, accuracy_topk):
220 |         print("{} - "
221 |               "Epoch {}/{} - "
222 |               "Reconstruction Loss: {:.4f} - "
223 |               "Supervised Loss: {:.4f} - "
224 |               "Loss: {:.4f} - "
225 |               "Accuracy: {:.4f} - "
226 |               "Accuracy Top-{}: {:.4f}".format(time_since(start, e / self.epoch),
227 |                                                e,
228 |                                                self.epoch,
229 |                                                reconst_loss,
230 |                                                supervised_loss,
231 |                                                total_loss,
232 |                                                accuracy,
233 |                                                self.topk[0],
234 |                                                accuracy_topk))
235 | 
236 |     def print_validation(self, vali_loss, best_vali_loss, vali_accuracy, best_vali_acc, vali_accuracy_topk,
237 |                          best_vali_acc_topk):
238 |         print(
239 |             "Validation Loss: {:.4f} (Best: {:.4f}) - "
240 |             "Validation Accuracy: {:.4f} (Best: {:.4f}) - "
241 |             "Validation Accuracy Top-{}: {:.4f} (Best: {:.4f})".format(vali_loss,
242 |                                                                        best_vali_loss,
243 |                                                                        vali_accuracy,
244 |                                                                        best_vali_acc,
245 |                                                                        self.topk[0],
246 |                                                                        vali_accuracy_topk,
247 |                                                                        best_vali_acc_topk))
248 | 
249 |     def print_test(self, test_loss, test_accuracy, test_accuracy_topk):
250 |         print("Test Loss: {:.4f} - "
251 |               "Test Accuracy: {:.4f} - "
252 |               "Test Accuracy Top-{}: {:.4f}".format(test_loss,
253 |                                                     test_accuracy,
254 |                                                     self.topk[0],
255 |                                                     test_accuracy_topk))
256 | 


--------------------------------------------------------------------------------
/training/single_model_ner_trainer.py:
--------------------------------------------------------------------------------
  1 | import logging.config
  2 | import os
  3 | import time
  4 | 
  5 | import torch
  6 | import torch.nn as nn
  7 | 
  8 | from evaluation.evaluator import Evaluator
  9 | from models.GRU import GRU
 10 | from models.LSTM import LSTMBase
 11 | from scorer.ner_scorer import NerScorer
 12 | from training.single_model_trainer import SingleModelTrainer
 13 | from utils.utils import time_since, save_best_model
 14 | 
 15 | logging.config.fileConfig(fname='./config/config.logger', disable_existing_loggers=False)
 16 | logger = logging.getLogger("Trainer")
 17 | 
 18 | 
 19 | class SingleModelNerTrainer(SingleModelTrainer):
 20 |     def __init__(self, training_properties, datasetloader, device):
 21 |         super(SingleModelNerTrainer, self).__init__(training_properties, datasetloader, device)
 22 | 
 23 |         self.scorer = NerScorer(datasetloader.ner_vocab)
 24 |         self.dev_evaluator, self.test_evaluator = Evaluator().evaluator_factory("single_model_ner_evaluator",
 25 |                                                                                 self.device)
 26 | 
 27 |     def train_iters(self, model, checkpoint=None):
 28 |         optimizer = self.init_optimizer(model)
 29 | 
 30 |         start = time.time()
 31 |         old_path = None
 32 |         best_vali_f1 = -1
 33 |         best_vali_token_acc = -1
 34 |         start_epoch = 1
 35 | 
 36 |         if checkpoint is not None:
 37 |             model.load(checkpoint["model"])
 38 |             if self.optimizer_type == "Noam":
 39 |                 optimizer.optimizer.load_state_dict(checkpoint["optimizer_state_dict"])
 40 |             else:
 41 |                 optimizer.load_state_dict(checkpoint["optimizer_state_dict"])
 42 |             start_epoch = checkpoint["epoch"] + 1
 43 |             best_vali_f1 = checkpoint["best_vali_f1"]
 44 |             best_vali_token_acc = checkpoint["best_vali_token_acc"]
 45 | 
 46 |         del checkpoint
 47 |         torch.cuda.empty_cache()
 48 | 
 49 |         logger.info("Training...")
 50 |         for e in range(start_epoch, self.epoch + 1):
 51 |             total_loss, train_f1 = self.train(model=model,
 52 |                                               optimizer=optimizer,
 53 |                                               scheduler=None)
 54 | 
 55 |             self.print_epoch(start, e, total_loss, train_f1)
 56 | 
 57 |             if e % self.eval_every == 0:
 58 |                 vali_f1, vali_precision, vali_recall, vali_token_acc = self.dev_evaluator.evaluate_iter(model=model,
 59 |                                                                                                         input=self.dev_iter,
 60 |                                                                                                         save_path=self.save_path,
 61 |                                                                                                         scorer=self.scorer)
 62 |                 if best_vali_f1 < vali_f1:
 63 |                     best_vali_token_acc = vali_token_acc
 64 |                     best_vali_f1 = vali_f1
 65 |                     save_best_model(model, self.save_path)
 66 | 
 67 |                 self.print_validation(vali_token_acc, best_vali_token_acc, vali_f1, best_vali_f1)
 68 | 
 69 |             if e % self.save_every == 0:
 70 |                 filename = "saved_model_step{}.pt".format(e)
 71 |                 out_path = os.path.abspath(os.path.join(self.save_path, filename))
 72 |                 if old_path is not None:
 73 |                     os.remove(old_path)
 74 |                 if self.optimizer_type == "Noam":
 75 |                     torch.save({
 76 |                         "epoch": e,
 77 |                         "best_vali_f1": best_vali_f1,
 78 |                         "best_vali_token_acc": best_vali_token_acc,
 79 |                         'model': model,
 80 |                         'optimizer_state_dict': optimizer.optimizer.state_dict(),
 81 |                     }, out_path)
 82 |                 else:
 83 |                     torch.save({
 84 |                         "epoch": e,
 85 |                         "best_vali_f1": best_vali_f1,
 86 |                         "best_vali_token_acc": best_vali_token_acc,
 87 |                         'model': model,
 88 |                         'optimizer_state_dict': optimizer.state_dict(),
 89 |                     }, out_path)
 90 |                 old_path = out_path
 91 | 
 92 |         test_f1, test_precision, test_recall, test_token_acc = self.test_evaluator.evaluate_iter(model=model,
 93 |                                                                                                  input=self.test_iter,
 94 |                                                                                                  save_path=self.save_path,
 95 |                                                                                                  scorer=self.scorer)
 96 | 
 97 |         self.print_test(test_token_acc, test_f1)
 98 | 
 99 |     def train(self, model, optimizer, scheduler):
100 |         epoch_loss = 0
101 |         epoch_total_f1 = 0
102 |         step = 1
103 |         model.train()
104 | 
105 |         for batch in self.train_iter:
106 |             if self.optimizer_type == "Noam":
107 |                 optimizer.optimizer.zero_grad()
108 |             else:
109 |                 optimizer.zero_grad()
110 | 
111 |             batch_x = batch.sentence.to(self.device)
112 |             batch_y = batch.ner_labels.to(self.device, non_blocking=True)
113 | 
114 |             if isinstance(model, GRU) or isinstance(model, LSTMBase):
115 |                 model.hidden = model.init_hidden(batch_x.size(1))
116 | 
117 |             try:
118 |                 loss, kl_loss = model(batch_x, batch_y)
119 |                 # f1_score = calculate_accuracy(predictions, batch_y)
120 | 
121 |                 loss.backward()
122 | 
123 |                 if 0.0 < self.norm_ratio:
124 |                     nn.utils.clip_grad_norm_(model.parameters(), self.norm_ratio)
125 | 
126 |                 if self.optimizer_type == "Noam":
127 |                     optimizer.optimizer.step()
128 |                 else:
129 |                     optimizer.step()
130 | 
131 |                 if scheduler is not None and step % 500 == 0:
132 |                     scheduler.step(step)
133 | 
134 |                 step += 1
135 | 
136 |                 epoch_loss += loss.item()
137 |                 # epoch_total_f1 += f1_score
138 | 
139 |                 if step % self.print_every == 0:
140 |                     self.print_step(step, loss.item(), 0)
141 | 
142 |                 torch.cuda.empty_cache()
143 | 
144 |             except RuntimeError as e:
145 |                 if 'out of memory' in str(e):
146 |                     logger.warning('Ran out of memory, skipping batch %d', step)
147 |                     optimizer.zero_grad()
148 |                     torch.cuda.empty_cache()
149 |                 else:
150 |                     raise e
151 | 
152 |         return epoch_loss / len(self.train_iter), epoch_total_f1 / len(self.train_iter)
153 | 
154 |     def print_step(self, step, loss, f1_score):
155 |         logger.info("Batch {}/{} - "
156 |                     "Batch Loss: {:.4f} - "
157 |                     "Batch F1: {:.4f}".format(step,
158 |                                               len(self.train_iter),
159 |                                               loss,
160 |                                               f1_score))
161 | 
162 |     def print_epoch(self, start, e, total_loss, train_f1):
163 |         logger.info("{} - "
164 |                     "Epoch {}/{} - "
165 |                     "Loss: {:.4f} - "
166 |                     "F1-Score: {:.4f}".format(time_since(start, e / self.epoch),
167 |                                               e,
168 |                                               self.epoch,
169 |                                               total_loss,
170 |                                               train_f1))
171 | 
172 |     def print_validation(self, vali_f1, best_vali_f1, vali_token_acc, best_vali_token_acc):
173 |         logger.info("Validation F1: {:.4f} (Best: {:.4f}) - "
174 |                     "Validation Token Level Accuracy: {:.4f} (Best: {:.4f}) - ".format(vali_token_acc,
175 |                                                                                        best_vali_token_acc,
176 |                                                                                        vali_f1,
177 |                                                                                        best_vali_f1))
178 | 
179 |     def print_test(self, test_token_acc, test_f1):
180 |         logger.info("Test F1: {:.4f} - "
181 |                     "Test Token Level Accuracy: {:.4f} - ".format(test_f1,
182 |                                                                   test_token_acc))
183 | 


--------------------------------------------------------------------------------
/training/single_model_trainer.py:
--------------------------------------------------------------------------------
  1 | import logging.config
  2 | import os
  3 | import time
  4 | 
  5 | import torch
  6 | import torch.nn as nn
  7 | import torch.optim as optim
  8 | 
  9 | from adabound import adabound
 10 | 
 11 | from custom_optimizer import OpenAIAdam, NoamOptimizer, Padam
 12 | from evaluation.evaluator import Evaluator
 13 | from models.GRU import GRU
 14 | from models.LSTM import LSTMBase
 15 | from utils.utils import time_since, save_best_model
 16 | from scorer.accuracy_scorer import AccuracyScorer
 17 | 
 18 | logging.config.fileConfig(fname='./config/config.logger', disable_existing_loggers=False)
 19 | logger = logging.getLogger("Trainer")
 20 | 
 21 | 
 22 | class SingleModelTrainer(object):
 23 |     def __init__(self, training_properties, datasetloader, device):
 24 |         self.task = training_properties["task"]
 25 |         self.optimizer_type = training_properties["optimizer"]
 26 |         self.learning_rate = training_properties["learning_rate"]
 27 |         self.weight_decay = training_properties["weight_decay"]
 28 |         self.momentum = training_properties["momentum"]
 29 |         self.norm_ratio = training_properties["norm_ratio"]
 30 |         self.epoch = training_properties["epoch"]
 31 |         self.topk = training_properties["topk"]
 32 |         self.print_every = training_properties["print_every_batch_step"]
 33 |         self.save_every = training_properties["save_every_epoch"]
 34 |         self.eval_every = training_properties["eval_every"]
 35 |         self.save_path = training_properties["save_path"]
 36 | 
 37 |         self.openAIAdamSchedulerType = training_properties["scheduler_type"]
 38 |         self.amsgrad = training_properties["amsgrad"]
 39 |         self.partial_adam = training_properties["partial_adam"]
 40 | 
 41 |         self.final_learning_rate = training_properties["final_learning_rate"]
 42 |         self.amsbound = training_properties["amsbound"]
 43 | 
 44 |         self.train_iter = datasetloader.train_iter
 45 |         self.dev_iter = datasetloader.val_iter
 46 |         self.test_iter = datasetloader.test_iter
 47 | 
 48 |         self.device = device
 49 | 
 50 |         self.scorer = AccuracyScorer(self.topk)
 51 |         self.dev_evaluator, self.test_evaluator = Evaluator().evaluator_factory("single_model_evaluator", self.device)
 52 | 
 53 |     def init_optimizer(self, model):
 54 |         logger.info("Optimizer type is %s!", self.optimizer_type)
 55 | 
 56 |         if self.optimizer_type == "Adam":
 57 |             return optim.Adam(model.parameters(), lr=self.learning_rate, weight_decay=self.weight_decay)
 58 |         elif self.optimizer_type == "SGD":
 59 |             return optim.SGD(model.parameters(), lr=self.learning_rate, weight_decay=self.weight_decay,
 60 |                              momentum=self.momentum)
 61 |         elif self.optimizer_type == "OpenAIAdam":
 62 |             return OpenAIAdam(model.parameters(), lr=self.learning_rate, schedule=self.openAIAdamSchedulerType,
 63 |                               warmup=0.002, t_total=len(self.train_iter) * self.epoch)
 64 |         elif self.optimizer_type == "Noam":
 65 |             return NoamOptimizer(300, 1, 400,
 66 |                                  optim.Adam(model.parameters(), lr=0, betas=(0.9, 0.98), eps=1e-9))
 67 |         elif self.optimizer_type == "Padam":
 68 |             return Padam(model.parameters(), lr=self.learning_rate, amsgrad=self.amsgrad, partial=self.partial_adam,
 69 |                          weight_decay=self.weight_decay)
 70 |         elif self.optimizer_type == "adabound":
 71 |             return adabound(model.parameters(), lr=self.learning_rate, final_lr=self.final_learning_rate,
 72 |                             weight_decay=self.weight_decay, amsbound=self.amsbound)
 73 |         else:
 74 |             raise ValueError("Invalid optimizer type! Choose Adam, SGD, Padam, NoamOptimizer or OpenAIAdam!")
 75 | 
 76 |     def train_iters(self, model, checkpoint=None):
 77 |         optimizer = self.init_optimizer(model)
 78 | 
 79 |         # scheduler = lr_scheduler.StepLR(optimizer, step_size=3, gamma=0.1)
 80 | 
 81 |         if isinstance(model, GRU):
 82 |             criterion = nn.NLLLoss().to(self.device)
 83 |         else:
 84 |             criterion = nn.CrossEntropyLoss().to(self.device)
 85 | 
 86 |         start = time.time()
 87 |         old_path = None
 88 |         best_vali_acc = -1
 89 |         best_vali_loss = -1
 90 |         best_vali_acc_topk = -1
 91 |         start_epoch = 1
 92 | 
 93 |         if checkpoint is not None:
 94 |             model.load_state_dict(checkpoint["model_state_dict"])
 95 |             if self.optimizer_type == "Noam":
 96 |                 optimizer.optimizer.load_state_dict(checkpoint["optimizer_state_dict"])
 97 |             else:
 98 |                 optimizer.load_state_dict(checkpoint["optimizer_state_dict"])
 99 |             start_epoch = checkpoint["epoch"] + 1
100 |             best_vali_acc = checkpoint["best_vali_acc"]
101 |             best_vali_loss = checkpoint["best_vali_loss"]
102 |             best_vali_acc_topk = checkpoint["best_vali_acc_topk"]
103 | 
104 |         # Memory problems of PyTorch is giving me headaches...
105 |         del checkpoint
106 |         torch.cuda.empty_cache()
107 | 
108 |         logger.info("Training...")
109 |         for e in range(start_epoch, self.epoch + 1):
110 |             total_loss, cross_entropy_loss, kl_loss, accuracy, accuracy_topk = self.train(model=model,
111 |                                                                                           optimizer=optimizer,
112 |                                                                                           scheduler=None,
113 |                                                                                           criterion=criterion)
114 | 
115 |             self.print_epoch(start, e, cross_entropy_loss, kl_loss, total_loss, accuracy, accuracy_topk)
116 | 
117 |             if e % self.eval_every == 0:
118 |                 vali_loss, vali_accuracy, vali_accuracy_topk = self.dev_evaluator.evaluate_iter(model=model,
119 |                                                                                                 input=self.dev_iter,
120 |                                                                                                 criterion=criterion,
121 |                                                                                                 save_path=self.save_path,
122 |                                                                                                 scorer=self.scorer)
123 |                 if best_vali_acc < vali_accuracy:
124 |                     best_vali_loss = vali_loss
125 |                     best_vali_acc = vali_accuracy
126 |                     best_vali_acc_topk = vali_accuracy_topk
127 |                     save_best_model(model, self.save_path)
128 | 
129 |                 self.print_validation(vali_loss, best_vali_loss, vali_accuracy, best_vali_acc, vali_accuracy_topk,
130 |                                       best_vali_acc_topk)
131 | 
132 |             if e % self.save_every == 0:
133 |                 filename = "saved_model_step{}.pt".format(e)
134 |                 out_path = os.path.abspath(os.path.join(self.save_path, filename))
135 |                 if old_path is not None:
136 |                     os.remove(old_path)
137 |                 if self.optimizer_type == "Noam":
138 |                     torch.save({
139 |                         "epoch": e,
140 |                         "best_vali_acc": best_vali_acc,
141 |                         "best_vali_loss": best_vali_loss,
142 |                         "best_vali_acc_topk": best_vali_acc_topk,
143 |                         'model_state_dict': model.state_dict(),
144 |                         'optimizer_state_dict': optimizer.optimizer.state_dict(),
145 |                     }, out_path)
146 |                 else:
147 |                     torch.save({
148 |                         "epoch": e,
149 |                         "best_vali_acc": best_vali_acc,
150 |                         "best_vali_loss": best_vali_loss,
151 |                         "best_vali_acc_topk": best_vali_acc_topk,
152 |                         'model_state_dict': model.state_dict(),
153 |                         'optimizer_state_dict': optimizer.state_dict(),
154 |                     }, out_path)
155 | 
156 |                 old_path = out_path
157 | 
158 |         test_loss, test_accuracy, test_accuracy_topk = self.test_evaluator.evaluate_iter(model=model,
159 |                                                                                          input=self.test_iter,
160 |                                                                                          criterion=criterion,
161 |                                                                                          save_path=self.save_path,
162 |                                                                                          scorer=self.scorer)
163 | 
164 |         self.print_test(test_loss, test_accuracy, test_accuracy_topk)
165 | 
166 |     def train(self, model, optimizer, scheduler, criterion):
167 |         epoch_loss = 0
168 |         epoch_kl_loss = 0
169 |         epoch_total_acc = 0
170 |         epoch_total_acc_topk = 0
171 |         step = 1
172 |         model.train()
173 | 
174 |         for batch in self.train_iter:
175 |             if self.optimizer_type == "Noam":
176 |                 optimizer.optimizer.zero_grad()
177 |             else:
178 |                 optimizer.zero_grad()
179 | 
180 |             batch_x = batch.sentence.to(self.device)
181 |             batch_y = batch.category_labels.to(self.device, non_blocking=True)
182 | 
183 |             if isinstance(model, GRU) or isinstance(model, LSTMBase):
184 |                 model.hidden = model.init_hidden(batch_x.size(1))
185 | 
186 |             try:
187 |                 predictions, kl_loss = model(batch_x)
188 | 
189 |                 loss = criterion(predictions, batch_y)
190 |                 accuracy = self.scorer.calculate_accuracy(predictions, batch_y)
191 |                 accuracy_topk = self.scorer.calculate_topk_accuracy(predictions, batch_y)
192 | 
193 |                 total_loss = loss + kl_loss / 10
194 | 
195 |                 total_loss.backward()
196 | 
197 |                 if 0.0 < self.norm_ratio:
198 |                     nn.utils.clip_grad_norm_(model.parameters(), self.norm_ratio)
199 | 
200 |                 if self.optimizer_type == "Noam":
201 |                     optimizer.optimizer.step()
202 |                 else:
203 |                     optimizer.step()
204 | 
205 |                 if scheduler is not None and step % 500 == 0:
206 |                     scheduler.step(step)
207 | 
208 |                 step += 1
209 | 
210 |                 epoch_loss += loss.item()
211 |                 epoch_kl_loss += kl_loss.item()
212 |                 epoch_total_acc += accuracy
213 |                 epoch_total_acc_topk += accuracy_topk[0].item()
214 | 
215 |                 if step % self.print_every == 0:
216 |                     self.print_step(step, loss.item(), kl_loss.item(), accuracy, accuracy_topk[0].item())
217 | 
218 |                 torch.cuda.empty_cache()
219 | 
220 |             except RuntimeError as e:
221 |                 if 'out of memory' in str(e):
222 |                     logger.warning('Ran out of memory, skipping batch %d', step)
223 |                     optimizer.zero_grad()
224 |                     torch.cuda.empty_cache()
225 |                 else:
226 |                     raise e
227 | 
228 |         epoch_total_loss = epoch_loss + epoch_kl_loss
229 |         return epoch_total_loss / len(self.train_iter), epoch_loss / len(self.train_iter), epoch_kl_loss / len(
230 |             self.train_iter), epoch_total_acc / len(self.train_iter), epoch_total_acc_topk / len(self.train_iter)
231 | 
232 |     def print_step(self, step, loss, kl_loss, accuracy, accuracy_topk):
233 |         logger.info("Batch {}/{} - "
234 |                     "Batch Loss: {:.4f} - "
235 |                     "Batch KL Loss: {:.4f} - "
236 |                     "Batch Accuracy: {:.4f} - "
237 |                     "Batch Accuracy Top-{} {:.4f}".format(step,
238 |                                                           len(self.train_iter),
239 |                                                           loss,
240 |                                                           kl_loss,
241 |                                                           accuracy,
242 |                                                           self.topk[0],
243 |                                                           accuracy_topk))
244 | 
245 |     def print_epoch(self, start, e, cross_entropy_loss, kl_loss, total_loss, accuracy, accuracy_topk):
246 |         logger.info("{} - "
247 |                     "Epoch {}/{} - "
248 |                     "Cross Entropy Loss: {:.4f} - "
249 |                     "KL Loss: {:.4f} - "
250 |                     "Loss: {:.4f} - "
251 |                     "Accuracy: {:.4f} - "
252 |                     "Accuracy Top-{}: {:.4f}".format(time_since(start, e / self.epoch),
253 |                                                      e,
254 |                                                      self.epoch,
255 |                                                      cross_entropy_loss,
256 |                                                      kl_loss,
257 |                                                      total_loss,
258 |                                                      accuracy,
259 |                                                      self.topk[0],
260 |                                                      accuracy_topk))
261 | 
262 |     def print_validation(self, vali_loss, best_vali_loss, vali_accuracy, best_vali_acc, vali_accuracy_topk,
263 |                          best_vali_acc_topk):
264 |         logger.info("Validation Loss: {:.4f} (Best: {:.4f}) - "
265 |                     "Validation Accuracy: {:.4f} (Best: {:.4f}) - "
266 |                     "Validation Accuracy Top-{}: {:.4f} (Best: {:.4f})".format(vali_loss,
267 |                                                                                best_vali_loss,
268 |                                                                                vali_accuracy,
269 |                                                                                best_vali_acc,
270 |                                                                                self.topk[0],
271 |                                                                                vali_accuracy_topk,
272 |                                                                                best_vali_acc_topk))
273 | 
274 |     def print_test(self, test_loss, test_accuracy, test_accuracy_topk):
275 |         logger.info("Test Loss: {:.4f} - "
276 |                     "Test Accuracy: {:.4f} - "
277 |                     "Test Accuracy Top-{}: {:.4f}".format(test_loss,
278 |                                                           test_accuracy,
279 |                                                           self.topk[0],
280 |                                                           test_accuracy_topk))
281 | 


--------------------------------------------------------------------------------
/training/trainer.py:
--------------------------------------------------------------------------------
 1 | import logging.config
 2 | 
 3 | from training.single_model_trainer import SingleModelTrainer
 4 | from training.single_model_ner_trainer import SingleModelNerTrainer
 5 | 
 6 | logging.config.fileConfig(fname='./config/config.logger', disable_existing_loggers=False)
 7 | logger = logging.getLogger("Trainer")
 8 | 
 9 | 
10 | class Trainer(object):
11 |     @staticmethod
12 |     def trainer_factory(type, training_properties, datasetloader, device):
13 |         if type == "single_model_trainer":
14 |             logger.info("Trainer type is %s", type)
15 |             return SingleModelTrainer(training_properties, datasetloader, device)
16 |         elif type == "single_model_ner_trainer":
17 |             logger.info("Trainer type is %s", type)
18 |             return SingleModelNerTrainer(training_properties, datasetloader, device)
19 |         else:
20 |             ValueError("Unrecognized trainer type")
21 | 


--------------------------------------------------------------------------------
/utils/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hbahadirsahin/nlp-experiments-in-pytorch/5d05a53146dffd707e4d037230656f980d7be05c/utils/__init__.py


--------------------------------------------------------------------------------
/utils/utils.py:
--------------------------------------------------------------------------------
 1 | import copy
 2 | import math
 3 | import os
 4 | import pickle
 5 | import time
 6 | 
 7 | import numpy as np
 8 | import torch
 9 | import torch.nn as nn
10 | import torch.nn.functional as F
11 | 
12 | 
13 | def save_vocabulary(vocab, path):
14 |     with open(path, 'wb') as fw:
15 |         pickle.dump(vocab, fw, protocol=pickle.HIGHEST_PROTOCOL)
16 | 
17 | 
18 | def load_vocabulary(path):
19 |     with open(path, 'rb') as f:
20 |         return pickle.load(f)
21 | 
22 | 
23 | def save_best_model(model, path, filename="saved_best_model.pt"):
24 |     out_path = os.path.abspath(os.path.join(path, filename))
25 |     torch.save(model, out_path)
26 | 
27 | 
28 | def load_best_model(path, filename="saved_best_model.pt"):
29 |     out_path = os.path.abspath(os.path.join(path, filename))
30 |     model = torch.load(out_path)
31 |     return model
32 | 
33 | 
34 | def as_minutes(s):
35 |     m = math.floor(s / 60)
36 |     s -= m * 60
37 |     return "%dm %ds" % (m, s)
38 | 
39 | 
40 | def time_since(since, percent):
41 |     now = time.time()
42 |     s = now - since
43 |     es = s / percent
44 |     rs = es - s
45 |     return "%s (- %s)" % (as_minutes(s), as_minutes(rs))
46 | 
47 | 
48 | # Direct c/p from Pytorch/BiLSTM Tutorial
49 | def argmax(vec):
50 |     # return the argmax as a python int
51 |     _, idx = torch.max(vec, 1)
52 |     return idx.item()
53 | 
54 | 
55 | def log_sum_exp(vec, dim=-1):
56 |     max_score, _ = torch.max(vec, dim=dim)
57 |     max_score_broadcast = max_score.unsqueeze(dim)
58 |     return max_score + \
59 |            torch.log(torch.sum(torch.exp(vec - max_score_broadcast)))
60 | 
61 | 
62 | def scheduled_annealing_strategy(epoch, max_epoch, max=1.0, min=0.01, gain=0.3):
63 |     upper_alpha = max - min
64 |     lower_alpha = (1 + torch.exp(gain * (epoch - (max_epoch // 2))))
65 |     alpha = (upper_alpha / lower_alpha) + max
66 |     return alpha
67 | 
68 | 
69 | def clones(module, num_of_clones):
70 |     return nn.ModuleList([copy.deepcopy(module) for _ in range(num_of_clones)])
71 | 
72 | 
73 | def subsequent_mask(size):
74 |     # Mask out subsequent positions. It is to prevent positions from attenting to subsequent positions
75 |     # For more detailed information:
76 |     # The Annotated Transformer = https://nlp.seas.hardvard.edu/2018/04/03.attention.html
77 |     sm = np.triu(np.ones((1, size, size)), k=1).astype("uint8")
78 |     return torch.from_numpy(sm) == 0
79 | 
80 | 
81 | def gelu(x):
82 |     # Gaussian Error Linear Unit
83 |     # Ref: https://github.com/pytorch/pytorch/issues/20464
84 |     return 0.5 * x * (1 + torch.tanh(math.sqrt(math.pi / 2) * (x + 0.044715 * x ** 3)))
85 | 
86 | 


--------------------------------------------------------------------------------