├── LICENSE ├── README.md ├── changes in torchtext.txt ├── config ├── README.md ├── __init__.py ├── config.json └── config.logger ├── crf ├── CRF.py └── __init__.py ├── datahelper ├── __init__.py ├── dataset_reader.py ├── embedding_helper.py └── preprocessor.py ├── dropout_models ├── __init__.py ├── dropout.py ├── gaussian_dropout.py └── variational_dropout.py ├── evaluation ├── __init__.py ├── evaluator.py ├── interactive_evaluator.py ├── multiple_model_evaluator.py ├── single_model_evaluator.py └── single_model_ner_evaluator.py ├── main.py ├── models ├── CNN.py ├── GRU.py ├── LSTM.py ├── Transformer.py ├── Transformer_OpenAI.py ├── Util_CNN.py └── __init__.py ├── optimizer └── custom_optimizer.py ├── requirements.txt ├── scorer ├── __init__.py ├── accuracy_scorer.py └── ner_scorer.py ├── training ├── __init__.py ├── multiple_model_trainer.py ├── single_model_ner_trainer.py ├── single_model_trainer.py └── trainer.py └── utils ├── __init__.py └── utils.py /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2019 Hikmet Bahadir Sahin 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # README 2 | 3 | ## Update 29-05-2019 4 | 5 | - Yay another push, another update! 6 | - As I said in my previous update yesterday, I continue to develop a specific OpenAI version of the Transformer. 7 | - In today's update, I added the LanguageModelHead definition w.r.t. original Tensorflow implementation + huggingface's PyTorch implementation. 8 | - In addition, I added "Block" layer which contains (1) an attention operation followed by (2) a layer normalization followed by (3) an MLP and finally followed by (4) a layer normalization. 9 | - In overall, the implementation is again similar to the reference codes; however, I added an extention to the "Block" and "Attention" part. 10 | - OpenAI version uses Conv1D for its attention calculations while the "All you need is attention" version uses Linear layer (Nope Conv1D does not equal to a Linear layer: [Source](https://stackoverflow.com/questions/55576314/conv1d-with-kernel-size-1-interpretation)) 11 | - Since I want to use built-in nn.MultiHeadAttention function, just for fun, I put a conditional into the "Block" definition. If you trigger it, you will use the attention with linear. Otherwise, your model will call OpenAI version. 12 | - Lastly, the new model is still under construction. I am highly confident that it has many bugs even now =) 13 | 14 | 15 | # Table Of Contents 16 | 17 | - [Introduction](#introduction) 18 | - [Library Dependencies](#library-dependencies) 19 | - [Project Skeleton](#project-skeleton) 20 | - [Project Details](#project-details) 21 | - [To-do](#to-do) 22 | - [How-to-run](#how-to-run) 23 | - [Important Note Before Start](#important-note-before-start) 24 | - [Configuration JSON Format](#configuration-json-format) 25 | - [How to Run Main](#how-to-run-main) 26 | - [Training from Scratch-Training from Checkpoint-Interactive Evaluation](#training-from-scratch-training-from-checkpoint-interactive-evaluation) 27 | - [Results](#results) 28 | - [Test Results for TextCNN](#test-results-for-textcnn) 29 | - [Previous Updates](#previous-updates) 30 | - [January 2019 - Wiki Link](https://github.com/hbahadirsahin/nlp-experiments-in-pytorch/wiki/Previous-Updates-(January-2019)) 31 | - [February 2019 - Wiki Link](https://github.com/hbahadirsahin/nlp-experiments-in-pytorch/wiki/Previous-Updates-(February-2019)) 32 | - [References for Code Development](#references-for-code-development) 33 | 34 | ## Introduction 35 | 36 | This is my personal, pet project which I apply machine learning and natural language processing stuffs by using PyTorch. I stopped working with Tensorflow after some hellish times that I could not do some basic extentions (such fasttext based oov embeddings, details are below). Also, Tensorflow's updates and functionality deprecation rate is annoying for me. 37 | 38 | In this repository, I implement popular learning models and extend them with different minor adjustments (like variational dropouts). Even though it is really slow, I execute experiments by using these models on a dataset which me and my old colleagues in Huawei constructed (details are below, again) and try to announce experiment results. 39 | 40 | ## Library Dependencies 41 | 42 | Before diving into details, the python and library versions are as follows: 43 | 44 | - python 3.6 (works well with 3.7, too) 45 | - torch 1.0.1 46 | - torchtext 0.3.1 47 | - numpy 1.15.4 (due to PyTorch 1.0) 48 | - setuptools 40.8.0 (Hell no idea why pipreqs put this into requirements.txt) 49 | - spacy 2.0.16 (for interactive evaluation only) 50 | - gensim 3.6.0 (for fasttext embeddings, as well as OOV Embedding generation.) 51 | - adabound 0.0.5 52 | 53 | ## Project Skeleton 54 | 55 | I try to keep every part of the project clean and easy to follow. Even though the folders are self explanatory for me, let me explain them for those who may have hard time to understand. 56 | 57 | - `./crf/CRF.py` contains the conditional random field implementation (not finished yet). 58 | - `./datahelper/dataset_reader.py` contains the "DatasetLoader" object that reads a text dataset, splits it into 3 subsets (train/vali/test), creates vocabulary and iterators. It is a little bit hard-coded for the dataset I am using now. However, it is easy to make changes to use it for your own dataset. 59 | - `./datahelper/embedding_helper.py` is a helper class to generate OOV word embeddings. To use Fasttext-based OOV embedding generation, it leverages Gensim! 60 | - `./datahelper/preprocessor.py` contains the "Preprocessor" object and actions to apply on sentences. 61 | - `./dropout_models/gaussian_dropout.py` contains the Gaussian Dropout object. 62 | - `./dropout_models/variational_dropout.py` contains the Variational Dropout object. 63 | - `./dropout_models/dropout.py` contains the Dropout object which you can select your dropout type among Bernoulli (basic), Gaussian and Variational dropout types. 64 | - `./evaluation/evaluator.py` is the factory for evaluation objects that are used in model trainings as well as interactive evaluation. 65 | - `./evaluation/xyz_evaluator.py` methods are the evaluator functions for specified models. 66 | - `./model/xyz.py` contains network objects. 67 | - `./model/Util_xyz.py` contains custom-defined objects that are used in `xyz`. 68 | - `./optimizer/custom_optimizer.py` contains custom-defined optimizer objects. 69 | - `./scorer/accuracy_scorer.py` contains classification accuracy metric calculations. 70 | - `./scorer/ner_scorer.py` contains NER-task related metric calculations. 71 | - `./training/trainer.py` is a class that returns the necessary trainer for the user's selected learning model 72 | - `./training/xyz_trainer.py` methods are the trainer functions for specified models. 73 | - `./utils/utils.py` contains both utility and common methods that are being used in several places in the project. 74 | - `./main.py` is the main code. To execute this project, one needs to provide a valid `config.json` file which contains the necessary configuration properties. 75 | - `./config/config.json` is the configuration file. 76 | 77 | ## Project Details 78 | 79 | - As the other Tensorflow-based repository, I will use the dataset that me and my old colleagues constructed 3 years ago. "English/Turkish Wikipedia Named-Entity Recognition and Text Categorization Dataset" is publicly available: https://data.mendeley.com/datasets/cdcztymf4k/1 80 | - Text CNN, CharCNN, VDCNN, Conv-Deconv CNN, basic LSTM/GRU and Transformer (Google version) models are currently available to train and evaluate in the repository. More models will be added. 81 | - Fasttext embeddings are used (by default but it can be changed). Eventually, one can use Torchtext to download the "pre-defined" embedding files. However, since Turkish embeddings were not included in, I manually edit the Torchtext backend codes (please check the "changes in the torchtext.txt" file). Also note that, everytime you update Torchtext, you need to re-add those changes again. 82 | - Embeddings (whether random or pretrained) can be "static", "nonstatic", or "multichannel". 83 | - For OOV words, OOVEmbeddingCreator is developed (under datahelper/embedding_helper). There are 5 different basic approaches defined to generate OOV embeddings: (1) zeros vector, (2) ones vector, (3) random vector (between 0, 1), (4) (r1, r2) ranged uniformly random vector, (5) Fasttext CharNgram-based vectors. 84 | - Even though I am focusing on Turkish versions of the dataset, I believe "Preprocessor" can work for English dataset, too. In future, I may add more language specific methods. 85 | - Main code loads properties from config.json (inside config folder). 86 | - I tested all training, evaluation, model/vocabulary saving/loading aspects of the code for several epochs without any problem (except out of memory errors =)). 87 | 88 | ## To-do 89 | 90 | - [x] ~~Better configuration/property reading, handling, instead of hard-coded dictionaries~~ (Update: 11-Jan-2019) 91 | - [x] ~~Variational Dropout. Update: Variational and Gaussian dropout methods are added. Reference: [Variational Dropout and 92 | the Local Reparameterization Trick](https://arxiv.org/pdf/1506.02557.pdf)~~ 93 | - [x] ~~Extend main flow and learning models with respect to new dropout models.~~ 94 | - [x] ~~Add character-level data preprocessing.~~ 95 | - [x] ~~Add character-level data loading.~~ 96 | - [ ] Run the current piece of code for the aforementioned datasets and define a text categorization baseline (for both Turkish and English). 97 | - [ ] Variational Dropout related extensions (current version is from 2015 paper but obviously more recent versions are out there for me to implement =)) + bayes by backprop for CNN (a.k.a. Bayesian CNN) 98 | - [ ] Attention. 99 | - [ ] Different learning algorithms (DeepCNN, LSTM, GRU, any-kind-of-hybrid versions of those algorithms, transformers). 100 | - [x] TextCNN 101 | - [x] GRU 102 | - [x] LSTM 103 | - [x] ~~Multilayer CNN~~ (I removed this model and decided to continue with CharCNN and VDCNN instead). 104 | - [x] CharCNN 105 | - [x] VDCNN (Very Deep CNN) 106 | - [x] Transformer (*Attention is All You Need* version) (**Modified for Text Classification/NER!**) 107 | - [ ] Transformer (*Improving Language Understanding by Generative Pre-Training* version) 108 | - [ ] Transformer-XL (*Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context* version) 109 | - [x] Conv-Deconv CNN 110 | - [ ] Encoder-Decoder GRU 111 | - [ ] Encoder-Decoder LSTM 112 | - [ ] Hybrid stuff (Like CNN+LSTM/GRU) 113 | - [x] ~~CRF layer to be able to do NER experiments~~. 114 | - [ ] Add new models that will use CRF as their last layer (such as LSTMCRF, GRUCRF, CNNCRF, etc.) 115 | - [ ] Develop NER-related performance metrics and update training/evaluation flows to use these metrics. 116 | - [ ] For Turkish, I plan to add morphological disambiguation (https://github.com/erayyildiz/Neural-Morphological-Disambiguation-for-Turkish). 117 | - [ ] Different language models. 118 | - [ ] ELMO (pretrained Turkish/English embeddings) 119 | - [ ] BERT (pretrained Turkish/English embeddings) 120 | - [ ] Document length categorization/NER support (Conv-Deconv CNN implementation supports document-length tasks, but more support will come with ELMO and BERT update). 121 | 122 | ## How-to-run 123 | 124 | ### Important Note Before Start 125 | 126 | I had to make some changes in the torchtext backend codes to be able to do several stuffs: 127 | 128 | - I don't know why, torchtext does not split a dataset into 3 subsets (train/val/test) even if there is a function for it. I changed it to fix that issue. Hopefully, one day torchtext will fix it offically =) 129 | - To be able to work with Turkish Fasttext embeddings, I added its respective alias. 130 | - To be able to apply Fasttext's CharNGram to OOV words to generate OOV embeddings, a minor change has been made to Vector object. 131 | - To be able to read any dataset without any problem, a minor change has been made to torchtext's utils.py. 132 | 133 | ### Configuration JSON Format 134 | 135 | To be able to run the main code, you need to provide a valid JSON file which contains 4 main properties. These are `dataset_properties`, `model_properties`, `training_properties`, and `evaluation_properties`: 136 | 137 | - `dataset_properties` contains dataset-related information such as path, embedding, batch information. 138 | - `model_properties` contains model-related parameters. Inside this property, 139 | - `common_model_properties` contains common properties for all models like embeddings, vocabulary size, etc. 140 | - `model_name` (like text_cnn, char_cnn, etc.) contains model-specific properties. 141 | - `training_properties` contains training-related properties. 142 | - `evaluation_properties` contains evaluation-related properties. 143 | 144 | Details of the `config.json` can be found in "/config/README.md" folder. 145 | 146 | ### How to Run Main 147 | 148 | If you make the necessary changes described in "changes in torchtext.txt" and prepare "config.json", you have two ways to run the code. 149 | 150 | - If you are using an IDE, copy/paste your "config.json" file's path as an argument and press run button. 151 | - If you are an old-school command window lover, type `python main.py --config /path/to/config.json`. 152 | 153 | ### Training from Scratch-Training from Checkpoint-Interactive Evaluation 154 | 155 | You can train your model from 0th epoch until max_epoch, and/or continue your training from xth epoch to the end. You do not need to do anything extra for the first case; however, to be able to continue your training you need to make necessary changes in "config.json": 156 | 157 | - If `dataset_properties/checkpoint_path` is empty, the code will start a new training process. If you type your saved PyTorch model, the main flow will automatically load it and continue from where it left. 158 | - Additionally, you can provide saved vocabulary files for words (`dataset_properties/saved_sentence_vocab` (don't ask why it is sentence)) and labels (`dataset_properties/saved_category_vocab`). 159 | 160 | To be able to activate interactive evaluation, you need to make necessary changes in "config.json": 161 | 162 | - Change `model_properties/common_model_properties/run_mode`'s value to "eval_interactive". 163 | - Provide your model's path to be evaluated and your saved vocabulary files' path by using `evaluation_properties`. 164 | 165 | ## Results 166 | 167 | This section presents the Top-1 and Top-5 test accuracies for **text categorization task** of my experiments. Due to computational resource limit, I cannot test every single parameter/hyperparameter. In general, I hold algorithm parameters same for all experiments; however, I change embedding related parameters. I assume the result table is self-explanatory. As a final note, I *won't* share my best models and I *won't* guarantee reproducibility. Dataset splits (training/validation/test) are deterministic for all experiments, but anything else that needs random initialization is non-deterministic. 168 | 169 | Note: Epoch is set to 20 for all experiments, until further notice (last update: 31-10-2018). However, if I believe that results may improve, I let the experiment run for 10 more epochs (at most 30 epoch per experiments). 170 | 171 | Note 2 (**Update: 22-01-2019**): Most of the English-language experiments are executed in Google Cloud (by using 300$ initial credit). Since, I want to finish as many experiments as possible, I cannot increase the max_epoch from 20 to 30. In this experiments, I saw that validation loss and accuracies were improving in every epoch until the 20th, and I am pretty sure models can improve further. Unfortunately, I chose the maximum number of experiment runs instead of best results for each experiment in this trade-off. 172 | 173 | ### Test Results for TextCNN 174 | 175 | |#| Language | # Of Categories | Pre-trained Embedding | OOV Embedding | Embedding Training | Top-1 Test Accuracy | Top-5 Test Accuracy | 176 | |-|:--------:|:-----------------------------:|-----------------------|---------------|--------------------|:-------------------:|:-------------------:| 177 | |1|Turkish|25| Fasttext | zeros | static | 49.4565 | 76.2760 | 178 | |2|Turkish|25| Fasttext | zeros | nonstatic | 62.6054 | 86.3384 | 179 | |3|Turkish|25| Fasttext | Fasttext | static | 49.6810 | 75.2684 | 180 | |4|Turkish|25| Fasttext | Fasttext | nonstatic | 63.9391 | 87.9597 | 181 | |5|Turkish|49| Fasttext | zeros | static | 43.5519 | 68.4336 | 182 | |6|Turkish|49| Fasttext | zeros | nonstatic | 56.0081 | 79.8634 | 183 | |7|Turkish|49| Fasttext | Fasttext | static | 43.8025 | 68.8641 | 184 | |8|Turkish|49| Fasttext | Fasttext | nonstatic | 60.4009 | 82.7879 | 185 | |9|English|25| Fasttext | zeros | static | 56.2290 | 83.2425 | 186 | |10|English|25| Fasttext | zeros | nonstatic | 64.2642 | 89.2115 | 187 | |11|English|25| Fasttext | Fasttext | static | 56.5313 | 83.9873 | 188 | |12|English|25| Fasttext | Fasttext | nonstatic | 65.9558 | 91.1536 | 189 | |13|English|49| Fasttext | zeros | static | 51.3862 | 78.7806 | 190 | |14|English|49| Fasttext | zeros | nonstatic | 59.2086* | 84.8054 | 191 | |15|English|49| Fasttext | Fasttext | static | 51.7878 | 79.9472 | 192 | |16|English|49| Fasttext | Fasttext | nonstatic | 55.3833* | 80.4958 | 193 | 194 | * Note that the experiment 14 resulted with a better score than 16, unlike other similar setups. The main reason is, I changed the "learning_rate" of the optimizer to a smaller value for the experiment 16 (well, for the sake of the experiment =)), and it appears that smaller learning rate made the learning process a bit slower (in terms of number of epochs). If I can find a chance to run this experiment again in Google Cloud (a.k.a. have enough credit to run it one more time), I will update the learning rate properly. 195 | 196 | ## Previous Updates 197 | 198 | In this title, I will save the previous updates for me and the visitors to keep track. 199 | 200 | ## May 2019 201 | 202 | ### Update 28-05-2019 203 | 204 | - I've started to work on *Improving Language Understanding by Generative Pre-Training* version of Transformer architecture. 205 | - The project will contain only languange model and classifier heads, and I will also add NER head. Rest of the original heads in the paper can be found in huggingface's github =) 206 | - And yes, I know huggingface has a PyTorch implementation which I wrote to my [References](#references-for-code-development) ages ago. Obviously, I won't c/p the already implemented code since it does not fit to my architectural flow as well as its PyTorch version is pretty old. 207 | - The end product would be provide same results (if you be a good human being and pray to the randomization gods), however, the implementation will differ due to my architectural concerns and capabilities of the latest PyTorch. 208 | - I've also updated my library versions (using PyTorch 1.1 now) but I forgot updating requirement.txt (It will be updated in next commit). 209 | - As a side note, in the last version of PyTorch, MultiHeadedAttention is developed under "torch.nn". For testing and comparison purposes, I tried to use it; however, I could not make it work in my TransformerGoogle code (you can find the commented-out line in model file). Return value of this new method does not fit the rest of the methods in the model (it is solvable but I won't waste time for it). 210 | 211 | ## April 2019 212 | 213 | ### Update 30-04-2019 214 | 215 | - I know I said that I will push more updates 2 months ago, but failed to deliver it since I am a filthy lazy dude =) 216 | - Hopefully, I will push some new stuff in 2-3 weeks. Next week, I have a US conference trip and I will buy myself a brand new monster laptop for both gaming and faster training times. After that I will 217 | - Develop the last 2 transformer papers. 218 | - Develop BERT and ELMO embeddings to the flow (not the algorithms but loading/inferring the pretrained model stuff). Maybe I can include GPT-2 models from the latest OpenAI paper but I am not sure about it for now (since that model is being used mainly for text generation). 219 | - Semi-relevant update: After I got several e-mails from researchers around the world (even from European Commission =)), I decided to improve my [dataset](https://data.mendeley.com/datasets/cdcztymf4k/1). I came up with a plan which I will execute after I push the updates I mentioned above. So, if you have any ideas and/or requests for the new dataset, just let me know. 220 | 221 | ## March 2019 222 | 223 | ### Update 02-03-2019 224 | 225 | - Recently, I read "Adaptive Gradient Methods with Dynamic Bound of Learning Rate" paper ([paper](https://openreview.net/forum?id=Bkg3g2R9FX) - [github](https://github.com/Luolc/AdaBound)) and decided to add it into my project. The presented results are promising, but I have not tested this new optimizer in my own experiments (for now). 226 | - Since Adabound can be installed via `pip install`, I updated the requirement.txt. 227 | - `config.json` is also updated, two new parameters are added related to Adabound. 228 | - Finally, personal issues are finalizing (new job, better state of mind =)). 229 | - I plan to add more models for text categorization starting with new Transformer codes (check To-Do list for details). 230 | - After two new Transformer code, I decide to add Elmo and BERT support (I won't train my own models but will use pretrained Turkish-English models). 231 | 232 | 233 | ## References for Code Development 234 | 235 | Below repositories really helped me to write a decent and working code: 236 | - https://github.com/bamtercelboo/cnn-lstm-bilstm-deepcnn-clstm-in-pytorch 237 | - https://github.com/bentrevett/pytorch-sentiment-analysis 238 | - https://github.com/j-min/Dropouts/blob/master/Gaussian_Variational_Dropout.ipynb 239 | - https://github.com/felix-laumann/Bayesian_CNN/ 240 | - https://github.com/kefirski/variational_dropout/ 241 | - https://github.com/dreamgonfly/deep-text-classification-pytorch/ 242 | - https://pytorch.org/tutorials/beginner/nlp/advanced_tutorial.html 243 | - https://github.com/allenai/allennlp/blob/master/allennlp/modules/conditional_random_field.py 244 | - https://github.com/threelittlemonkeys/lstm-crf-pytorch/ 245 | - https://github.com/ymym3412/textcnn-conv-deconv-pytorch/blob/master/model.py 246 | - http://nlp.seas.harvard.edu/2018/04/03/attention.html 247 | - https://github.com/huggingface/pytorch-openai-transformer-lm 248 | -------------------------------------------------------------------------------- /changes in torchtext.txt: -------------------------------------------------------------------------------- 1 | ##################################################################################### 2 | ################################# dataset.py ######################################## 3 | ##################################################################################### 4 | def check_split_ratio(split_ratio): 5 | """Check that the split ratio argument is not malformed""" 6 | valid_ratio = 0. 7 | if isinstance(split_ratio, float): 8 | # Only the train set relative ratio is provided 9 | # Assert in bounds, validation size is zero 10 | assert 0. < split_ratio < 1., ( 11 | "Split ratio {} not between 0 and 1".format(split_ratio)) 12 | 13 | test_ratio = (1. - split_ratio) / 2 14 | valid_ratio = (1. - split_ratio) / 2 15 | return (split_ratio, test_ratio, valid_ratio) 16 | elif isinstance(split_ratio, list): 17 | # A list of relative ratios is provided 18 | length = len(split_ratio) 19 | assert length == 2 or length == 3, ( 20 | "Length of split ratio list should be 2 or 3, got {}".format(split_ratio)) 21 | 22 | # Normalize if necessary 23 | ratio_sum = sum(split_ratio) 24 | if not ratio_sum == 1.: 25 | split_ratio = [float(ratio) / ratio_sum for ratio in split_ratio] 26 | 27 | if length == 2: 28 | return tuple(split_ratio + [valid_ratio]) 29 | return tuple(split_ratio) 30 | else: 31 | raise ValueError('Split ratio must be float or a list, got {}' 32 | .format(type(split_ratio))) 33 | 34 | ##################################################################################### 35 | ################################# vocab.py ########################################## 36 | ##################################################################################### 37 | class Vectors(object): 38 | .... 39 | def __getitem__(self, token): 40 | if token in self.stoi: 41 | return self.vectors[self.stoi[token]] 42 | else: 43 | return self.unk_init(torch.Tensor(self.dim), token) 44 | ..... 45 | 46 | ##################################################################################### 47 | ################################# vocab.py ########################################## 48 | ##################################################################################### 49 | pretrained_aliases = { 50 | "charngram.100d": partial(CharNGram), 51 | "fasttext.en.300d": partial(FastText, language="en"), 52 | "fasttext.tr.300d": partial(FastText, language="tr"), # Added Turkish Fasttext 53 | "fasttext.simple.300d": partial(FastText, language="simple"), 54 | "glove.42B.300d": partial(GloVe, name="42B", dim="300"), 55 | "glove.840B.300d": partial(GloVe, name="840B", dim="300"), 56 | "glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim="25"), 57 | "glove.twitter.27B.50d": partial(GloVe, name="twitter.27B", dim="50"), 58 | "glove.twitter.27B.100d": partial(GloVe, name="twitter.27B", dim="100"), 59 | "glove.twitter.27B.200d": partial(GloVe, name="twitter.27B", dim="200"), 60 | "glove.6B.50d": partial(GloVe, name="6B", dim="50"), 61 | "glove.6B.100d": partial(GloVe, name="6B", dim="100"), 62 | "glove.6B.200d": partial(GloVe, name="6B", dim="200"), 63 | "glove.6B.300d": partial(GloVe, name="6B", dim="300") 64 | } 65 | 66 | ##################################################################################### 67 | ################################# utils.py ########################################## 68 | ##################################################################################### 69 | def unicode_csv_reader(unicode_csv_data, **kwargs): 70 | """Since the standard csv library does not handle unicode in Python 2, we need a wrapper. 71 | Borrowed and slightly modified from the Python docs: 72 | https://docs.python.org/2/library/csv.html#csv-examples""" 73 | if six.PY2: 74 | # csv.py doesn't do Unicode; encode temporarily as UTF-8: 75 | csv_reader = csv.reader(utf_8_encoder(unicode_csv_data), **kwargs) 76 | for row in csv_reader: 77 | # decode UTF-8 back to Unicode, cell by cell: 78 | yield [cell.decode('utf-8') for cell in row] 79 | else: 80 | for line in csv.reader(unicode_csv_data, quotechar=None, **kwargs): 81 | yield line -------------------------------------------------------------------------------- /config/README.md: -------------------------------------------------------------------------------- 1 | # README 2 | 3 | This folder only holds a configuration JSON file. This file is used to define all properties and parameters that are 4 | needed to be execute this project. 5 | 6 | ## Configuration Details 7 | 8 | `config.json` has 4 main property dictionaries: 9 | 10 | - `dataset_properties` defines the dataset reading/loading/preparing related properties. 11 | - `training_properties` defines all training-related properties and parameters algorithm/optimizer type, learning rate, 12 | decay rate, and so on. 13 | - `evaluation_properties` defines the necessary model/vocabulary paths for evaluation purposes. 14 | - `model_properties` defines anything that is needed to create a model. 15 | 16 | ### Dataset Properties 17 | 18 | There are 10 properties under this `dataset_properties`: 19 | 20 | - `data_path`: The original dataset file path (The current version of the code splits a full dataset to train/validation/test sets. But does not allow you to use an already splitted version). 21 | - `stop_word_path`: The stop word file path. 22 | - `embedding_vector`: Embedding alias that torchtext needs/uses while building vocabulary (predefined aliases can be found torchtext's vocab.py file). For instance "fasttext.en.300d", "fasttext.tr.300d", etc. 23 | - `vector_cache`: The embedding file path that torchtext creates. To prevent it to download same file over and over again, you need to provide its path. 24 | - `pretrained_embedding_path`: This is the original, Gensim readable, embedding files' path (note that only use case for this is Fasttext-based OOV word generation). 25 | - `checkpoint_path`: The path for saved model file that you want to continue your training. 26 | - `oov_embedding_type`: The property to define how to handle out-of-vocabulary word embeddings. It takes a string and can be "zeros", "ones", "random", "uniform" or "fasttext_oov". 27 | - `batch_size`: Self-explanatory. 28 | - `fixed_length`: Integer parameter to define the maximum length of an input string (lets say for a sentence, it limits the number of words). For values lower than or equal to 0, the dataset loader uses dynamic input size. It is for reducing the memory consumption. 29 | - `min_freq`: Integer parameter to define the minimum frequency condition on words to be appear in vocabulary. For values lower than or equal to 0, the dataset loader will assign every seen word (min_freq=1) in vocabulary. It is for reducing the memory consumption. 30 | 31 | ### Training Properties 32 | 33 | There are 14 properties under this `training_properties` which determines the learning algorithm, optimizer, optimizer's 34 | parameters and printing/saving/evaluating related stuff: 35 | 36 | - `learner`: String parameter to choose which learning algorithm to use. It can be "text_cnn", "gru", "lstm", "char_cnn", 37 | "vdcnn", "conv_deconv_cnn", "transformer_google" and "lstmcrf" (Last Update: 31-01-2019) 38 | - `task`: String parameter to choose which task the learner will be trained on. It can be "classification" or "ner". 39 | - `optimizer`: String parameter to choose which optimizer to use. It can be "Adam", "SGD", "OpenAIAdam", "Noam", "Padam" and "adabound". 40 | - `learning_rate`: Self-explanatory. Takes float value. 41 | - `final_learning_rate`: This float-valued parameter is used in Adabound optimizer which is the learning rate for SGD inside Adabound method. 42 | - `scheduler_type`: String parameter to choose a scheduler for OpenAIAdam optimizer (it has no usage for others). It can 43 | be "cos", "constant" or "linear". 44 | - `amsgrad`: Boolean parameter to choose whether to use amsgrad or not in Padam optimizer. 45 | - `amsbound`: Boolean parameter to choose whether to use amsbound or not in Adabound optimizer. 46 | - `partial_adam`: Float parameter to define the "partial" parameter in Padam. It can take values between (0, 0.5] 47 | - `weight_decay`: Float parameter for L2 normalization term. *Note that for my test cases, any value bigger than 0, 48 | literally fucked my performance.* 49 | - `momentum`: Self-explanatory (it is only for "SGD"). Takes float value. 50 | - `norm_ratio`: Gradient clipping ratio. Takes float value. 51 | - `topk`: Tuple value for top-k accuracy calculations (Default: (1, 5)). It is tuple because I c/p related code from Pytorch's 52 | imagenet example without modifying it. It does not have any effects on training, it is for logging/monitoring purposes. 53 | - `print_every_batch_step`: Print loss and accuracy at every x step. 54 | - `save_every_epoch`: Save the model at every epoch. 55 | - `eval_every`: Run the trained model for validation set at every epoch. 56 | 57 | ### Evaluation Properties 58 | 59 | There are 3 properties under this `evaluation_properties`: 60 | 61 | - `model_path`: The path for the model file that you want to evaluate. 62 | - `sentence_vocab`: Saved vocabulary (for words) file path. 63 | - `category_vocab`: Saved vocabulary (for labels) file path. 64 | 65 | ### Model Properties 66 | 67 | This is the biggest and longest set of properties. Obviously, it tends to get bigger as long as I add new models. The 68 | root `model_properties` contains several inner dictionaries. The first inner dictionary is `common_model_properties` which 69 | defines the common things that are not change w.r.t. selected learning algorithm. The rest of the inner dictionaries are 70 | the learning algorithms that are developed in this repository. 71 | 72 | - `common_model_properties`: 73 | - `run_mode`: String parameter to define the main executing job. It can be either "train" or "eval_interactive". 74 | - `use_pretrained_embed`: Boolean parameter to define whether the learning algorithm uses pretrained embeddings or not. 75 | - `embed_train_type`: String parameter to define whether the embedding layer is trainable or not. It can be "static", 76 | "nonstatic" or "multichannel". Except Text CNN model, "multichannel" embeddings are not used! 77 | - `use_batch_norm`: Boolean parameter to determine the batch normalization usage. 78 | - `batch_norm_momentum`: Float parameter to define batch normalization momentum parameter. 79 | - `batch_norm_affine`: Boolean parameter to define whether batch normalization uses affine or not. 80 | - `text_cnn`: 81 | - `use_padded_conv`: Boolean parameter to define whether convolution layer pads the input or not. 82 | - `dropout_type`: String parameter to choose which dropout method to use. It can be "bernoulli", "gaussian" or 83 | "variational" 84 | - `keep_prob`: Float parameter to define the dropout's keeping probability. 85 | - `filter_count`: Integer parameter to define the filter count in the convolutional layer. 86 | - `filter_sizes`: List of integers parameter to define the filter sizes in the convolutional layer. Default value is 87 | [3, 4, 5]. Size of the list is not constant/limited/pre-determined! 88 | - `char_cnn`: 89 | - `dropout_type`: String parameter to choose which dropout method to use. It can be "bernoulli", "gaussian" or 90 | "variational" 91 | - `keep_prob`: Float parameter to define the dropout's keeping probability. 92 | - `max_sequence_length`: Integer parameter to define the maximum sequence length (in terms of characters). Default value 93 | is 1014 (as it is defined in CharCNN article). 94 | - `feature_size`: String parameter to define network size. It can be "large" (conv_filter_count=1024, linear_unit_count=2048), 95 | "small" (conv_filter_count=256, linear_unit_count=1024) or ""(empty string) 96 | - `filter_count`: Integer parameter to define filter count in the convolutional layer. If feature_size is empty, all 97 | convolution layers will use this parameter as the filter_count value. 98 | - `filter_sizes`: : List of integers parameter to define the filter sizes in the convolutional layers. Default value 99 | is [7, 7, 3, 3, 3, 3]. Size of the list is limited to 6. 100 | - `max_pool_kernels`: List of integers parameter to define the kernel sizes in the maxpooling layers. Default value 101 | is [3, 3, 3]. Size of the list is limited to 3. 102 | - `linear_unit_count`: Integer parameter to define the number of hidden units in the fully connected layer. If feature_size 103 | is empty, fully connected layer will use this parameter as the linear_unit_count value. 104 | - `vdcnn`: 105 | - `keep_prob`: Float parameter to define the dropout's keeping probability. This model only use "bernoulli" dropout. 106 | - `depth`: Integer parameter to define the depth of the network. It can be 9, 17, 29, or 49. 107 | - `filter_counts`: List of integers parameter to define the filter counts in the convolutional layers. Default value 108 | is [64, 128, 256, 512]. Size of the list is limited to 4. 109 | - `filter_size`: Integer parameter to define the filter size for convolutional layers. All layers use the same size. 110 | - `use_shortcut`: Boolean parameter to determine shortcut usage in VDCNN model. 111 | - `downsampling_type`: String parameter to define downsampling method. It can be "resnet", "vgg" or kmax". 112 | - `maxpool_filter_size`: Integer parameter that defines kernel size for all maxpooling operations. 113 | - `kmax`: An integer parameter that defines "k" value for KMaxPooling operation. 114 | - `conv_deconv_cnn`: 115 | - `keep_prob`: Float parameter to define the dropout's keeping probability. This model only use "bernoulli" dropout. 116 | - `filter_counts`: List of integers parameter to define the filter counts in the encoder convolutional layers. For 117 | deconvonvolution part, this parameter is reversed. Default value is [300, 600, 500]. Size of the list is limited to 3. 118 | - `filter_sizes`: List of integers parameter to define the filter sizes in the encoder convolutional layers. For 119 | deconvonvolution part, this parameter is reversed. Default value is [5, 5, 12]. Size of the list is limited to 3. 120 | - `strides`: List of integers parameter to define the strides in the encoder convolutional layers. For 121 | deconvonvolution part, this parameter is reversed. Default value is [2, 2, 1]. Size of the list is limited to 3. 122 | - `temperature`: Float parameter to define temperature parameter of the Deconvolution stage. 123 | - `hidden_layer_size`: Integer parameter to define the number of hidden units in the Classifier stage . 124 | - `gru`: 125 | - `dropout_type`: String parameter to choose which dropout method to use. It can be "bernoulli", "gaussian" or 126 | "variational" 127 | - `keep_prob`: Float parameter to define the dropout's keeping probability. 128 | - `hidden_dim`: Integer parameter to define the hidden dimension. 129 | - `num_layers`: Integer parameter to define the number of GRU layers. 130 | - `bidirectional`: Boolean parameter to define bidirectionality. 131 | - `bias`: Boolean parameter to define the usage of bias. 132 | - `lstm`: 133 | - `dropout_type`: String parameter to choose which dropout method to use. It can be "bernoulli", "gaussian" or 134 | "variational" 135 | - `keep_prob`: Float parameter to define the dropout's keeping probability. 136 | - `hidden_dim`: Integer parameter to define the hidden dimension. 137 | - `num_layers`: Integer parameter to define the number of GRU layers. 138 | - `bidirectional`: Boolean parameter to define bidirectionality. 139 | - `bias`: Boolean parameter to define the usage of bias. 140 | - `transformer_google`: 141 | - `use_embed_sqrt_mul`: Boolean parameter to initialize embeddings by multiplying it with the square root of the 142 | model size. Initially, its value is False. 143 | - `keep_prob_encoder`: Float parameter to define the dropout's keeping probability in encoder. This model only use 144 | "bernoulli" dropout. 145 | - `keep_prob_pe`: Float parameter to define the dropout's keeping probability in positional embeddings. This model 146 | only use "bernoulli" dropout. 147 | - `keep_prob_pff`": Float parameter to define the dropout's keeping probability in positional feed-forward. This model 148 | only use "bernoulli" dropout. 149 | - `keep_prob_attn`: Float parameter to define the dropout's keeping probability in attention. This model only use 150 | "bernoulli" dropout. 151 | - `keep_prob_clf`: Float parameter to define the dropout's keeping probability in classifier. This model only use 152 | "bernoulli" dropout. 153 | - `transformer_type`: String parameter to define the job of the transformer model. Currently, it can only take "classifier" 154 | value. 155 | - `heads`: Integer parameter to define the number of parallel attention layers. 156 | - `num_encoder_layers`: Integer parameter to define the number of encoder layers. 157 | - `num_hidden_pos_ff`: Integer parameter to define number of hidden units in position-wise feed-forward network. 158 | - `max_length`: Integer parameter to define the maximum length of the input. Default value is 5000. 159 | -------------------------------------------------------------------------------- /config/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hbahadirsahin/nlp-experiments-in-pytorch/5d05a53146dffd707e4d037230656f980d7be05c/config/__init__.py -------------------------------------------------------------------------------- /config/config.json: -------------------------------------------------------------------------------- 1 | { 2 | "dataset_properties": { 3 | "data_path": "./data/turkish_test.DUMP", 4 | "batch_size": 32, 5 | "stop_word_path": "./data/stopwords/turkish", 6 | "embedding_vector": "fasttext.tr.300d", 7 | "vector_cache": "./data/fasttext", 8 | "pretrained_embedding_path": "./data/fasttext/wiki.tr", 9 | "checkpoint_path": "", 10 | "oov_embedding_type": "zeros", 11 | "fixed_length": 0, 12 | "min_freq": 0 13 | }, 14 | "model_properties": { 15 | "common_model_properties": { 16 | "run_mode": "train", 17 | "use_pretrained_embed": true, 18 | "embed_train_type": "static", 19 | "use_batch_norm": true, 20 | "batch_norm_momentum": 0.1, 21 | "batch_norm_affine": false 22 | }, 23 | "text_cnn": { 24 | "use_padded_conv": true, 25 | "dropout_type": "bernoulli", 26 | "keep_prob": 0.5, 27 | "filter_count": 64, 28 | "filter_sizes": [ 29 | 3, 30 | 4, 31 | 5 32 | ] 33 | }, 34 | "char_cnn": { 35 | "dropout_type": "bernoulli", 36 | "keep_prob": 0.5, 37 | "max_sequence_length": 1014, 38 | "feature_size": "large", 39 | "filter_count": 1024, 40 | "filter_sizes": [ 41 | 7, 42 | 7, 43 | 3, 44 | 3, 45 | 3, 46 | 3 47 | ], 48 | "max_pool_kernels": [ 49 | 3, 50 | 3, 51 | 3 52 | ], 53 | "linear_unit_count": 2048 54 | }, 55 | "vdcnn": { 56 | "keep_prob": 0.5, 57 | "depth": 9, 58 | "filter_counts": [ 59 | 64, 60 | 128, 61 | 256, 62 | 512 63 | ], 64 | "filter_size": 3, 65 | "use_shortcut": true, 66 | "downsampling_type": "resnet", 67 | "maxpool_filter_size": 3, 68 | "kmax": 8 69 | }, 70 | "conv_deconv_cnn": { 71 | "keep_prob": 0.5, 72 | "filter_counts": [ 73 | 300, 74 | 600, 75 | 500 76 | ], 77 | "filter_sizes": [ 78 | 5, 79 | 5, 80 | 12 81 | ], 82 | "strides": [ 83 | 2, 84 | 2, 85 | 1 86 | ], 87 | "temperature": 0.01, 88 | "hidden_layer_size": 500 89 | }, 90 | "gru": { 91 | "dropout_type": "bernoulli", 92 | "keep_prob": 0.5, 93 | "hidden_dim": 300, 94 | "num_layers": 1, 95 | "bidirectional": false, 96 | "bias": true 97 | }, 98 | "lstm": { 99 | "dropout_type": "bernoulli", 100 | "keep_prob": 0.5, 101 | "hidden_dim": 100, 102 | "num_layers": 2, 103 | "bidirectional": true, 104 | "bias": true 105 | }, 106 | "transformer_google": { 107 | "use_embed_sqrt_mul": false, 108 | "keep_prob_encoder": 0.1, 109 | "keep_prob_pe": 0.1, 110 | "keep_prob_pff": 0.1, 111 | "keep_prob_attn": 0.1, 112 | "keep_prob_clf": 0.5, 113 | "transformer_type": "classifier", 114 | "heads": 6, 115 | "num_encoder_layers": 6, 116 | "num_hidden_pos_ff": 2048, 117 | "max_length": 5000 118 | } 119 | }, 120 | "training_properties": { 121 | "learner": "transformer_google", 122 | "task": "classification", 123 | "optimizer": "SGD", 124 | "learning_rate": 0.05, 125 | "final_learning_rate": 0.1, 126 | "scheduler_type": "cos", 127 | "amsgrad": true, 128 | "amsbound": false, 129 | "partial_adam": 0.25, 130 | "weight_decay": 0, 131 | "momentum": 0.9, 132 | "norm_ratio": 5, 133 | "epoch": 10, 134 | "print_every_batch_step": 250, 135 | "save_every_epoch": 1, 136 | "topk": [ 137 | 5, 138 | 1 139 | ], 140 | "eval_every": 1 141 | }, 142 | "evaluation_properties": { 143 | "model_path": "D:/PyTorchNLP/saved/2018-12-06/", 144 | "sentence_vocab": "D:/PyTorchNLP/saved/vocab/sentence_vocab.dat", 145 | "category_vocab": "D:/PyTorchNLP/saved/vocab/category_vocab.dat" 146 | } 147 | } -------------------------------------------------------------------------------- /config/config.logger: -------------------------------------------------------------------------------- 1 | [loggers] 2 | keys=root 3 | 4 | [handlers] 5 | keys=consoleHandler 6 | 7 | [formatters] 8 | keys=sampleFormatter 9 | 10 | [logger_root] 11 | level=INFO 12 | handlers=consoleHandler 13 | 14 | [handler_consoleHandler] 15 | class=StreamHandler 16 | level=INFO 17 | formatter=sampleFormatter 18 | args=(sys.stdout,) 19 | 20 | [formatter_sampleFormatter] 21 | format=%(asctime)s - %(name)s - %(levelname)s - %(message)s -------------------------------------------------------------------------------- /crf/CRF.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | 4 | from utils.utils import log_sum_exp 5 | 6 | 7 | class ConditionalRandomField(nn.Module): 8 | def __init__(self, args): 9 | super(ConditionalRandomField, self).__init__() 10 | 11 | self.args_common = args["common_model_properties"] 12 | 13 | self.tag_size = self.args_common["num_tags"] 14 | self.start_id = self.args_common["start_id"] 15 | self.end_id = self.args_common["end_id"] 16 | self.pad_id = self.args_common["padding_id"] 17 | 18 | # Device 19 | self.device = self.args_common["device"] 20 | 21 | # Matrix of transition parameters. Entry i,j is the score of transitioning *to* i *from* j 22 | self.transition = nn.Parameter(torch.Tensor(self.tag_size, self.tag_size)).to(self.device) 23 | 24 | self.transition.data[self.start_id, :] = -10000. # no transition to start 25 | self.transition.data[:, self.end_id] = -10000. # no transition from end except to pad 26 | self.transition.data[:, self.pad_id] = -10000. # no transition from pad except to pad 27 | self.transition.data[self.pad_id, :] = -10000. # no transition to pad except from end 28 | self.transition.data[self.pad_id, self.end_id] = 0. 29 | self.transition.data[self.pad_id, self.pad_id] = 0. 30 | 31 | torch.nn.init.xavier_normal_(self.transition) 32 | 33 | def _forward(self, x, mask): 34 | # initialize forward variables in log space 35 | batch_size, seq_length, tag_size = x.size() 36 | 37 | # Size of init_alphas = [Batch_size, Tag_size] 38 | alpha = torch.full((batch_size, self.tag_size), -10000.).to(self.device) 39 | alpha[:, self.start_id] = 0. 40 | 41 | # Size of transition_scores = [1, Tag_size, Tag_size] 42 | transition_scores = self.transition.unsqueeze(0) 43 | 44 | for i in range(seq_length): 45 | mask_broadcast = mask[:, i].unsqueeze(1) 46 | 47 | # Size of emition_scores = [Batch_size, Tag_Size, 1] 48 | emition_scores = x[:, i].unsqueeze(2) 49 | 50 | # Size of alpha_broadcast: [Batch_Size, Tag_Size, Tag_Size] 51 | alpha_broadcast = log_sum_exp(alpha.unsqueeze(1) + emition_scores + transition_scores) 52 | 53 | # Size of alpha: [Batch_size, Tag_Size] 54 | alpha = alpha_broadcast * mask_broadcast + alpha * (1 - mask_broadcast) 55 | return log_sum_exp(alpha + self.transition[self.end_id]) 56 | 57 | def _score(self, x, tags, mask): 58 | batch_size, seq_length, tag_size = x.size() 59 | 60 | score = torch.zeros(batch_size).to(self.device) 61 | 62 | x = x.unsqueeze(3) 63 | trans = self.transition.unsqueeze(2) 64 | 65 | for t in range(seq_length - 1): # recursion through the sequence 66 | mask_broadcast = mask[:, t] 67 | emition_scores = torch.cat([x[t, tag[t + 1]] for x, tag in zip(x, tags)]) 68 | transition_scores = torch.cat([trans[tag[t + 1], tag[t]] for tag in tags]) 69 | score += (emition_scores + transition_scores) * mask_broadcast 70 | 71 | last_tag = tags.gather(1, mask.sum(1).long().unsqueeze(1)).squeeze(1) 72 | score += self.transition[self.end_id, last_tag] 73 | return score 74 | 75 | def forward(self, input, tags, mask, reduce=True): 76 | forward_score = self._forward(input, mask) 77 | gold_score = self._score(input, tags, mask) 78 | nll = forward_score - gold_score 79 | return nll if not reduce else torch.mean(nll) 80 | 81 | def viterbi_decode(self, x, mask): 82 | batch_size, seq_length, tag_size = x.size() 83 | 84 | backpointers = torch.LongTensor().to(self.device) 85 | # Initialize the viterbi variables in log space 86 | path_score = torch.full((batch_size, self.tag_size), -10000.).to(self.device) 87 | path_score[:, self.start_id] = 0. 88 | 89 | for next_tag in range(seq_length): 90 | mask_broadcast = mask[:, next_tag].unsqueeze(1) 91 | path_score_broadcast = path_score.unsqueeze(1) + self.transition 92 | path_score_broadcast, backpointer_broadcast = torch.max(path_score_broadcast, 2) 93 | path_score_broadcast += x[:, next_tag] 94 | path_score = path_score_broadcast * mask_broadcast + path_score * (1 - mask_broadcast) 95 | backpointers = torch.cat((backpointers, backpointer_broadcast.unsqueeze(1)), 1) 96 | 97 | path_score += self.transition[self.end_id] 98 | best_path_scores, best_tag_ids = torch.max(path_score, 1) 99 | 100 | backpointers = backpointers.tolist() 101 | best_paths = [[tag_id] for tag_id in best_tag_ids.tolist()] 102 | 103 | for batch in range(batch_size): 104 | best_tag = best_tag_ids[batch] 105 | idx = int(mask[batch].sum().item()) 106 | for bptr_t in reversed(backpointers[batch][:idx]): 107 | best_tag = bptr_t[best_tag] 108 | best_paths[batch].append(best_tag) 109 | best_paths[batch].pop() 110 | best_paths[batch].reverse() 111 | 112 | return best_path_scores, best_paths 113 | 114 | 115 | -------------------------------------------------------------------------------- /crf/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hbahadirsahin/nlp-experiments-in-pytorch/5d05a53146dffd707e4d037230656f980d7be05c/crf/__init__.py -------------------------------------------------------------------------------- /datahelper/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hbahadirsahin/nlp-experiments-in-pytorch/5d05a53146dffd707e4d037230656f980d7be05c/datahelper/__init__.py -------------------------------------------------------------------------------- /datahelper/dataset_reader.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | 3 | import logging.config 4 | import random 5 | 6 | from torchtext import data 7 | 8 | from datahelper.preprocessor import Preprocessor 9 | from embedding_helper import OOVEmbeddingCreator 10 | 11 | SEED = 1234 12 | 13 | logging.config.fileConfig(fname='D:/PyTorchNLP/config/config.logger', disable_existing_loggers=False) 14 | logger = logging.getLogger("Dataset Loader") 15 | 16 | 17 | class DatasetLoader(object): 18 | def __init__(self, data_path, vector, task, fix_length=0, min_freq=0, level="word", unk_init=None, 19 | preprocessor=None, 20 | vector_cache=None): 21 | assert data_path is not None and vector is not None 22 | 23 | self.data_path = data_path 24 | self.vector = vector 25 | self.preprocessor = preprocessor 26 | self.vector_cache = vector_cache 27 | self.unk_init = unk_init 28 | self.level = level 29 | self.task = task 30 | 31 | self.sentence_vocab = None 32 | self.category_vocab = None 33 | self.ner_vocab = None 34 | 35 | self.sentence_vocab_vectors = None 36 | 37 | self.train_iter = None 38 | self.val_iter = None 39 | self.test_iter = None 40 | 41 | self.min_freq = min_freq 42 | self.fix_length = fix_length 43 | if min_freq < 0: 44 | self.min_freq = 0 45 | if fix_length <= 0: 46 | self.fix_length = None 47 | 48 | ''' 49 | This method is for character-level stuff. 50 | Since, torchtext do not let me do preprocess before tokenization in its normal flow (it always applies tokenization 51 | first, then does preprocess), I listened an advice from stackoverflow and wrote my own tokenizer which does 52 | preprocess first, then applies tokenize preprocessed sentence into characters. Not happy with it, but it works =) 53 | :param sentence: 54 | :return: tokenized_sentence 55 | ''' 56 | 57 | def evil_workaround_tokenizer(self, sentence): 58 | preprocessed_sentence = self.preprocessor(sentence) 59 | tokenized_sentence = list(preprocessed_sentence) 60 | return tokenized_sentence 61 | 62 | def create_fields(self, seq_input=True, seq_ner=True, seq_cat=False): 63 | if self.level == "word": 64 | sentence_field = data.Field(sequential=seq_input, preprocessing=self.preprocessor, fix_length=self.fix_length, 65 | init_token="", eos_token="") 66 | elif self.level == "char": 67 | sentence_field = data.Field(sequential=seq_input, tokenize=self.evil_workaround_tokenizer, fix_length=1014) 68 | # sentence_field = data.NestedField(nested_field) 69 | else: 70 | raise KeyError("Sentence_field is undefined!") 71 | 72 | ner_label_field = data.Field(sequential=seq_ner, init_token="", eos_token="", unk_token=None) 73 | category_label_field = data.LabelField(sequential=seq_cat) 74 | return sentence_field, ner_label_field, category_label_field 75 | 76 | def read_dataset(self, batch_size=128, split_ratio=0.7, format="tsv"): 77 | sf, nlf, clf = self.create_fields() 78 | if self.task == "classification": 79 | dataset = data.TabularDataset(path=self.data_path, 80 | format=format, 81 | skip_header=True, 82 | fields=[("category_labels", clf), 83 | ("ner_labels", None), 84 | ("sentence", sf)]) 85 | elif self.task == "ner": 86 | dataset = data.TabularDataset(path=self.data_path, 87 | format=format, 88 | skip_header=True, 89 | fields=[("category_labels", None), 90 | ("ner_labels", nlf), 91 | ("sentence", sf)]) 92 | else: 93 | raise ValueError("Training task is not defined! It can be 'classification' or 'ner'") 94 | 95 | logger.info("Splitting dataset into train/dev/test") 96 | train, val, test = self.create_splits(dataset, split_ratio) 97 | logger.info("Splitting done!") 98 | logger.info("Creating vocabulary") 99 | self.create_vocabs(dataset, sf, clf, nlf) 100 | logger.info("Vocabulary created!") 101 | logger.info("Creating iterators") 102 | self.create_iterator(train, val, test, batch_size) 103 | return train, val, test 104 | 105 | @staticmethod 106 | def create_splits(dataset, split_ratio): 107 | return dataset.split(split_ratio=split_ratio, random_state=random.seed(SEED)) 108 | 109 | def create_vocabs(self, train, sentence_field, category_label_field, ner_label_field): 110 | if self.level == "word": 111 | sentence_field.build_vocab(train, vectors=self.vector, vectors_cache=self.vector_cache, 112 | unk_init=self.unk_init, min_freq=self.min_freq) 113 | else: 114 | sentence_field.build_vocab(train) 115 | 116 | self.sentence_vocab = sentence_field.vocab 117 | self.sentence_vocab_vectors = sentence_field.vocab.vectors 118 | 119 | if self.task == "classification": 120 | category_label_field.build_vocab(train) 121 | self.category_vocab = category_label_field.vocab 122 | else: 123 | ner_label_field.build_vocab(train) 124 | self.ner_vocab = ner_label_field.vocab 125 | 126 | def create_iterator(self, train, val, test, batch_size): 127 | self.train_iter, self.val_iter, self.test_iter = data.BucketIterator.splits(datasets=(train, val, test), 128 | batch_sizes=( 129 | batch_size, batch_size, 130 | batch_size), 131 | sort=True, 132 | sort_key=lambda x: -len(x.sentence), 133 | sort_within_batch=False, 134 | repeat=False) 135 | 136 | 137 | if __name__ == '__main__': 138 | stop_word_path = "D:/Anaconda3/nltk_data/corpora/stopwords/turkish" 139 | data_path = "D:/PyTorchNLP/data/turkish_test.DUMP" 140 | vector_cache = "D:/PyTorchNLP/data/fasttext" 141 | level = "word" 142 | is_char_level = False 143 | task = "ner" 144 | 145 | if task == "ner": 146 | stop_word_path = None 147 | 148 | preprocessor = Preprocessor(stop_word_path, 149 | is_remove_digit=False, 150 | is_remove_punctuations=False, 151 | is_char_level=is_char_level) 152 | 153 | unkembedding = OOVEmbeddingCreator(type="zeros", 154 | fasttext_model_path="D:/PyTorchNLP/data/fasttext/wiki.tr") 155 | 156 | dataset_helper = DatasetLoader(data_path=data_path, 157 | vector="fasttext.tr.300d", 158 | preprocessor=preprocessor.preprocess, 159 | level=level, 160 | vector_cache=vector_cache, 161 | unk_init=unkembedding.create_oov_embedding, 162 | min_freq=0, 163 | fix_length=0, 164 | task=task 165 | ) 166 | 167 | print("Reading dataset") 168 | train, val, test = dataset_helper.read_dataset(batch_size=32) 169 | print(len(train), "-", len(val), "-", len(test)) 170 | sentence_vocab = dataset_helper.sentence_vocab 171 | category_vocab = dataset_helper.category_vocab 172 | ner_vocab = dataset_helper.ner_vocab 173 | 174 | print("Vocab:", len(sentence_vocab)) 175 | # print("Vocab:", len(category_vocab)) 176 | print("Most freq:", sentence_vocab.freqs.most_common(20)) 177 | # print("Most freq:", category_vocab.freqs.most_common(20)) 178 | print("Itos:", sentence_vocab.itos[:50]) 179 | print("Itos:", ner_vocab.itos) 180 | print("Stoi:", ner_vocab.stoi) 181 | # print("Stoi:", category_vocab.stoi) 182 | 183 | train_iter = dataset_helper.train_iter 184 | val_iter = dataset_helper.val_iter 185 | test_iter = dataset_helper.test_iter 186 | 187 | print("Train iter size:", len(train_iter)) 188 | print("Val iter size:", len(val_iter)) 189 | print("Test iter size:", len(test_iter)) 190 | 191 | for idx, batch in enumerate(val_iter): 192 | batch_x = batch.sentence 193 | # batch_category = batch.category_labels 194 | batch_ner = batch.ner_labels 195 | batch_x = batch_x.permute(1, 0) 196 | batch_ner = batch_ner.permute(1, 0) 197 | print(idx, ":", batch_x.size()) 198 | print(idx, ":", batch_ner.size()) 199 | # print(batch_x) 200 | # print(batch_category) 201 | # print(batch_ner) 202 | # batch_x = torch.reshape(batch_x, (batch_x.size(0), batch_x.size(1)*batch_x.size(2))) 203 | for idx2, x in enumerate(batch_x): 204 | if dataset_helper.level == "word": 205 | s = [sentence_vocab.itos[i] for i in x] 206 | n = [ner_vocab.itos[i] for i in batch_ner[idx2]] 207 | else: 208 | # s = [sentence_vocab.itos[char] for sentence in batch_x for word in sentence for char in word] 209 | s = [sentence_vocab.itos[idx] for idx in x] 210 | 211 | print(idx2, "(", len(x), ")-", x) 212 | print(idx2, "(", len(s), ")-", s) 213 | print(idx2, "(", len(batch_ner[idx2]), ")-", batch_ner[idx2]) 214 | print(idx2, "(", len(n), ")-", n) 215 | print("") 216 | 217 | 218 | -------------------------------------------------------------------------------- /datahelper/embedding_helper.py: -------------------------------------------------------------------------------- 1 | import logging.config 2 | 3 | import torch 4 | from gensim.models import FastText 5 | 6 | logging.config.fileConfig(fname="D:/PyTorchNLP/config/config.logger", disable_existing_loggers=False) 7 | logger = logging.getLogger("Embedding Helper") 8 | 9 | class OOVEmbeddingCreator(object): 10 | def __init__(self, type="zeros", range=(-0.25, 0.25), fasttext_model_path="None"): 11 | assert type == "zeros" or type == "ones" or type == "random" or type == "uniform" or type == "fasttext_oov" 12 | self.type = type 13 | self.range = range 14 | self.fasttext_model_path = fasttext_model_path 15 | self.random_emb = None 16 | self.uniform_emb = None 17 | logger.info("> OOV Embedding mode: %s", self.type) 18 | if self.type == "fasttext_oov": 19 | assert self.fasttext_model_path is not None 20 | logger.info(">> Fasttext model will be loaded and embeddings for OOV words will be calculated by using it!") 21 | logger.info(">> Beware that the process may take a while due to this process!") 22 | self.model = FastText.load_fasttext_format(self.fasttext_model_path) 23 | 24 | def create_oov_embedding(self, vector, word=None): 25 | if self.type == "zeros": 26 | return torch.zeros(vector.size()) 27 | elif self.type == "ones": 28 | return torch.ones(vector.size()) 29 | elif self.type == "random": 30 | if self.random_emb is None: 31 | self.random_emb = torch.randn(vector.size()) 32 | return self.random_emb 33 | elif self.type == "uniform": 34 | if self.uniform_emb is None: 35 | self.uniform_emb = torch.FloatTensor(vector.size()).uniform_(self.range[0], self.range[1]) 36 | return self.uniform_emb 37 | elif self.type == "fasttext_oov": 38 | try: 39 | res = torch.from_numpy(self.model.wv.word_vec(word)) 40 | except KeyError: 41 | res = torch.randn(vector.size()) 42 | return res 43 | -------------------------------------------------------------------------------- /datahelper/preprocessor.py: -------------------------------------------------------------------------------- 1 | import re 2 | import string 3 | 4 | all_letters = string.ascii_letters + ".,;" 5 | 6 | 7 | class Preprocessor(object): 8 | def __init__(self, stop_word_path=None, is_remove_digit=True, is_remove_punctuations=True, is_char_level=False): 9 | self.stop_word_path = stop_word_path 10 | if stop_word_path is not None: 11 | self.stop_words = self.load_stop_words(stop_word_path) 12 | self.is_remove_digit = is_remove_digit 13 | self.is_remove_punctuations = is_remove_punctuations 14 | self.is_char_level = is_char_level 15 | 16 | @staticmethod 17 | def load_stop_words(path): 18 | return set(line.strip() for line in open(path)) 19 | 20 | @staticmethod 21 | def remove_line_breaks(sentence): 22 | return sentence.replace("\r", "").replace("\n", "") 23 | 24 | @staticmethod 25 | def remove_punctuations(sentence): 26 | sentence = sentence.replace('"', '') 27 | sentence = sentence.replace("'", "") 28 | return "".join([ch for ch in sentence if ch not in string.punctuation]) 29 | 30 | @staticmethod 31 | def remove_multiple_white_spaces(sentence): 32 | return " ".join(sentence.split()) 33 | 34 | def remove_stop_words(self, sentence): 35 | return " ".join([word for word in sentence.split() if word not in self.stop_words]) 36 | 37 | @staticmethod 38 | def to_lowercase(sentence): 39 | return sentence.lower() 40 | 41 | @staticmethod 42 | def remove_digits(sentence): 43 | return "".join([word for word in sentence if not word.isdigit()]) 44 | 45 | @staticmethod 46 | def replace_digits(sentence): 47 | return re.sub("\d+", "", sentence) 48 | 49 | @staticmethod 50 | def replace_digits_in_char_level(sentence): 51 | return [re.sub("\d+", "", i) for i in sentence] 52 | 53 | @staticmethod 54 | def change_space_to_special_token(sentence): 55 | return ["" if i == " " else i for i in sentence] 56 | 57 | @staticmethod 58 | def remove_alphanumeric(sentence): 59 | return "".join([word for word in sentence if not word.isalnum()]) 60 | 61 | @staticmethod 62 | def remove_non_utf8(sentence): 63 | return bytes(sentence, "utf-8").decode("utf-8", "ignore") 64 | 65 | @staticmethod 66 | def change_currency_characters(sentence): 67 | return sentence.replace('$', 'dolar').replace('£', 'sterlin').replace('€', 'euro') 68 | 69 | @staticmethod 70 | def change_space_to_special_token(sentence): 71 | return ["" if i == " " else i for i in sentence] 72 | 73 | def preprocess(self, sentence): 74 | # TorchText returns a list of words instead of a normal sentence. 75 | # First, create the sentence again. Then, do preprocess. Finally, return the preprocessed sentence as list 76 | # of words 77 | if self.is_char_level: 78 | x = sentence 79 | else: 80 | x = " ".join(sentence) 81 | x = self.to_lowercase(x) 82 | x = self.change_currency_characters(x) 83 | 84 | if self.is_remove_punctuations: 85 | x = self.remove_punctuations(x) 86 | 87 | if self.stop_word_path is not None: 88 | x = self.remove_stop_words(x) 89 | 90 | if self.is_remove_digit: 91 | x = self.remove_digits(x) 92 | elif self.is_remove_digit is False and self.is_char_level is False: 93 | x = self.replace_digits(x) 94 | 95 | x = self.remove_line_breaks(x) 96 | x = self.remove_multiple_white_spaces(x) 97 | 98 | if self.is_char_level: 99 | x = list((x.strip())) 100 | x = self.change_space_to_special_token(x) 101 | x = self.replace_digits_in_char_level(x) 102 | return x 103 | else: 104 | return (x.strip()).split() 105 | 106 | 107 | if __name__ == '__main__': 108 | stop_word_path = "D:/Anaconda3/nltk_data/corpora/stopwords/turkish" 109 | dataset_path = "D:/PyTorchNLP/data/twnertc_basic_tr.DUMP" 110 | 111 | preprocessor = Preprocessor(stop_word_path, 112 | is_remove_digit=False, 113 | is_remove_punctuations=False) 114 | 115 | with open(dataset_path, encoding="utf-8") as dataset: 116 | for counter, line in enumerate(dataset): 117 | line_tokens = line.split('\t') 118 | sentence = line_tokens[2] 119 | print(" > ", sentence) 120 | print(" = ", preprocessor.preprocess(sentence)) 121 | if counter == 10: 122 | break 123 | -------------------------------------------------------------------------------- /dropout_models/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hbahadirsahin/nlp-experiments-in-pytorch/5d05a53146dffd707e4d037230656f980d7be05c/dropout_models/__init__.py -------------------------------------------------------------------------------- /dropout_models/dropout.py: -------------------------------------------------------------------------------- 1 | import torch.nn as nn 2 | from dropout_models.gaussian_dropout import GaussianDropout 3 | from dropout_models.variational_dropout import VariationalDropout 4 | 5 | 6 | class Dropout(object): 7 | def __init__(self, keep_prob=0.5, dimension=None, dropout_type="bernoulli"): 8 | self.keep_prob = keep_prob 9 | self.dimension = dimension 10 | self.dropout_type = dropout_type 11 | self.dropout = self.create_dropout() 12 | 13 | def create_dropout(self): 14 | if self.dropout_type == "bernoulli": 15 | return nn.Dropout(self.keep_prob) 16 | elif self.dropout_type == "gaussian": 17 | return GaussianDropout(prob=self.keep_prob) 18 | elif self.dropout_type == "variational": 19 | return VariationalDropout(prob=self.keep_prob, dimension=self.dimension) 20 | -------------------------------------------------------------------------------- /dropout_models/gaussian_dropout.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | 4 | 5 | class GaussianDropout(nn.Module): 6 | def __init__(self, prob): 7 | super(GaussianDropout, self).__init__() 8 | if 0 < prob <= 0.5: 9 | self.alpha = torch.Tensor([prob / (1.0 - prob)]) 10 | else: 11 | prob = 0.5 12 | self.alpha = torch.Tensor([prob / (1.0 - prob)]) 13 | 14 | def forward(self, x): 15 | if self.train(): 16 | # Epsilon ~ N(1, alpha) 17 | epsilon = torch.randn(x.size()) * self.alpha + 1 18 | if x.is_cuda: 19 | epsilon = epsilon.cuda() 20 | # Local reparametrization trick: x_i = ^x_i * epsilon_i 21 | return x * epsilon 22 | else: 23 | return x 24 | -------------------------------------------------------------------------------- /dropout_models/variational_dropout.py: -------------------------------------------------------------------------------- 1 | import logging.config 2 | 3 | import torch 4 | import torch.nn as nn 5 | from torch.autograd import Variable 6 | 7 | logging.config.fileConfig(fname='./config/config.logger', disable_existing_loggers=False) 8 | logger = logging.getLogger("Dropout") 9 | 10 | class VariationalDropout(nn.Module): 11 | def __init__(self, prob, dimension=None): 12 | super(VariationalDropout, self).__init__() 13 | 14 | self.dimension = dimension 15 | 16 | alpha = 1.0 17 | if prob <= 0.5: 18 | alpha = prob / (1 - prob) 19 | else: 20 | logger.warning("Caution! With the current alpha value ({}), you may trapped in local optima!".format(prob)) 21 | logger.warning("It is suggested that probability value should be <= 0.5") 22 | alpha = prob / (1 - 0.49) 23 | self.max_alpha = alpha 24 | 25 | log_alpha = torch.log(torch.ones(dimension) * alpha) 26 | self.log_alpha = nn.Parameter(log_alpha) 27 | 28 | self.c = [1.16145124, -1.50204118, 0.58629921] 29 | 30 | def kl(self): 31 | alpha = torch.exp(self.log_alpha) 32 | 33 | kl = -(0.5 * self.log_alpha + self.c[0] * alpha + self.c[1] * alpha ** 2 + self.c[2] * alpha ** 3) 34 | 35 | return torch.mean(kl) 36 | 37 | def forward(self, x): 38 | self.log_alpha.data = torch.clamp(self.log_alpha.data, max=self.max_alpha) 39 | kld = self.kl() 40 | 41 | if self.train(): 42 | # Epsilon ~ N(0, 1) 43 | epsilon = Variable(torch.randn(x.size())) 44 | if x.is_cuda: 45 | epsilon = epsilon.cuda() 46 | 47 | alpha = torch.exp(self.log_alpha) 48 | 49 | # Epsilon ~ N(1, alpha) 50 | epsilon = epsilon * alpha 51 | 52 | return x * epsilon, kld 53 | else: 54 | return x, kld 55 | -------------------------------------------------------------------------------- /evaluation/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hbahadirsahin/nlp-experiments-in-pytorch/5d05a53146dffd707e4d037230656f980d7be05c/evaluation/__init__.py -------------------------------------------------------------------------------- /evaluation/evaluator.py: -------------------------------------------------------------------------------- 1 | import logging.config 2 | 3 | from evaluation.interactive_evaluator import InteractiveEvaluator 4 | from evaluation.single_model_ner_evaluator import SingleModelNerEvaluator 5 | from evaluation.single_model_evaluator import SingleModelEvaluator 6 | 7 | logging.config.fileConfig(fname='./config/config.logger', disable_existing_loggers=False) 8 | logger = logging.getLogger("Evaluator") 9 | 10 | 11 | class Evaluator(object): 12 | @staticmethod 13 | def evaluator_factory(type, device): 14 | if type == "single_model_evaluator": 15 | logger.info("Evaluator type is %s", type) 16 | dev_evaluator = SingleModelEvaluator(device, is_vali=True) 17 | test_evaluator = SingleModelEvaluator(device, is_vali=False) 18 | return dev_evaluator, test_evaluator 19 | elif type == "single_model_ner_evaluator": 20 | logger.info("Evaluator type is %s", type) 21 | dev_evaluator = SingleModelNerEvaluator(device, is_vali=True) 22 | test_evaluator = SingleModelNerEvaluator(device, is_vali=False) 23 | return dev_evaluator, test_evaluator 24 | elif type == "interactive_evaluator": 25 | return InteractiveEvaluator(device) 26 | else: 27 | ValueError("Unrecognized trainer type") 28 | -------------------------------------------------------------------------------- /evaluation/interactive_evaluator.py: -------------------------------------------------------------------------------- 1 | import logging.config 2 | 3 | import pkg_resources 4 | 5 | try: 6 | pkg_resources.get_distribution("spacy") 7 | except pkg_resources.DistributionNotFound: 8 | print("Spacy has not been found! As sentence tokenizer .split() will be used!") 9 | HAS_SPACY = False 10 | else: 11 | import spacy 12 | 13 | HAS_SPACY = True 14 | import torch 15 | import torch.nn.functional as F 16 | 17 | from utils.utils import load_best_model, load_vocabulary 18 | 19 | logging.config.fileConfig(fname='./config/config.logger', disable_existing_loggers=False) 20 | logger = logging.getLogger("Evaluator") 21 | 22 | 23 | class InteractiveEvaluator(object): 24 | def __init__(self, device="cpu"): 25 | self.device = device 26 | 27 | def evaluate_interactive(self, model_path, sentence_vocab_path, category_vocab_path, preprocessor, topk): 28 | sentence_vocab = load_vocabulary(sentence_vocab_path) 29 | category_vocab = load_vocabulary(category_vocab_path) 30 | 31 | model = load_best_model(model_path) 32 | model = model.to(self.device) 33 | model.eval() 34 | 35 | with torch.no_grad(): 36 | while True: 37 | try: 38 | sentence = input("Enter a test sentence (Type q or quit to exit!):") 39 | except ValueError: 40 | logger.error("Invalid input. Try again! (Type q or quit to exit!)") 41 | continue 42 | 43 | if sentence.lower() != "q" and sentence.lower() != "quit": 44 | # Below 5 lines of weird looking code is for tokenizing a test input, correctly. 45 | # Obviously, sentence.split() does not work if the sentence has punctuations to tokenize. 46 | # Example: "a, b c." sentence should be tokenized as "a , b c . ". 47 | # That's why I added spacy tokenizer. And, lucky me it works for Turkish, too =) 48 | # Note that I added this tokenization to preprocessor; however, it takes too much time to prepare a 49 | # whole dataset in training process. Since the dataset I am using is already tokenized as it should be, 50 | # I wrote the below code to only evaluation process which is less shorter than my comment to explain 51 | # this situation =) 52 | if HAS_SPACY: 53 | nlp_tokenizer = spacy.load("en") 54 | doc = nlp_tokenizer(sentence.lower()) 55 | tokenized_sentence = [token.text for token in doc] 56 | preprocessed_sentence = preprocessor(tokenized_sentence) 57 | temp = nlp_tokenizer(" ".join(preprocessed_sentence)) 58 | preprocessed_sentence = [token.text for token in temp] 59 | else: 60 | preprocessed_sentence = preprocessor(sentence.lower().split()) 61 | 62 | indexed_test_sentence = [sentence_vocab.stoi[token] for token in preprocessed_sentence] 63 | 64 | tensored_test_sentence = torch.LongTensor(indexed_test_sentence).to(self.device) 65 | 66 | tensored_test_sentence = tensored_test_sentence.unsqueeze(1) 67 | 68 | logit = model(tensored_test_sentence) 69 | probs = F.softmax(logit, dim=1) 70 | 71 | predicted_category_probs, predicted_category_ids = probs.topk(topk, 1, True, True) 72 | 73 | predicted_category_ids = predicted_category_ids.t() 74 | 75 | predicted_labels = [] 76 | for idx in predicted_category_ids: 77 | predicted_labels.append(category_vocab.itos[idx]) 78 | 79 | if topk == 1: 80 | logger.info("Predicted category is {} with probability {}".format(predicted_labels[0], 81 | predicted_category_probs[0][ 82 | 0].item())) 83 | else: 84 | logger.info("Top-{} predicted labels are as follows in order:".format(topk)) 85 | for idx, label in enumerate(predicted_labels): 86 | logger.info("> {} - Predicted category is {} with probability {:.4f}".format(idx + 1, 87 | label, 88 | predicted_category_probs[ 89 | 0][ 90 | idx].item())) 91 | else: 92 | logger.info("Interactive evaluation ends!") 93 | break 94 | -------------------------------------------------------------------------------- /evaluation/multiple_model_evaluator.py: -------------------------------------------------------------------------------- 1 | import logging.config 2 | 3 | import torch 4 | 5 | from utils.utils import calculate_accuracy, calculate_topk_accuracy, load_best_model 6 | 7 | logging.config.fileConfig(fname='./config/config.logger', disable_existing_loggers=False) 8 | logger = logging.getLogger("Evaluator") 9 | 10 | class MultipleModelEvaluator(object): 11 | def __init__(self, device, is_vali): 12 | self.device = device 13 | self.is_vali = is_vali 14 | 15 | def evaluate_iter(self, encoder, decoder, classifier, input, reconst_criterion, supervised_criterion, save_path, 16 | topk): 17 | total_loss = 0 18 | total_acc = 0 19 | total_acc_topk = 0 20 | 21 | if not self.is_vali: 22 | logger.info("Test mode!") 23 | encoder = load_best_model(save_path, filename="saved_best_encoder") 24 | decoder = load_best_model(save_path, filename="saved_best_decoder") 25 | classifier = load_best_model(save_path, filename="saved_best_classifier") 26 | else: 27 | logger.info("Validation mode!") 28 | 29 | encoder.eval() 30 | decoder.eval() 31 | classifier.eval() 32 | 33 | with torch.no_grad(): 34 | for batch in input: 35 | batch_x = batch.sentence.to(self.device) 36 | batch_y = batch.category_labels.to(self.device) 37 | 38 | hidden = encoder(batch_x) 39 | reconstruction_probs = decoder(hidden) 40 | supervised_predictions = classifier(hidden.squeeze()) 41 | 42 | reconstruction_loss = reconst_criterion(reconstruction_probs, batch_x) 43 | supervised_loss = supervised_criterion(supervised_predictions, batch_y) 44 | 45 | accuracy = calculate_accuracy(supervised_predictions, batch_y) 46 | accuracy_topk = calculate_topk_accuracy(supervised_predictions, batch_y, topk=topk) 47 | 48 | total_loss = total_loss + reconstruction_loss.item() + supervised_loss.item() 49 | total_acc += accuracy 50 | total_acc_topk += accuracy_topk[0].item() 51 | 52 | torch.cuda.empty_cache() 53 | 54 | current_loss = total_loss / len(input) 55 | current_acc = total_acc / len(input) 56 | current_acc_topk = total_acc_topk / len(input) 57 | 58 | return current_loss, current_acc, current_acc_topk 59 | -------------------------------------------------------------------------------- /evaluation/single_model_evaluator.py: -------------------------------------------------------------------------------- 1 | import logging.config 2 | 3 | import torch 4 | 5 | from utils.utils import load_best_model 6 | 7 | logging.config.fileConfig(fname='./config/config.logger', disable_existing_loggers=False) 8 | logger = logging.getLogger("Evaluator") 9 | 10 | class SingleModelEvaluator(object): 11 | def __init__(self, device, is_vali): 12 | self.device = device 13 | self.is_vali = is_vali 14 | 15 | def evaluate_iter(self, model, input, criterion, save_path, scorer): 16 | total_loss = 0 17 | total_acc = 0 18 | total_acc_topk = 0 19 | 20 | if not self.is_vali: 21 | logger.info("Test mode!") 22 | model = load_best_model(save_path) 23 | else: 24 | logger.info("Validation mode!") 25 | model.eval() 26 | 27 | with torch.no_grad(): 28 | for batch in input: 29 | batch_x = batch.sentence.to(self.device) 30 | batch_y = batch.category_labels.to(self.device) 31 | 32 | predictions, _ = model(batch_x) 33 | 34 | loss = criterion(predictions, batch_y) 35 | accuracy = scorer.calculate_accuracy(predictions, batch_y) 36 | accuracy_topk = scorer.calculate_topk_accuracy(predictions, batch_y) 37 | 38 | total_loss += loss.item() 39 | total_acc += accuracy 40 | total_acc_topk += accuracy_topk[0].item() 41 | 42 | torch.cuda.empty_cache() 43 | 44 | current_loss = total_loss / len(input) 45 | current_acc = total_acc / len(input) 46 | current_acc_topk = total_acc_topk / len(input) 47 | 48 | return current_loss, current_acc, current_acc_topk 49 | -------------------------------------------------------------------------------- /evaluation/single_model_ner_evaluator.py: -------------------------------------------------------------------------------- 1 | import logging.config 2 | 3 | import torch 4 | 5 | from models.GRU import GRU 6 | from models.LSTM import LSTMBase 7 | from utils.utils import load_best_model 8 | 9 | logging.config.fileConfig(fname='./config/config.logger', disable_existing_loggers=False) 10 | logger = logging.getLogger("Evaluator") 11 | 12 | 13 | class SingleModelNerEvaluator(object): 14 | def __init__(self, device, is_vali): 15 | self.device = device 16 | self.is_vali = is_vali 17 | 18 | def evaluate_iter(self, model, input, save_path, scorer, detailed_ner_log=True): 19 | total_loss = 0 20 | macro_f1 = 0 21 | macro_precision = 0 22 | macro_recall = 0 23 | total_token_acc = 0 24 | 25 | if not self.is_vali: 26 | logger.info("Test mode!") 27 | model = load_best_model(save_path) 28 | else: 29 | logger.info("Validation mode!") 30 | model.eval() 31 | 32 | full_ground_truth_list = list() 33 | full_prediction_list = list() 34 | 35 | with torch.no_grad(): 36 | for batch in input: 37 | batch_x = batch.sentence.to(self.device) 38 | batch_y = batch.ner_labels.to(self.device) 39 | 40 | if isinstance(model, GRU) or isinstance(model, LSTMBase): 41 | model.hidden = model.init_hidden(batch_x.size(1)) 42 | 43 | pred_scores, predictions = model.decode(batch_x) 44 | 45 | batch_y = batch_y.permute(1, 0) 46 | 47 | scorer.token_level_accuracy(predictions, batch_y) 48 | 49 | full_ground_truth_list.extend(batch_y.tolist()) 50 | full_prediction_list.extend(predictions) 51 | 52 | token_level_accuracy = scorer.token_accuracy 53 | 54 | total_token_acc += token_level_accuracy 55 | 56 | torch.cuda.empty_cache() 57 | 58 | scorer.f1_score(full_prediction_list, full_ground_truth_list) 59 | macro_f1 = scorer.avg_macro_f1 60 | macro_precision = scorer.avg_macro_precision 61 | macro_recall = scorer.avg_macro_recall 62 | current_token_acc = total_token_acc / len(input) 63 | 64 | if detailed_ner_log: 65 | scorer.print_detailed_score_log() 66 | 67 | return macro_f1, macro_precision, macro_recall, current_token_acc 68 | -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | 3 | import argparse 4 | import datetime 5 | import json 6 | import logging.config 7 | import os 8 | 9 | import torch 10 | 11 | from datahelper.dataset_reader import DatasetLoader 12 | from datahelper.embedding_helper import OOVEmbeddingCreator 13 | from datahelper.preprocessor import Preprocessor 14 | from evaluation.evaluator import Evaluator 15 | from models.CNN import TextCnn, CharCNN, VDCNN, ConvDeconvCNN 16 | from models.GRU import GRU 17 | from models.LSTM import LSTM, LSTMCRF 18 | from models.Transformer import TransformerGoogle 19 | from training.trainer import Trainer 20 | from utils.utils import save_vocabulary 21 | 22 | os.environ['CUDA_LAUNCH_BLOCKING'] = "1" 23 | 24 | logging.config.fileConfig(fname='./config/config.logger', disable_existing_loggers=False) 25 | logger = logging.getLogger("Main") 26 | 27 | device = torch.device("cuda" if torch.cuda.is_available() else "cpu") 28 | 29 | 30 | def initialize_model_and_trainer(model_properties, training_properties, datasetloader, device): 31 | logger.info("Model type is %s", training_properties["learner"]) 32 | if training_properties["learner"] == "text_cnn": 33 | model = TextCnn(model_properties).to(device) 34 | trainer = Trainer.trainer_factory("single_model_trainer", training_properties, datasetloader, device) 35 | elif training_properties["learner"] == "gru": 36 | model = GRU(model_properties).to(device) 37 | trainer = Trainer.trainer_factory("single_model_trainer", training_properties, datasetloader, device) 38 | elif training_properties["learner"] == "lstm": 39 | model = LSTM(model_properties).to(device) 40 | trainer = Trainer.trainer_factory("single_model_trainer", training_properties, datasetloader, device) 41 | elif training_properties["learner"] == "char_cnn": 42 | model = CharCNN(model_properties).to(device) 43 | trainer = Trainer.trainer_factory("single_model_trainer", training_properties, datasetloader, device) 44 | elif training_properties["learner"] == "vdcnn": 45 | model = VDCNN(model_properties).to(device) 46 | trainer = Trainer.trainer_factory("single_model_trainer", training_properties, datasetloader, device) 47 | elif training_properties["learner"] == "conv_deconv_cnn": 48 | model = ConvDeconvCNN(model_properties) 49 | trainer = Trainer.trainer_factory("single_model_trainer", training_properties, datasetloader, device) 50 | elif training_properties["learner"] == "transformer_google": 51 | model = TransformerGoogle(model_properties).model.to(device) 52 | trainer = Trainer.trainer_factory("single_model_trainer", training_properties, datasetloader, device) 53 | elif training_properties["learner"] == "lstmcrf": 54 | assert training_properties["task"] == "ner" 55 | model = LSTMCRF(model_properties).to(device) 56 | trainer = Trainer.trainer_factory("single_model_ner_trainer", training_properties, datasetloader, device) 57 | else: 58 | raise ValueError("Model is not defined! Available learner values are : 'text_cnn', 'char_cnn', 'vdcnn', 'gru', " 59 | "'lstm', 'conv_deconv_cnn' and 'transformer_google'") 60 | 61 | return model, trainer 62 | 63 | 64 | if __name__ == '__main__': 65 | parser = argparse.ArgumentParser() 66 | 67 | parser.add_argument("-c", "--config", default="D:/PyTorchNLP/config/config.json", type=str, 68 | help="config.json path. Caution! Default path is hard-coded, local path.") 69 | 70 | args = parser.parse_args() 71 | 72 | config = json.load(open(args.config)) 73 | 74 | dataset_properties = config["dataset_properties"] 75 | model_properties = config["model_properties"] 76 | training_properties = config["training_properties"] 77 | evaluation_properties = config["evaluation_properties"] 78 | 79 | assert model_properties["common_model_properties"]["run_mode"] == "train" or \ 80 | model_properties["common_model_properties"]["run_mode"] == "eval_interactive" 81 | 82 | assert training_properties["task"] == "classification" or training_properties["task"] == "ner" 83 | 84 | logger.info("Initial device is %s", device) 85 | if "cuda" == device: 86 | torch.backends.cudnn.benchmark = True 87 | torch.backends.cudnn.fastest = True 88 | else: 89 | torch.set_num_threads(8) 90 | torch.backends.cudnn.enabled = False 91 | 92 | stop_word_path = dataset_properties["stop_word_path"] 93 | data_path = dataset_properties["data_path"] 94 | vector_cache = dataset_properties["vector_cache"] 95 | fasttext_model_path = dataset_properties["pretrained_embedding_path"] 96 | 97 | oov_embedding_type = dataset_properties["oov_embedding_type"] 98 | batch_size = dataset_properties["batch_size"] 99 | min_freq = dataset_properties["min_freq"] 100 | fix_length = dataset_properties["fixed_length"] 101 | 102 | embedding_vector = dataset_properties["embedding_vector"] 103 | 104 | training_task = training_properties["task"] 105 | 106 | save_dir = os.path.abspath(os.path.join(os.curdir, "saved", datetime.datetime.today().strftime('%Y-%m-%d'))) 107 | save_dir_vocab = os.path.abspath(os.path.join(os.curdir, "saved", "vocab")) 108 | if not os.path.isdir(save_dir): 109 | os.makedirs(save_dir) 110 | if not os.path.isdir(save_dir_vocab): 111 | os.makedirs(save_dir_vocab) 112 | logger.info("Saving directory for models is %s", save_dir) 113 | logger.info("Saving directory for vocabulary files is %s", save_dir_vocab) 114 | training_properties["save_path"] = save_dir 115 | 116 | level = "word" 117 | is_char_level = False 118 | if training_properties["learner"] == "charcnn" or training_properties["learner"] == "vdcnn": 119 | logger.info("Caution: Due to selected learning model, everything will be executed in character-level!") 120 | level = "char" 121 | is_char_level = True 122 | 123 | if training_task == "ner": 124 | stop_word_path = None 125 | 126 | logger.info("Initialize Preprocessor") 127 | preprocessor = Preprocessor(stop_word_path, 128 | is_remove_digit=True, 129 | is_remove_punctuations=False, 130 | is_char_level=is_char_level) 131 | 132 | if model_properties["common_model_properties"]["run_mode"] == "train": 133 | logger.info("Initialize OOVEmbeddingCreator") 134 | unkembedding = OOVEmbeddingCreator(type=oov_embedding_type, 135 | fasttext_model_path=fasttext_model_path) 136 | 137 | logger.info("Initialize DatasetLoader") 138 | datasetloader = DatasetLoader(data_path=data_path, 139 | vector=embedding_vector, 140 | preprocessor=preprocessor.preprocess, 141 | level=level, 142 | vector_cache=vector_cache, 143 | unk_init=unkembedding.create_oov_embedding, 144 | min_freq=min_freq, 145 | fix_length=fix_length, 146 | task=training_task 147 | ) 148 | 149 | logger.info("Loading train, validation and test sets") 150 | train, val, test = datasetloader.read_dataset(batch_size=batch_size) 151 | logger.info("Loading vocabularies") 152 | sentence_vocab = datasetloader.sentence_vocab 153 | category_vocab = datasetloader.category_vocab 154 | ner_vocab = datasetloader.ner_vocab 155 | logger.info("Loading embeddings") 156 | pretrained_embeddings = datasetloader.sentence_vocab_vectors 157 | logger.info("Updating properties") 158 | model_properties["common_model_properties"]["device"] = device 159 | 160 | if training_properties["learner"] == "charcnn": 161 | model_properties["common_model_properties"]["vocab_size"] = len(sentence_vocab) 162 | model_properties["common_model_properties"]["embed_dim"] = len(sentence_vocab) - 1 163 | elif training_properties["learner"] == "vdcnn": 164 | model_properties["common_model_properties"]["vocab_size"] = len(sentence_vocab) 165 | model_properties["common_model_properties"]["embed_dim"] = 16 166 | else: 167 | model_properties["common_model_properties"]["vocab_size"] = pretrained_embeddings.size()[0] 168 | model_properties["common_model_properties"]["embed_dim"] = pretrained_embeddings.size()[1] 169 | 170 | if category_vocab is not None: 171 | model_properties["common_model_properties"]["num_class"] = len(category_vocab) 172 | if ner_vocab is not None: 173 | model_properties["common_model_properties"]["ner_vocab"] = ner_vocab 174 | model_properties["common_model_properties"]["num_tags"] = len(ner_vocab) 175 | model_properties["common_model_properties"]["start_id"] = ner_vocab.stoi[""] 176 | model_properties["common_model_properties"]["end_id"] = ner_vocab.stoi[""] 177 | 178 | model_properties["common_model_properties"]["vocab"] = sentence_vocab 179 | model_properties["common_model_properties"]["padding_id"] = sentence_vocab.stoi[""] 180 | model_properties["common_model_properties"]["pretrained_weights"] = pretrained_embeddings 181 | model_properties["common_model_properties"]["batch_size"] = dataset_properties["batch_size"] 182 | 183 | logger.info("Saving vocabulary files") 184 | save_vocabulary(sentence_vocab, os.path.abspath(os.path.join(save_dir_vocab, "sentence_vocab.dat"))) 185 | if category_vocab is not None: 186 | save_vocabulary(category_vocab, os.path.abspath(os.path.join(save_dir_vocab, "category_vocab.dat"))) 187 | if ner_vocab is not None: 188 | save_vocabulary(ner_vocab, os.path.abspath(os.path.join(save_dir_vocab, "ner_vocab.dat"))) 189 | 190 | logger.info("Initialize model and trainer") 191 | model, trainer = initialize_model_and_trainer(model_properties, training_properties, datasetloader, device) 192 | 193 | if dataset_properties["checkpoint_path"] is None or dataset_properties["checkpoint_path"] == "": 194 | logger.info("Train process is starting from scratch!") 195 | trainer.train_iters(model) 196 | else: 197 | checkpoint = torch.load(dataset_properties["checkpoint_path"]) 198 | logger.info("Train process is reloading from epoch {}".format(checkpoint["epoch"])) 199 | trainer.train_iters(model, checkpoint) 200 | 201 | elif model_properties["common_model_properties"]["run_mode"] == "eval_interactive": 202 | interactive_evaluator = Evaluator.evaluator_factory("interactive_evaluator", "cpu") 203 | 204 | model_path = evaluation_properties["model_path"] 205 | sentence_vocab_path = evaluation_properties["sentence_vocab"] 206 | category_vocab_path = evaluation_properties["category_vocab"] 207 | 208 | logger.info("Interactive evaluation mode for model {}:".format(model_path)) 209 | 210 | interactive_evaluator.evaluate_interactive(model_path=model_path, 211 | sentence_vocab_path=sentence_vocab_path, 212 | category_vocab_path=category_vocab_path, 213 | preprocessor=preprocessor.preprocess, 214 | topk=training_properties["topk"]) 215 | logger.info("Done!") 216 | -------------------------------------------------------------------------------- /models/CNN.py: -------------------------------------------------------------------------------- 1 | import logging.config 2 | 3 | import torch 4 | import torch.nn as nn 5 | import torch.nn.functional as F 6 | 7 | from Util_CNN import KMaxPooling, LayerBlock, ConvolutionEncoder, DeconvolutionDecoder, FullyConnectedClassifier 8 | from dropout_models.dropout import Dropout 9 | 10 | logging.config.fileConfig(fname='./config/config.logger', disable_existing_loggers=False) 11 | logger = logging.getLogger("CNN") 12 | 13 | 14 | class TextCnn(nn.Module): 15 | def __init__(self, args): 16 | super(TextCnn, self).__init__() 17 | self.args_common = args["common_model_properties"] 18 | self.args_specific = args["text_cnn"] 19 | 20 | self.vocab = self.args_common["vocab"] 21 | 22 | # Device 23 | self.device = self.args_common["device"] 24 | 25 | # Input/Output dimensions 26 | self.embed_num = self.args_common["vocab_size"] 27 | self.embed_dim = self.args_common["embed_dim"] 28 | self.num_class = self.args_common["num_class"] 29 | 30 | # Embedding parameters 31 | self.padding_id = self.args_common["padding_id"] 32 | 33 | # Condition parameters 34 | self.use_pretrained_embed = self.args_common["use_pretrained_embed"] 35 | self.embed_train_type = self.args_common["embed_train_type"] 36 | self.use_padded_conv = self.args_specific["use_padded_conv"] 37 | self.use_batch_norm = self.args_common["use_batch_norm"] 38 | 39 | # Pretrained embedding weights 40 | self.pretrained_weights = self.args_common["pretrained_weights"] 41 | 42 | # Dropout type 43 | self.dropout_type = self.args_specific["dropout_type"] 44 | 45 | # Dropout probabilities 46 | keep_prob = self.args_specific["keep_prob"] 47 | 48 | # Batch normalization parameters 49 | self.batch_norm_momentum = self.args_common["batch_norm_momentum"] 50 | self.batch_norm_affine = self.args_common["batch_norm_affine"] 51 | 52 | # Convolution parameters 53 | self.input_channel = 1 54 | self.filter_count = self.args_specific["filter_count"] 55 | self.filter_sizes = self.args_specific["filter_sizes"] 56 | 57 | # Embedding Layer Initialization 58 | if self.embed_train_type == "multichannel": 59 | self.embed, self.embed_static = self.initialize_embeddings() 60 | self.embed = self.embed.cpu() 61 | self.embed_static = self.embed_static.cpu() 62 | else: 63 | self.embed, _ = self.initialize_embeddings() 64 | self.embed = self.embed.cpu() 65 | 66 | # Convolution Initialization 67 | self.convs = self.initialize_conv_layer() 68 | 69 | # Initialize convolution weights 70 | self.initialize_weights() 71 | 72 | # Flatten conv layers' output 73 | num_flatten_feature = len(self.filter_sizes) * self.filter_count 74 | 75 | # Batch Normalization initialization 76 | if self.use_batch_norm: 77 | logger.info("> Batch Normalization") 78 | self.initialize_batch_normalization(num_flatten_feature) 79 | 80 | # Dropout initialization 81 | if self.dropout_type == "bernoulli" or self.dropout_type == "gaussian": 82 | logger.info("> Dropout - %s", self.dropout_type) 83 | self.dropout = Dropout(keep_prob=keep_prob, dimension=None, dropout_type=self.dropout_type).dropout 84 | elif self.dropout_type == "variational": 85 | logger.info("> Dropout - %s", self.dropout_type) 86 | self.dropout_before_flatten = Dropout(keep_prob=0.2, dimension=num_flatten_feature, 87 | dropout_type=self.dropout_type).dropout 88 | self.dropout_fc1 = Dropout(keep_prob=keep_prob, dimension=num_flatten_feature // 2, 89 | dropout_type=self.dropout_type).dropout 90 | else: 91 | logger.info("> Dropout - Bernoulli (You provide undefined dropout type!)") 92 | self.dropout = Dropout(keep_prob=keep_prob, dimension=None, dropout_type="bernoulli").dropout 93 | 94 | # Fully Connected Layer 1 initialization 95 | self.fc1 = nn.Linear(in_features=num_flatten_feature, 96 | out_features=num_flatten_feature // 2, 97 | bias=True) 98 | 99 | # Fully Connected Layer 2 initialization 100 | self.fc2 = nn.Linear(in_features=num_flatten_feature // 2, 101 | out_features=self.num_class, 102 | bias=True) 103 | 104 | def initialize_embeddings(self): 105 | logger.info("> Embeddings") 106 | embed = nn.Embedding(num_embeddings=self.embed_num, 107 | embedding_dim=self.embed_dim, 108 | padding_idx=self.padding_id) 109 | 110 | embed_static = None 111 | # Create 2nd embedding layer for multichannel purpose 112 | if self.embed_train_type == "multichannel": 113 | embed_static = nn.Embedding(num_embeddings=self.embed_num, 114 | embedding_dim=self.embed_dim, 115 | padding_idx=self.padding_id) 116 | 117 | if self.use_pretrained_embed: 118 | logger.info("> Pre-trained Embeddings") 119 | embed.from_pretrained(self.pretrained_weights) 120 | if self.embed_train_type == "multichannel": 121 | embed_static.from_pretrained(self.pretrained_weights) 122 | else: 123 | logger.info("> Random Embeddings") 124 | random_embedding_weights = torch.rand(self.embed_num, self.embed_dim) 125 | embed.from_pretrained(random_embedding_weights) 126 | if self.embed_train_type == "multichannel": 127 | embed_static.from_pretrained(random_embedding_weights) 128 | 129 | if self.embed_train_type == "static": 130 | logger.info("> Static Embeddings") 131 | embed.weight.requires_grad = False 132 | elif self.embed_train_type == "nonstatic": 133 | logger.info("> Non-Static Embeddings") 134 | embed.weight.requires_grad = True 135 | elif self.embed_train_type == "multichannel": 136 | embed.weight.requires_grad = True 137 | embed_static.weight.requires_grad = False 138 | else: 139 | raise ValueError("Embedding train type can be (1) static, (2) nonstatic or (3) multichannel") 140 | return embed, embed_static 141 | 142 | def initialize_conv_layer(self): 143 | if self.use_padded_conv: 144 | logger.info("> Padded convolution") 145 | return nn.ModuleList([nn.Conv2d(in_channels=self.input_channel, 146 | out_channels=self.filter_count, 147 | kernel_size=(filter_size, self.embed_dim), 148 | stride=(1, 1), 149 | padding=(filter_size // 2, 0), 150 | bias=True) for filter_size in self.filter_sizes]) 151 | else: 152 | logger.info("> Without-pad convolution") 153 | return nn.ModuleList([nn.Conv2d(in_channels=self.input_channel, 154 | out_channels=self.filter_count, 155 | kernel_size=(filter_size, self.embed_dim), 156 | bias=True) for filter_size in self.filter_sizes]) 157 | 158 | def initialize_weights(self): 159 | for conv in self.convs: 160 | nn.init.xavier_normal_(conv.weight) 161 | conv.bias.data.fill_(0.01) 162 | 163 | def initialize_batch_normalization(self, num_flatten_feature): 164 | self.convs_bn = nn.BatchNorm2d(num_features=self.filter_count, 165 | momentum=self.batch_norm_momentum, 166 | affine=self.batch_norm_affine) 167 | self.fc1_bn = nn.BatchNorm1d(num_features=num_flatten_feature // 2, 168 | momentum=self.batch_norm_momentum, 169 | affine=self.batch_norm_affine) 170 | self.fc2_bn = nn.BatchNorm1d(num_features=self.num_class, 171 | momentum=self.batch_norm_momentum, 172 | affine=self.batch_norm_affine) 173 | 174 | def forward(self, x): 175 | kl_loss = torch.Tensor([0.0]) 176 | # Input shape: [sentence_length, batch_size] 177 | x = x.permute(1, 0) 178 | # X shape: [batch_size, sentence_length] 179 | x = self.embed(x) 180 | if self.embed_train_type == "multichannel": 181 | x_static = self.embed_static(x) 182 | x = torch.stack[(x_static, x), 1] 183 | if "cuda" in str(self.device): 184 | x = x.cuda() 185 | kl_loss = kl_loss.cuda() 186 | # X shape: [batch_size, sentence_length, embedding_dim] 187 | x = x.unsqueeze(1) 188 | # X shape: [batch_size, 1, sentence_length, embedding_dim] 189 | if self.use_batch_norm: 190 | x = [self.convs_bn(F.relu(conv(x))).squeeze(3) for conv in self.convs] 191 | else: 192 | x = [F.relu(conv(x)).squeeze(3) for conv in self.convs] 193 | # X[i] shape: [batch_size, filter_count, sentence_length - filter_size[i]] 194 | x = [F.max_pool1d(conv, conv.size(2)).squeeze(2) for conv in x] 195 | # X[i] shape: [batch_size, filter_count] 196 | if self.dropout_type == "variational": 197 | x, kld = self.dropout_before_flatten(torch.cat(x, dim=1)) 198 | kl_loss += kld.sum() 199 | else: 200 | x = self.dropout(torch.cat(x, dim=1)) 201 | # Fully Connected Layers 202 | if self.use_batch_norm: 203 | if self.dropout_type == "variational": 204 | x, kld = self.dropout_fc1(self.fc1_bn(F.relu(self.fc1(x)))) 205 | kl_loss += kld.sum() 206 | else: 207 | x = self.dropout(self.fc1_bn(F.relu(self.fc1(x)))) 208 | x = self.fc2_bn(self.fc2(x)) 209 | else: 210 | if self.dropout_type == "variational": 211 | x, kld = self.dropout_fc1(F.relu(self.fc1(x))) 212 | kl_loss += kld.sum() 213 | else: 214 | x = self.dropout(F.relu(self.fc1(x))) 215 | x = self.fc2(x) 216 | return x, kl_loss 217 | 218 | 219 | class CharCNN(nn.Module): 220 | def __init__(self, args): 221 | super(CharCNN, self).__init__() 222 | 223 | self.args_common = args["common_model_properties"] 224 | self.args_specific = args["char_cnn"] 225 | 226 | # Device 227 | self.device = self.args_common["device"] 228 | 229 | # Input/Output dimensions 230 | self.vocab_size = self.args_common["vocab_size"] 231 | self.embed_dim = self.args_common["embed_dim"] 232 | self.num_class = self.args_common["num_class"] 233 | 234 | # Embedding parameters 235 | self.padding_id = self.args_common["padding_id"] 236 | 237 | # Dropout type 238 | self.dropout_type = self.args_specific["dropout_type"] 239 | 240 | # Dropout probabilities 241 | self.keep_prob = self.args_specific["keep_prob"] 242 | 243 | # CharCNN specific parameters 244 | self.max_sequence_length = self.args_specific["max_sequence_length"] 245 | 246 | if self.args_specific["feature_size"] == "large": 247 | self.filter_count = 1024 248 | self.linear_unit_count = 2048 249 | elif self.args_specific["feature_size"] == "small": 250 | self.filter_count = 256 251 | self.linear_unit_count = 1024 252 | else: 253 | self.filter_count = self.args_specific["filter_count"] 254 | self.linear_unit_count = self.args_specific["linear_unit_count"] 255 | 256 | self.filter_sizes = self.args_specific["filter_sizes"] 257 | self.max_pool_kernels = self.args_specific["max_pool_kernels"] 258 | 259 | # Embedding initialization 260 | # As the original CharCNN paper, I initialized char embeddings as one-hot vector. 261 | self.embedding = nn.Embedding(self.vocab_size, self.embed_dim, padding_idx=self.padding_id).cpu() 262 | self.embedding.weight.data = torch.eye(self.vocab_size, self.embed_dim) 263 | self.embedding.weight.reqiures_grad = False 264 | 265 | # Convolution Layer 1 266 | self.conv1 = nn.Conv1d(in_channels=self.embed_dim, out_channels=self.filter_count, 267 | kernel_size=self.filter_sizes[0]) 268 | self.pool1 = nn.MaxPool1d(kernel_size=self.max_pool_kernels[0]) 269 | 270 | # Convolution Layer 2 271 | self.conv2 = nn.Conv1d(in_channels=self.filter_count, out_channels=self.filter_count, 272 | kernel_size=self.filter_sizes[1]) 273 | self.pool2 = nn.MaxPool1d(kernel_size=self.max_pool_kernels[1]) 274 | 275 | # Convolution Layer 3 276 | self.conv3 = nn.Conv1d(in_channels=self.filter_count, out_channels=self.filter_count, 277 | kernel_size=self.filter_sizes[2]) 278 | 279 | # Convolution Layer 4 280 | self.conv4 = nn.Conv1d(in_channels=self.filter_count, out_channels=self.filter_count, 281 | kernel_size=self.filter_sizes[3]) 282 | 283 | # Convolution Layer 5 284 | self.conv5 = nn.Conv1d(in_channels=self.filter_count, out_channels=self.filter_count, 285 | kernel_size=self.filter_sizes[4]) 286 | 287 | # Convolution Layer 6 288 | self.conv6 = nn.Conv1d(in_channels=self.filter_count, out_channels=self.filter_count, 289 | kernel_size=self.filter_sizes[5]) 290 | self.pool3 = nn.MaxPool1d(kernel_size=self.max_pool_kernels[2]) 291 | 292 | # Activation 293 | self.relu = nn.ReLU() 294 | 295 | # Number of features after convolution blocks 296 | num_features = (self.max_sequence_length - 96) // 27 * self.filter_count 297 | 298 | self.initialize_dropout(num_features) 299 | 300 | # Linear Block 1 301 | self.linear1 = nn.Linear(num_features, self.linear_unit_count) 302 | 303 | # Linear Block 2 304 | self.linear2 = nn.Linear(self.linear_unit_count, self.linear_unit_count) 305 | 306 | # Linear Block 3 307 | self.linear3 = nn.Linear(self.linear_unit_count, self.num_class) 308 | 309 | def initialize_dropout(self, num_features): 310 | # Dropout initialization 311 | if self.dropout_type == "bernoulli" or self.dropout_type == "gaussian": 312 | logger.info("> Dropout - %s", self.dropout_type) 313 | self.dropout = Dropout(keep_prob=self.keep_prob, dimension=None, dropout_type=self.dropout_type).dropout 314 | elif self.dropout_type == "variational": 315 | logger.info("> Dropout - %s", self.dropout_type) 316 | self.dropout = Dropout(keep_prob=self.keep_prob, dimension=num_features, 317 | dropout_type=self.dropout_type).dropout 318 | else: 319 | logger.info("> Dropout - Bernoulli (You provide undefined dropout type!)") 320 | self.dropout = Dropout(keep_prob=self.keep_prob, dimension=None, dropout_type="bernoulli").dropout 321 | 322 | def forward(self, batch): 323 | kl_loss = torch.Tensor([0.0]) 324 | # Get batch size to beginning 325 | x = batch.permute(1, 0) 326 | # Embedding magic 327 | x = self.embedding(x) 328 | x = x.permute(0, 2, 1) 329 | if "cuda" in str(self.device): 330 | x = x.cuda() 331 | kl_loss = kl_loss.cuda() 332 | # To Convolution-Pooling 333 | x = self.pool1(self.relu(self.conv1(x))) 334 | x = self.pool2(self.relu(self.conv2(x))) 335 | x = self.relu(self.conv3(x)) 336 | x = self.relu(self.conv4(x)) 337 | x = self.relu(self.conv5(x)) 338 | x = self.pool3(self.relu(self.conv6(x))) 339 | # Flatten 340 | x = x.view(x.size(0), -1) 341 | # To Linear 342 | if self.dropout_type == "variational": 343 | x, kld = self.dropout(self.relu(self.linear1(x))) 344 | kl_loss += kld.sum() 345 | else: 346 | x = self.dropout(self.relu(self.linear1(x))) 347 | if self.dropout_type == "variational": 348 | x, kld = self.dropout(self.relu(self.linear2(x))) 349 | kl_loss += kld.sum() 350 | else: 351 | x = self.dropout(self.relu(self.linear2(x))) 352 | x = self.linear3(x) 353 | 354 | return x, kl_loss 355 | 356 | 357 | class VDCNN(nn.Module): 358 | def __init__(self, args): 359 | super(VDCNN, self).__init__() 360 | 361 | self.args_common = args["common_model_properties"] 362 | self.args_specific = args["vdcnn"] 363 | 364 | # Device 365 | self.device = self.args_common["device"] 366 | 367 | # Input/Output dimensions 368 | self.vocab_size = self.args_common["vocab_size"] 369 | self.embed_dim = self.args_common["embed_dim"] 370 | self.num_class = self.args_common["num_class"] 371 | 372 | # Embedding parameters 373 | self.padding_id = self.args_common["padding_id"] 374 | 375 | # Condition parameters 376 | self.use_pretrained_embed = self.args_common["use_pretrained_embed"] 377 | self.use_shortcut = self.args_specific["use_shortcut"] 378 | 379 | # Pretrained embedding weights 380 | self.pretrained_weights = self.args_common["pretrained_weights"] 381 | 382 | # Dropout probabilities 383 | self.keep_prob = self.args_specific["keep_prob"] 384 | # Dropout type 385 | self.dropout_type = nn.Dropout(self.keep_prob) 386 | 387 | # Batch normalization parameters 388 | self.batch_norm_momentum = self.args_common["batch_norm_momentum"] 389 | self.batch_norm_affine = self.args_common["batch_norm_affine"] 390 | 391 | # Convolution parameters 392 | self.depth = self.args_specific["depth"] 393 | assert self.depth in [9, 17, 29, 49] 394 | self.filter_counts = self.args_specific["filter_counts"] 395 | self.filter_size = self.args_specific["filter_size"] 396 | 397 | # Downsampling parameters 398 | self.downsampling_type = self.args_specific["downsampling_type"] 399 | self.maxpool_filter_size = self.args_specific["maxpool_filter_size"] 400 | self.k = self.args_specific["kmax"] 401 | 402 | self.embedding = nn.Embedding(self.vocab_size, self.embed_dim, padding_idx=self.padding_id).cpu() 403 | 404 | number_of_layers = self.initialize_number_of_layers() 405 | layers = nn.ModuleList() 406 | 407 | first_conv_layer = nn.Conv1d(in_channels=self.embed_dim, 408 | out_channels=self.filter_count[0], 409 | kernel_size=self.filter_size, 410 | padding=1) 411 | layers.append(first_conv_layer) 412 | 413 | # Add second convolution layer block where input_size is self.filter_count[0], output_size is self.filter_count[0] 414 | for n in range(number_of_layers[0]): 415 | layers.append(LayerBlock(input_channel_size=self.filter_count[0], 416 | filter_count=self.filter_count[0], 417 | conv_filter_size=self.filter_size, 418 | maxpool_filter_size=self.maxpool_filter_size, 419 | kmax_k=self.k, 420 | downsample_type=self.downsampling_type, 421 | use_shortcut=self.use_shortcut)) 422 | 423 | # Add third convolution layer block where input_size is self.filter_count[0], output_size is self.filter_count[1] 424 | layers.append(LayerBlock(input_channel_size=self.filter_count[0], 425 | filter_count=self.filter_count[1], 426 | conv_filter_size=self.filter_size, 427 | maxpool_filter_size=self.maxpool_filter_size, 428 | kmax_k=self.k, 429 | downsample_type=self.downsampling_type, 430 | downsample=True, 431 | use_shortcut=self.use_shortcut)) 432 | for n in range(number_of_layers[1] - 1): 433 | layers.append(LayerBlock(input_channel_size=self.filter_count[1], 434 | filter_count=self.filter_count[1], 435 | conv_filter_size=self.filter_size, 436 | maxpool_filter_size=self.maxpool_filter_size, 437 | kmax_k=self.k, 438 | downsample_type=self.downsampling_type, 439 | use_shortcut=self.use_shortcut)) 440 | 441 | # Add fourth convolution layer block where input_size is self.filter_count[1], output_size is self.filter_count[2] 442 | layers.append(LayerBlock(input_channel_size=self.filter_count[1], 443 | filter_count=self.filter_count[2], 444 | conv_filter_size=self.filter_size, 445 | maxpool_filter_size=self.maxpool_filter_size, 446 | kmax_k=self.k, 447 | downsample_type=self.downsampling_type, 448 | downsample=True, 449 | use_shortcut=self.use_shortcut)) 450 | for n in range(number_of_layers[2] - 1): 451 | layers.append(LayerBlock(input_channel_size=self.filter_count[2], 452 | filter_count=self.filter_count[2], 453 | conv_filter_size=self.filter_size, 454 | maxpool_filter_size=self.maxpool_filter_size, 455 | kmax_k=self.k, 456 | downsample_type=self.downsampling_type, 457 | use_shortcut=self.use_shortcut)) 458 | 459 | # Add fifth convolution layer block where input_size is self.filter_count[2], output_size is self.filter_count[3] 460 | layers.append(LayerBlock(input_channel_size=self.filter_count[2], 461 | filter_count=self.filter_count[3], 462 | conv_filter_size=self.filter_size, 463 | maxpool_filter_size=self.maxpool_filter_size, 464 | kmax_k=self.k, 465 | downsample_type=self.downsampling_type, 466 | downsample=True, 467 | use_shortcut=self.use_shortcut)) 468 | for n in range(number_of_layers[2] - 1): 469 | layers.append(LayerBlock(input_channel_size=self.filter_count[3], 470 | filter_count=self.filter_count[3], 471 | conv_filter_size=self.filter_size, 472 | maxpool_filter_size=self.maxpool_filter_size, 473 | kmax_k=self.k, 474 | downsample_type=self.downsampling_type, 475 | use_shortcut=self.use_shortcut)) 476 | 477 | self.all_conv_layers == nn.Sequential(*layers) 478 | self.kmax_pooling == KMaxPooling(k=self.k) 479 | 480 | self.relu = nn.ReLU() 481 | self.fc1 = nn.Linear(self.filter_counts[3] * self.k, 2048) 482 | self.fc2 = nn.Linear(2048, 2048) 483 | self.fc3 = nn.Linear(2048, self.num_class) 484 | 485 | def initialize_number_of_layers(self): 486 | if self.depth == 9: 487 | return [2] * 4 488 | elif self.depth == 17: 489 | return [4] * 4 490 | elif self.depth == 29: 491 | return [10, 10, 4, 4] 492 | elif self.depth == 49: 493 | return [16, 16, 10, 6] 494 | 495 | def forward(self, batch): 496 | kl_loss = torch.Tensor([0.0]) 497 | # Get batch size to beginning 498 | x = batch.permute(1, 0) 499 | # Embedding magic 500 | x = self.embedding(x) 501 | x = x.permute(0, 2, 1) 502 | if "cuda" in str(self.device): 503 | x = x.cuda() 504 | kl_loss = kl_loss.cuda() 505 | x = self.all_conv_layers(x) 506 | x = self.kmax_pooling(x) 507 | x = x.view(x.size(0), -1) 508 | x = self.relu(self.fc1(x)) 509 | x = self.relu(self.fc2(x)) 510 | x = self.fc3(x) 511 | return x, kl_loss 512 | 513 | 514 | class ConvDeconvCNN(nn.Module): 515 | def __init__(self, args): 516 | super(ConvDeconvCNN, self).__init__() 517 | 518 | self.args = args["common_model_properties"] 519 | 520 | # Input/Output dimensions 521 | self.vocab_size = self.args["vocab_size"] 522 | self.embed_dim = self.args["embed_dim"] 523 | 524 | # Embedding parameters 525 | self.padding_id = self.args["padding_id"] 526 | 527 | # Condition parameters 528 | self.use_pretrained_embed = self.args["use_pretrained_embed"] 529 | 530 | # Pretrained embedding weights 531 | self.pretrained_weights = self.args["pretrained_weights"] 532 | 533 | # Initialize embeddings 534 | self.embedding = nn.Embedding(self.vocab_size, self.embed_dim, padding_idx=self.padding_id).cpu() 535 | if self.use_pretrained_embed: 536 | logger.info("> Pre-trained Embeddings") 537 | self.embedding.from_pretrained(self.pretrained_weights) 538 | 539 | self.encoder = ConvolutionEncoder(args, self.embedding) 540 | self.decoder = DeconvolutionDecoder(args, self.embedding) 541 | self.classifier = FullyConnectedClassifier(args) 542 | 543 | def forward(self, x): 544 | return self.encoder(self.decoder(self.classifier(x))) 545 | -------------------------------------------------------------------------------- /models/GRU.py: -------------------------------------------------------------------------------- 1 | import logging.config 2 | 3 | import torch 4 | import torch.nn as nn 5 | import torch.nn.functional as F 6 | from torch.autograd import Variable 7 | 8 | from dropout_models.dropout import Dropout 9 | 10 | logging.config.fileConfig(fname='./config/config.logger', disable_existing_loggers=False) 11 | logger = logging.getLogger("GRU") 12 | 13 | 14 | class GRU(nn.Module): 15 | def __init__(self, args): 16 | super(GRU, self).__init__() 17 | self.args_common = args["common_model_properties"] 18 | self.args_specific = args["gru"] 19 | 20 | self.hidden_dim = self.args_specific["hidden_dim"] 21 | self.num_layers = self.args_specific["num_layers"] 22 | self.batch_size = self.args_common["batch_size"] 23 | 24 | self.vocab = self.args_common["vocab"] 25 | 26 | # Device 27 | self.device = self.args_common["device"] 28 | 29 | # Input/Output dimensions 30 | self.embed_num = self.args_common["vocab_size"] 31 | self.embed_dim = self.args_common["embed_dim"] 32 | self.num_class = self.args_common["num_class"] 33 | 34 | # Embedding parameters 35 | self.padding_id = self.args_common["padding_id"] 36 | 37 | # Condition parameters 38 | self.use_pretrained_embed = self.args_common["use_pretrained_embed"] 39 | self.embed_train_type = self.args_common["embed_train_type"] 40 | self.bidirectional = self.args_specific["bidirectional"] 41 | self.rnn_bias = self.args_specific["bias"] 42 | 43 | # Pretrained embedding weights 44 | self.pretrained_weights = self.args_common["pretrained_weights"] 45 | 46 | # Dropout type 47 | self.dropout_type = self.args_specific["dropout_type"] 48 | 49 | # Dropout probabilities 50 | self.keep_prob = self.args_specific["keep_prob"] 51 | 52 | self.embed = self.initialize_embeddings() 53 | 54 | # It is NOT the inner GRU dropout! 55 | self.dropout = self.initialize_dropout() 56 | 57 | self.gru = nn.GRU(self.embed_dim, 58 | self.hidden_dim, 59 | dropout=self.keep_prob, 60 | num_layers=self.num_layers, 61 | bidirectional=self.bidirectional, 62 | bias=self.rnn_bias) 63 | 64 | self.hidden = self.init_hidden() 65 | 66 | if self.bidirectional is True: 67 | self.h2o = nn.Linear(self.hidden_dim * 2, self.num_class) 68 | else: 69 | self.h2o = nn.Linear(self.hidden_dim, self.num_class) 70 | 71 | def init_hidden(self, batch_size): 72 | if self.bidirectional is True: 73 | return Variable(torch.zeros((1, batch_size, self.hidden_dim * 2))).to(self.device) 74 | else: 75 | return Variable(torch.zeros((1, batch_size, self.hidden_dim))).to(self.device) 76 | 77 | def initialize_embeddings(self): 78 | logger.info("> Embeddings") 79 | embed = nn.Embedding(num_embeddings=self.embed_num, 80 | embedding_dim=self.embed_dim, 81 | padding_idx=self.padding_id).cpu() 82 | if self.use_pretrained_embed: 83 | logger.info("> Pre-trained Embeddings") 84 | embed.from_pretrained(self.pretrained_weights) 85 | else: 86 | logger.info("> Random Embeddings") 87 | random_embedding_weights = torch.rand(self.embed_num, self.embed_dim) 88 | embed.from_pretrained(random_embedding_weights) 89 | 90 | if self.embed_train_type == "static": 91 | logger.info("> Static Embeddings") 92 | embed.weight.requires_grad = False 93 | elif self.embed_train_type == "nonstatic": 94 | logger.info("> Non-Static Embeddings") 95 | embed.weight.requires_grad = True 96 | return embed 97 | 98 | def initialize_dropout(self): 99 | if self.dropout_type == "bernoulli" or self.dropout_type == "gaussian": 100 | logger.info("> Dropout - %s", self.dropout_type) 101 | return Dropout(keep_prob=self.keep_prob, dimension=None, dropout_type=self.dropout_type).dropout 102 | elif self.dropout_type == "variational": 103 | logger.info("> Dropout - %s", self.dropout_type) 104 | return Dropout(keep_prob=self.keep_prob, dimension=self.hidden_dim, 105 | dropout_type=self.dropout_type).dropout 106 | else: 107 | logger.info("> Dropout - Bernoulli (You provide undefined dropout type!)") 108 | return Dropout(keep_prob=self.keep_prob, dimension=None, dropout_type="bernoulli").dropout 109 | 110 | def forward(self, batch): 111 | kl_loss = torch.Tensor([0.0]) 112 | 113 | x = self.embed(batch) 114 | x = self.dropout(x) 115 | x = x.view(len(x), self.batch_size, -1) 116 | 117 | if "cuda" in str(self.device): 118 | x = x.cuda() 119 | out, self.hidden = self.gru(x, self.hidden) 120 | out = torch.transpose(out, 0, 1) 121 | out = torch.transpose(out, 1, 2) 122 | 123 | out = F.max_pool1d(input=out, kernel_size=out.size(2)).squeeze(2) 124 | out = torch.tanh(out) 125 | 126 | out = self.h2o(out) 127 | out = F.log_softmax(out, dim=1) 128 | return out, kl_loss 129 | -------------------------------------------------------------------------------- /models/LSTM.py: -------------------------------------------------------------------------------- 1 | import logging.config 2 | 3 | import torch 4 | import torch.nn as nn 5 | import torch.nn.functional as F 6 | from torch.autograd import Variable 7 | 8 | from dropout_models.dropout import Dropout 9 | from crf.CRF import ConditionalRandomField 10 | 11 | logging.config.fileConfig(fname='./config/config.logger', disable_existing_loggers=False) 12 | logger = logging.getLogger("LSTM") 13 | 14 | 15 | class LSTMBase(nn.Module): 16 | def __init__(self, args): 17 | super(LSTMBase, self).__init__() 18 | self.args_common = args["common_model_properties"] 19 | self.args_specific = args["lstm"] 20 | 21 | self.hidden_dim = self.args_specific["hidden_dim"] 22 | self.num_layers = self.args_specific["num_layers"] 23 | self.batch_size = self.args_common["batch_size"] 24 | 25 | self.vocab = self.args_common["vocab"] 26 | 27 | # Device 28 | self.device = self.args_common["device"] 29 | 30 | # Input/Output dimensions 31 | self.embed_num = self.args_common["vocab_size"] 32 | self.embed_dim = self.args_common["embed_dim"] 33 | 34 | # Embedding parameters 35 | self.padding_id = self.args_common["padding_id"] 36 | 37 | # Condition parameters 38 | self.use_pretrained_embed = self.args_common["use_pretrained_embed"] 39 | self.embed_train_type = self.args_common["embed_train_type"] 40 | self.bidirectional = self.args_specific["bidirectional"] 41 | self.rnn_bias = self.args_specific["bias"] 42 | 43 | # Pretrained embedding weights 44 | self.pretrained_weights = self.args_common["pretrained_weights"] 45 | 46 | # Dropout type 47 | self.dropout_type = self.args_specific["dropout_type"] 48 | 49 | # Dropout probabilities 50 | self.keep_prob = self.args_specific["keep_prob"] 51 | 52 | self.embed = self.initialize_embeddings() 53 | 54 | # It is NOT the inner LSTM dropout! 55 | self.dropout = self.initialize_dropout() 56 | 57 | self.lstm = nn.LSTM(self.embed_dim, 58 | self.hidden_dim, 59 | dropout=self.keep_prob, 60 | num_layers=self.num_layers, 61 | bidirectional=self.bidirectional, 62 | bias=self.rnn_bias, 63 | batch_first=True) 64 | 65 | self.hidden = self.init_hidden(self.batch_size) 66 | 67 | def init_hidden(self, batch_size): 68 | if self.bidirectional is True: 69 | return (Variable(torch.zeros(self.num_layers * 2, batch_size, self.hidden_dim).to(self.device)), 70 | Variable(torch.zeros(self.num_layers * 2, batch_size, self.hidden_dim).to(self.device))) 71 | else: 72 | return (Variable(torch.zeros(self.num_layers, batch_size, self.hidden_dim).to(self.device)), 73 | Variable(torch.zeros(self.num_layers, batch_size, self.hidden_dim).to(self.device))) 74 | 75 | def initialize_embeddings(self): 76 | logger.info("> Embeddings") 77 | embed = nn.Embedding(num_embeddings=self.embed_num, 78 | embedding_dim=self.embed_dim, 79 | padding_idx=self.padding_id).cpu() 80 | if self.use_pretrained_embed: 81 | logger.info("> Pre-trained Embeddings") 82 | embed.from_pretrained(self.pretrained_weights) 83 | else: 84 | logger.info("> Random Embeddings") 85 | random_embedding_weights = torch.rand(self.embed_num, self.embed_dim) 86 | embed.from_pretrained(random_embedding_weights) 87 | 88 | if self.embed_train_type == "static": 89 | logger.info("> Static Embeddings") 90 | embed.weight.requires_grad = False 91 | elif self.embed_train_type == "nonstatic": 92 | logger.info("> Non-Static Embeddings") 93 | embed.weight.requires_grad = True 94 | return embed 95 | 96 | def initialize_dropout(self): 97 | if self.dropout_type == "bernoulli" or self.dropout_type == "gaussian": 98 | logger.info("> Dropout - %s", self.dropout_type) 99 | return Dropout(keep_prob=self.keep_prob, dimension=None, dropout_type=self.dropout_type).dropout 100 | elif self.dropout_type == "variational": 101 | logger.info("> Dropout - %s", self.dropout_type) 102 | return Dropout(keep_prob=self.keep_prob, dimension=self.hidden_dim, 103 | dropout_type=self.dropout_type).dropout 104 | else: 105 | logger.info("> Dropout - Bernoulli (You provide undefined dropout type!)") 106 | return Dropout(keep_prob=self.keep_prob, dimension=None, dropout_type="bernoulli").dropout 107 | 108 | 109 | class LSTM(LSTMBase): 110 | def __init__(self, args): 111 | super(LSTM, self).__init__(args) 112 | 113 | self.num_class = self.args_common["num_class"] 114 | 115 | if self.bidirectional is True: 116 | self.h2o = nn.Linear(self.hidden_dim * 2, self.num_class) 117 | else: 118 | self.h2o = nn.Linear(self.hidden_dim, self.num_class) 119 | 120 | def forward(self, batch): 121 | kl_loss = torch.Tensor([0.0]) 122 | 123 | x = self.embed(batch) 124 | x = self.dropout(x) 125 | x = x.view(len(x), self.batch_size, -1) 126 | 127 | if "cuda" in str(self.device): 128 | x = x.cuda() 129 | 130 | out, self.hidden = self.lstm(x, self.hidden) 131 | out = torch.transpose(out, 0, 1) 132 | out = torch.transpose(out, 1, 2) 133 | 134 | out = F.max_pool1d(input=out, kernel_size=out.size(2)).squeeze(2) 135 | out = torch.tanh(out) 136 | 137 | out = self.h2o(out) 138 | out = F.log_softmax(out, dim=1) 139 | return out, kl_loss 140 | 141 | 142 | class LSTMCRF(LSTMBase): 143 | def __init__(self, args): 144 | super(LSTMCRF, self).__init__(args) 145 | 146 | self.num_tags = self.args_common["num_tags"] 147 | 148 | if self.bidirectional is True: 149 | self.h2o = nn.Linear(self.hidden_dim * 2, self.num_tags) 150 | else: 151 | self.h2o = nn.Linear(self.hidden_dim, self.num_tags) 152 | 153 | self.crf = ConditionalRandomField(args).to(self.device) 154 | 155 | def forward(self, batch_x, batch_y): 156 | kl_loss = torch.Tensor([0.0]) 157 | 158 | x = batch_x.permute(1, 0) 159 | y = batch_y.permute(1, 0) 160 | 161 | mask = x.data.gt(1).float() 162 | input_lengths = mask.sum(1).int() 163 | 164 | x = self.embed(x) 165 | if "cuda" in str(self.device): 166 | x = x.cuda() 167 | 168 | x = nn.utils.rnn.pack_padded_sequence(x, input_lengths, batch_first=True) 169 | h, _ = self.lstm(x, self.hidden) 170 | h, _ = nn.utils.rnn.pad_packed_sequence(h, batch_first=True) 171 | h = self.h2o(h) 172 | h *= mask.unsqueeze(2) 173 | 174 | out = self.crf(h, y, mask) 175 | 176 | return out, kl_loss 177 | 178 | def decode(self, batch_x): 179 | x = batch_x.permute(1, 0) 180 | 181 | mask = x.data.gt(1).float() 182 | input_lengths = mask.sum(1).int() 183 | 184 | x = self.embed(x) 185 | if "cuda" in str(self.device): 186 | x = x.cuda() 187 | 188 | x = nn.utils.rnn.pack_padded_sequence(x, input_lengths, batch_first=True) 189 | h, _ = self.lstm(x, self.hidden) 190 | h, _ = nn.utils.rnn.pad_packed_sequence(h, batch_first=True) 191 | h = self.h2o(h) 192 | h *= mask.unsqueeze(2) 193 | 194 | return self.crf.viterbi_decode(h, mask) 195 | -------------------------------------------------------------------------------- /models/Transformer.py: -------------------------------------------------------------------------------- 1 | import copy 2 | import math 3 | 4 | import matplotlib.pyplot as plt 5 | import numpy as np 6 | import torch 7 | import torch.nn as nn 8 | import torch.nn.functional as F 9 | from torch.autograd import Variable 10 | 11 | from utils.utils import clones 12 | 13 | 14 | class LayerNormGoogle(nn.Module): 15 | def __init__(self, features, epsilon=1e-6): 16 | super(LayerNormGoogle, self).__init__() 17 | self.a_2 = nn.Parameter(torch.ones(features)) 18 | self.b_2 = nn.Parameter(torch.zeros(features)) 19 | self.epsilon = epsilon 20 | 21 | def forward(self, x): 22 | mean = x.mean(-1, keepdim=True) 23 | std = x.std(-1, keepdim=True) 24 | return self.a_2 * (x - mean) / (std + self.epsilon) + self.b_2 25 | 26 | 27 | class EncoderBlockGoogle(nn.Module): 28 | def __init__(self, layer, num_layers): 29 | super(EncoderBlockGoogle, self).__init__() 30 | self.layers = clones(layer, num_layers) 31 | self.norm = LayerNormGoogle(layer.size) 32 | 33 | def forward(self, x, mask): 34 | for layer in self.layers: 35 | x = layer(x, mask) 36 | return self.norm(x) 37 | 38 | 39 | class ResidualConnectionGoogle(nn.Module): 40 | def __init__(self, size, keep_prob): 41 | super(ResidualConnectionGoogle, self).__init__() 42 | self.norm = LayerNormGoogle(size) 43 | # TODO: Use dropout interface 44 | self.dropout = nn.Dropout(keep_prob) 45 | 46 | def forward(self, input, sublayer): 47 | return input + self.dropout(sublayer(self.norm(input))) 48 | 49 | 50 | class EncoderLayerGoogle(nn.Module): 51 | def __init__(self, size, attention, feed_forward, keep_prob): 52 | super(EncoderLayerGoogle, self).__init__() 53 | self.size = size 54 | self.attention = attention 55 | self.feed_forward = feed_forward 56 | # Each encoder layer has two sublayers 57 | self.sublayer = clones(ResidualConnectionGoogle(size, keep_prob), 2) 58 | 59 | def forward(self, x, mask): 60 | x = self.sublayer[0](x, lambda x: self.attention(x, x, x, mask)) 61 | return self.sublayer[1](x, self.feed_forward) 62 | 63 | 64 | class EncoderClassifier(nn.Module): 65 | def __init__(self, embedding, encoder, classifier, device, is_average=True): 66 | super(EncoderClassifier, self).__init__() 67 | self.embedding = embedding 68 | self.encoder = encoder 69 | self.classifier = classifier 70 | self.device = device 71 | self.is_average = is_average 72 | 73 | def forward(self, x, mask=None): 74 | kl_loss = torch.Tensor([0.0]) 75 | # Initial x.size() = [length, batch_size] 76 | x = x.permute(1, 0) 77 | # After permute x.size = [batch_size, length] 78 | x = self.embedding(x) 79 | if "cuda" in str(self.device): 80 | x = x.cuda() 81 | kl_loss = kl_loss.cuda() 82 | x = self.encoder(x, mask) 83 | if self.is_average: 84 | # Averaged sentence representation 85 | x = torch.mean(x, dim=1) 86 | x = self.classifier(x) 87 | return x, kl_loss 88 | 89 | 90 | class Classifier(nn.Module): 91 | def __init__(self, d_model, d_hidden, num_classes, keep_prob): 92 | super(Classifier, self).__init__() 93 | self.linear1 = nn.Linear(d_model, d_hidden) 94 | self.dropout = nn.Dropout(keep_prob) 95 | self.relu = nn.ReLU() 96 | self.linear2 = nn.Linear(d_hidden, num_classes) 97 | 98 | def forward(self, x): 99 | x = self.dropout(self.relu(self.linear1(x))) 100 | x = self.linear2(x) 101 | return x 102 | 103 | 104 | class MultiHeadedAttentionGoogle(nn.Module): 105 | def __init__(self, heads=8, d_model=512, keep_prob=0.1): 106 | super(MultiHeadedAttentionGoogle, self).__init__() 107 | assert d_model % heads == 0 108 | self.d_k = d_model // heads 109 | self.heads = heads 110 | self.linears = clones(nn.Linear(d_model, d_model), 4) 111 | self.attn = None 112 | self.dropout = nn.Dropout(keep_prob) 113 | 114 | def attention(self, query, key, value, mask=None): 115 | # Dot product attention 116 | d_k = query.size(-1) 117 | scores = torch.matmul(query, key.transpose(-2, -1)) / math.sqrt(d_k) 118 | 119 | if mask is not None: 120 | scores = scores.masked_fill(mask == 0, -1e9) 121 | 122 | p_attn = F.softmax(scores, dim=-1) 123 | 124 | if self.dropout is not None: 125 | p_attn = self.dropout(p_attn) 126 | 127 | return torch.matmul(p_attn, value), p_attn 128 | 129 | def forward(self, query, key, value, mask=None): 130 | num_batches = query.size(0) 131 | if mask is not None: 132 | mask = mask.unsqueeze(1) 133 | 134 | # Apply linear projection on the input sequence and split the heads. 135 | query, key, value = [linear(x).view(num_batches, -1, self.heads, self.d_k).transpose(1, 2) 136 | for linear, x in zip(self.linears, (query, key, value))] 137 | 138 | # Apply attention on the projected and splitted vectors 139 | x, self.attn = self.attention(query, key, value, mask=mask) 140 | 141 | # Concat vectors and apply linear 142 | x = x.transpose(1, 2).contiguous().view(num_batches, -1, self.heads * self.d_k) 143 | 144 | return self.linears[-1](x) 145 | 146 | 147 | class PositionalFeedForwardGoogle(nn.Module): 148 | def __init__(self, d_model, d_ff, keep_prob=0.1): 149 | super(PositionalFeedForwardGoogle, self).__init__() 150 | self.w_1 = nn.Linear(d_model, d_ff) 151 | self.w_2 = nn.Linear(d_ff, d_model) 152 | self.dropout = nn.Dropout(keep_prob) 153 | self.relu = nn.ReLU() 154 | 155 | def forward(self, input): 156 | return self.w_2(self.dropout(self.relu(self.w_1(input)))) 157 | 158 | 159 | class Embeddings(nn.Module): 160 | def __init__(self, embed_dim, vocab_size, padding_id, use_pretrained_embed, pretrained_weights, 161 | optional_sqrt_mul=False): 162 | super(Embeddings, self).__init__() 163 | # Initialize embeddings 164 | self.embedding = nn.Embedding(vocab_size, embed_dim, padding_idx=padding_id).cpu() 165 | if use_pretrained_embed: 166 | self.embedding.from_pretrained(pretrained_weights) 167 | self.embed_dim = embed_dim 168 | self.optional_sqrt_mul = optional_sqrt_mul 169 | 170 | def forward(self, input): 171 | if self.optional_sqrt_mul: 172 | return self.embedding(input) * math.sqrt(self.embed_dim) 173 | else: 174 | return self.embedding(input) 175 | 176 | 177 | class PositionalEncodingGoogle(nn.Module): 178 | def __init__(self, d_model, keep_prob=0.1, max_len=5000): 179 | super(PositionalEncodingGoogle, self).__init__() 180 | self.dropout = nn.Dropout(keep_prob) 181 | 182 | positional_encoding = torch.zeros(max_len, d_model) 183 | pos = torch.arange(0., max_len).unsqueeze(1) 184 | # Log space 185 | div_term = torch.exp(torch.arange(0., d_model, 2) * (-math.log(10000) / d_model)) 186 | 187 | positional_encoding[:, 0::2] = torch.sin(pos * div_term) 188 | positional_encoding[:, 1::2] = torch.cos(pos * div_term) 189 | 190 | positional_encoding = positional_encoding.unsqueeze(0) 191 | self.register_buffer("pe", positional_encoding) 192 | 193 | def forward(self, input): 194 | return self.dropout(input + Variable(self.pe[:, :input.size(1)], requires_grad=False)) 195 | 196 | 197 | class TransformerGoogle: 198 | def __init__(self, args): 199 | super(TransformerGoogle, self).__init__() 200 | 201 | self.args_common = args["common_model_properties"] 202 | self.args_specific = args["transformer_google"] 203 | 204 | # Device 205 | self.device = self.args_common["device"] 206 | 207 | # Input/Output dimensions 208 | self.vocab_size = self.args_common["vocab_size"] 209 | self.embed_dim = self.args_common["embed_dim"] 210 | self.num_class = self.args_common["num_class"] 211 | 212 | # Embedding parameters 213 | self.padding_id = self.args_common["padding_id"] 214 | 215 | # Condition parameters 216 | self.use_pretrained_embed = self.args_common["use_pretrained_embed"] 217 | self.use_embed_sqrt_mul = self.args_specific["use_embed_sqrt_mul"] 218 | 219 | # Pretrained embedding weights 220 | self.pretrained_weights = self.args_common["pretrained_weights"] 221 | 222 | # Dropout probabilities for each individual part of the full model. 223 | self.keep_prob_encoder = self.args_specific["keep_prob_encoder"] 224 | self.keep_prob_pe = self.args_specific["keep_prob_pe"] 225 | self.kee_prob_pff = self.args_specific["keep_prob_pff"] 226 | self.keep_prob_attn = self.args_specific["keep_prob_attn"] 227 | self.keep_prob_clf = self.args_specific["keep_prob_clf"] 228 | 229 | # Condition parameter for the transformer type (It only supports classification for now) 230 | self.transformer_type = self.args_specific["transformer_type"] 231 | 232 | # Number of parallel attention layers for MultiHeadedAttention 233 | self.heads = self.args_specific["heads"] 234 | 235 | # Number of encoder layers 236 | self.num_encoder_layers = self.args_specific["num_encoder_layers"] 237 | 238 | # Number of hidden count units for Position-Wise Feed-Forward Network 239 | self.num_hidden_pos_ff = self.args_specific["num_hidden_pos_ff"] 240 | 241 | # Maximum length of an input 242 | self.max_length = self.args_specific["max_length"] 243 | 244 | if self.transformer_type == "classifier": 245 | self.model = self.create_classifier_transformer() 246 | else: 247 | raise ValueError("Transformer can be created as classifier for now!") 248 | 249 | def create_classifier_transformer(self): 250 | c = copy.deepcopy 251 | 252 | # Initialize individual parts of the full model 253 | # attention = torch.nn.MultiheadAttention(num_heads=self.heads, embed_dim=self.embed_dim, 254 | # dropout=self.keep_prob_attn) 255 | attention = MultiHeadedAttentionGoogle(heads=self.heads, d_model=self.embed_dim, keep_prob=self.keep_prob_attn) 256 | 257 | ff = PositionalFeedForwardGoogle(d_model=self.embed_dim, d_ff=self.num_hidden_pos_ff, 258 | keep_prob=self.kee_prob_pff) 259 | 260 | embeddings = Embeddings(self.embed_dim, self.vocab_size, self.padding_id, self.use_pretrained_embed, 261 | self.pretrained_weights, optional_sqrt_mul=self.use_embed_sqrt_mul) 262 | 263 | positional_embeddings = PositionalEncodingGoogle(d_model=self.embed_dim, keep_prob=self.keep_prob_pe, 264 | max_len=self.max_length) 265 | 266 | # Initialize the full model 267 | model = EncoderClassifier(nn.Sequential(embeddings, c(positional_embeddings)), 268 | EncoderBlockGoogle( 269 | EncoderLayerGoogle(self.embed_dim, c(attention), c(ff), self.keep_prob_encoder), 270 | self.num_encoder_layers), 271 | Classifier(self.embed_dim, d_hidden=self.embed_dim // 2, num_classes=self.num_class, 272 | keep_prob=self.keep_prob_clf), 273 | device=self.device) 274 | 275 | # Initialize model parameters 276 | for p in model.parameters(): 277 | if p.dim() > 1: 278 | nn.init.xavier_uniform_(p) 279 | return model 280 | 281 | 282 | if __name__ == '__main__': 283 | print("Transformer tests") 284 | plt.figure(figsize=(15, 5)) 285 | pe = PositionalEncodingGoogle(20, 0) 286 | y = pe.forward(Variable(torch.zeros(1, 100, 20))) 287 | plt.plot(np.arange(100), y[0, :, 4:8].data.numpy()) 288 | plt.legend(["dim %d" % p for p in [4, 5, 6, 7]]) 289 | plt.show() 290 | -------------------------------------------------------------------------------- /models/Transformer_OpenAI.py: -------------------------------------------------------------------------------- 1 | import copy 2 | import math 3 | 4 | import matplotlib.pyplot as plt 5 | import numpy as np 6 | import torch 7 | import torch.nn as nn 8 | import torch.nn.functional as F 9 | from torch.autograd import Variable 10 | 11 | from utils.utils import clones 12 | 13 | 14 | 15 | class Embeddings(nn.Module): 16 | def __init__(self, embed_dim, vocab_size, keep_prob, padding_id, use_pretrained_embed, pretrained_weights): 17 | super(Embeddings, self).__init__() 18 | # Initialize embeddings 19 | self.embedding = nn.Embedding(vocab_size, embed_dim, padding_idx=padding_id).cpu() 20 | if use_pretrained_embed: 21 | self.load_pretrained_weights() 22 | self.embed_drop = nn.Dropout(keep_prob) 23 | 24 | def forward(self, input): 25 | x = self.embed_drop(self.embedding(input)) 26 | out = x.sum(dim=2) 27 | return out 28 | 29 | 30 | class LayerNorm(nn.Module): 31 | def __init__(self, features, epsilon=1e-5): 32 | self.a_2 = nn.Parameter(torch.ones(features)) 33 | self.b_2 = nn.Parameter(torch.zeros(features)) 34 | self.epsilon = epsilon 35 | 36 | def forward(self, x): 37 | mean = x.mean(-1, keepdim=True) 38 | std = x.std(-1, keepdim=True) 39 | return self.a_2 * (x - mean) / torch.sqrt(std + self.epsilon) + self.b_2 40 | 41 | 42 | class MultiLayerPerceptron(nn.Module): 43 | def __init__(self, num_state, embed_dim, keep_prob): 44 | self.fc = nn.Conv1d(num_state, 1, embed_dim) 45 | self.proj = nn.Conv1d(embed_dim, 1, num_state) 46 | self.activation = nn.ReLU() 47 | self.dropout = nn.Dropout(keep_prob) 48 | 49 | def forward(self, input): 50 | x = self.activation(self.fc(input)) 51 | x = self.dropout(self.proj(x)) 52 | return x 53 | 54 | 55 | class ModifiedMultiHeadedAttention(nn.Module): 56 | def __init__(self, num_state, n_ctx, num_heads, keep_prob_attention, keep_prob_residual, scale=False): 57 | assert num_state % num_heads == 0 58 | self.bias = torch.tril(torch.ones(n_ctx, n_ctx)).view(1, 1, n_ctx, n_ctx) 59 | self.num_heads = num_heads 60 | self.split_size = num_state 61 | self.scale = scale 62 | self.attn = nn.Conv1d(num_state * 3, 1, num_state) 63 | self.proj = nn.Conv1d(num_state, 1, num_state) 64 | self.attn_dropout = nn.Dropout(keep_prob_attention) 65 | self.residual_dropout = nn.Dropout(keep_prob_residual) 66 | 67 | def attention(self, query, key, value): 68 | weight = torch.matmul(query, key) 69 | if self.scale: 70 | weight = weight / math.sqrt(value.size(-1)) 71 | 72 | # Mask attention weights 73 | bias = self.bias[:, :, :weight.size(-2), :weight.size(-1)] 74 | weight = weight * bias - 1e9 * (1 - bias) 75 | 76 | p_attn = F.softmax(weight, dim=-1) 77 | if self.attn_dropout is not None: 78 | p_attn = self.attn_dropout(p_attn) 79 | return torch.matmul(p_attn, value) 80 | 81 | # Direct c/p from huggingface, which is the equivalent of original tensorflow implementation. 82 | def merge_heads(self, x): 83 | x = x.permute(0, 2, 1, 3) 84 | new_x_shape = x.size()[:-2] + (x.size(-2) * x.size(-1),) 85 | return x.view(*new_x_shape) 86 | 87 | # Direct c/p from huggingface, which is the equivalent of original tensorflow implementation. 88 | def split_heads(self, x, is_key=False): 89 | new_x_shape = x.size()[:-1] + (self.n_head, x.size(-1) // self.n_head) 90 | x = x.view(*new_x_shape) 91 | if is_key: 92 | return x.permute(0, 2, 3, 1) 93 | else: 94 | return x.permute(0, 2, 1, 3) 95 | 96 | def forward(self, input): 97 | x = self.attn(input) 98 | query, key, value = x.split(self.split_size, dim=2) 99 | query = self.split_heads(query) 100 | key = self.split_heads(key, is_key=True) 101 | value = self.split_heads(value) 102 | out = self.proj(self.merge_heads(self.attention(query, key, value))) 103 | return self.residual_dropout(out) 104 | 105 | 106 | class Block(nn.Module): 107 | def __init__(self, embed_dim, num_heads, keep_prob_attention, keep_prob_residual, keep_prob_mlp, n_ctx=512, 108 | scale=False, use_builtin_mha=False): 109 | if use_builtin_mha: 110 | self.attention = nn.MultiheadAttention(embed_dim=embed_dim, 111 | num_heads=num_heads, 112 | dropout=keep_prob_attention) 113 | else: 114 | self.attention = ModifiedMultiHeadedAttention(num_state=embed_dim, 115 | n_ctx=n_ctx, 116 | num_heads=num_heads, 117 | keep_prob_attention=keep_prob_attention, 118 | keep_prob_residual=keep_prob_residual, 119 | scale=scale) 120 | self.layer_norm1 = LayerNorm(embed_dim) 121 | self.mlp = MultiLayerPerceptron(4 * embed_dim, embed_dim, keep_prob_mlp) 122 | self.layer_norm2 = LayerNorm(embed_dim) 123 | 124 | def forward(self, input): 125 | x = self.attn(input) 126 | x_hat = self.ln_1(input + x) 127 | x = self.mlp(x_hat) 128 | x = self.ln_2(x_hat + x) 129 | return x 130 | 131 | class LanguageModelHead(nn.Module): 132 | def __init__(self, embedding, embed_dim): 133 | super(LanguageModelHead, self).__init__() 134 | self.embed_dim = embed_dim 135 | self.decoder = nn.Linear(in_features=embedding.embedding.weight.shape[1], 136 | out_features=embedding.embedding.weight.shape[0], 137 | bias=True) 138 | self.decoder.weight = embedding.embedding.weight 139 | 140 | def forward(self, input): 141 | # Remove last token 142 | x = input[:, :-1].view(-1, self.embed_dim) 143 | x = self.decoder(x) 144 | return x 145 | 146 | 147 | class TransformerOpenAI: 148 | def __init__(self, args): 149 | super(TransformerOpenAI, self).__init__() 150 | 151 | self.args_common = args["common_model_properties"] 152 | self.args_specific = args["transformer_openai"] 153 | 154 | # Device 155 | self.device = self.args_common["device"] 156 | 157 | # Input/Output dimensions 158 | self.vocab_size = self.args_common["vocab_size"] 159 | self.embed_dim = self.args_common["embed_dim"] 160 | self.num_class = self.args_common["num_class"] 161 | 162 | # Embedding parameters 163 | self.padding_id = self.args_common["padding_id"] 164 | 165 | # Condition parameters 166 | self.use_pretrained_embed = self.args_common["use_pretrained_embed"] 167 | 168 | # Model/Context size 169 | self.d_model = self.args_specific["d_model"] 170 | 171 | # Dropout probabilities for each individual part of the full model. 172 | self.keep_prob_embed = self.args_specific["keep_prob_embed"] 173 | 174 | # Number of parallel attention layers for MultiHeadedAttention 175 | self.heads = self.args_specific["heads"] 176 | 177 | # Number of layers in terms of Blocks 178 | self.num_layers = self.args_specific["num_layers"] 179 | 180 | if self.transformer_type == "classifier": 181 | self.model = self.create_classifier_transformer() 182 | else: 183 | raise ValueError("Transformer can be created as classifier for now!") 184 | 185 | def create_classifier_transformer(self): 186 | c = copy.deepcopy 187 | 188 | embedding = Embeddings(self.embed_dim, self.vocab_size, self.keep_prob_embed, self.padding_id, 189 | self.use_pretrained_embed, self.pretrained_weights) 190 | 191 | 192 | -------------------------------------------------------------------------------- /models/Util_CNN.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import torch.nn.functional as F 4 | 5 | 6 | class ConvolutionalBlock(nn.Module): 7 | def __init__(self, input_channel_size, filter_count, filter_size, stride): 8 | super(ConvolutionalBlock, self).__init__() 9 | relu = nn.ReLU() 10 | bn = nn.BatchNorm1d(num_features=filter_count) 11 | conv1 = nn.Conv1d(in_channels=input_channel_size, 12 | out_channels=filter_count, 13 | kernel_size=filter_size, 14 | stride=stride, 15 | padding=1) 16 | conv2 = nn.Conv1d(in_channels=filter_count, 17 | out_channels=filter_count, 18 | kernel_size=filter_size, 19 | stride=1, 20 | padding=1) 21 | self.block = nn.Sequential(conv1, bn, relu, conv2, bn, relu) 22 | 23 | def forward(self, input): 24 | return self.block(input) 25 | 26 | 27 | class KMaxPooling(nn.Module): 28 | def __init__(self, k): 29 | super(KMaxPooling, self).__init__() 30 | assert 1 < k 31 | self.k = k 32 | 33 | def forward(self, input): 34 | kmax, _ = input.topk(input.shape(2) // self.k, dim=2) 35 | return kmax 36 | 37 | 38 | class LayerBlock(nn.Module): 39 | def __init__(self, input_channel_size, filter_count, conv_filter_size, maxpool_filter_size, kmax_k=2, 40 | downsample=False, downsample_type="resnet", use_shortcut=True): 41 | super(LayerBlock, self).__init__() 42 | self.downsample = downsample 43 | self.use_shortcut = use_shortcut 44 | 45 | self.pool = None 46 | stride = 1 47 | if self.downsample: 48 | if downsample_type == "resnet": 49 | stride = 2 50 | elif downsample_type == "vgg": 51 | self.pool = nn.MaxPool1d(kernel_size=maxpool_filter_size, stride=2, padding=1) 52 | elif downsample_type == "kmax": 53 | self.pool = self.KMaxPooling(k=kmax_k) 54 | else: 55 | raise KeyError("Downsample_type can be (1) resnet, (2) vgg, or (3) kmax") 56 | 57 | self.convolutional_block = self.ConvolutionalBlock(input_channel_size=input_channel_size, 58 | filter_count=filter_count, 59 | filter_size=conv_filter_size, 60 | stride=stride) 61 | 62 | if use_shortcut and self.downsample: 63 | self.shortcut = nn.Conv1d(in_channels=input_channel_size, 64 | out_channels=filter_count, 65 | kernel_size=1, 66 | stride=2) 67 | 68 | def forward(self, input): 69 | residual = input 70 | if self.downsample and self.pool: 71 | x = self.pool(input) 72 | x = self.convolutional_block(x) 73 | 74 | if self.downsample and self.use_shortcut: 75 | residual = self.shortcut(residual) 76 | 77 | if self.use_shortcut: 78 | x += residual 79 | return x 80 | 81 | 82 | class ConvolutionEncoder(nn.Module): 83 | def __init__(self, args, embedding): 84 | super(ConvolutionEncoder, self).__init__() 85 | self.args_common = args["common_model_properties"] 86 | self.args_specific = args["conv_deconv_cnn"] 87 | 88 | # Device 89 | self.device = self.args_common["device"] 90 | 91 | # Input/Output dimensions 92 | self.embed_dim = self.args_common["embed_dim"] 93 | 94 | # Condition parameters 95 | self.use_batch_norm = self.args_common["use_batch_norm"] 96 | 97 | # Batch normalization parameters 98 | self.batch_norm_momentum = self.args_common["batch_norm_momentum"] 99 | self.batch_norm_affine = self.args_common["batch_norm_affine"] 100 | 101 | # Convolution parameters 102 | self.input_channel = 1 103 | self.filter_counts = self.args_specific["filter_counts"] 104 | self.filter_sizes = self.args_specific["filter_sizes"] 105 | self.strides = self.args_specific["strides"] 106 | 107 | self.embedding = embedding 108 | 109 | # Initialize convolutions 110 | self.conv1 = nn.Conv2d(in_channels=self.input_channel, 111 | out_channels=self.filter_counts[0], 112 | kernel_size=(self.filter_sizes[0], self.embed_dim), 113 | stride=self.strides[0], 114 | bias=True) 115 | self.conv2 = nn.Conv2d(in_channels=self.filter_counts[0], 116 | out_channels=self.filter_counts[1], 117 | kernel_size=(self.filter_sizes[1], 1), 118 | stride=self.strides[1], 119 | bias=True) 120 | self.conv3 = nn.Conv2d(in_channels=self.filter_counts[1], 121 | out_channels=self.filter_counts[2], 122 | kernel_size=(self.filter_sizes[2], 1), 123 | stride=self.strides[2], 124 | bias=True) 125 | 126 | # Initialize batch norms 127 | if self.use_batch_norm: 128 | self.conv1_bn = nn.BatchNorm2d(num_features=self.filter_counts[0], 129 | momentum=self.batch_norm_momentum, 130 | affine=self.batch_norm_affine) 131 | self.conv2_bn = nn.BatchNorm2d(num_features=self.filter_counts[1], 132 | momentum=self.batch_norm_momentum, 133 | affine=self.batch_norm_affine) 134 | 135 | # Well, self-explanatory. 136 | self.relu = nn.ReLU() 137 | 138 | def forward(self, batch): 139 | batch_permuted = batch.permute(1, 0) 140 | h = self.embed(batch_permuted) 141 | if "cuda" in str(self.device): 142 | h = h.cuda() 143 | 144 | if self.use_batch_norm: 145 | h = self.relu(self.conv1_bn(self.conv1(h))) 146 | h = self.relu(self.conv2_bn(self.conv2(h))) 147 | h = self.relu(self.conv3(h)) 148 | else: 149 | h = self.relu(self.conv1(h)) 150 | h = self.relu(self.conv2(h)) 151 | h = self.relu(self.conv3(h)) 152 | 153 | return h 154 | 155 | 156 | class DeconvolutionDecoder(nn.Module): 157 | def __init__(self, args, embedding): 158 | super(DeconvolutionDecoder, self).__init__() 159 | self.args_common = args["common_model_properties"] 160 | self.args_specific = args["conv_deconv_cnn"] 161 | 162 | # Device 163 | self.device = self.args_common["device"] 164 | 165 | # Input/Output dimensions 166 | self.embed_dim = self.args_common["embed_dim"] 167 | 168 | # Condition parameters 169 | self.use_batch_norm = self.args_common["use_batch_norm"] 170 | 171 | # Batch normalization parameters 172 | self.batch_norm_momentum = self.args_common["batch_norm_momentum"] 173 | self.batch_norm_affine = self.args_common["batch_norm_affine"] 174 | 175 | # Convolution parameters 176 | self.input_channel = 1 177 | self.filter_counts = list(reversed(self.args_specific["filter_counts"])) 178 | self.filter_sizes = list(reversed(self.args_specific["filter_sizes"])) 179 | self.strides = list(reversed(self.args_specific["strides"])) 180 | self.temperature = args["deconv_temperature"] 181 | 182 | self.embedding = embedding 183 | 184 | # Initialize deconvolutions 185 | self.deconv1 = nn.ConvTranspose2d(in_channels=self.filter_counts[0], 186 | out_channels=self.filter_counts[1], 187 | kernel_size=(self.filter_sizes[0], 1), 188 | stride=self.strides[0], 189 | bias=True) 190 | self.deconv2 = nn.ConvTranspose2d(in_channels=self.filter_counts[1], 191 | out_channels=self.filter_counts[2], 192 | kernel_size=(self.filter_sizes[1], 1), 193 | stride=self.strides[1], 194 | bias=True) 195 | self.deconv3 = nn.ConvTranspose2d(in_channels=self.filter_counts[2], 196 | out_channels=self.input_channel, 197 | kernel_size=(self.filter_sizes[2], self.embed_dim), 198 | stride=self.strides[2], 199 | bias=True) 200 | 201 | # Initialize batch norms 202 | if self.use_batch_norm: 203 | self.deconv1_bn = nn.BatchNorm2d(num_features=self.filter_counts[0], 204 | momentum=self.batch_norm_momentum, 205 | affine=self.batch_norm_affine) 206 | self.deconv2_bn = nn.BatchNorm2d(num_features=self.filter_counts[1], 207 | momentum=self.batch_norm_momentum, 208 | affine=self.batch_norm_affine) 209 | 210 | # Well, self-explanatory. 211 | self.relu = nn.ReLU() 212 | 213 | def forward(self, h): 214 | if self.use_batch_norm: 215 | x_ = self.relu(self.deconv1_bn(self.deconv1(h))) 216 | x_ = self.relu(self.deconv2_bn(self.deconv2(x_))) 217 | x_ = self.relu(self.deconv3(x_)) 218 | else: 219 | x_ = self.relu(self.deconv1(h)) 220 | x_ = self.relu(self.deconv2(x_)) 221 | x_ = self.relu(self.deconv3(x_)) 222 | 223 | x_ = x_.squeeze() 224 | 225 | # p(w^t = v): Probability of w^t to be word v, as w^t is the t'th word of the reconstructed sentence. 226 | normalized_x_ = torch.norm(x_, p=2, dim=2, keepdim=True) 227 | reconstructed_x_ = x_ / normalized_x_ 228 | 229 | normalized_w = (nn.Variable(self.embedding.weight.data).t()).unsqueeze(0) 230 | normalized_w = normalized_w.expand(reconstructed_x_.size(0), *normalized_w.size()) 231 | probs = torch.bmm(reconstructed_x_, normalized_w) / self.temperature 232 | # Reconstruction log probabilities (not loss) 233 | return F.log_softmax(probs, dim=2) 234 | 235 | 236 | class FullyConnectedClassifier(nn.Module): 237 | def __init__(self, args): 238 | super(FullyConnectedClassifier, self).__init__() 239 | self.args_common = args["common_model_properties"] 240 | self.args_specific = args["conv_deconv_cnn"] 241 | 242 | # This block is not configurable for any network architecture! 243 | # It is designed for Conv-Deconv CNN, hence its input size is the output size of the Encoder CNN. 244 | self.input_size = self.args_specific["filter_counts"][2] 245 | self.hidden_layer_size = self.args_specific["hidden_layer_size"] 246 | self.num_class = self.args_common["num_class"] 247 | self.keep_prob = self.args_common["keep_prob"] 248 | 249 | self.fc1 = nn.Linear(self.input_size, self.hidden_layer_size) 250 | self.fc2 = nn.Linear(self.hidden_layer_size, self.num_class) 251 | 252 | self.dropout = nn.Dropout(self.keep_prob) 253 | 254 | def forward(self, input): 255 | x = self.dropout(self.fc1(input)) 256 | x = self.fc2(x) 257 | # Supervised log probabilities 258 | return F.log_softmax(x, dim=1) 259 | -------------------------------------------------------------------------------- /models/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hbahadirsahin/nlp-experiments-in-pytorch/5d05a53146dffd707e4d037230656f980d7be05c/models/__init__.py -------------------------------------------------------------------------------- /optimizer/custom_optimizer.py: -------------------------------------------------------------------------------- 1 | import math 2 | 3 | import matplotlib.pyplot as plt 4 | import numpy as np 5 | import torch 6 | from torch.nn.utils import clip_grad_norm_ 7 | from torch.optim import Optimizer 8 | 9 | 10 | def warmup_cosine(x, warmup=0.002): 11 | s = 0 12 | if x < warmup: 13 | s = 1 14 | return s * (x / warmup) + (1 - s) * (0.5 * (1 + torch.cos(math.pi * x))) 15 | 16 | 17 | def warmup_constant(x, warmup=0.002): 18 | s = 0 19 | if x < warmup: 20 | s = 1 21 | return s * (x / warmup) + (1 - s) 22 | 23 | 24 | def warmup_linear(x, warmup=0.002): 25 | s = 0 26 | if x < warmup: 27 | s = 1 28 | return (s * (x / warmup) + (1 - s)) * (1 - x) 29 | 30 | 31 | SCHEDULES = { 32 | "cos": warmup_cosine, 33 | "constant": warmup_constant, 34 | "linear": warmup_linear, 35 | } 36 | 37 | 38 | class NoamOptimizer(): 39 | # Direct c/p from Attention is All You Need notebook (famous Harvard's one) 40 | def __init__(self, model_size, factor, warmup, optimizer): 41 | self.optimizer = optimizer 42 | self._step = 0 43 | self.warmup = warmup 44 | self.factor = factor 45 | self.model_size = model_size 46 | self._rate = 0 47 | 48 | def step(self): 49 | # Update parameters and rate 50 | self._step += 1 51 | rate = self.rate() 52 | for p in self.optimizer.param_groups: 53 | p["lr"] = rate 54 | self._rate = rate 55 | self.optimizer.step() 56 | 57 | def rate(self, step=None): 58 | if step is None: 59 | step = self._step 60 | return self.factor * (self.model_size ** (-0.5) * min(step ** (-0.5), step * self.warmup ** (-1.5))) 61 | 62 | 63 | class OpenAIAdam(Optimizer): 64 | # Referance to https://github.com/huggingface 65 | # Open AI version of Adam with weight decay 66 | def __init__(self, params, lr, schedule, warmup, t_total, b1=0.9, b2=0.999, e=1e-8, l2=0, vector_l2=False, 67 | max_grad_norm=-1, **kwargs): 68 | assert 0 < lr 69 | assert schedule == "cos" or schedule == "constant" or schedule == "linear" 70 | assert 0 < warmup 71 | assert 0 < b1 <= 1.0 72 | assert 0 < b2 <= 1.0 73 | assert 0 < e 74 | 75 | defaults = dict(lr=lr, schedule=schedule, warmup=warmup, t_total=t_total, b1=b1, b2=b2, e=e, l2=l2, 76 | vector_l2=vector_l2, max_grad_norm=max_grad_norm) 77 | 78 | super(OpenAIAdam, self).__init__(params, defaults) 79 | 80 | def step(self, closure=None): 81 | # Performs a single optimization step 82 | loss = None 83 | if closure is not None: 84 | loss = closure() 85 | 86 | for group in self.param_groups: 87 | for p in group["params"]: 88 | if p.grad is None: 89 | continue 90 | grad = p.grad.date 91 | if grad.is_sparse: 92 | raise RuntimeError("Sparse gradients. Use SparseAdam") 93 | 94 | state = self.state[p] 95 | 96 | # State initialization 97 | if len(state) == 0: 98 | state["step"] = 0 99 | # Exponential moving average of gradient values 100 | state["exp_avg"] = torch.zeros_like(p.data) 101 | # Exponential moving average of squared gradient values 102 | state["exp_avg_sq"] = torch.zeros_like(p.data) 103 | 104 | exp_avg, exp_avg_sq = state["exp_avg"], state["exp_avg_sq"] 105 | beta1, beta2 = group["b1"], group["b2"] 106 | 107 | state["step"] += 1 108 | 109 | # Add grad clipping 110 | if group["max_grad_norm"] > 0: 111 | clip_grad_norm_(p, group["max_grad_norm"]) 112 | 113 | # Decay the first and second moment running average coefficient 114 | exp_avg.mul_(beta1).add_(1 - beta1, grad) 115 | exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad) 116 | denom = exp_avg_sq.sqrt().add_(group["e"]) 117 | 118 | bias_correction1 = 1 - beta1 ** state["step"] 119 | bias_correction2 = 1 - beta2 ** state["step"] 120 | 121 | schedule_fn = SCHEDULES(group["schedule"]) 122 | lr_scheduled = group["lr"] * schedule_fn(state["state"] / group["t_total"], group["warmup"]) 123 | step_size = lr_scheduled * math.sqrt(bias_correction2) / bias_correction1 124 | 125 | p.data.addcdiv_(-step_size, exp_avg, denom) 126 | 127 | # Add weight decay at the end (fixed version) 128 | if (len(p.size()) > 1 or group["vector_l2"]) and group["l2"] > 0: 129 | p.data.add_(-lr_scheduled * group["l2"], p.data) 130 | return loss 131 | 132 | 133 | class Padam(Optimizer): 134 | """Partially Adaptive Momentum Estimation algorithm""" 135 | 136 | def __init__(self, params, lr, amsgrad, e=1e-8, b1=0.9, b2=0.999, partial=0.25, weight_decay=0, max_grad_norm=-1, 137 | **kwargs): 138 | assert 0 < lr 139 | assert 0 < b1 <= 1.0 140 | assert 0 < b2 <= 1.0 141 | assert 0 < e 142 | assert 0 < partial <= 0.5 143 | defaults = dict(lr=lr, b1=b1, b2=b2, e=e, amsgrad=amsgrad, partial=partial, weight_decay=weight_decay, 144 | max_grad_norm=max_grad_norm) 145 | super(Padam, self).__init__(params, defaults) 146 | 147 | def step(self, closure=None): 148 | # Performs a single optimization step 149 | loss = None 150 | if closure is not None: 151 | loss = closure() 152 | 153 | for group in self.param_groups: 154 | for p in group['params']: 155 | if p.grad is None: 156 | continue 157 | grad = p.grad.data 158 | if grad.is_sparse: 159 | raise RuntimeError("Sparse gradients. Use SparseAdam") 160 | 161 | amsgrad = group['amsgrad'] 162 | partial = group['partial'] 163 | 164 | state = self.state[p] 165 | 166 | # State initialization 167 | if len(state) == 0: 168 | state['step'] = 0 169 | # Exponential moving average of gradient values 170 | state['exp_avg'] = torch.zeros_like(p.data) 171 | # Exponential moving average of squared gradient values 172 | state['exp_avg_sq'] = torch.zeros_like(p.data) 173 | if amsgrad: 174 | # Maintains max of all exp. moving avg. of sq. grad. values 175 | state['max_exp_avg_sq'] = torch.zeros_like(p.data) 176 | 177 | exp_avg, exp_avg_sq = state["exp_avg"], state["exp_avg_sq"] 178 | beta1, beta2 = group["b1"], group["b2"] 179 | if amsgrad: 180 | max_exp_avg_sq = state['max_exp_avg_sq'] 181 | state["step"] += 1 182 | 183 | # Add grad clipping 184 | if group["max_grad_norm"] > 0: 185 | clip_grad_norm_(p, group["max_grad_norm"]) 186 | 187 | if group['weight_decay'] != 0: 188 | grad = grad.add(group['weight_decay'], p.data) 189 | 190 | # Decay the first and second moment running average coefficient 191 | exp_avg.mul_(beta1).add_(1 - beta1, grad) 192 | exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad) 193 | if amsgrad: 194 | # Maintains the maximum of all 2nd moment running avg. till now 195 | torch.max(max_exp_avg_sq, exp_avg_sq, out=max_exp_avg_sq) 196 | # Use the max. for normalizing running avg. of gradient 197 | denom = max_exp_avg_sq.sqrt().add_(group['e']) 198 | else: 199 | denom = exp_avg_sq.sqrt().add_(group["e"]) 200 | 201 | bias_correction1 = 1 - beta1 ** state["step"] 202 | bias_correction2 = 1 - beta2 ** state["step"] 203 | step_size = group['lr'] * math.sqrt(bias_correction2) / bias_correction1 204 | 205 | p.data.addcdiv_(-step_size, exp_avg, denom ** (partial * 2)) 206 | return loss 207 | 208 | 209 | if __name__ == '__main__': 210 | opts = [NoamOptimizer(512, 1, 4000, None), 211 | NoamOptimizer(512, 1, 8000, None), 212 | NoamOptimizer(256, 1, 4000, None)] 213 | plt.plot(np.arange(1, 20000), [[opt.rate(i) for opt in opts] for i in range(1, 20000)]) 214 | plt.legend(["512:4000", "512:8000", "256:4000"]) 215 | plt.show() 216 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | numpy==1.16.0 2 | torch==1.0.1 3 | torchtext==0.3.1 4 | setuptools==40.8.0 5 | gensim==3.6.0 6 | adabound==0.0.5 7 | matplotlib==3.0.2 8 | spacy==2.0.16 9 | -------------------------------------------------------------------------------- /scorer/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hbahadirsahin/nlp-experiments-in-pytorch/5d05a53146dffd707e4d037230656f980d7be05c/scorer/__init__.py -------------------------------------------------------------------------------- /scorer/accuracy_scorer.py: -------------------------------------------------------------------------------- 1 | import torch 2 | 3 | 4 | class AccuracyScorer(object): 5 | def __init__(self, topk=(2,)): 6 | super(AccuracyScorer, self).__init__() 7 | self.topk = topk 8 | 9 | @staticmethod 10 | def calculate_accuracy(predictions, ground_truths): 11 | correct = (torch.max(predictions, 1)[1].view(ground_truths.size()).data == ground_truths.data) 12 | return float(correct.sum()) / len(correct) * 100 13 | 14 | def calculate_topk_accuracy(self, predictions, ground_truths): 15 | """ 16 | Code copied/pasted from PyTorch Imagenet example: https://github.com/pytorch/examples/blob/master/imagenet/main.py 17 | Computes the accuracy over the k top predictions for the specified values of k 18 | """ 19 | with torch.no_grad(): 20 | maxk = max(self.topk) 21 | batch_size = ground_truths.size(0) 22 | 23 | _, pred = predictions.topk(maxk, 1, True, True) 24 | pred = pred.t() 25 | correct = pred.eq(ground_truths.view(1, -1).expand_as(pred)) 26 | 27 | res = [] 28 | for k in self.topk: 29 | correct_k = correct[:k].view(-1).float().sum(0, keepdim=True) 30 | res.append(correct_k.mul_(100.0 / batch_size)) 31 | return res 32 | -------------------------------------------------------------------------------- /scorer/ner_scorer.py: -------------------------------------------------------------------------------- 1 | import logging.config 2 | 3 | logging.config.fileConfig(fname='./config/config.logger', disable_existing_loggers=False) 4 | logger = logging.getLogger("NerScorer") 5 | 6 | class NerScorer(object): 7 | def __init__(self, ner_vocab): 8 | super(NerScorer, self).__init__() 9 | self.ner_vocab = ner_vocab 10 | self.token_accuracy = 0 11 | self.avg_macro_precision = 0 12 | self.avg_macro_recall = 0 13 | self.avg_macro_f1 = 0 14 | self.macro_precision = {} 15 | self.macro_recall = {} 16 | self.macro_f1 = {} 17 | 18 | def token_level_accuracy(self, prediction, ground_truth): 19 | token_count = 0 20 | matched = 0 21 | 22 | for p_seq, gt_seq in zip(prediction, ground_truth): 23 | for p, gt in zip(p_seq, gt_seq): 24 | token_count += 1 25 | if p == gt: 26 | matched += 1 27 | 28 | self.token_accuracy = matched * 100.0 / token_count 29 | 30 | def __initialize_dict(self): 31 | d = {} 32 | for v in self.ner_vocab.stoi: 33 | d[self.ner_vocab.stoi[v]] = 0 34 | return d 35 | 36 | def __add_to_dict(self, d, tag): 37 | if tag in d: 38 | d[tag] += 1 39 | else: 40 | d[tag] = 1 41 | return d 42 | 43 | def __calculate_tag_f1(self, f1, tp, fp, fn): 44 | precision = {} 45 | recall = {} 46 | for tag in tp: 47 | precision[tag] = tp[tag] / (tp[tag] + fp[tag] + 1e-16) 48 | recall[tag] = tp[tag] / (tp[tag] + fn[tag] + 1e-16) 49 | f1[tag] = (2 * precision[tag] * recall[tag] / (precision[tag] + recall[tag] + 1e-16)) * 100 50 | self.macro_f1 = f1 51 | self.macro_precision = precision 52 | self.macro_recall = recall 53 | 54 | def __calculate_mean_f1(self): 55 | self.avg_macro_f1 = sum(self.macro_f1.values()) / float(len(self.macro_f1)) 56 | self.avg_macro_precision = sum(self.macro_precision.values()) / float(len(self.macro_precision)) 57 | self.avg_macro_recall = sum(self.macro_recall.values()) / float(len(self.macro_recall)) 58 | 59 | def f1_score(self, prediction, ground_truth): 60 | true_positives = self.__initialize_dict() 61 | false_positives = self.__initialize_dict() 62 | false_negatives = self.__initialize_dict() 63 | f1 = self.__initialize_dict() 64 | 65 | for p_seq, gt_seq in zip(prediction, ground_truth): 66 | for p, gt in zip(p_seq, gt_seq): 67 | if p == gt: 68 | true_positives = self.__add_to_dict(true_positives, gt) 69 | else: 70 | false_negatives = self.__add_to_dict(false_negatives, gt) 71 | false_positives = self.__add_to_dict(false_positives, p) 72 | self.__calculate_tag_f1(f1, true_positives, false_positives, false_negatives) 73 | self.__calculate_mean_f1() 74 | 75 | def print_detailed_score_log(self): 76 | logger.info("--------------------") 77 | logger.info("Detailed Tag-Based Score") 78 | for tag in self.macro_f1: 79 | logger.info("Tag: {} - Precision: {:.4f} - Recall: {:.4f} - F1: {:.4f}".format(self.ner_vocab.itos[tag], 80 | self.macro_precision[tag], 81 | self.macro_recall[tag], 82 | self.macro_f1[tag])) 83 | logger.info("--------------------") 84 | -------------------------------------------------------------------------------- /training/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hbahadirsahin/nlp-experiments-in-pytorch/5d05a53146dffd707e4d037230656f980d7be05c/training/__init__.py -------------------------------------------------------------------------------- /training/multiple_model_trainer.py: -------------------------------------------------------------------------------- 1 | import os 2 | import time 3 | 4 | import torch 5 | import torch.nn as nn 6 | import torch.optim as optim 7 | 8 | from custom_optimizer import OpenAIAdam, Padam 9 | from evaluation.evaluator import Evaluator 10 | from utils.utils import time_since, calculate_accuracy, save_best_model, calculate_topk_accuracy, \ 11 | scheduled_annealing_strategy 12 | 13 | 14 | class MultipleModelTrainer(object): 15 | def __init__(self, training_properties, train_iter, dev_iter, test_iter, device): 16 | self.optimizer_type = training_properties["optimizer"] 17 | self.learning_rate = training_properties["learning_rate"] 18 | self.weight_decay = training_properties["weight_decay"] 19 | self.momentum = training_properties["momentum"] 20 | self.epoch = training_properties["epoch"] 21 | self.topk = training_properties["topk"] 22 | self.print_every = training_properties["print_every_batch_step"] 23 | self.save_every = training_properties["save_every_epoch"] 24 | self.eval_every = training_properties["eval_every"] 25 | self.save_path = training_properties["save_path"] 26 | 27 | self.openAIAdamSchedulerType = training_properties["scheduler_type"] 28 | self.amsgrad = training_properties["amsgrad"] 29 | self.partial_adam = training_properties["partial_adam"] 30 | 31 | self.train_iter = train_iter 32 | self.dev_iter = dev_iter 33 | self.test_iter = test_iter 34 | 35 | self.device = device 36 | 37 | self.dev_evaluator, self.test_evaluator = Evaluator().evaluator_factory("multiple_model_evaluator", self.device) 38 | 39 | def init_optimizer(self, model): 40 | print("Optimizer type is {} !".format(self.optimizer_type)) 41 | 42 | if self.optimizer_type == "Adam": 43 | return optim.Adam(model.parameters(), lr=self.learning_rate, weight_decay=self.weight_decay) 44 | elif self.optimizer_type == "SGD": 45 | return optim.SGD(model.parameters(), lr=self.learning_rate, weight_decay=self.weight_decay, 46 | momentum=self.momentum) 47 | elif self.optimizer_type == "OpenAIAdam": 48 | return OpenAIAdam(model.parameters(), lr=self.learning_rate, schedule=self.openAIAdamSchedulerType, 49 | warmup=0.002, t_total=len(self.train_iter) * self.epoch) 50 | elif self.optimizer_type == "Padam": 51 | return Padam(model.parameters(), lr=self.learning_rate, amsgrad=self.amsgrad, partial=self.partial_adam, 52 | weight_decay=self.weight_decay) 53 | else: 54 | raise ValueError("Invalid optimizer type! Choose Adam, SGD, Padam or OpenAIAdam!") 55 | 56 | def train_iters_multi_model(self, models, checkpoint=None): 57 | # Under the assumption of models is a list that contains encoder, decoder and classifier in order. 58 | encoder = models[0] 59 | decoder = models[1] 60 | classifier = models[2] 61 | 62 | encoder_optimizer = self.init_optimizer(encoder) 63 | decoder_optimizer = self.init_optimizer(decoder) 64 | classifier_optimizer = self.init_optimizer(classifier) 65 | 66 | reconstruction_criterion = nn.CrossEntropyLoss().to(self.device) 67 | supervised_criterion = nn.NLLLoss().to(self.device) 68 | 69 | start = time.time() 70 | old_path = None 71 | best_vali_acc = -1 72 | best_vali_loss = -1 73 | best_vali_acc_topk = -1 74 | start_epoch = 1 75 | 76 | if checkpoint is not None: 77 | encoder.load_state_dict(checkpoint["encoder_state_dict"]) 78 | decoder.load_state_dict(checkpoint["decoder_state_dict"]) 79 | classifier.load_state_dict(checkpoint["classifier_stat_dict"]) 80 | encoder_optimizer.load_state_dict(checkpoint["encoder_optimizer_state_dict"]) 81 | decoder_optimizer.load_state_dict(checkpoint["decoder_optimizer_state_dict"]) 82 | classifier_optimizer.load_state_dict(checkpoint["classifier_optimizer_state_dict"]) 83 | start_epoch = checkpoint["epoch"] + 1 84 | best_vali_acc = checkpoint["best_vali_acc"] 85 | best_vali_loss = checkpoint["best_vali_loss"] 86 | best_vali_acc_topk = checkpoint["best_vali_acc_topk"] 87 | 88 | print("Training...") 89 | for e in range(start_epoch, self.epoch + 1): 90 | alpha = scheduled_annealing_strategy(epoch=e, max_epoch=self.epoch) 91 | total_loss, reconst_loss, supervised_loss, accuracy, accuracy_topk = self.train(encoder=encoder, 92 | decoder=decoder, 93 | classifier=classifier, 94 | encoder_optimizer=encoder_optimizer, 95 | decoder_optimizer=decoder_optimizer, 96 | classifier_optimizer=classifier_optimizer, 97 | reconst_criterion=reconstruction_criterion, 98 | supervised_criterion=supervised_criterion, 99 | alpha=alpha) 100 | 101 | self.print_epoch(start, e, reconst_loss, supervised_loss, total_loss, accuracy, accuracy_topk) 102 | 103 | if e % self.eval_every == 0: 104 | vali_loss, vali_accuracy, vali_accuracy_topk = self.dev_evaluatorevaluate_iter(encoder=encoder, 105 | decoder=decoder, 106 | classifier=classifier, 107 | input=self.dev_iter, 108 | reconstruction_criterion=reconstruction_criterion, 109 | supervised_criterion=supervised_criterion, 110 | save_path=self.save_path, 111 | topk=self.topk) 112 | if best_vali_acc < vali_accuracy: 113 | best_vali_loss = vali_loss 114 | best_vali_acc = vali_accuracy 115 | best_vali_acc_topk = vali_accuracy_topk 116 | save_best_model(encoder, self.save_path, filename="saved_best_encoder.pt") 117 | save_best_model(decoder, self.save_path, filename="saved_best_decoder.pt") 118 | save_best_model(classifier, self.save_path, filename="saved_best_classifier.pt") 119 | 120 | self.print_validation(vali_loss, best_vali_loss, vali_accuracy, best_vali_acc, vali_accuracy_topk, 121 | best_vali_acc_topk) 122 | 123 | if e % self.save_every == 0: 124 | filename = "saved_model_step{}.pt".format(e) 125 | out_path = os.path.abspath(os.path.join(self.save_path, filename)) 126 | if old_path is not None: 127 | os.remove(old_path) 128 | torch.save({ 129 | "epoch": e, 130 | "best_vali_acc": best_vali_acc, 131 | "best_vali_loss": best_vali_loss, 132 | "best_vali_acc_topk": best_vali_acc_topk, 133 | "encoder_state_dict": encoder.state_dict(), 134 | 'decoder_state_dict': decoder.state_dict(), 135 | 'classifier_state_dict': classifier.state_dict(), 136 | 'encoder_optimizer_state_dict': encoder_optimizer.state_dict(), 137 | 'decoder_optimizer_state_dict': decoder_optimizer.state_dict(), 138 | 'classifier_optimizer_state_dict': classifier_optimizer.state_dict() 139 | }, out_path) 140 | old_path = out_path 141 | 142 | test_loss, test_accuracy, test_accuracy_topk = self.test_evaluatorevaluate_iter(encoder=encoder, 143 | decoder=decoder, 144 | classifier=classifier, 145 | input=self.dev_iter, 146 | reconstruction_criterion=reconstruction_criterion, 147 | supervised_criterion=supervised_criterion, 148 | save_path=self.save_path, 149 | topk=self.topk) 150 | self.print_test(test_loss, test_accuracy, test_accuracy_topk) 151 | 152 | def train(self, encoder, decoder, classifier, encoder_optimizer, decoder_optimizer, classifier_optimizer, 153 | reconst_criterion, supervised_criterion, alpha=1): 154 | epoch_reconstruction_loss = 0 155 | epoch_supervised_loss = 0 156 | epoch_total_acc = 0 157 | epoch_total_acc_topk = 0 158 | step = 1 159 | 160 | encoder.train() 161 | decoder.train() 162 | classifier.train() 163 | 164 | for batch in self.train_iter: 165 | encoder_optimizer.zero_grad() 166 | decoder_optimizer.zero_grad() 167 | classifier_optimizer.zero_grad() 168 | 169 | batch_x = batch.sentence.to(self.device) 170 | batch_y = batch.category_labels.to(self.device) 171 | 172 | hidden = encoder(batch_x) 173 | reconstruction_probs = decoder(hidden) 174 | supervised_predictions = classifier(hidden.squeeze()) 175 | 176 | reconstruction_loss = reconst_criterion(reconstruction_probs, batch_x) 177 | supervised_loss = supervised_criterion(supervised_predictions, batch_y) 178 | 179 | total_loss = alpha * reconstruction_loss + supervised_loss 180 | 181 | accuracy = calculate_accuracy(supervised_predictions, batch_y) 182 | accuracy_topk = calculate_topk_accuracy(supervised_predictions, batch_y, topk=self.topk) 183 | 184 | total_loss.backward() 185 | 186 | encoder_optimizer.step() 187 | decoder_optimizer.step() 188 | classifier.step() 189 | 190 | step += 1 191 | 192 | epoch_reconstruction_loss += reconstruction_loss.item() 193 | epoch_supervised_loss += supervised_loss.item() 194 | epoch_total_acc += accuracy 195 | epoch_total_acc_topk += accuracy_topk[0].item() 196 | 197 | if step % self.print_every == 0: 198 | self.print_step(step, reconstruction_loss, supervised_loss, accuracy, accuracy_topk) 199 | torch.cuda.empty_cache() 200 | 201 | epoch_total_loss = epoch_reconstruction_loss + epoch_supervised_loss 202 | return epoch_total_loss / len(self.train_iter), epoch_reconstruction_loss / len( 203 | self.train_iter), epoch_supervised_loss / len( 204 | self.train_iter), epoch_total_acc / len(self.train_iter), epoch_total_acc_topk / len(self.train_iter) 205 | 206 | def print_step(self, step, reconstruction_loss, supervised_loss, accuracy, accuracy_topk): 207 | print("Batch {}/{} - " 208 | "Batch Reconstruction Loss: {:.4f} - " 209 | "Batch Supervised Loss: {:.4f} - " 210 | "Batch Accuracy: {:.4f} - " 211 | "Batch Accuracy Top-{} {:.4f}".format(step, 212 | len(self.train_iter), 213 | reconstruction_loss.item(), 214 | supervised_loss.item(), 215 | accuracy, 216 | self.topk[0], 217 | accuracy_topk[0].item())) 218 | 219 | def print_epoch(self, start, e, reconst_loss, supervised_loss, total_loss, accuracy, accuracy_topk): 220 | print("{} - " 221 | "Epoch {}/{} - " 222 | "Reconstruction Loss: {:.4f} - " 223 | "Supervised Loss: {:.4f} - " 224 | "Loss: {:.4f} - " 225 | "Accuracy: {:.4f} - " 226 | "Accuracy Top-{}: {:.4f}".format(time_since(start, e / self.epoch), 227 | e, 228 | self.epoch, 229 | reconst_loss, 230 | supervised_loss, 231 | total_loss, 232 | accuracy, 233 | self.topk[0], 234 | accuracy_topk)) 235 | 236 | def print_validation(self, vali_loss, best_vali_loss, vali_accuracy, best_vali_acc, vali_accuracy_topk, 237 | best_vali_acc_topk): 238 | print( 239 | "Validation Loss: {:.4f} (Best: {:.4f}) - " 240 | "Validation Accuracy: {:.4f} (Best: {:.4f}) - " 241 | "Validation Accuracy Top-{}: {:.4f} (Best: {:.4f})".format(vali_loss, 242 | best_vali_loss, 243 | vali_accuracy, 244 | best_vali_acc, 245 | self.topk[0], 246 | vali_accuracy_topk, 247 | best_vali_acc_topk)) 248 | 249 | def print_test(self, test_loss, test_accuracy, test_accuracy_topk): 250 | print("Test Loss: {:.4f} - " 251 | "Test Accuracy: {:.4f} - " 252 | "Test Accuracy Top-{}: {:.4f}".format(test_loss, 253 | test_accuracy, 254 | self.topk[0], 255 | test_accuracy_topk)) 256 | -------------------------------------------------------------------------------- /training/single_model_ner_trainer.py: -------------------------------------------------------------------------------- 1 | import logging.config 2 | import os 3 | import time 4 | 5 | import torch 6 | import torch.nn as nn 7 | 8 | from evaluation.evaluator import Evaluator 9 | from models.GRU import GRU 10 | from models.LSTM import LSTMBase 11 | from scorer.ner_scorer import NerScorer 12 | from training.single_model_trainer import SingleModelTrainer 13 | from utils.utils import time_since, save_best_model 14 | 15 | logging.config.fileConfig(fname='./config/config.logger', disable_existing_loggers=False) 16 | logger = logging.getLogger("Trainer") 17 | 18 | 19 | class SingleModelNerTrainer(SingleModelTrainer): 20 | def __init__(self, training_properties, datasetloader, device): 21 | super(SingleModelNerTrainer, self).__init__(training_properties, datasetloader, device) 22 | 23 | self.scorer = NerScorer(datasetloader.ner_vocab) 24 | self.dev_evaluator, self.test_evaluator = Evaluator().evaluator_factory("single_model_ner_evaluator", 25 | self.device) 26 | 27 | def train_iters(self, model, checkpoint=None): 28 | optimizer = self.init_optimizer(model) 29 | 30 | start = time.time() 31 | old_path = None 32 | best_vali_f1 = -1 33 | best_vali_token_acc = -1 34 | start_epoch = 1 35 | 36 | if checkpoint is not None: 37 | model.load(checkpoint["model"]) 38 | if self.optimizer_type == "Noam": 39 | optimizer.optimizer.load_state_dict(checkpoint["optimizer_state_dict"]) 40 | else: 41 | optimizer.load_state_dict(checkpoint["optimizer_state_dict"]) 42 | start_epoch = checkpoint["epoch"] + 1 43 | best_vali_f1 = checkpoint["best_vali_f1"] 44 | best_vali_token_acc = checkpoint["best_vali_token_acc"] 45 | 46 | del checkpoint 47 | torch.cuda.empty_cache() 48 | 49 | logger.info("Training...") 50 | for e in range(start_epoch, self.epoch + 1): 51 | total_loss, train_f1 = self.train(model=model, 52 | optimizer=optimizer, 53 | scheduler=None) 54 | 55 | self.print_epoch(start, e, total_loss, train_f1) 56 | 57 | if e % self.eval_every == 0: 58 | vali_f1, vali_precision, vali_recall, vali_token_acc = self.dev_evaluator.evaluate_iter(model=model, 59 | input=self.dev_iter, 60 | save_path=self.save_path, 61 | scorer=self.scorer) 62 | if best_vali_f1 < vali_f1: 63 | best_vali_token_acc = vali_token_acc 64 | best_vali_f1 = vali_f1 65 | save_best_model(model, self.save_path) 66 | 67 | self.print_validation(vali_token_acc, best_vali_token_acc, vali_f1, best_vali_f1) 68 | 69 | if e % self.save_every == 0: 70 | filename = "saved_model_step{}.pt".format(e) 71 | out_path = os.path.abspath(os.path.join(self.save_path, filename)) 72 | if old_path is not None: 73 | os.remove(old_path) 74 | if self.optimizer_type == "Noam": 75 | torch.save({ 76 | "epoch": e, 77 | "best_vali_f1": best_vali_f1, 78 | "best_vali_token_acc": best_vali_token_acc, 79 | 'model': model, 80 | 'optimizer_state_dict': optimizer.optimizer.state_dict(), 81 | }, out_path) 82 | else: 83 | torch.save({ 84 | "epoch": e, 85 | "best_vali_f1": best_vali_f1, 86 | "best_vali_token_acc": best_vali_token_acc, 87 | 'model': model, 88 | 'optimizer_state_dict': optimizer.state_dict(), 89 | }, out_path) 90 | old_path = out_path 91 | 92 | test_f1, test_precision, test_recall, test_token_acc = self.test_evaluator.evaluate_iter(model=model, 93 | input=self.test_iter, 94 | save_path=self.save_path, 95 | scorer=self.scorer) 96 | 97 | self.print_test(test_token_acc, test_f1) 98 | 99 | def train(self, model, optimizer, scheduler): 100 | epoch_loss = 0 101 | epoch_total_f1 = 0 102 | step = 1 103 | model.train() 104 | 105 | for batch in self.train_iter: 106 | if self.optimizer_type == "Noam": 107 | optimizer.optimizer.zero_grad() 108 | else: 109 | optimizer.zero_grad() 110 | 111 | batch_x = batch.sentence.to(self.device) 112 | batch_y = batch.ner_labels.to(self.device, non_blocking=True) 113 | 114 | if isinstance(model, GRU) or isinstance(model, LSTMBase): 115 | model.hidden = model.init_hidden(batch_x.size(1)) 116 | 117 | try: 118 | loss, kl_loss = model(batch_x, batch_y) 119 | # f1_score = calculate_accuracy(predictions, batch_y) 120 | 121 | loss.backward() 122 | 123 | if 0.0 < self.norm_ratio: 124 | nn.utils.clip_grad_norm_(model.parameters(), self.norm_ratio) 125 | 126 | if self.optimizer_type == "Noam": 127 | optimizer.optimizer.step() 128 | else: 129 | optimizer.step() 130 | 131 | if scheduler is not None and step % 500 == 0: 132 | scheduler.step(step) 133 | 134 | step += 1 135 | 136 | epoch_loss += loss.item() 137 | # epoch_total_f1 += f1_score 138 | 139 | if step % self.print_every == 0: 140 | self.print_step(step, loss.item(), 0) 141 | 142 | torch.cuda.empty_cache() 143 | 144 | except RuntimeError as e: 145 | if 'out of memory' in str(e): 146 | logger.warning('Ran out of memory, skipping batch %d', step) 147 | optimizer.zero_grad() 148 | torch.cuda.empty_cache() 149 | else: 150 | raise e 151 | 152 | return epoch_loss / len(self.train_iter), epoch_total_f1 / len(self.train_iter) 153 | 154 | def print_step(self, step, loss, f1_score): 155 | logger.info("Batch {}/{} - " 156 | "Batch Loss: {:.4f} - " 157 | "Batch F1: {:.4f}".format(step, 158 | len(self.train_iter), 159 | loss, 160 | f1_score)) 161 | 162 | def print_epoch(self, start, e, total_loss, train_f1): 163 | logger.info("{} - " 164 | "Epoch {}/{} - " 165 | "Loss: {:.4f} - " 166 | "F1-Score: {:.4f}".format(time_since(start, e / self.epoch), 167 | e, 168 | self.epoch, 169 | total_loss, 170 | train_f1)) 171 | 172 | def print_validation(self, vali_f1, best_vali_f1, vali_token_acc, best_vali_token_acc): 173 | logger.info("Validation F1: {:.4f} (Best: {:.4f}) - " 174 | "Validation Token Level Accuracy: {:.4f} (Best: {:.4f}) - ".format(vali_token_acc, 175 | best_vali_token_acc, 176 | vali_f1, 177 | best_vali_f1)) 178 | 179 | def print_test(self, test_token_acc, test_f1): 180 | logger.info("Test F1: {:.4f} - " 181 | "Test Token Level Accuracy: {:.4f} - ".format(test_f1, 182 | test_token_acc)) 183 | -------------------------------------------------------------------------------- /training/single_model_trainer.py: -------------------------------------------------------------------------------- 1 | import logging.config 2 | import os 3 | import time 4 | 5 | import torch 6 | import torch.nn as nn 7 | import torch.optim as optim 8 | 9 | from adabound import adabound 10 | 11 | from custom_optimizer import OpenAIAdam, NoamOptimizer, Padam 12 | from evaluation.evaluator import Evaluator 13 | from models.GRU import GRU 14 | from models.LSTM import LSTMBase 15 | from utils.utils import time_since, save_best_model 16 | from scorer.accuracy_scorer import AccuracyScorer 17 | 18 | logging.config.fileConfig(fname='./config/config.logger', disable_existing_loggers=False) 19 | logger = logging.getLogger("Trainer") 20 | 21 | 22 | class SingleModelTrainer(object): 23 | def __init__(self, training_properties, datasetloader, device): 24 | self.task = training_properties["task"] 25 | self.optimizer_type = training_properties["optimizer"] 26 | self.learning_rate = training_properties["learning_rate"] 27 | self.weight_decay = training_properties["weight_decay"] 28 | self.momentum = training_properties["momentum"] 29 | self.norm_ratio = training_properties["norm_ratio"] 30 | self.epoch = training_properties["epoch"] 31 | self.topk = training_properties["topk"] 32 | self.print_every = training_properties["print_every_batch_step"] 33 | self.save_every = training_properties["save_every_epoch"] 34 | self.eval_every = training_properties["eval_every"] 35 | self.save_path = training_properties["save_path"] 36 | 37 | self.openAIAdamSchedulerType = training_properties["scheduler_type"] 38 | self.amsgrad = training_properties["amsgrad"] 39 | self.partial_adam = training_properties["partial_adam"] 40 | 41 | self.final_learning_rate = training_properties["final_learning_rate"] 42 | self.amsbound = training_properties["amsbound"] 43 | 44 | self.train_iter = datasetloader.train_iter 45 | self.dev_iter = datasetloader.val_iter 46 | self.test_iter = datasetloader.test_iter 47 | 48 | self.device = device 49 | 50 | self.scorer = AccuracyScorer(self.topk) 51 | self.dev_evaluator, self.test_evaluator = Evaluator().evaluator_factory("single_model_evaluator", self.device) 52 | 53 | def init_optimizer(self, model): 54 | logger.info("Optimizer type is %s!", self.optimizer_type) 55 | 56 | if self.optimizer_type == "Adam": 57 | return optim.Adam(model.parameters(), lr=self.learning_rate, weight_decay=self.weight_decay) 58 | elif self.optimizer_type == "SGD": 59 | return optim.SGD(model.parameters(), lr=self.learning_rate, weight_decay=self.weight_decay, 60 | momentum=self.momentum) 61 | elif self.optimizer_type == "OpenAIAdam": 62 | return OpenAIAdam(model.parameters(), lr=self.learning_rate, schedule=self.openAIAdamSchedulerType, 63 | warmup=0.002, t_total=len(self.train_iter) * self.epoch) 64 | elif self.optimizer_type == "Noam": 65 | return NoamOptimizer(300, 1, 400, 66 | optim.Adam(model.parameters(), lr=0, betas=(0.9, 0.98), eps=1e-9)) 67 | elif self.optimizer_type == "Padam": 68 | return Padam(model.parameters(), lr=self.learning_rate, amsgrad=self.amsgrad, partial=self.partial_adam, 69 | weight_decay=self.weight_decay) 70 | elif self.optimizer_type == "adabound": 71 | return adabound(model.parameters(), lr=self.learning_rate, final_lr=self.final_learning_rate, 72 | weight_decay=self.weight_decay, amsbound=self.amsbound) 73 | else: 74 | raise ValueError("Invalid optimizer type! Choose Adam, SGD, Padam, NoamOptimizer or OpenAIAdam!") 75 | 76 | def train_iters(self, model, checkpoint=None): 77 | optimizer = self.init_optimizer(model) 78 | 79 | # scheduler = lr_scheduler.StepLR(optimizer, step_size=3, gamma=0.1) 80 | 81 | if isinstance(model, GRU): 82 | criterion = nn.NLLLoss().to(self.device) 83 | else: 84 | criterion = nn.CrossEntropyLoss().to(self.device) 85 | 86 | start = time.time() 87 | old_path = None 88 | best_vali_acc = -1 89 | best_vali_loss = -1 90 | best_vali_acc_topk = -1 91 | start_epoch = 1 92 | 93 | if checkpoint is not None: 94 | model.load_state_dict(checkpoint["model_state_dict"]) 95 | if self.optimizer_type == "Noam": 96 | optimizer.optimizer.load_state_dict(checkpoint["optimizer_state_dict"]) 97 | else: 98 | optimizer.load_state_dict(checkpoint["optimizer_state_dict"]) 99 | start_epoch = checkpoint["epoch"] + 1 100 | best_vali_acc = checkpoint["best_vali_acc"] 101 | best_vali_loss = checkpoint["best_vali_loss"] 102 | best_vali_acc_topk = checkpoint["best_vali_acc_topk"] 103 | 104 | # Memory problems of PyTorch is giving me headaches... 105 | del checkpoint 106 | torch.cuda.empty_cache() 107 | 108 | logger.info("Training...") 109 | for e in range(start_epoch, self.epoch + 1): 110 | total_loss, cross_entropy_loss, kl_loss, accuracy, accuracy_topk = self.train(model=model, 111 | optimizer=optimizer, 112 | scheduler=None, 113 | criterion=criterion) 114 | 115 | self.print_epoch(start, e, cross_entropy_loss, kl_loss, total_loss, accuracy, accuracy_topk) 116 | 117 | if e % self.eval_every == 0: 118 | vali_loss, vali_accuracy, vali_accuracy_topk = self.dev_evaluator.evaluate_iter(model=model, 119 | input=self.dev_iter, 120 | criterion=criterion, 121 | save_path=self.save_path, 122 | scorer=self.scorer) 123 | if best_vali_acc < vali_accuracy: 124 | best_vali_loss = vali_loss 125 | best_vali_acc = vali_accuracy 126 | best_vali_acc_topk = vali_accuracy_topk 127 | save_best_model(model, self.save_path) 128 | 129 | self.print_validation(vali_loss, best_vali_loss, vali_accuracy, best_vali_acc, vali_accuracy_topk, 130 | best_vali_acc_topk) 131 | 132 | if e % self.save_every == 0: 133 | filename = "saved_model_step{}.pt".format(e) 134 | out_path = os.path.abspath(os.path.join(self.save_path, filename)) 135 | if old_path is not None: 136 | os.remove(old_path) 137 | if self.optimizer_type == "Noam": 138 | torch.save({ 139 | "epoch": e, 140 | "best_vali_acc": best_vali_acc, 141 | "best_vali_loss": best_vali_loss, 142 | "best_vali_acc_topk": best_vali_acc_topk, 143 | 'model_state_dict': model.state_dict(), 144 | 'optimizer_state_dict': optimizer.optimizer.state_dict(), 145 | }, out_path) 146 | else: 147 | torch.save({ 148 | "epoch": e, 149 | "best_vali_acc": best_vali_acc, 150 | "best_vali_loss": best_vali_loss, 151 | "best_vali_acc_topk": best_vali_acc_topk, 152 | 'model_state_dict': model.state_dict(), 153 | 'optimizer_state_dict': optimizer.state_dict(), 154 | }, out_path) 155 | 156 | old_path = out_path 157 | 158 | test_loss, test_accuracy, test_accuracy_topk = self.test_evaluator.evaluate_iter(model=model, 159 | input=self.test_iter, 160 | criterion=criterion, 161 | save_path=self.save_path, 162 | scorer=self.scorer) 163 | 164 | self.print_test(test_loss, test_accuracy, test_accuracy_topk) 165 | 166 | def train(self, model, optimizer, scheduler, criterion): 167 | epoch_loss = 0 168 | epoch_kl_loss = 0 169 | epoch_total_acc = 0 170 | epoch_total_acc_topk = 0 171 | step = 1 172 | model.train() 173 | 174 | for batch in self.train_iter: 175 | if self.optimizer_type == "Noam": 176 | optimizer.optimizer.zero_grad() 177 | else: 178 | optimizer.zero_grad() 179 | 180 | batch_x = batch.sentence.to(self.device) 181 | batch_y = batch.category_labels.to(self.device, non_blocking=True) 182 | 183 | if isinstance(model, GRU) or isinstance(model, LSTMBase): 184 | model.hidden = model.init_hidden(batch_x.size(1)) 185 | 186 | try: 187 | predictions, kl_loss = model(batch_x) 188 | 189 | loss = criterion(predictions, batch_y) 190 | accuracy = self.scorer.calculate_accuracy(predictions, batch_y) 191 | accuracy_topk = self.scorer.calculate_topk_accuracy(predictions, batch_y) 192 | 193 | total_loss = loss + kl_loss / 10 194 | 195 | total_loss.backward() 196 | 197 | if 0.0 < self.norm_ratio: 198 | nn.utils.clip_grad_norm_(model.parameters(), self.norm_ratio) 199 | 200 | if self.optimizer_type == "Noam": 201 | optimizer.optimizer.step() 202 | else: 203 | optimizer.step() 204 | 205 | if scheduler is not None and step % 500 == 0: 206 | scheduler.step(step) 207 | 208 | step += 1 209 | 210 | epoch_loss += loss.item() 211 | epoch_kl_loss += kl_loss.item() 212 | epoch_total_acc += accuracy 213 | epoch_total_acc_topk += accuracy_topk[0].item() 214 | 215 | if step % self.print_every == 0: 216 | self.print_step(step, loss.item(), kl_loss.item(), accuracy, accuracy_topk[0].item()) 217 | 218 | torch.cuda.empty_cache() 219 | 220 | except RuntimeError as e: 221 | if 'out of memory' in str(e): 222 | logger.warning('Ran out of memory, skipping batch %d', step) 223 | optimizer.zero_grad() 224 | torch.cuda.empty_cache() 225 | else: 226 | raise e 227 | 228 | epoch_total_loss = epoch_loss + epoch_kl_loss 229 | return epoch_total_loss / len(self.train_iter), epoch_loss / len(self.train_iter), epoch_kl_loss / len( 230 | self.train_iter), epoch_total_acc / len(self.train_iter), epoch_total_acc_topk / len(self.train_iter) 231 | 232 | def print_step(self, step, loss, kl_loss, accuracy, accuracy_topk): 233 | logger.info("Batch {}/{} - " 234 | "Batch Loss: {:.4f} - " 235 | "Batch KL Loss: {:.4f} - " 236 | "Batch Accuracy: {:.4f} - " 237 | "Batch Accuracy Top-{} {:.4f}".format(step, 238 | len(self.train_iter), 239 | loss, 240 | kl_loss, 241 | accuracy, 242 | self.topk[0], 243 | accuracy_topk)) 244 | 245 | def print_epoch(self, start, e, cross_entropy_loss, kl_loss, total_loss, accuracy, accuracy_topk): 246 | logger.info("{} - " 247 | "Epoch {}/{} - " 248 | "Cross Entropy Loss: {:.4f} - " 249 | "KL Loss: {:.4f} - " 250 | "Loss: {:.4f} - " 251 | "Accuracy: {:.4f} - " 252 | "Accuracy Top-{}: {:.4f}".format(time_since(start, e / self.epoch), 253 | e, 254 | self.epoch, 255 | cross_entropy_loss, 256 | kl_loss, 257 | total_loss, 258 | accuracy, 259 | self.topk[0], 260 | accuracy_topk)) 261 | 262 | def print_validation(self, vali_loss, best_vali_loss, vali_accuracy, best_vali_acc, vali_accuracy_topk, 263 | best_vali_acc_topk): 264 | logger.info("Validation Loss: {:.4f} (Best: {:.4f}) - " 265 | "Validation Accuracy: {:.4f} (Best: {:.4f}) - " 266 | "Validation Accuracy Top-{}: {:.4f} (Best: {:.4f})".format(vali_loss, 267 | best_vali_loss, 268 | vali_accuracy, 269 | best_vali_acc, 270 | self.topk[0], 271 | vali_accuracy_topk, 272 | best_vali_acc_topk)) 273 | 274 | def print_test(self, test_loss, test_accuracy, test_accuracy_topk): 275 | logger.info("Test Loss: {:.4f} - " 276 | "Test Accuracy: {:.4f} - " 277 | "Test Accuracy Top-{}: {:.4f}".format(test_loss, 278 | test_accuracy, 279 | self.topk[0], 280 | test_accuracy_topk)) 281 | -------------------------------------------------------------------------------- /training/trainer.py: -------------------------------------------------------------------------------- 1 | import logging.config 2 | 3 | from training.single_model_trainer import SingleModelTrainer 4 | from training.single_model_ner_trainer import SingleModelNerTrainer 5 | 6 | logging.config.fileConfig(fname='./config/config.logger', disable_existing_loggers=False) 7 | logger = logging.getLogger("Trainer") 8 | 9 | 10 | class Trainer(object): 11 | @staticmethod 12 | def trainer_factory(type, training_properties, datasetloader, device): 13 | if type == "single_model_trainer": 14 | logger.info("Trainer type is %s", type) 15 | return SingleModelTrainer(training_properties, datasetloader, device) 16 | elif type == "single_model_ner_trainer": 17 | logger.info("Trainer type is %s", type) 18 | return SingleModelNerTrainer(training_properties, datasetloader, device) 19 | else: 20 | ValueError("Unrecognized trainer type") 21 | -------------------------------------------------------------------------------- /utils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hbahadirsahin/nlp-experiments-in-pytorch/5d05a53146dffd707e4d037230656f980d7be05c/utils/__init__.py -------------------------------------------------------------------------------- /utils/utils.py: -------------------------------------------------------------------------------- 1 | import copy 2 | import math 3 | import os 4 | import pickle 5 | import time 6 | 7 | import numpy as np 8 | import torch 9 | import torch.nn as nn 10 | import torch.nn.functional as F 11 | 12 | 13 | def save_vocabulary(vocab, path): 14 | with open(path, 'wb') as fw: 15 | pickle.dump(vocab, fw, protocol=pickle.HIGHEST_PROTOCOL) 16 | 17 | 18 | def load_vocabulary(path): 19 | with open(path, 'rb') as f: 20 | return pickle.load(f) 21 | 22 | 23 | def save_best_model(model, path, filename="saved_best_model.pt"): 24 | out_path = os.path.abspath(os.path.join(path, filename)) 25 | torch.save(model, out_path) 26 | 27 | 28 | def load_best_model(path, filename="saved_best_model.pt"): 29 | out_path = os.path.abspath(os.path.join(path, filename)) 30 | model = torch.load(out_path) 31 | return model 32 | 33 | 34 | def as_minutes(s): 35 | m = math.floor(s / 60) 36 | s -= m * 60 37 | return "%dm %ds" % (m, s) 38 | 39 | 40 | def time_since(since, percent): 41 | now = time.time() 42 | s = now - since 43 | es = s / percent 44 | rs = es - s 45 | return "%s (- %s)" % (as_minutes(s), as_minutes(rs)) 46 | 47 | 48 | # Direct c/p from Pytorch/BiLSTM Tutorial 49 | def argmax(vec): 50 | # return the argmax as a python int 51 | _, idx = torch.max(vec, 1) 52 | return idx.item() 53 | 54 | 55 | def log_sum_exp(vec, dim=-1): 56 | max_score, _ = torch.max(vec, dim=dim) 57 | max_score_broadcast = max_score.unsqueeze(dim) 58 | return max_score + \ 59 | torch.log(torch.sum(torch.exp(vec - max_score_broadcast))) 60 | 61 | 62 | def scheduled_annealing_strategy(epoch, max_epoch, max=1.0, min=0.01, gain=0.3): 63 | upper_alpha = max - min 64 | lower_alpha = (1 + torch.exp(gain * (epoch - (max_epoch // 2)))) 65 | alpha = (upper_alpha / lower_alpha) + max 66 | return alpha 67 | 68 | 69 | def clones(module, num_of_clones): 70 | return nn.ModuleList([copy.deepcopy(module) for _ in range(num_of_clones)]) 71 | 72 | 73 | def subsequent_mask(size): 74 | # Mask out subsequent positions. It is to prevent positions from attenting to subsequent positions 75 | # For more detailed information: 76 | # The Annotated Transformer = https://nlp.seas.hardvard.edu/2018/04/03.attention.html 77 | sm = np.triu(np.ones((1, size, size)), k=1).astype("uint8") 78 | return torch.from_numpy(sm) == 0 79 | 80 | 81 | def gelu(x): 82 | # Gaussian Error Linear Unit 83 | # Ref: https://github.com/pytorch/pytorch/issues/20464 84 | return 0.5 * x * (1 + torch.tanh(math.sqrt(math.pi / 2) * (x + 0.044715 * x ** 3))) 85 | 86 | --------------------------------------------------------------------------------