├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── LICENSE ├── NOTICE ├── README.md └── code ├── README.md └── scripts ├── bert_alone.py ├── bert_joint.py ├── bert_mt.py ├── bert_soft_align.py ├── conlleval.pl ├── layers.py ├── loss.py ├── lstm_alone.py ├── lstm_joint.py ├── lstm_mt.py ├── translate_and_align.py └── utils.py /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | ## Code of Conduct 2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 4 | opensource-codeofconduct@amazon.com with any additional questions or comments. 5 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing Guidelines 2 | 3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional 4 | documentation, we greatly value feedback and contributions from our community. 5 | 6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary 7 | information to effectively respond to your bug report or contribution. 8 | 9 | 10 | ## Reporting Bugs/Feature Requests 11 | 12 | We welcome you to use the GitHub issue tracker to report bugs or suggest features. 13 | 14 | When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already 15 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful: 16 | 17 | * A reproducible test case or series of steps 18 | * The version of our code being used 19 | * Any modifications you've made relevant to the bug 20 | * Anything unusual about your environment or deployment 21 | 22 | 23 | ## Contributing via Pull Requests 24 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that: 25 | 26 | 1. You are working against the latest source on the *master* branch. 27 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already. 28 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted. 29 | 30 | To send us a pull request, please: 31 | 32 | 1. Fork the repository. 33 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change. 34 | 3. Ensure local tests pass. 35 | 4. Commit to your fork using clear commit messages. 36 | 5. Send us a pull request, answering any default questions in the pull request interface. 37 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation. 38 | 39 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and 40 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/). 41 | 42 | 43 | ## Finding contributions to work on 44 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start. 45 | 46 | 47 | ## Code of Conduct 48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 49 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 50 | opensource-codeofconduct@amazon.com with any additional questions or comments. 51 | 52 | 53 | ## Security issue notifications 54 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue. 55 | 56 | 57 | ## Licensing 58 | 59 | See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution. 60 | 61 | We may ask you to sign a [Contributor License Agreement (CLA)](http://en.wikipedia.org/wiki/Contributor_License_Agreement) for larger changes. 62 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | 2 | Apache License 3 | Version 2.0, January 2004 4 | http://www.apache.org/licenses/ 5 | 6 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 7 | 8 | 1. Definitions. 9 | 10 | "License" shall mean the terms and conditions for use, reproduction, 11 | and distribution as defined by Sections 1 through 9 of this document. 12 | 13 | "Licensor" shall mean the copyright owner or entity authorized by 14 | the copyright owner that is granting the License. 15 | 16 | "Legal Entity" shall mean the union of the acting entity and all 17 | other entities that control, are controlled by, or are under common 18 | control with that entity. For the purposes of this definition, 19 | "control" means (i) the power, direct or indirect, to cause the 20 | direction or management of such entity, whether by contract or 21 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 22 | outstanding shares, or (iii) beneficial ownership of such entity. 23 | 24 | "You" (or "Your") shall mean an individual or Legal Entity 25 | exercising permissions granted by this License. 26 | 27 | "Source" form shall mean the preferred form for making modifications, 28 | including but not limited to software source code, documentation 29 | source, and configuration files. 30 | 31 | "Object" form shall mean any form resulting from mechanical 32 | transformation or translation of a Source form, including but 33 | not limited to compiled object code, generated documentation, 34 | and conversions to other media types. 35 | 36 | "Work" shall mean the work of authorship, whether in Source or 37 | Object form, made available under the License, as indicated by a 38 | copyright notice that is included in or attached to the work 39 | (an example is provided in the Appendix below). 40 | 41 | "Derivative Works" shall mean any work, whether in Source or Object 42 | form, that is based on (or derived from) the Work and for which the 43 | editorial revisions, annotations, elaborations, or other modifications 44 | represent, as a whole, an original work of authorship. For the purposes 45 | of this License, Derivative Works shall not include works that remain 46 | separable from, or merely link (or bind by name) to the interfaces of, 47 | the Work and Derivative Works thereof. 48 | 49 | "Contribution" shall mean any work of authorship, including 50 | the original version of the Work and any modifications or additions 51 | to that Work or Derivative Works thereof, that is intentionally 52 | submitted to Licensor for inclusion in the Work by the copyright owner 53 | or by an individual or Legal Entity authorized to submit on behalf of 54 | the copyright owner. For the purposes of this definition, "submitted" 55 | means any form of electronic, verbal, or written communication sent 56 | to the Licensor or its representatives, including but not limited to 57 | communication on electronic mailing lists, source code control systems, 58 | and issue tracking systems that are managed by, or on behalf of, the 59 | Licensor for the purpose of discussing and improving the Work, but 60 | excluding communication that is conspicuously marked or otherwise 61 | designated in writing by the copyright owner as "Not a Contribution." 62 | 63 | "Contributor" shall mean Licensor and any individual or Legal Entity 64 | on behalf of whom a Contribution has been received by Licensor and 65 | subsequently incorporated within the Work. 66 | 67 | 2. Grant of Copyright License. Subject to the terms and conditions of 68 | this License, each Contributor hereby grants to You a perpetual, 69 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 70 | copyright license to reproduce, prepare Derivative Works of, 71 | publicly display, publicly perform, sublicense, and distribute the 72 | Work and such Derivative Works in Source or Object form. 73 | 74 | 3. Grant of Patent License. Subject to the terms and conditions of 75 | this License, each Contributor hereby grants to You a perpetual, 76 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 77 | (except as stated in this section) patent license to make, have made, 78 | use, offer to sell, sell, import, and otherwise transfer the Work, 79 | where such license applies only to those patent claims licensable 80 | by such Contributor that are necessarily infringed by their 81 | Contribution(s) alone or by combination of their Contribution(s) 82 | with the Work to which such Contribution(s) was submitted. If You 83 | institute patent litigation against any entity (including a 84 | cross-claim or counterclaim in a lawsuit) alleging that the Work 85 | or a Contribution incorporated within the Work constitutes direct 86 | or contributory patent infringement, then any patent licenses 87 | granted to You under this License for that Work shall terminate 88 | as of the date such litigation is filed. 89 | 90 | 4. Redistribution. You may reproduce and distribute copies of the 91 | Work or Derivative Works thereof in any medium, with or without 92 | modifications, and in Source or Object form, provided that You 93 | meet the following conditions: 94 | 95 | (a) You must give any other recipients of the Work or 96 | Derivative Works a copy of this License; and 97 | 98 | (b) You must cause any modified files to carry prominent notices 99 | stating that You changed the files; and 100 | 101 | (c) You must retain, in the Source form of any Derivative Works 102 | that You distribute, all copyright, patent, trademark, and 103 | attribution notices from the Source form of the Work, 104 | excluding those notices that do not pertain to any part of 105 | the Derivative Works; and 106 | 107 | (d) If the Work includes a "NOTICE" text file as part of its 108 | distribution, then any Derivative Works that You distribute must 109 | include a readable copy of the attribution notices contained 110 | within such NOTICE file, excluding those notices that do not 111 | pertain to any part of the Derivative Works, in at least one 112 | of the following places: within a NOTICE text file distributed 113 | as part of the Derivative Works; within the Source form or 114 | documentation, if provided along with the Derivative Works; or, 115 | within a display generated by the Derivative Works, if and 116 | wherever such third-party notices normally appear. The contents 117 | of the NOTICE file are for informational purposes only and 118 | do not modify the License. You may add Your own attribution 119 | notices within Derivative Works that You distribute, alongside 120 | or as an addendum to the NOTICE text from the Work, provided 121 | that such additional attribution notices cannot be construed 122 | as modifying the License. 123 | 124 | You may add Your own copyright statement to Your modifications and 125 | may provide additional or different license terms and conditions 126 | for use, reproduction, or distribution of Your modifications, or 127 | for any such Derivative Works as a whole, provided Your use, 128 | reproduction, and distribution of the Work otherwise complies with 129 | the conditions stated in this License. 130 | 131 | 5. Submission of Contributions. Unless You explicitly state otherwise, 132 | any Contribution intentionally submitted for inclusion in the Work 133 | by You to the Licensor shall be under the terms and conditions of 134 | this License, without any additional terms or conditions. 135 | Notwithstanding the above, nothing herein shall supersede or modify 136 | the terms of any separate license agreement you may have executed 137 | with Licensor regarding such Contributions. 138 | 139 | 6. Trademarks. This License does not grant permission to use the trade 140 | names, trademarks, service marks, or product names of the Licensor, 141 | except as required for reasonable and customary use in describing the 142 | origin of the Work and reproducing the content of the NOTICE file. 143 | 144 | 7. Disclaimer of Warranty. Unless required by applicable law or 145 | agreed to in writing, Licensor provides the Work (and each 146 | Contributor provides its Contributions) on an "AS IS" BASIS, 147 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 148 | implied, including, without limitation, any warranties or conditions 149 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 150 | PARTICULAR PURPOSE. You are solely responsible for determining the 151 | appropriateness of using or redistributing the Work and assume any 152 | risks associated with Your exercise of permissions under this License. 153 | 154 | 8. Limitation of Liability. In no event and under no legal theory, 155 | whether in tort (including negligence), contract, or otherwise, 156 | unless required by applicable law (such as deliberate and grossly 157 | negligent acts) or agreed to in writing, shall any Contributor be 158 | liable to You for damages, including any direct, indirect, special, 159 | incidental, or consequential damages of any character arising as a 160 | result of this License or out of the use or inability to use the 161 | Work (including but not limited to damages for loss of goodwill, 162 | work stoppage, computer failure or malfunction, or any and all 163 | other commercial damages or losses), even if such Contributor 164 | has been advised of the possibility of such damages. 165 | 166 | 9. Accepting Warranty or Additional Liability. While redistributing 167 | the Work or Derivative Works thereof, You may choose to offer, 168 | and charge a fee for, acceptance of support, warranty, indemnity, 169 | or other liability obligations and/or rights consistent with this 170 | License. However, in accepting such obligations, You may act only 171 | on Your own behalf and on Your sole responsibility, not on behalf 172 | of any other Contributor, and only if You agree to indemnify, 173 | defend, and hold each Contributor harmless for any liability 174 | incurred by, or claims asserted against, such Contributor by reason 175 | of your accepting any such warranty or additional liability. 176 | -------------------------------------------------------------------------------- /NOTICE: -------------------------------------------------------------------------------- 1 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## MultiAtis++ Corpus 2 | 3 | ### Description 4 | 5 | The ATIS (Air Travel Information Services) collection was developed to support the research and development of speech understanding systems [1]. The original English data includes intent and slot annotations, and was later extended to Hindi and Turkish [2]. MultiATIS++ futher extends ATIS to 6 more languages, and hence, covers a total of 9 languages, that is, English, Spanish, German, French, Portuguese, Chinese, Japanese, Hindi and Turkish. These locales belong to a diverse set of language families- Indo-European, Sino-Tibetan, Japonic and Altaic. 6 | 7 | MultiATIS++ corpus has been outsourced to foster further research in the domain of multilingual/cross-lingual natural language understanding. 8 | 9 | For more details, please check the paper: 10 | Xu, W., Haider, B. and Mansour, S., 2020. End-to-End Slot Alignment and Recognition for Cross-Lingual NLU. arXiv preprint arXiv:2004.14353 (https://arxiv.org/abs/2004.14353) 11 | 12 | ### Accessing MultiAtis++ 13 | 14 | To obtain a copy of *MutliAtis++* data, please visit: 15 | https://catalog.ldc.upenn.edu/LDC2021T04 16 | 17 | Please send your queries/comments to multiatis@amazon.com. 18 | 19 | ### Citation 20 | 21 | Please cite [3] when referring to the MultiATIS++ dataset. 22 | 23 | 24 | ## Soft-Align Implementation 25 | 26 | Implementation of the *soft-align* method introduced in [3] will be available here, soon. 27 | 28 | 29 | ## Security 30 | 31 | See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information. 32 | 33 | ## License 34 | 35 | This project is licensed under the Apache-2.0 License. 36 | 37 | ## References 38 | 39 | [1] LDC93S5 ATIS2, LDC94S19 ATIS3 Training Data, LDC95S26 ATIS3 Test Data 40 | 41 | [2] Shyam Upadhyay, Manaal Faruqui, Gokhan Tur, Dilek Hakkani-Tur, Larry Heck. (Almost) Zero-Shot Cross-Lingual Spoken Language Understanding. IEEE ICASSP 2018. 42 | 43 | [3] Weijia Xu, Batool Haider, Saab Mansour. 2020. End-to-End Slot Alignment and Recognition for Cross-Lingual NLU. arXiv preprint arXiv:2004.14353. 44 | -------------------------------------------------------------------------------- /code/README.md: -------------------------------------------------------------------------------- 1 | ### Environment 2 | ``` 3 | pip install numpy scipy scikit-learn 4 | pip install --upgrade mxnet>=1.6.0 5 | pip install gluonnlp 6 | ``` 7 | 8 | ### Data 9 | + Multilingual ATIS dataset in EN, ES, DE, ZH, JA, PT, FR, HI, and TR. 10 | 11 | ### Preparation 12 | + Download BERT related [codes](https://gluon-nlp.mxnet.io/model_zoo/bert/index.html). 13 | + Decompress it and place the folder in `code`. 14 | + Install [fast-align](https://github.com/clab/fast_align). 15 | 16 | ### Run 17 | + For supervised experiments, run `python lstm_alone.py $seed` or `python bert_alone.py $seed` to train the biLSTM/BERT supervised model (`$seed` is a random seed number). 18 | + For multilingual experiments, run `python lstm_joint.py $seed` or `python bert_joint.py $seed` to train the biLSTM/BERT multilingual model. 19 | + For cross-lingual transfer using *MT+fast-align*, first run `python translate_and_align.py $lang` to translate the English utterances to the target language `$lang` and project the slot labels using fast-align. And then, run `python lstm_mt.py $seed` or `python bert_mt.py $seed` to train the biLSTM/BERT model. 20 | + For cross-lingual transfer using *MT+soft-align*, first run `python translate_and_align.py $lang` to translate the English utterances to the target language `$lang`. And then, run `python bert_soft_align.py $seed` to train the soft-alignment model. 21 | -------------------------------------------------------------------------------- /code/scripts/bert_alone.py: -------------------------------------------------------------------------------- 1 | import collections 2 | import gluonnlp as nlp 3 | import logging 4 | import mxnet as mx 5 | import os 6 | import sklearn.metrics 7 | import subprocess 8 | import sys 9 | import time 10 | import warnings 11 | 12 | from bert import * 13 | from mxnet import gluon 14 | from mxnet.gluon import Block, nn, rnn 15 | 16 | from loss import * 17 | from utils import * 18 | 19 | random_seed = int(sys.argv[1]) 20 | warnings.filterwarnings('ignore') 21 | data_dir = "../data/" 22 | model_dir = "../exp/" 23 | conll_prediction_file = data_dir + "conll.pred" 24 | 25 | PAD = '[PAD]' 26 | mx.random.seed(random_seed) 27 | ctx = [mx.gpu(0), mx.gpu(1)] 28 | 29 | log = logging.getLogger('gluonnlp') 30 | log.setLevel(logging.DEBUG) 31 | formatter = logging.Formatter(fmt='[%(levelname)s] %(name)s:%(asctime)s %(message)s', datefmt='%H:%M:%S') 32 | fh = logging.FileHandler(os.path.join(model_dir, 'bert.' + str(random_seed) + '.log'), mode='w') 33 | fh.setLevel(logging.INFO) 34 | fh.setFormatter(formatter) 35 | console = logging.StreamHandler() 36 | console.setLevel(logging.INFO) 37 | console.setFormatter(formatter) 38 | log.addHandler(console) 39 | log.addHandler(fh) 40 | 41 | class BERTForICSL(Block): 42 | """Model for IC/SL task. 43 | 44 | The model feeds token ids into BERT to get the sequence 45 | representations, then apply two dense layers for IC/SL task. 46 | """ 47 | 48 | def __init__(self, bert, num_slot_labels, num_intents, hidden_size=768, dropout=.1, prefix=None, params=None): 49 | super(BERTForICSL, self).__init__(prefix=prefix, params=params) 50 | self.bert = bert 51 | with self.name_scope(): 52 | self.dropout = nn.Dropout(rate=dropout) 53 | # IC/SL classifier 54 | self.slot_classifier = nn.Dense(units=num_slot_labels, 55 | in_units=hidden_size, 56 | flatten=False) 57 | self.intent_classifier = nn.Dense(units=num_intents, 58 | in_units=hidden_size, 59 | flatten=False) 60 | 61 | def encode(self, inputs, valid_length): 62 | types = mx.nd.zeros_like(inputs) 63 | encoded = self.bert(inputs, types, valid_length) 64 | encoded = self.dropout(encoded) 65 | return encoded 66 | 67 | def forward(self, inputs, valid_length): # pylint: disable=arguments-differ 68 | """Generate unnormalized scores for the given input sequences. 69 | 70 | Parameters 71 | ---------- 72 | inputs : NDArray, shape (batch_size, seq_length) 73 | Input words for the sequences. 74 | valid_length : NDArray or None, shape (batch_size) 75 | Valid length of the sequence. This is used to mask the padded tokens. 76 | 77 | Returns 78 | ------- 79 | intent_prediction: NDArray 80 | Shape (batch_size, num_intents) 81 | slot_prediction : NDArray 82 | Shape (batch_size, seq_length, num_slot_labels) 83 | """ 84 | # hidden: (batch_size, seq_length, hidden_size) 85 | hidden = self.encode(inputs, valid_length) 86 | # get intent and slot label predictions 87 | intent_prediction = self.intent_classifier(hidden[:, 0, :]) 88 | slot_prediction = self.slot_classifier(hidden[:, 1:, :]) 89 | return intent_prediction, slot_prediction 90 | 91 | 92 | def train(model_name, train_input, dev_input): 93 | """Training function.""" 94 | ## Arguments 95 | log_interval = 100 96 | batch_size = 32 97 | lr = 1e-5 98 | optimizer = 'adam' 99 | accumulate = None 100 | epochs = 20 101 | 102 | ## Load BERT model and vocabulary 103 | bert, vocabulary = nlp.model.get_model('bert_12_768_12', 104 | dataset_name='wiki_multilingual_uncased', 105 | pretrained=True, 106 | ctx=ctx, 107 | use_pooler=False, 108 | use_decoder=False, 109 | use_classifier=False) 110 | 111 | model = BERTForICSL(bert, num_slot_labels=len(label2idx), num_intents=len(intent2idx)) 112 | model.initialize(init=mx.init.Uniform(0.1), ctx=ctx) 113 | model.hybridize(static_alloc=True) 114 | 115 | icsl_loss_function = ICSLLoss() 116 | icsl_loss_function.hybridize(static_alloc=True) 117 | 118 | ic_metric = mx.metric.Accuracy() 119 | sl_metric = mx.metric.Accuracy() 120 | 121 | ## Load labeled data 122 | field_separator = nlp.data.Splitter('\t') 123 | # fields to select from the file: utterance, slot labels, intent, uid 124 | field_indices = [1, 3, 4, 0] 125 | train_data = nlp.data.TSVDataset(filename=train_input, 126 | field_separator=field_separator, 127 | num_discard_samples=1, 128 | field_indices=field_indices) 129 | 130 | # use the vocabulary from pre-trained model for tokenization 131 | bert_tokenizer = nlp.data.BERTTokenizer(vocabulary, lower=True) 132 | train_data_transform = train_data.transform(fn=lambda x: icsl_transform(x, vocabulary, label2idx, intent2idx, bert_tokenizer)[0]) 133 | # create data loader 134 | pad_token_id = vocabulary[PAD] 135 | pad_label_id = label2idx[PAD] 136 | batchify_fn = nlp.data.batchify.Tuple( 137 | nlp.data.batchify.Stack(), 138 | nlp.data.batchify.Pad(axis=0, pad_val=pad_token_id), 139 | nlp.data.batchify.Pad(axis=0, pad_val=pad_label_id), 140 | nlp.data.batchify.Stack('float32'), 141 | nlp.data.batchify.Stack('float32')) 142 | train_sampler = nlp.data.FixedBucketSampler(lengths=[len(item[1]) for item in train_data_transform], 143 | batch_size=batch_size, 144 | shuffle=True) 145 | train_dataloader = mx.gluon.data.DataLoader(train_data_transform, 146 | batchify_fn=batchify_fn, 147 | batch_sampler=train_sampler) 148 | 149 | optimizer_params = {'learning_rate': lr} 150 | trainer = gluon.Trainer(model.collect_params(), optimizer, 151 | optimizer_params, update_on_kvstore=False) 152 | 153 | # Collect differentiable parameters 154 | params = [p for p in model.collect_params().values() if p.grad_req != 'null'] 155 | # Set grad_req if gradient accumulation is required 156 | if accumulate: 157 | for p in params: 158 | p[1].grad_req = 'add' 159 | # Fix BERT embeddings if required 160 | for p in model.collect_params().items(): 161 | if 'embed' in p[0]: 162 | p[1].grad_req = 'null' 163 | 164 | epoch_tic = time.time() 165 | total_num = 0 166 | log_num = 0 167 | best_score = (0, 0) 168 | for epoch_id in range(epochs): 169 | step_loss = 0 170 | tic = time.time() 171 | # train on labeled data 172 | for batch_id, data in enumerate(train_dataloader): 173 | # forward and backward 174 | with mx.autograd.record(): 175 | if data[0].shape[0] < len(ctx): 176 | data = split_and_load(data, [ctx[0]]) 177 | else: 178 | data = split_and_load(data, ctx) 179 | for chunk in data: 180 | _, token_ids, slot_label, intent_label, valid_length = chunk 181 | 182 | log_num += len(token_ids) 183 | total_num += len(token_ids) 184 | 185 | # forward computation 186 | intent_pred, slot_pred = model(token_ids, valid_length) 187 | ls = icsl_loss_function(intent_pred, slot_pred, intent_label, slot_label, valid_length - 2).mean() 188 | 189 | if accumulate: 190 | ls = ls / accumulate 191 | ls.backward() 192 | step_loss += ls.asscalar() 193 | 194 | # update 195 | if not accumulate or (batch_id + 1) % accumulate == 0: 196 | trainer.allreduce_grads() 197 | nlp.utils.clip_grad_global_norm(params, 1) 198 | trainer.update(1, ignore_stale_grad=True) 199 | 200 | if (batch_id + 1) % log_interval == 0: 201 | toc = time.time() 202 | # update metrics 203 | ic_metric.update([intent_label], [intent_pred]) 204 | sl_metric.update(*process_seq_labels(slot_label, slot_pred, ignore_id=pad_label_id)) 205 | log.info('Epoch: {}, Batch: {}/{}, speed: {:.2f} samples/s, lr={:.7f}, loss={:.4f}, intent acc={:.3f}, slot acc={:.3f}' 206 | .format(epoch_id, 207 | batch_id, 208 | len(train_dataloader), 209 | log_num / (toc - tic), 210 | trainer.learning_rate, 211 | step_loss / log_interval, 212 | ic_metric.get()[1], 213 | sl_metric.get()[1])) 214 | tic = time.time() 215 | step_loss = 0 216 | log_num = 0 217 | 218 | mx.nd.waitall() 219 | epoch_toc = time.time() 220 | log.info('Time cost: {:.2f} s, Speed: {:.2f} samples/s' 221 | .format(epoch_toc - epoch_tic, total_num/(epoch_toc - epoch_tic))) 222 | # evaluate on development set 223 | log.info('Evaluate on development set:') 224 | intent_acc, slot_f1 = evaluate(model=model, eval_input=dev_input) 225 | if slot_f1 > best_score[1]: 226 | best_score = (intent_acc, slot_f1) 227 | model.save_parameters(os.path.join(model_dir, model_name + '.params')) 228 | 229 | 230 | def evaluate(model=None, model_name='', eval_input=''): 231 | """Evaluate the model on validation dataset. 232 | """ 233 | ## Load model 234 | bert, vocabulary = nlp.model.get_model('bert_12_768_12', 235 | dataset_name='wiki_multilingual_uncased', 236 | pretrained=True, 237 | ctx=ctx, 238 | use_pooler=False, 239 | use_decoder=False, 240 | use_classifier=False) 241 | if model is None: 242 | assert model_name != '' 243 | model = BERTForICSL(bert, num_slot_labels=len(label2idx), num_intents=len(intent2idx)) 244 | model.initialize(ctx=ctx) 245 | model.hybridize(static_alloc=True) 246 | model.load_parameters(os.path.join(model_dir, model_name + '.params')) 247 | 248 | idx2label = {} 249 | for label, idx in label2idx.items(): 250 | idx2label[idx] = label 251 | ## Load dev dataset 252 | field_separator = nlp.data.Splitter('\t') 253 | field_indices = [1, 3, 4, 0] 254 | eval_data = nlp.data.TSVDataset(filename=eval_input, 255 | field_separator=field_separator, 256 | num_discard_samples=1, 257 | field_indices=field_indices) 258 | 259 | bert_tokenizer = nlp.data.BERTTokenizer(vocabulary, lower=True) 260 | 261 | dev_alignment = {} 262 | eval_data_transform = [] 263 | for sample in eval_data: 264 | sample, alignment = icsl_transform(sample, vocabulary, label2idx, intent2idx, bert_tokenizer) 265 | eval_data_transform += [sample] 266 | dev_alignment[sample[0]] = alignment 267 | log.info('The number of examples after preprocessing: {}' 268 | .format(len(eval_data_transform))) 269 | 270 | test_batch_size = 16 271 | pad_token_id = vocabulary[PAD] 272 | pad_label_id = label2idx[PAD] 273 | batchify_fn = nlp.data.batchify.Tuple( 274 | nlp.data.batchify.Stack(), 275 | nlp.data.batchify.Pad(axis=0, pad_val=pad_token_id), 276 | nlp.data.batchify.Pad(axis=0, pad_val=pad_label_id), 277 | nlp.data.batchify.Stack('float32'), 278 | nlp.data.batchify.Stack('float32')) 279 | eval_dataloader = mx.gluon.data.DataLoader( 280 | eval_data_transform, 281 | batchify_fn=batchify_fn, 282 | num_workers=4, batch_size=test_batch_size, shuffle=False, last_batch='keep') 283 | 284 | _Result = collections.namedtuple( 285 | '_Result', ['intent', 'slot_labels']) 286 | all_results = {} 287 | 288 | total_num = 0 289 | for data in eval_dataloader: 290 | example_ids, token_ids, _, _, valid_length = data 291 | total_num += len(token_ids) 292 | # load data to GPU 293 | token_ids = token_ids.astype('float32').as_in_context(ctx[0]) 294 | valid_length = valid_length.astype('float32').as_in_context(ctx[0]) 295 | 296 | # forward computation 297 | intent_pred, slot_pred = model(token_ids, valid_length) 298 | intent_pred = intent_pred.asnumpy() 299 | slot_pred = slot_pred.asnumpy() 300 | valid_length = valid_length.asnumpy() 301 | 302 | for eid, y_intent, y_slot, length in zip(example_ids, intent_pred, slot_pred, valid_length): 303 | eid = eid.asscalar() 304 | length = int(length) - 2 305 | intent_id = y_intent.argmax(axis=-1) 306 | slot_ids = y_slot.argmax(axis=-1).tolist()[:length] 307 | slot_names = [idx2label[idx] for idx in slot_ids] 308 | merged_slot_names = merge_slots(slot_names, dev_alignment[eid] + [length]) 309 | if eid not in all_results: 310 | all_results[eid] = _Result(intent_id, merged_slot_names) 311 | 312 | example_ids, utterances, labels, intents = load_tsv(eval_input) 313 | pred_intents = [] 314 | label_intents = [] 315 | for eid, intent in zip(example_ids, intents): 316 | label_intents.append(label2index(intent2idx, intent)) 317 | pred_intents.append(all_results[eid].intent) 318 | intent_acc = sklearn.metrics.accuracy_score(label_intents, pred_intents) 319 | log.info("Intent Accuracy: %.4f" % intent_acc) 320 | 321 | pred_icsl = [] 322 | label_icsl = [] 323 | for eid, intent, slot_labels in zip(example_ids, intents, labels): 324 | label_icsl.append(str(label2index(intent2idx, intent)) + ' ' + ' '.join(slot_labels)) 325 | pred_icsl.append(str(all_results[eid].intent) + ' ' + ' '.join(all_results[eid].slot_labels)) 326 | exact_match = sklearn.metrics.accuracy_score(label_icsl, pred_icsl) 327 | log.info("Exact Match: %.4f" % exact_match) 328 | 329 | with open(conll_prediction_file, "w") as fw: 330 | for eid, utterance, labels in zip(example_ids, utterances, labels): 331 | preds = all_results[eid].slot_labels 332 | for w, l, p in zip(utterance, labels, preds): 333 | fw.write(' '.join([w, l, p]) + '\n') 334 | fw.write('\n') 335 | proc = subprocess.Popen(["perl", "conlleval.pl"], stdin=subprocess.PIPE, stdout=subprocess.PIPE) 336 | with open(conll_prediction_file) as f: 337 | stdout = proc.communicate(f.read().encode())[0] 338 | result = stdout.decode('utf-8').split('\n')[1] 339 | slot_f1 = float(result.split()[-1].strip()) 340 | log.info("Slot Labeling: %s" % result) 341 | return intent_acc, slot_f1 342 | 343 | # extract labels 344 | train_input = data_dir + 'atis_train.tsv' 345 | intent2idx, label2idx = get_label_indices(train_input) 346 | 347 | for lang in ['EN', 'ES', 'DE', 'ZH', 'JA', 'PT', 'FR', 'HI', 'TR']: 348 | log.info('Train on %s:' % lang) 349 | model_name = 'model_bert_' + lang + '.' + str(random_seed) 350 | if lang == 'EN': 351 | train_input = data_dir + 'atis_train.tsv' 352 | dev_input = data_dir + 'atis_dev.tsv' 353 | else: 354 | train_input = data_dir + 'atis_train_' + lang + '.tsv' 355 | dev_input = data_dir + 'atis_dev_' + lang + '.tsv' 356 | train(model_name, train_input, dev_input) 357 | 358 | log.info('==========Supervised learning==========') 359 | for lang in ['EN', 'ES', 'DE', 'ZH', 'JA', 'PT', 'FR', 'HI', 'TR']: 360 | log.info('Evaluate on %s:' % lang) 361 | model_name = 'model_bert_' + lang + '.' + str(random_seed) 362 | if lang == 'EN': 363 | test_input = data_dir + 'atis_test.tsv' 364 | else: 365 | test_input = data_dir + 'atis_test_' + lang + '.tsv' 366 | evaluate(model_name=model_name, eval_input=test_input) 367 | 368 | log.info('==========Transfer learning==========') 369 | src_lang = 'EN' 370 | for lang in ['EN', 'ES', 'DE', 'ZH', 'JA', 'PT', 'FR', 'HI', 'TR']: 371 | log.info('Evaluate on %s:' % lang) 372 | model_name = 'model_bert_' + src_lang + '.' + str(random_seed) 373 | if lang == 'EN': 374 | test_input = data_dir + 'atis_test.tsv' 375 | else: 376 | test_input = data_dir + 'atis_test_' + lang + '.tsv' 377 | evaluate(model_name=model_name, eval_input=test_input) 378 | -------------------------------------------------------------------------------- /code/scripts/bert_joint.py: -------------------------------------------------------------------------------- 1 | import collections 2 | import gluonnlp as nlp 3 | import logging 4 | import mxnet as mx 5 | import os 6 | import sklearn.metrics 7 | import subprocess 8 | import sys 9 | import time 10 | import warnings 11 | 12 | from bert import * 13 | from mxnet import gluon 14 | from mxnet.gluon import Block, nn, rnn 15 | 16 | from loss import * 17 | from utils import * 18 | 19 | random_seed = int(sys.argv[1]) 20 | warnings.filterwarnings('ignore') 21 | data_dir = "../data/" 22 | model_dir = "../exp/" 23 | conll_prediction_file = data_dir + "conll.pred" 24 | 25 | PAD = '[PAD]' 26 | mx.random.seed(random_seed) 27 | ctx = [mx.gpu(0), mx.gpu(1)] 28 | 29 | log = logging.getLogger('gluonnlp') 30 | log.setLevel(logging.DEBUG) 31 | formatter = logging.Formatter(fmt='[%(levelname)s] %(name)s:%(asctime)s %(message)s', datefmt='%H:%M:%S') 32 | fh = logging.FileHandler(os.path.join(model_dir, 'bert_joint.' + str(random_seed) + '.log'), mode='w') 33 | fh.setLevel(logging.INFO) 34 | fh.setFormatter(formatter) 35 | console = logging.StreamHandler() 36 | console.setLevel(logging.INFO) 37 | console.setFormatter(formatter) 38 | log.addHandler(console) 39 | log.addHandler(fh) 40 | 41 | class BERTForICSL(Block): 42 | """Model for IC/SL task. 43 | 44 | The model feeds token ids into BERT to get the sequence 45 | representations, then apply two dense layers for IC/SL task. 46 | """ 47 | 48 | def __init__(self, bert, num_slot_labels, num_intents, hidden_size=768, dropout=.1, prefix=None, params=None): 49 | super(BERTForICSL, self).__init__(prefix=prefix, params=params) 50 | self.bert = bert 51 | with self.name_scope(): 52 | self.dropout = nn.Dropout(rate=dropout) 53 | # IC/SL classifier 54 | self.slot_classifier = nn.Dense(units=num_slot_labels, 55 | in_units=hidden_size, 56 | flatten=False) 57 | self.intent_classifier = nn.Dense(units=num_intents, 58 | in_units=hidden_size, 59 | flatten=False) 60 | 61 | def encode(self, inputs, valid_length): 62 | types = mx.nd.zeros_like(inputs) 63 | encoded = self.bert(inputs, types, valid_length) 64 | encoded = self.dropout(encoded) 65 | return encoded 66 | 67 | def forward(self, inputs, valid_length): # pylint: disable=arguments-differ 68 | """Generate unnormalized scores for the given input sequences. 69 | 70 | Parameters 71 | ---------- 72 | inputs : NDArray, shape (batch_size, seq_length) 73 | Input words for the sequences. 74 | valid_length : NDArray or None, shape (batch_size) 75 | Valid length of the sequence. This is used to mask the padded tokens. 76 | 77 | Returns 78 | ------- 79 | intent_prediction: NDArray 80 | Shape (batch_size, num_intents) 81 | slot_prediction : NDArray 82 | Shape (batch_size, seq_length, num_slot_labels) 83 | """ 84 | # hidden: (batch_size, seq_length, hidden_size) 85 | hidden = self.encode(inputs, valid_length) 86 | # get intent and slot label predictions 87 | intent_prediction = self.intent_classifier(hidden[:, 0, :]) 88 | slot_prediction = self.slot_classifier(hidden[:, 1:, :]) 89 | return intent_prediction, slot_prediction 90 | 91 | 92 | def train(model_name, train_input, dev_input): 93 | """Training function.""" 94 | ## Arguments 95 | log_interval = 100 96 | batch_size = 32 97 | lr = 1e-5 98 | optimizer = 'adam' 99 | accumulate = None 100 | epochs = 20 101 | 102 | ## Load BERT model and vocabulary 103 | bert, vocabulary = nlp.model.get_model('bert_12_768_12', 104 | dataset_name='wiki_multilingual_uncased', 105 | pretrained=True, 106 | ctx=ctx, 107 | use_pooler=False, 108 | use_decoder=False, 109 | use_classifier=False) 110 | 111 | model = BERTForICSL(bert, num_slot_labels=len(label2idx), num_intents=len(intent2idx)) 112 | model.initialize(init=mx.init.Uniform(0.1), ctx=ctx) 113 | model.hybridize(static_alloc=True) 114 | 115 | icsl_loss_function = ICSLLoss() 116 | icsl_loss_function.hybridize(static_alloc=True) 117 | 118 | ic_metric = mx.metric.Accuracy() 119 | sl_metric = mx.metric.Accuracy() 120 | 121 | ## Load labeled data 122 | field_separator = nlp.data.Splitter('\t') 123 | # fields to select from the file: utterance, slot labels, intent, uid 124 | field_indices = [1, 3, 4, 0] 125 | train_data = nlp.data.TSVDataset(filename=train_input, 126 | field_separator=field_separator, 127 | num_discard_samples=1, 128 | field_indices=field_indices) 129 | 130 | # use the vocabulary from pre-trained model for tokenization 131 | bert_tokenizer = nlp.data.BERTTokenizer(vocabulary, lower=True) 132 | train_data_transform = train_data.transform(fn=lambda x: icsl_transform(x, vocabulary, label2idx, intent2idx, bert_tokenizer)[0]) 133 | # create data loader 134 | pad_token_id = vocabulary[PAD] 135 | pad_label_id = label2idx[PAD] 136 | batchify_fn = nlp.data.batchify.Tuple( 137 | nlp.data.batchify.Stack(), 138 | nlp.data.batchify.Pad(axis=0, pad_val=pad_token_id), 139 | nlp.data.batchify.Pad(axis=0, pad_val=pad_label_id), 140 | nlp.data.batchify.Stack('float32'), 141 | nlp.data.batchify.Stack('float32')) 142 | train_sampler = nlp.data.FixedBucketSampler(lengths=[len(item[1]) for item in train_data_transform], 143 | batch_size=batch_size, 144 | shuffle=True) 145 | train_dataloader = mx.gluon.data.DataLoader(train_data_transform, 146 | batchify_fn=batchify_fn, 147 | batch_sampler=train_sampler) 148 | 149 | optimizer_params = {'learning_rate': lr} 150 | trainer = gluon.Trainer(model.collect_params(), optimizer, 151 | optimizer_params, update_on_kvstore=False) 152 | 153 | # Collect differentiable parameters 154 | params = [p for p in model.collect_params().values() if p.grad_req != 'null'] 155 | # Set grad_req if gradient accumulation is required 156 | if accumulate: 157 | for p in params: 158 | p[1].grad_req = 'add' 159 | # Fix BERT embeddings if required 160 | for p in model.collect_params().items(): 161 | if 'embed' in p[0]: 162 | p[1].grad_req = 'null' 163 | 164 | epoch_tic = time.time() 165 | total_num = 0 166 | log_num = 0 167 | best_score = (0, 0) 168 | for epoch_id in range(epochs): 169 | step_loss = 0 170 | tic = time.time() 171 | # train on labeled data 172 | for batch_id, data in enumerate(train_dataloader): 173 | # forward and backward 174 | with mx.autograd.record(): 175 | if data[0].shape[0] < len(ctx): 176 | data = split_and_load(data, [ctx[0]]) 177 | else: 178 | data = split_and_load(data, ctx) 179 | for chunk in data: 180 | _, token_ids, slot_label, intent_label, valid_length = chunk 181 | 182 | log_num += len(token_ids) 183 | total_num += len(token_ids) 184 | 185 | # forward computation 186 | intent_pred, slot_pred = model(token_ids, valid_length) 187 | ls = icsl_loss_function(intent_pred, slot_pred, intent_label, slot_label, valid_length - 2).mean() 188 | 189 | if accumulate: 190 | ls = ls / accumulate 191 | ls.backward() 192 | step_loss += ls.asscalar() 193 | 194 | # update 195 | if not accumulate or (batch_id + 1) % accumulate == 0: 196 | trainer.allreduce_grads() 197 | nlp.utils.clip_grad_global_norm(params, 1) 198 | trainer.update(1, ignore_stale_grad=True) 199 | 200 | if (batch_id + 1) % log_interval == 0: 201 | toc = time.time() 202 | # update metrics 203 | ic_metric.update([intent_label], [intent_pred]) 204 | sl_metric.update(*process_seq_labels(slot_label, slot_pred, ignore_id=pad_label_id)) 205 | log.info('Epoch: {}, Batch: {}/{}, speed: {:.2f} samples/s, lr={:.7f}, loss={:.4f}, intent acc={:.3f}, slot acc={:.3f}' 206 | .format(epoch_id, 207 | batch_id, 208 | len(train_dataloader), 209 | log_num / (toc - tic), 210 | trainer.learning_rate, 211 | step_loss / log_interval, 212 | ic_metric.get()[1], 213 | sl_metric.get()[1])) 214 | tic = time.time() 215 | step_loss = 0 216 | log_num = 0 217 | 218 | mx.nd.waitall() 219 | epoch_toc = time.time() 220 | log.info('Time cost: {:.2f} s, Speed: {:.2f} samples/s' 221 | .format(epoch_toc - epoch_tic, total_num/(epoch_toc - epoch_tic))) 222 | # evaluate on development set 223 | log.info('Evaluate on development set:') 224 | intent_acc, slot_f1 = evaluate(model=model, eval_input=dev_input) 225 | if slot_f1 > best_score[1]: 226 | best_score = (intent_acc, slot_f1) 227 | model.save_parameters(os.path.join(model_dir, model_name + '.params')) 228 | 229 | 230 | def evaluate(model=None, model_name='', eval_input=''): 231 | """Evaluate the model on validation dataset. 232 | """ 233 | ## Load model 234 | bert, vocabulary = nlp.model.get_model('bert_12_768_12', 235 | dataset_name='wiki_multilingual_uncased', 236 | pretrained=True, 237 | ctx=ctx, 238 | use_pooler=False, 239 | use_decoder=False, 240 | use_classifier=False) 241 | if model is None: 242 | assert model_name != '' 243 | model = BERTForICSL(bert, num_slot_labels=len(label2idx), num_intents=len(intent2idx)) 244 | model.initialize(ctx=ctx) 245 | model.hybridize(static_alloc=True) 246 | model.load_parameters(os.path.join(model_dir, model_name + '.params')) 247 | 248 | idx2label = {} 249 | for label, idx in label2idx.items(): 250 | idx2label[idx] = label 251 | ## Load dev dataset 252 | field_separator = nlp.data.Splitter('\t') 253 | field_indices = [1, 3, 4, 0] 254 | eval_data = nlp.data.TSVDataset(filename=eval_input, 255 | field_separator=field_separator, 256 | num_discard_samples=1, 257 | field_indices=field_indices) 258 | 259 | bert_tokenizer = nlp.data.BERTTokenizer(vocabulary, lower=True) 260 | 261 | dev_alignment = {} 262 | eval_data_transform = [] 263 | for sample in eval_data: 264 | sample, alignment = icsl_transform(sample, vocabulary, label2idx, intent2idx, bert_tokenizer) 265 | eval_data_transform += [sample] 266 | dev_alignment[sample[0]] = alignment 267 | log.info('The number of examples after preprocessing: {}' 268 | .format(len(eval_data_transform))) 269 | 270 | test_batch_size = 16 271 | pad_token_id = vocabulary[PAD] 272 | pad_label_id = label2idx[PAD] 273 | batchify_fn = nlp.data.batchify.Tuple( 274 | nlp.data.batchify.Stack(), 275 | nlp.data.batchify.Pad(axis=0, pad_val=pad_token_id), 276 | nlp.data.batchify.Pad(axis=0, pad_val=pad_label_id), 277 | nlp.data.batchify.Stack('float32'), 278 | nlp.data.batchify.Stack('float32')) 279 | eval_dataloader = mx.gluon.data.DataLoader( 280 | eval_data_transform, 281 | batchify_fn=batchify_fn, 282 | num_workers=4, batch_size=test_batch_size, shuffle=False, last_batch='keep') 283 | 284 | _Result = collections.namedtuple( 285 | '_Result', ['intent', 'slot_labels']) 286 | all_results = {} 287 | 288 | total_num = 0 289 | for data in eval_dataloader: 290 | example_ids, token_ids, _, _, valid_length = data 291 | total_num += len(token_ids) 292 | # load data to GPU 293 | token_ids = token_ids.astype('float32').as_in_context(ctx[0]) 294 | valid_length = valid_length.astype('float32').as_in_context(ctx[0]) 295 | 296 | # forward computation 297 | intent_pred, slot_pred = model(token_ids, valid_length) 298 | intent_pred = intent_pred.asnumpy() 299 | slot_pred = slot_pred.asnumpy() 300 | valid_length = valid_length.asnumpy() 301 | 302 | for eid, y_intent, y_slot, length in zip(example_ids, intent_pred, slot_pred, valid_length): 303 | eid = eid.asscalar() 304 | length = int(length) - 2 305 | intent_id = y_intent.argmax(axis=-1) 306 | slot_ids = y_slot.argmax(axis=-1).tolist()[:length] 307 | slot_names = [idx2label[idx] for idx in slot_ids] 308 | merged_slot_names = merge_slots(slot_names, dev_alignment[eid] + [length]) 309 | if eid not in all_results: 310 | all_results[eid] = _Result(intent_id, merged_slot_names) 311 | 312 | example_ids, utterances, labels, intents = load_tsv(eval_input) 313 | pred_intents = [] 314 | label_intents = [] 315 | for eid, intent in zip(example_ids, intents): 316 | label_intents.append(label2index(intent2idx, intent)) 317 | pred_intents.append(all_results[eid].intent) 318 | intent_acc = sklearn.metrics.accuracy_score(label_intents, pred_intents) 319 | log.info("Intent Accuracy: %.4f" % intent_acc) 320 | 321 | pred_icsl = [] 322 | label_icsl = [] 323 | for eid, intent, slot_labels in zip(example_ids, intents, labels): 324 | label_icsl.append(str(label2index(intent2idx, intent)) + ' ' + ' '.join(slot_labels)) 325 | pred_icsl.append(str(all_results[eid].intent) + ' ' + ' '.join(all_results[eid].slot_labels)) 326 | exact_match = sklearn.metrics.accuracy_score(label_icsl, pred_icsl) 327 | log.info("Exact Match: %.4f" % exact_match) 328 | 329 | with open(conll_prediction_file, "w") as fw: 330 | for eid, utterance, labels in zip(example_ids, utterances, labels): 331 | preds = all_results[eid].slot_labels 332 | for w, l, p in zip(utterance, labels, preds): 333 | fw.write(' '.join([w, l, p]) + '\n') 334 | fw.write('\n') 335 | proc = subprocess.Popen(["perl", "conlleval.pl"], stdin=subprocess.PIPE, stdout=subprocess.PIPE) 336 | with open(conll_prediction_file) as f: 337 | stdout = proc.communicate(f.read().encode())[0] 338 | result = stdout.decode('utf-8').split('\n')[1] 339 | slot_f1 = float(result.split()[-1].strip()) 340 | log.info("Slot Labeling: %s" % result) 341 | return intent_acc, slot_f1 342 | 343 | # extract labels 344 | train_input = data_dir + 'atis_train.tsv' 345 | intent2idx, label2idx = get_label_indices(train_input) 346 | 347 | log.info('Train on all languages:') 348 | model_name = 'model_bert_joint.' + str(random_seed) 349 | train_input = data_dir + 'atis_train_all.tsv' 350 | dev_input = data_dir + 'atis_dev.tsv' 351 | train(model_name, train_input, dev_input) 352 | 353 | for lang in ['EN', 'ES', 'DE', 'ZH', 'JA', 'PT', 'FR', 'HI', 'TR']: 354 | log.info('Evaluate on %s:' % lang) 355 | model_name = 'model_bert_joint.' + str(random_seed) 356 | if lang == 'EN': 357 | test_input = data_dir + 'atis_test.tsv' 358 | else: 359 | test_input = data_dir + 'atis_test_' + lang + '.tsv' 360 | evaluate(model_name=model_name, eval_input=test_input) 361 | -------------------------------------------------------------------------------- /code/scripts/bert_mt.py: -------------------------------------------------------------------------------- 1 | import collections 2 | import gluonnlp as nlp 3 | import logging 4 | import mxnet as mx 5 | import os 6 | import sklearn.metrics 7 | import subprocess 8 | import sys 9 | import time 10 | import warnings 11 | 12 | from bert import * 13 | from mxnet import gluon 14 | from mxnet.gluon import Block, nn, rnn 15 | 16 | from loss import * 17 | from utils import * 18 | 19 | random_seed = int(sys.argv[1]) 20 | warnings.filterwarnings('ignore') 21 | data_dir = "../data/" 22 | model_dir = "../exp/" 23 | conll_prediction_file = data_dir + "conll.pred" 24 | 25 | PAD = '[PAD]' 26 | mx.random.seed(random_seed) 27 | ctx = [mx.gpu(0), mx.gpu(1)] 28 | 29 | log = logging.getLogger('gluonnlp') 30 | log.setLevel(logging.DEBUG) 31 | formatter = logging.Formatter(fmt='[%(levelname)s] %(name)s:%(asctime)s %(message)s', datefmt='%H:%M:%S') 32 | fh = logging.FileHandler(os.path.join(model_dir, 'bert_mt.' + str(random_seed) + '.log'), mode='w') 33 | fh.setLevel(logging.INFO) 34 | fh.setFormatter(formatter) 35 | console = logging.StreamHandler() 36 | console.setLevel(logging.INFO) 37 | console.setFormatter(formatter) 38 | log.addHandler(console) 39 | log.addHandler(fh) 40 | 41 | class BERTForICSL(Block): 42 | """Model for IC/SL task. 43 | 44 | The model feeds token ids into BERT to get the sequence 45 | representations, then apply two dense layers for IC/SL task. 46 | """ 47 | 48 | def __init__(self, bert, num_slot_labels, num_intents, hidden_size=768, dropout=.1, prefix=None, params=None): 49 | super(BERTForICSL, self).__init__(prefix=prefix, params=params) 50 | self.bert = bert 51 | with self.name_scope(): 52 | self.dropout = nn.Dropout(rate=dropout) 53 | # IC/SL classifier 54 | self.slot_classifier = nn.Dense(units=num_slot_labels, 55 | in_units=hidden_size, 56 | flatten=False) 57 | self.intent_classifier = nn.Dense(units=num_intents, 58 | in_units=hidden_size, 59 | flatten=False) 60 | 61 | def encode(self, inputs, valid_length): 62 | types = mx.nd.zeros_like(inputs) 63 | encoded = self.bert(inputs, types, valid_length) 64 | encoded = self.dropout(encoded) 65 | return encoded 66 | 67 | def forward(self, inputs, valid_length): # pylint: disable=arguments-differ 68 | """Generate unnormalized scores for the given input sequences. 69 | 70 | Parameters 71 | ---------- 72 | inputs : NDArray, shape (batch_size, seq_length) 73 | Input words for the sequences. 74 | valid_length : NDArray or None, shape (batch_size) 75 | Valid length of the sequence. This is used to mask the padded tokens. 76 | 77 | Returns 78 | ------- 79 | intent_prediction: NDArray 80 | Shape (batch_size, num_intents) 81 | slot_prediction : NDArray 82 | Shape (batch_size, seq_length, num_slot_labels) 83 | """ 84 | # hidden: (batch_size, seq_length, hidden_size) 85 | hidden = self.encode(inputs, valid_length) 86 | # get intent and slot label predictions 87 | intent_prediction = self.intent_classifier(hidden[:, 0, :]) 88 | slot_prediction = self.slot_classifier(hidden[:, 1:, :]) 89 | return intent_prediction, slot_prediction 90 | 91 | 92 | def train(model_name, train_input): 93 | """Training function.""" 94 | ## Arguments 95 | log_interval = 100 96 | batch_size = 32 97 | lr = 1e-5 98 | optimizer = 'adam' 99 | accumulate = None 100 | epochs = 20 101 | 102 | ## Load BERT model and vocabulary 103 | bert, vocabulary = nlp.model.get_model('bert_12_768_12', 104 | dataset_name='wiki_multilingual_uncased', 105 | pretrained=True, 106 | ctx=ctx, 107 | use_pooler=False, 108 | use_decoder=False, 109 | use_classifier=False) 110 | 111 | model = BERTForICSL(bert, num_slot_labels=len(label2idx), num_intents=len(intent2idx)) 112 | model.initialize(init=mx.init.Uniform(0.1), ctx=ctx) 113 | model.hybridize(static_alloc=True) 114 | 115 | icsl_loss_function = ICSLLoss() 116 | icsl_loss_function.hybridize(static_alloc=True) 117 | 118 | ic_metric = mx.metric.Accuracy() 119 | sl_metric = mx.metric.Accuracy() 120 | 121 | ## Load labeled data 122 | field_separator = nlp.data.Splitter('\t') 123 | # fields to select from the file: utterance, slot labels, intent, uid 124 | field_indices = [1, 3, 4, 0] 125 | train_data = nlp.data.TSVDataset(filename=train_input, 126 | field_separator=field_separator, 127 | num_discard_samples=1, 128 | field_indices=field_indices) 129 | 130 | # use the vocabulary from pre-trained model for tokenization 131 | bert_tokenizer = nlp.data.BERTTokenizer(vocabulary, lower=True) 132 | train_data_transform = train_data.transform(fn=lambda x: icsl_transform(x, vocabulary, label2idx, intent2idx, bert_tokenizer)[0]) 133 | # create data loader 134 | pad_token_id = vocabulary[PAD] 135 | pad_label_id = label2idx[PAD] 136 | batchify_fn = nlp.data.batchify.Tuple( 137 | nlp.data.batchify.Stack(), 138 | nlp.data.batchify.Pad(axis=0, pad_val=pad_token_id), 139 | nlp.data.batchify.Pad(axis=0, pad_val=pad_label_id), 140 | nlp.data.batchify.Stack('float32'), 141 | nlp.data.batchify.Stack('float32')) 142 | train_sampler = nlp.data.FixedBucketSampler(lengths=[len(item[1]) for item in train_data_transform], 143 | batch_size=batch_size, 144 | shuffle=True) 145 | train_dataloader = mx.gluon.data.DataLoader(train_data_transform, 146 | batchify_fn=batchify_fn, 147 | batch_sampler=train_sampler) 148 | 149 | optimizer_params = {'learning_rate': lr} 150 | trainer = gluon.Trainer(model.collect_params(), optimizer, 151 | optimizer_params, update_on_kvstore=False) 152 | 153 | # Collect differentiable parameters 154 | params = [p for p in model.collect_params().values() if p.grad_req != 'null'] 155 | # Set grad_req if gradient accumulation is required 156 | if accumulate: 157 | for p in params: 158 | p[1].grad_req = 'add' 159 | # Fix BERT embeddings if required 160 | for p in model.collect_params().items(): 161 | if 'embed' in p[0]: 162 | p[1].grad_req = 'null' 163 | 164 | epoch_tic = time.time() 165 | total_num = 0 166 | log_num = 0 167 | for epoch_id in range(epochs): 168 | step_loss = 0 169 | tic = time.time() 170 | # train on labeled data 171 | for batch_id, data in enumerate(train_dataloader): 172 | # forward and backward 173 | with mx.autograd.record(): 174 | if data[0].shape[0] < len(ctx): 175 | data = split_and_load(data, [ctx[0]]) 176 | else: 177 | data = split_and_load(data, ctx) 178 | for chunk in data: 179 | _, token_ids, slot_label, intent_label, valid_length = chunk 180 | 181 | log_num += len(token_ids) 182 | total_num += len(token_ids) 183 | 184 | # forward computation 185 | intent_pred, slot_pred = model(token_ids, valid_length) 186 | ls = icsl_loss_function(intent_pred, slot_pred, intent_label, slot_label, valid_length - 2).mean() 187 | 188 | if accumulate: 189 | ls = ls / accumulate 190 | ls.backward() 191 | step_loss += ls.asscalar() 192 | 193 | # update 194 | if not accumulate or (batch_id + 1) % accumulate == 0: 195 | trainer.allreduce_grads() 196 | nlp.utils.clip_grad_global_norm(params, 1) 197 | trainer.update(1, ignore_stale_grad=True) 198 | 199 | if (batch_id + 1) % log_interval == 0: 200 | toc = time.time() 201 | # update metrics 202 | ic_metric.update([intent_label], [intent_pred]) 203 | sl_metric.update(*process_seq_labels(slot_label, slot_pred, ignore_id=pad_label_id)) 204 | log.info('Epoch: {}, Batch: {}/{}, speed: {:.2f} samples/s, lr={:.7f}, loss={:.4f}, intent acc={:.3f}, slot acc={:.3f}' 205 | .format(epoch_id, 206 | batch_id, 207 | len(train_dataloader), 208 | log_num / (toc - tic), 209 | trainer.learning_rate, 210 | step_loss / log_interval, 211 | ic_metric.get()[1], 212 | sl_metric.get()[1])) 213 | tic = time.time() 214 | step_loss = 0 215 | log_num = 0 216 | 217 | mx.nd.waitall() 218 | epoch_toc = time.time() 219 | log.info('Time cost: {:.2f} s, Speed: {:.2f} samples/s' 220 | .format(epoch_toc - epoch_tic, total_num/(epoch_toc - epoch_tic))) 221 | model.save_parameters(os.path.join(model_dir, model_name + '.params')) 222 | 223 | 224 | def evaluate(model=None, model_name='', eval_input=''): 225 | """Evaluate the model on validation dataset. 226 | """ 227 | ## Load model 228 | bert, vocabulary = nlp.model.get_model('bert_12_768_12', 229 | dataset_name='wiki_multilingual_uncased', 230 | pretrained=True, 231 | ctx=ctx, 232 | use_pooler=False, 233 | use_decoder=False, 234 | use_classifier=False) 235 | if model is None: 236 | assert model_name != '' 237 | model = BERTForICSL(bert, num_slot_labels=len(label2idx), num_intents=len(intent2idx)) 238 | model.initialize(ctx=ctx) 239 | model.hybridize(static_alloc=True) 240 | model.load_parameters(os.path.join(model_dir, model_name + '.params')) 241 | 242 | idx2label = {} 243 | for label, idx in label2idx.items(): 244 | idx2label[idx] = label 245 | ## Load dev dataset 246 | field_separator = nlp.data.Splitter('\t') 247 | field_indices = [1, 3, 4, 0] 248 | eval_data = nlp.data.TSVDataset(filename=eval_input, 249 | field_separator=field_separator, 250 | num_discard_samples=1, 251 | field_indices=field_indices) 252 | 253 | bert_tokenizer = nlp.data.BERTTokenizer(vocabulary, lower=True) 254 | 255 | dev_alignment = {} 256 | eval_data_transform = [] 257 | for sample in eval_data: 258 | sample, alignment = icsl_transform(sample, vocabulary, label2idx, intent2idx, bert_tokenizer) 259 | eval_data_transform += [sample] 260 | dev_alignment[sample[0]] = alignment 261 | log.info('The number of examples after preprocessing: {}' 262 | .format(len(eval_data_transform))) 263 | 264 | test_batch_size = 16 265 | pad_token_id = vocabulary[PAD] 266 | pad_label_id = label2idx[PAD] 267 | batchify_fn = nlp.data.batchify.Tuple( 268 | nlp.data.batchify.Stack(), 269 | nlp.data.batchify.Pad(axis=0, pad_val=pad_token_id), 270 | nlp.data.batchify.Pad(axis=0, pad_val=pad_label_id), 271 | nlp.data.batchify.Stack('float32'), 272 | nlp.data.batchify.Stack('float32')) 273 | eval_dataloader = mx.gluon.data.DataLoader( 274 | eval_data_transform, 275 | batchify_fn=batchify_fn, 276 | num_workers=4, batch_size=test_batch_size, shuffle=False, last_batch='keep') 277 | 278 | _Result = collections.namedtuple( 279 | '_Result', ['intent', 'slot_labels']) 280 | all_results = {} 281 | 282 | total_num = 0 283 | for data in eval_dataloader: 284 | example_ids, token_ids, _, _, valid_length = data 285 | total_num += len(token_ids) 286 | # load data to GPU 287 | token_ids = token_ids.astype('float32').as_in_context(ctx[0]) 288 | valid_length = valid_length.astype('float32').as_in_context(ctx[0]) 289 | 290 | # forward computation 291 | intent_pred, slot_pred = model(token_ids, valid_length) 292 | intent_pred = intent_pred.asnumpy() 293 | slot_pred = slot_pred.asnumpy() 294 | valid_length = valid_length.asnumpy() 295 | 296 | for eid, y_intent, y_slot, length in zip(example_ids, intent_pred, slot_pred, valid_length): 297 | eid = eid.asscalar() 298 | length = int(length) - 2 299 | intent_id = y_intent.argmax(axis=-1) 300 | slot_ids = y_slot.argmax(axis=-1).tolist()[:length] 301 | slot_names = [idx2label[idx] for idx in slot_ids] 302 | merged_slot_names = merge_slots(slot_names, dev_alignment[eid] + [length]) 303 | if eid not in all_results: 304 | all_results[eid] = _Result(intent_id, merged_slot_names) 305 | 306 | example_ids, utterances, labels, intents = load_tsv(eval_input) 307 | pred_intents = [] 308 | label_intents = [] 309 | for eid, intent in zip(example_ids, intents): 310 | label_intents.append(label2index(intent2idx, intent)) 311 | pred_intents.append(all_results[eid].intent) 312 | intent_acc = sklearn.metrics.accuracy_score(label_intents, pred_intents) 313 | log.info("Intent Accuracy: %.4f" % intent_acc) 314 | 315 | pred_icsl = [] 316 | label_icsl = [] 317 | for eid, intent, slot_labels in zip(example_ids, intents, labels): 318 | label_icsl.append(str(label2index(intent2idx, intent)) + ' ' + ' '.join(slot_labels)) 319 | pred_icsl.append(str(all_results[eid].intent) + ' ' + ' '.join(all_results[eid].slot_labels)) 320 | exact_match = sklearn.metrics.accuracy_score(label_icsl, pred_icsl) 321 | log.info("Exact Match: %.4f" % exact_match) 322 | 323 | with open(conll_prediction_file, "w") as fw: 324 | for eid, utterance, labels in zip(example_ids, utterances, labels): 325 | preds = all_results[eid].slot_labels 326 | for w, l, p in zip(utterance, labels, preds): 327 | fw.write(' '.join([w, l, p]) + '\n') 328 | fw.write('\n') 329 | proc = subprocess.Popen(["perl", "conlleval.pl"], stdin=subprocess.PIPE, stdout=subprocess.PIPE) 330 | with open(conll_prediction_file) as f: 331 | stdout = proc.communicate(f.read().encode())[0] 332 | result = stdout.decode('utf-8').split('\n')[1] 333 | slot_f1 = float(result.split()[-1].strip()) 334 | log.info("Slot Labeling: %s" % result) 335 | return intent_acc, slot_f1 336 | 337 | 338 | # extract labels 339 | train_input = data_dir + 'atis_train.tsv' 340 | intent2idx, label2idx = get_label_indices(train_input) 341 | 342 | # Train 343 | for lang in ['ES', 'DE', 'ZH', 'JA', 'PT', 'FR', 'HI', 'TR']: 344 | log.info('Train on %s:' % lang) 345 | model_name = 'model_bert_mt_' + lang + '.' + str(random_seed) 346 | train_input = data_dir + 'train_translated_' + lang + '.tsv' 347 | train(model_name, train_input) 348 | 349 | for lang in ['ES', 'DE', 'ZH', 'JA', 'PT', 'FR', 'HI', 'TR']: 350 | log.info('Evaluate on %s:' % lang) 351 | model_name = 'model_bert_mt_' + lang + '.' + str(random_seed) 352 | test_input = data_dir + 'atis_test_' + lang + '.tsv' 353 | evaluate(model_name=model_name, eval_input=test_input) 354 | -------------------------------------------------------------------------------- /code/scripts/bert_soft_align.py: -------------------------------------------------------------------------------- 1 | import collections 2 | import gluonnlp as nlp 3 | import logging 4 | import mxnet as mx 5 | import os 6 | import sklearn.metrics 7 | import subprocess 8 | import sys 9 | import time 10 | import warnings 11 | 12 | from bert import * 13 | from mxnet import gluon 14 | from mxnet.gluon import Block, nn, rnn 15 | from mxnet.gluon.loss import Loss, SoftmaxCELoss 16 | 17 | from layers import * 18 | from loss import * 19 | from utils import * 20 | 21 | random_seed = int(sys.argv[1]) 22 | warnings.filterwarnings('ignore') 23 | data_dir = "../data/" 24 | model_dir = "../exp/" 25 | conll_prediction_file = data_dir + "conll.pred" 26 | 27 | PAD = '[PAD]' 28 | INF_INT = int(1e18) 29 | mx.random.seed(random_seed) 30 | ctx = [mx.gpu(i) for i in range(4)] 31 | 32 | log = logging.getLogger('gluonnlp') 33 | log.setLevel(logging.DEBUG) 34 | formatter = logging.Formatter(fmt='[%(levelname)s] %(name)s:%(asctime)s %(message)s', datefmt='%H:%M:%S') 35 | fh = logging.FileHandler(os.path.join(model_dir, 'bert_align.' + str(random_seed) + '.log'), mode='w') 36 | fh.setLevel(logging.INFO) 37 | fh.setFormatter(formatter) 38 | console = logging.StreamHandler() 39 | console.setLevel(logging.INFO) 40 | console.setFormatter(formatter) 41 | log.addHandler(console) 42 | log.addHandler(fh) 43 | 44 | class MultiTaskICSL(Block): 45 | """Model for IC/SL task. 46 | 47 | The model feeds token ids into BERT to get the sequence 48 | representations, then apply two dense layers for IC/SL task. 49 | """ 50 | 51 | def __init__(self, bert, vocab_size, num_slot_labels, num_intents, hidden_size=768, dropout=.1, attn_temperature=.1, prefix=None, params=None): 52 | super(MultiTaskICSL, self).__init__(prefix=prefix, params=params) 53 | self.bert = bert 54 | with self.name_scope(): 55 | self.dropout = nn.Dropout(rate=dropout) 56 | # IC/SL classifier 57 | self.slot_classifier = nn.Dense(units=num_slot_labels, 58 | in_units=hidden_size, 59 | flatten=False) 60 | self.intent_classifier = nn.Dense(units=num_intents, 61 | in_units=hidden_size, 62 | flatten=False) 63 | # LM output layer 64 | self.lm_output_layer = nn.Dense(units=vocab_size, 65 | in_units=hidden_size, 66 | params=self.bert.word_embed.params, 67 | flatten=False) 68 | # attention map layer 69 | self.attention_map_layer = AttentionMapCell(units=hidden_size, 70 | hidden_size=hidden_size * 2, 71 | attn_temperature=attn_temperature) 72 | 73 | def encode(self, inputs, valid_length): 74 | types = mx.nd.zeros_like(inputs) 75 | encoded = self.bert(inputs, types, valid_length) 76 | encoded = self.dropout(encoded) 77 | return encoded 78 | 79 | def forward(self, inputs, valid_length): # pylint: disable=arguments-differ 80 | """Generate unnormalized scores for the given input sequences. 81 | 82 | Parameters 83 | ---------- 84 | inputs : NDArray, shape (batch_size, seq_length) 85 | Input words for the sequences. 86 | valid_length : NDArray or None, shape (batch_size) 87 | Valid length of the sequence. This is used to mask the padded tokens. 88 | 89 | Returns 90 | ------- 91 | intent_prediction: NDArray 92 | Shape (batch_size, num_intents) 93 | slot_prediction : NDArray 94 | Shape (batch_size, seq_length, num_slot_labels) 95 | """ 96 | # hidden: (batch_size, seq_length, hidden_size) 97 | hidden = self.encode(inputs, valid_length) 98 | # get intent and slot label predictions 99 | intent_prediction = self.intent_classifier(hidden[:, 0, :]) 100 | slot_prediction = self.slot_classifier(hidden[:, 1:, :]) 101 | return intent_prediction, slot_prediction 102 | 103 | def translate_and_predict(self, source, target, src_valid_length): 104 | """Generate unnormalized scores for the given input sequences. 105 | 106 | Parameters 107 | ---------- 108 | source : NDArray, shape (batch_size, src_seq_length) 109 | Input words for the source sequences. 110 | target : NDArray, shape (batch_size, tgt_seq_length) 111 | Input words for the target sequences. 112 | src_valid_length : NDArray or None, shape (batch_size) 113 | Valid length of the source sequence. This is used to mask the padded tokens. 114 | 115 | Returns 116 | ------- 117 | translation : NDArray 118 | Shape (batch_size, tgt_seq_length, vocab_size) 119 | intent_prediction: NDArray 120 | Shape (batch_size, num_intents) 121 | slot_prediction : NDArray 122 | Shape (batch_size, tgt_seq_length, num_slot_labels) 123 | """ 124 | # src_len_mask: (batch_size, tgt_seq_length, src_seq_length) 125 | src_len_mask = None 126 | if src_valid_length is not None: 127 | dtype = src_valid_length.dtype 128 | ctx = src_valid_length.context 129 | src_len_mask = mx.nd.broadcast_lesser( 130 | mx.nd.arange(source.shape[1], ctx=ctx, dtype=dtype).reshape((1, -1)), 131 | src_valid_length.reshape((-1, 1))) 132 | src_len_mask = mx.nd.broadcast_axes(mx.nd.expand_dims(src_len_mask, axis=1), axis=1, size=target.shape[1]) 133 | # src_encoded: (batch_size, src_seq_length, hidden_size) 134 | src_encoded = self.encode(source, src_valid_length) 135 | # tgt_embed: (batch_size, tgt_seq_length, hidden_size) 136 | tgt_embed = self.bert.word_embed(target) 137 | # (batch_size, tgt_seq_length, hidden_size) 138 | decoded, attn_output = self.attention_map_layer(tgt_embed, src_encoded, src_len_mask) 139 | # translation: (batch_size, tgt_seq_length - 1, vocab_size) 140 | translation = self.lm_output_layer(decoded[:, 1:, :]) 141 | # get intent and slot label predictions 142 | intent_prediction = self.intent_classifier(src_encoded[:, 0, :]) 143 | slot_prediction = self.slot_classifier(attn_output[:, 1:, :]) 144 | return translation, intent_prediction, slot_prediction 145 | 146 | def train(model_name, train_input, para_input): 147 | """Training function.""" 148 | ## Arguments 149 | log_interval = 100 150 | batch_size = 32 151 | lr = 1e-5 152 | optimizer = 'adam' 153 | accumulate = None 154 | epochs = 20 155 | mt_batches_per_epoch = 200 156 | icsl_batches_per_epoch = 200 157 | 158 | ## Load BERT model and vocabulary 159 | bert, vocabulary = nlp.model.get_model('bert_12_768_12', 160 | dataset_name='wiki_multilingual_uncased', 161 | pretrained=True, 162 | ctx=ctx, 163 | use_pooler=False, 164 | use_decoder=False, 165 | use_classifier=False) 166 | 167 | model = MultiTaskICSL(bert, len(vocabulary), num_slot_labels=len(label2idx), num_intents=len(intent2idx)) 168 | model.initialize(init=mx.init.Uniform(0.1), ctx=ctx) 169 | model.hybridize(static_alloc=True) 170 | 171 | icsl_loss_function = ICSLLoss() 172 | icsl_loss_function.hybridize(static_alloc=True) 173 | ce_loss_function = SoftmaxCELoss() 174 | ce_loss_function.hybridize(static_alloc=True) 175 | mce_loss_function = SoftmaxCEMaskedLoss() 176 | mce_loss_function.hybridize(static_alloc=True) 177 | 178 | ic_metric = mx.metric.Accuracy() 179 | sl_metric = mx.metric.Accuracy() 180 | 181 | ## Load labeled data 182 | field_separator = nlp.data.Splitter('\t') 183 | # fields to select from the file: utterance, slot labels, intent, uid 184 | field_indices = [1, 3, 4, 0] 185 | train_data = nlp.data.TSVDataset(filename=train_input, 186 | field_separator=field_separator, 187 | num_discard_samples=1, 188 | field_indices=field_indices) 189 | 190 | # use the vocabulary from pre-trained model for tokenization 191 | bert_tokenizer = nlp.data.BERTTokenizer(vocabulary, lower=True) 192 | train_data_transform = train_data.transform(fn=lambda x: icsl_transform(x, vocabulary, label2idx, intent2idx, bert_tokenizer)[0]) 193 | # create data loader 194 | pad_token_id = vocabulary[PAD] 195 | pad_label_id = label2idx[PAD] 196 | batchify_fn = nlp.data.batchify.Tuple( 197 | nlp.data.batchify.Stack(), 198 | nlp.data.batchify.Pad(axis=0, pad_val=pad_token_id), 199 | nlp.data.batchify.Pad(axis=0, pad_val=pad_label_id), 200 | nlp.data.batchify.Stack('float32'), 201 | nlp.data.batchify.Stack('float32')) 202 | train_sampler = nlp.data.FixedBucketSampler(lengths=[len(item[1]) for item in train_data_transform], 203 | batch_size=batch_size, 204 | shuffle=True) 205 | train_dataloader = mx.gluon.data.DataLoader(train_data_transform, 206 | batchify_fn=batchify_fn, 207 | batch_sampler=train_sampler) 208 | 209 | ## Load parallel data 210 | field_separator = nlp.data.Splitter('\t') 211 | # fields to select from the file: utterance, uid 212 | field_indices = [0, 1, 2, 3] 213 | para_data = nlp.data.TSVDataset(filename=para_input, 214 | field_separator=field_separator, 215 | num_discard_samples=0, 216 | field_indices=field_indices) 217 | 218 | # use the vocabulary from pre-trained model for tokenization 219 | bert_tokenizer = nlp.data.BERTTokenizer(vocabulary, lower=True) 220 | para_data_transform = para_data.transform(fn=lambda x: parallel_icsl_transform(x, vocabulary, label2idx, intent2idx, bert_tokenizer)) 221 | # create data loader 222 | batchify_fn = nlp.data.batchify.Tuple( 223 | nlp.data.batchify.Pad(axis=0, pad_val=pad_token_id), 224 | nlp.data.batchify.Pad(axis=0, pad_val=pad_token_id), 225 | nlp.data.batchify.Pad(axis=0, pad_val=pad_label_id), 226 | nlp.data.batchify.Stack('float32'), 227 | nlp.data.batchify.Stack('float32'), 228 | nlp.data.batchify.Stack('float32')) 229 | train_sampler = nlp.data.FixedBucketSampler(lengths=[len(item[0]) for item in para_data_transform], 230 | batch_size=batch_size, 231 | shuffle=True) 232 | para_dataloader = mx.gluon.data.DataLoader(para_data_transform, 233 | batchify_fn=batchify_fn, 234 | batch_sampler=train_sampler) 235 | 236 | optimizer_params = {'learning_rate': lr} 237 | trainer = gluon.Trainer(model.collect_params(), optimizer, 238 | optimizer_params, update_on_kvstore=False) 239 | optimizer_params = {'learning_rate': lr} 240 | mt_trainer = gluon.Trainer(model.collect_params(), optimizer, 241 | optimizer_params, update_on_kvstore=False) 242 | 243 | # Collect differentiable parameters 244 | params = [p for p in model.collect_params().values() if p.grad_req != 'null'] 245 | # Set grad_req if gradient accumulation is required 246 | if accumulate: 247 | for p in params: 248 | p[1].grad_req = 'add' 249 | # Fix BERT embeddings if required 250 | for p in model.collect_params().items(): 251 | if 'embed' in p[0]: 252 | p[1].grad_req = 'null' 253 | 254 | epoch_tic = time.time() 255 | total_num = 0 256 | log_num = 0 257 | for epoch_id in range(epochs): 258 | mt_loss, icsl_loss, step_loss = 0, 0, 0 259 | tic = time.time() 260 | 261 | # train on parallel data 262 | para_data_iterator = iter(para_dataloader) 263 | num_batches = mt_batches_per_epoch if epoch_id > 0 else INF_INT 264 | for batch_id in range(num_batches): 265 | data = next(para_data_iterator, None) 266 | if data is None: 267 | break 268 | # forward and backward 269 | with mx.autograd.record(): 270 | if data[0].shape[0] < len(ctx): 271 | data = split_and_load(data, [ctx[0]]) 272 | else: 273 | data = split_and_load(data, ctx) 274 | for chunk in data: 275 | source, target, slot_label, intent_label, src_valid_len, tgt_valid_len = chunk 276 | 277 | # forward computation 278 | translation, intent_pred, slot_pred = model.translate_and_predict(source, target, src_valid_len) 279 | mt_ls = mce_loss_function(translation, target[:, 1:], tgt_valid_len - 1).mean() 280 | icsl_ls = icsl_loss_function(intent_pred, slot_pred, intent_label, slot_label, tgt_valid_len - 2).mean() 281 | ls = mt_ls + icsl_ls 282 | 283 | if accumulate: 284 | ls = ls / accumulate 285 | ls.backward() 286 | mt_loss += mt_ls.asscalar() 287 | icsl_loss += icsl_ls.asscalar() 288 | 289 | # update 290 | if not accumulate or (batch_id + 1) % accumulate == 0: 291 | mt_trainer.allreduce_grads() 292 | nlp.utils.clip_grad_global_norm(params, 1) 293 | mt_trainer.update(1, ignore_stale_grad=True) 294 | if (batch_id + 1) % log_interval == 0: 295 | log.info('Epoch: {}, Batch: {}/{}, lr={:.7f}, mt_loss={:.4f}, icsl_loss={:.4f}' 296 | .format(epoch_id, 297 | batch_id, 298 | len(para_dataloader), 299 | mt_trainer.learning_rate, 300 | mt_loss / log_interval, 301 | icsl_loss / log_interval)) 302 | mt_loss = 0 303 | icsl_loss = 0 304 | 305 | # train on labeled data 306 | train_data_iterator = iter(train_dataloader) 307 | for batch_id in range(icsl_batches_per_epoch): 308 | data = next(train_data_iterator, None) 309 | if data is None: 310 | break 311 | # forward and backward 312 | with mx.autograd.record(): 313 | if data[0].shape[0] < len(ctx): 314 | data = split_and_load(data, [ctx[0]]) 315 | else: 316 | data = split_and_load(data, ctx) 317 | for chunk in data: 318 | _, token_ids, slot_label, intent_label, valid_length = chunk 319 | 320 | log_num += len(token_ids) 321 | total_num += len(token_ids) 322 | 323 | # forward computation 324 | intent_pred, slot_pred = model(token_ids, valid_length) 325 | ls = icsl_loss_function(intent_pred, slot_pred, intent_label, slot_label, valid_length - 2).mean() 326 | 327 | if accumulate: 328 | ls = ls / accumulate 329 | ls.backward() 330 | step_loss += ls.asscalar() 331 | 332 | # update 333 | if not accumulate or (batch_id + 1) % accumulate == 0: 334 | trainer.allreduce_grads() 335 | nlp.utils.clip_grad_global_norm(params, 1) 336 | trainer.update(1, ignore_stale_grad=True) 337 | 338 | if (batch_id + 1) % log_interval == 0: 339 | toc = time.time() 340 | # update metrics 341 | ic_metric.update([intent_label], [intent_pred]) 342 | sl_metric.update(*process_seq_labels(slot_label, slot_pred, ignore_id=pad_label_id)) 343 | log.info('Epoch: {}, Batch: {}/{}, speed: {:.2f} samples/s, lr={:.7f}, loss={:.4f}, intent acc={:.3f}, slot acc={:.3f}' 344 | .format(epoch_id, 345 | batch_id, 346 | len(train_dataloader), 347 | log_num / (toc - tic), 348 | trainer.learning_rate, 349 | step_loss / log_interval, 350 | ic_metric.get()[1], 351 | sl_metric.get()[1])) 352 | tic = time.time() 353 | step_loss = 0 354 | log_num = 0 355 | 356 | mx.nd.waitall() 357 | epoch_toc = time.time() 358 | log.info('Time cost: {:.2f} s, Speed: {:.2f} samples/s' 359 | .format(epoch_toc - epoch_tic, total_num/(epoch_toc - epoch_tic))) 360 | model.save_parameters(os.path.join(model_dir, model_name + '.params')) 361 | 362 | 363 | def evaluate(model=None, model_name='', eval_input=''): 364 | """Evaluate the model on validation dataset. 365 | """ 366 | ## Load model 367 | bert, vocabulary = nlp.model.get_model('bert_12_768_12', 368 | dataset_name='wiki_multilingual_uncased', 369 | pretrained=True, 370 | ctx=ctx, 371 | use_pooler=False, 372 | use_decoder=False, 373 | use_classifier=False) 374 | if model is None: 375 | assert model_name != '' 376 | model = MultiTaskICSL(bert, len(vocabulary), num_slot_labels=len(label2idx), num_intents=len(intent2idx)) 377 | model.initialize(ctx=ctx) 378 | model.hybridize(static_alloc=True) 379 | model.load_parameters(os.path.join(model_dir, model_name + '.params')) 380 | 381 | idx2label = {} 382 | for label, idx in label2idx.items(): 383 | idx2label[idx] = label 384 | ## Load dev dataset 385 | field_separator = nlp.data.Splitter('\t') 386 | field_indices = [1, 3, 4, 0] 387 | eval_data = nlp.data.TSVDataset(filename=eval_input, 388 | field_separator=field_separator, 389 | num_discard_samples=1, 390 | field_indices=field_indices) 391 | 392 | bert_tokenizer = nlp.data.BERTTokenizer(vocabulary, lower=True) 393 | 394 | dev_alignment = {} 395 | eval_data_transform = [] 396 | for sample in eval_data: 397 | sample, alignment = icsl_transform(sample, vocabulary, label2idx, intent2idx, bert_tokenizer) 398 | eval_data_transform += [sample] 399 | dev_alignment[sample[0]] = alignment 400 | log.info('The number of examples after preprocessing: {}' 401 | .format(len(eval_data_transform))) 402 | 403 | test_batch_size = 16 404 | pad_token_id = vocabulary[PAD] 405 | pad_label_id = label2idx[PAD] 406 | batchify_fn = nlp.data.batchify.Tuple( 407 | nlp.data.batchify.Stack(), 408 | nlp.data.batchify.Pad(axis=0, pad_val=pad_token_id), 409 | nlp.data.batchify.Pad(axis=0, pad_val=pad_label_id), 410 | nlp.data.batchify.Stack('float32'), 411 | nlp.data.batchify.Stack('float32')) 412 | eval_dataloader = mx.gluon.data.DataLoader( 413 | eval_data_transform, 414 | batchify_fn=batchify_fn, 415 | num_workers=4, batch_size=test_batch_size, shuffle=False, last_batch='keep') 416 | 417 | _Result = collections.namedtuple( 418 | '_Result', ['intent', 'slot_labels']) 419 | all_results = {} 420 | 421 | total_num = 0 422 | for data in eval_dataloader: 423 | example_ids, token_ids, _, _, valid_length = data 424 | total_num += len(token_ids) 425 | # load data to GPU 426 | token_ids = token_ids.astype('float32').as_in_context(ctx[0]) 427 | valid_length = valid_length.astype('float32').as_in_context(ctx[0]) 428 | 429 | # forward computation 430 | intent_pred, slot_pred = model(token_ids, valid_length) 431 | intent_pred = intent_pred.asnumpy() 432 | slot_pred = slot_pred.asnumpy() 433 | valid_length = valid_length.asnumpy() 434 | 435 | for eid, y_intent, y_slot, length in zip(example_ids, intent_pred, slot_pred, valid_length): 436 | eid = eid.asscalar() 437 | length = int(length) - 2 438 | intent_id = y_intent.argmax(axis=-1) 439 | slot_ids = y_slot.argmax(axis=-1).tolist()[:length] 440 | slot_names = [idx2label[idx] for idx in slot_ids] 441 | merged_slot_names = merge_slots(slot_names, dev_alignment[eid] + [length]) 442 | if eid not in all_results: 443 | all_results[eid] = _Result(intent_id, merged_slot_names) 444 | 445 | example_ids, utterances, labels, intents = load_tsv(eval_input) 446 | pred_intents = [] 447 | label_intents = [] 448 | for eid, intent in zip(example_ids, intents): 449 | label_intents.append(label2index(intent2idx, intent)) 450 | pred_intents.append(all_results[eid].intent) 451 | intent_acc = sklearn.metrics.accuracy_score(label_intents, pred_intents) 452 | log.info("Intent Accuracy: %.4f" % intent_acc) 453 | 454 | pred_icsl = [] 455 | label_icsl = [] 456 | for eid, intent, slot_labels in zip(example_ids, intents, labels): 457 | label_icsl.append(str(label2index(intent2idx, intent)) + ' ' + ' '.join(slot_labels)) 458 | pred_icsl.append(str(all_results[eid].intent) + ' ' + ' '.join(all_results[eid].slot_labels)) 459 | exact_match = sklearn.metrics.accuracy_score(label_icsl, pred_icsl) 460 | log.info("Exact Match: %.4f" % exact_match) 461 | 462 | with open(conll_prediction_file, "w") as fw: 463 | for eid, utterance, labels in zip(example_ids, utterances, labels): 464 | preds = all_results[eid].slot_labels 465 | for w, l, p in zip(utterance, labels, preds): 466 | fw.write(' '.join([w, l, p]) + '\n') 467 | fw.write('\n') 468 | proc = subprocess.Popen(["perl", "conlleval.pl"], stdin=subprocess.PIPE, stdout=subprocess.PIPE) 469 | with open(conll_prediction_file) as f: 470 | stdout = proc.communicate(f.read().encode())[0] 471 | result = stdout.decode('utf-8').split('\n')[1] 472 | slot_f1 = float(result.split()[-1].strip()) 473 | log.info("Slot Labeling: %s" % result) 474 | return intent_acc, slot_f1 475 | 476 | 477 | # extract labels 478 | train_input = data_dir + 'atis_train.tsv' 479 | intent2idx, label2idx = get_label_indices(train_input) 480 | 481 | for lang in ['ES', 'DE', 'ZH', 'JA', 'PT', 'FR', 'HI', 'TR']: 482 | log.info('Train on %s:' % lang) 483 | model_name = 'model_bert_align_' + lang + '.' + str(random_seed) 484 | train_input = data_dir + 'atis_train.tsv' 485 | para_input = data_dir + 'train_para_' + lang + '.tsv' 486 | train(model_name, train_input, para_input) 487 | 488 | for lang in ['ES', 'DE', 'ZH', 'JA', 'PT', 'FR', 'HI', 'TR']: 489 | log.info('Evaluate on %s:' % lang) 490 | model_name = 'model_bert_align_' + lang + '.' + str(random_seed) 491 | test_input = data_dir + 'atis_test_' + lang + '.tsv' 492 | evaluate(model_name=model_name, eval_input=test_input) 493 | -------------------------------------------------------------------------------- /code/scripts/conlleval.pl: -------------------------------------------------------------------------------- 1 | #!/usr/bin/perl -w 2 | # conlleval: evaluate result of processing CoNLL-2000 shared task 3 | # usage: conlleval [-l] [-r] [-d delimiterTag] [-o oTag] < file 4 | # README: http://cnts.uia.ac.be/conll2000/chunking/output.html 5 | # options: l: generate LaTeX output for tables like in 6 | # http://cnts.uia.ac.be/conll2003/ner/example.tex 7 | # r: accept raw result tags (without B- and I- prefix; 8 | # assumes one word per chunk) 9 | # d: alternative delimiter tag (default is single space) 10 | # o: alternative outside tag (default is O) 11 | # note: the file should contain lines with items separated 12 | # by $delimiter characters (default space). The final 13 | # two items should contain the correct tag and the 14 | # guessed tag in that order. Sentences should be 15 | # separated from each other by empty lines or lines 16 | # with $boundary fields (default -X-). 17 | # url: http://lcg-www.uia.ac.be/conll2000/chunking/ 18 | # started: 1998-09-25 19 | # version: 2004-01-26 20 | # author: Erik Tjong Kim Sang 21 | 22 | use strict; 23 | 24 | my $false = 0; 25 | my $true = 42; 26 | 27 | my $boundary = "-X-"; # sentence boundary 28 | my $correct; # current corpus chunk tag (I,O,B) 29 | my $correctChunk = 0; # number of correctly identified chunks 30 | my $correctTags = 0; # number of correct chunk tags 31 | my $correctType; # type of current corpus chunk tag (NP,VP,etc.) 32 | my $delimiter = " "; # field delimiter 33 | my $FB1 = 0.0; # FB1 score (Van Rijsbergen 1979) 34 | my $firstItem; # first feature (for sentence boundary checks) 35 | my $foundCorrect = 0; # number of chunks in corpus 36 | my $foundGuessed = 0; # number of identified chunks 37 | my $guessed; # current guessed chunk tag 38 | my $guessedType; # type of current guessed chunk tag 39 | my $i; # miscellaneous counter 40 | my $inCorrect = $false; # currently processed chunk is correct until now 41 | my $lastCorrect = "O"; # previous chunk tag in corpus 42 | my $latex = 0; # generate LaTeX formatted output 43 | my $lastCorrectType = ""; # type of previously identified chunk tag 44 | my $lastGuessed = "O"; # previously identified chunk tag 45 | my $lastGuessedType = ""; # type of previous chunk tag in corpus 46 | my $lastType; # temporary storage for detecting duplicates 47 | my $line; # line 48 | my $nbrOfFeatures = -1; # number of features per line 49 | my $precision = 0.0; # precision score 50 | my $oTag = "O"; # outside tag, default O 51 | my $raw = 0; # raw input: add B to every token 52 | my $recall = 0.0; # recall score 53 | my $tokenCounter = 0; # token counter (ignores sentence breaks) 54 | 55 | my %correctChunk = (); # number of correctly identified chunks per type 56 | my %foundCorrect = (); # number of chunks in corpus per type 57 | my %foundGuessed = (); # number of identified chunks per type 58 | 59 | my @features; # features on line 60 | my @sortedTypes; # sorted list of chunk type names 61 | 62 | # sanity check 63 | while (@ARGV and $ARGV[0] =~ /^-/) { 64 | if ($ARGV[0] eq "-l") { $latex = 1; shift(@ARGV); } 65 | elsif ($ARGV[0] eq "-r") { $raw = 1; shift(@ARGV); } 66 | elsif ($ARGV[0] eq "-d") { 67 | shift(@ARGV); 68 | if (not defined $ARGV[0]) { 69 | die "conlleval: -d requires delimiter character"; 70 | } 71 | $delimiter = shift(@ARGV); 72 | } elsif ($ARGV[0] eq "-o") { 73 | shift(@ARGV); 74 | if (not defined $ARGV[0]) { 75 | die "conlleval: -o requires delimiter character"; 76 | } 77 | $oTag = shift(@ARGV); 78 | } else { die "conlleval: unknown argument $ARGV[0]\n"; } 79 | } 80 | if (@ARGV) { die "conlleval: unexpected command line argument\n"; } 81 | # process input 82 | while () { 83 | chomp($line = $_); 84 | @features = split(/$delimiter/,$line); 85 | if ($nbrOfFeatures < 0) { $nbrOfFeatures = $#features; } 86 | elsif ($nbrOfFeatures != $#features and @features != 0) { 87 | printf STDERR "unexpected number of features: %d (%d)\n", 88 | $#features+1,$nbrOfFeatures+1; 89 | exit(1); 90 | } 91 | if (@features == 0 or 92 | $features[0] eq $boundary) { @features = ($boundary,"O","O"); } 93 | if (@features < 2) { 94 | die "conlleval: unexpected number of features in line $line\n"; 95 | } 96 | if ($raw) { 97 | if ($features[$#features] eq $oTag) { $features[$#features] = "O"; } 98 | if ($features[$#features-1] eq $oTag) { $features[$#features-1] = "O"; } 99 | if ($features[$#features] ne "O") { 100 | $features[$#features] = "B-$features[$#features]"; 101 | } 102 | if ($features[$#features-1] ne "O") { 103 | $features[$#features-1] = "B-$features[$#features-1]"; 104 | } 105 | } 106 | # 20040126 ET code which allows hyphens in the types 107 | if ($features[$#features] =~ /^([^-]*)-(.*)$/) { 108 | $guessed = $1; 109 | $guessedType = $2; 110 | } else { 111 | $guessed = $features[$#features]; 112 | $guessedType = ""; 113 | } 114 | pop(@features); 115 | if ($features[$#features] =~ /^([^-]*)-(.*)$/) { 116 | $correct = $1; 117 | $correctType = $2; 118 | } else { 119 | $correct = $features[$#features]; 120 | $correctType = ""; 121 | } 122 | pop(@features); 123 | # ($guessed,$guessedType) = split(/-/,pop(@features)); 124 | # ($correct,$correctType) = split(/-/,pop(@features)); 125 | $guessedType = $guessedType ? $guessedType : ""; 126 | $correctType = $correctType ? $correctType : ""; 127 | $firstItem = shift(@features); 128 | 129 | # 1999-06-26 sentence breaks should always be counted as out of chunk 130 | if ( $firstItem eq $boundary ) { $guessed = "O"; } 131 | 132 | if ($inCorrect) { 133 | if ( &endOfChunk($lastCorrect,$correct,$lastCorrectType,$correctType) and 134 | &endOfChunk($lastGuessed,$guessed,$lastGuessedType,$guessedType) and 135 | $lastGuessedType eq $lastCorrectType) { 136 | $inCorrect=$false; 137 | $correctChunk++; 138 | $correctChunk{$lastCorrectType} = $correctChunk{$lastCorrectType} ? 139 | $correctChunk{$lastCorrectType}+1 : 1; 140 | } elsif ( 141 | &endOfChunk($lastCorrect,$correct,$lastCorrectType,$correctType) != 142 | &endOfChunk($lastGuessed,$guessed,$lastGuessedType,$guessedType) or 143 | $guessedType ne $correctType ) { 144 | $inCorrect=$false; 145 | } 146 | } 147 | 148 | if ( &startOfChunk($lastCorrect,$correct,$lastCorrectType,$correctType) and 149 | &startOfChunk($lastGuessed,$guessed,$lastGuessedType,$guessedType) and 150 | $guessedType eq $correctType) { $inCorrect = $true; } 151 | 152 | if ( &startOfChunk($lastCorrect,$correct,$lastCorrectType,$correctType) ) { 153 | $foundCorrect++; 154 | $foundCorrect{$correctType} = $foundCorrect{$correctType} ? 155 | $foundCorrect{$correctType}+1 : 1; 156 | } 157 | if ( &startOfChunk($lastGuessed,$guessed,$lastGuessedType,$guessedType) ) { 158 | $foundGuessed++; 159 | $foundGuessed{$guessedType} = $foundGuessed{$guessedType} ? 160 | $foundGuessed{$guessedType}+1 : 1; 161 | } 162 | if ( $firstItem ne $boundary ) { 163 | if ( $correct eq $guessed and $guessedType eq $correctType ) { 164 | $correctTags++; 165 | } 166 | $tokenCounter++; 167 | } 168 | 169 | $lastGuessed = $guessed; 170 | $lastCorrect = $correct; 171 | $lastGuessedType = $guessedType; 172 | $lastCorrectType = $correctType; 173 | } 174 | if ($inCorrect) { 175 | $correctChunk++; 176 | $correctChunk{$lastCorrectType} = $correctChunk{$lastCorrectType} ? 177 | $correctChunk{$lastCorrectType}+1 : 1; 178 | } 179 | 180 | if (not $latex) { 181 | # compute overall precision, recall and FB1 (default values are 0.0) 182 | $precision = 100*$correctChunk/$foundGuessed if ($foundGuessed > 0); 183 | $recall = 100*$correctChunk/$foundCorrect if ($foundCorrect > 0); 184 | $FB1 = 2*$precision*$recall/($precision+$recall) 185 | if ($precision+$recall > 0); 186 | 187 | # print overall performance 188 | printf "processed $tokenCounter tokens with $foundCorrect phrases; "; 189 | printf "found: $foundGuessed phrases; correct: $correctChunk.\n"; 190 | if ($tokenCounter>0) { 191 | printf "accuracy: %6.2f%%; ",100*$correctTags/$tokenCounter; 192 | printf "precision: %6.2f%%; ",$precision; 193 | printf "recall: %6.2f%%; ",$recall; 194 | printf "FB1: %6.2f\n",$FB1; 195 | } 196 | } 197 | 198 | # sort chunk type names 199 | undef($lastType); 200 | @sortedTypes = (); 201 | foreach $i (sort (keys %foundCorrect,keys %foundGuessed)) { 202 | if (not($lastType) or $lastType ne $i) { 203 | push(@sortedTypes,($i)); 204 | } 205 | $lastType = $i; 206 | } 207 | # print performance per chunk type 208 | if (not $latex) { 209 | for $i (@sortedTypes) { 210 | $correctChunk{$i} = $correctChunk{$i} ? $correctChunk{$i} : 0; 211 | if (not($foundGuessed{$i})) { $foundGuessed{$i} = 0; $precision = 0.0; } 212 | else { $precision = 100*$correctChunk{$i}/$foundGuessed{$i}; } 213 | if (not($foundCorrect{$i})) { $recall = 0.0; } 214 | else { $recall = 100*$correctChunk{$i}/$foundCorrect{$i}; } 215 | if ($precision+$recall == 0.0) { $FB1 = 0.0; } 216 | else { $FB1 = 2*$precision*$recall/($precision+$recall); } 217 | printf "%17s: ",$i; 218 | printf "precision: %6.2f%%; ",$precision; 219 | printf "recall: %6.2f%%; ",$recall; 220 | printf "FB1: %6.2f %d\n",$FB1,$foundGuessed{$i}; 221 | } 222 | } else { 223 | print " & Precision & Recall & F\$_{\\beta=1} \\\\\\hline"; 224 | for $i (@sortedTypes) { 225 | $correctChunk{$i} = $correctChunk{$i} ? $correctChunk{$i} : 0; 226 | if (not($foundGuessed{$i})) { $precision = 0.0; } 227 | else { $precision = 100*$correctChunk{$i}/$foundGuessed{$i}; } 228 | if (not($foundCorrect{$i})) { $recall = 0.0; } 229 | else { $recall = 100*$correctChunk{$i}/$foundCorrect{$i}; } 230 | if ($precision+$recall == 0.0) { $FB1 = 0.0; } 231 | else { $FB1 = 2*$precision*$recall/($precision+$recall); } 232 | printf "\n%-7s & %6.2f\\%% & %6.2f\\%% & %6.2f \\\\", 233 | $i,$precision,$recall,$FB1; 234 | } 235 | print "\\hline\n"; 236 | $precision = 0.0; 237 | $recall = 0; 238 | $FB1 = 0.0; 239 | $precision = 100*$correctChunk/$foundGuessed if ($foundGuessed > 0); 240 | $recall = 100*$correctChunk/$foundCorrect if ($foundCorrect > 0); 241 | $FB1 = 2*$precision*$recall/($precision+$recall) 242 | if ($precision+$recall > 0); 243 | printf "Overall & %6.2f\\%% & %6.2f\\%% & %6.2f \\\\\\hline\n", 244 | $precision,$recall,$FB1; 245 | } 246 | 247 | exit 0; 248 | 249 | # endOfChunk: checks if a chunk ended between the previous and current word 250 | # arguments: previous and current chunk tags, previous and current types 251 | # note: this code is capable of handling other chunk representations 252 | # than the default CoNLL-2000 ones, see EACL'99 paper of Tjong 253 | # Kim Sang and Veenstra http://xxx.lanl.gov/abs/cs.CL/9907006 254 | 255 | sub endOfChunk { 256 | my $prevTag = shift(@_); 257 | my $tag = shift(@_); 258 | my $prevType = shift(@_); 259 | my $type = shift(@_); 260 | my $chunkEnd = $false; 261 | 262 | if ( $prevTag eq "B" and $tag eq "B" ) { $chunkEnd = $true; } 263 | if ( $prevTag eq "B" and $tag eq "O" ) { $chunkEnd = $true; } 264 | if ( $prevTag eq "I" and $tag eq "B" ) { $chunkEnd = $true; } 265 | if ( $prevTag eq "I" and $tag eq "O" ) { $chunkEnd = $true; } 266 | 267 | if ( $prevTag eq "E" and $tag eq "E" ) { $chunkEnd = $true; } 268 | if ( $prevTag eq "E" and $tag eq "I" ) { $chunkEnd = $true; } 269 | if ( $prevTag eq "E" and $tag eq "O" ) { $chunkEnd = $true; } 270 | if ( $prevTag eq "I" and $tag eq "O" ) { $chunkEnd = $true; } 271 | 272 | if ($prevTag ne "O" and $prevTag ne "." and $prevType ne $type) { 273 | $chunkEnd = $true; 274 | } 275 | 276 | # corrected 1998-12-22: these chunks are assumed to have length 1 277 | if ( $prevTag eq "]" ) { $chunkEnd = $true; } 278 | if ( $prevTag eq "[" ) { $chunkEnd = $true; } 279 | 280 | return($chunkEnd); 281 | } 282 | 283 | # startOfChunk: checks if a chunk started between the previous and current word 284 | # arguments: previous and current chunk tags, previous and current types 285 | # note: this code is capable of handling other chunk representations 286 | # than the default CoNLL-2000 ones, see EACL'99 paper of Tjong 287 | # Kim Sang and Veenstra http://xxx.lanl.gov/abs/cs.CL/9907006 288 | 289 | sub startOfChunk { 290 | my $prevTag = shift(@_); 291 | my $tag = shift(@_); 292 | my $prevType = shift(@_); 293 | my $type = shift(@_); 294 | my $chunkStart = $false; 295 | 296 | if ( $prevTag eq "B" and $tag eq "B" ) { $chunkStart = $true; } 297 | if ( $prevTag eq "I" and $tag eq "B" ) { $chunkStart = $true; } 298 | if ( $prevTag eq "O" and $tag eq "B" ) { $chunkStart = $true; } 299 | if ( $prevTag eq "O" and $tag eq "I" ) { $chunkStart = $true; } 300 | 301 | if ( $prevTag eq "E" and $tag eq "E" ) { $chunkStart = $true; } 302 | if ( $prevTag eq "E" and $tag eq "I" ) { $chunkStart = $true; } 303 | if ( $prevTag eq "O" and $tag eq "E" ) { $chunkStart = $true; } 304 | if ( $prevTag eq "O" and $tag eq "I" ) { $chunkStart = $true; } 305 | 306 | if ($tag ne "O" and $tag ne "." and $prevType ne $type) { 307 | $chunkStart = $true; 308 | } 309 | 310 | # corrected 1998-12-22: these chunks are assumed to have length 1 311 | if ( $tag eq "[" ) { $chunkStart = $true; } 312 | if ( $tag eq "]" ) { $chunkStart = $true; } 313 | 314 | return($chunkStart); 315 | } 316 | -------------------------------------------------------------------------------- /code/scripts/layers.py: -------------------------------------------------------------------------------- 1 | import mxnet as mx 2 | 3 | from gluonnlp.model.attention_cell import AttentionCell, _masked_softmax 4 | from gluonnlp.model.block import L2Normalization 5 | from gluonnlp.model.transformer import PositionwiseFFN 6 | from mxnet.gluon import nn, HybridBlock 7 | 8 | class ScaledDotProductAttentionCell(AttentionCell): 9 | """Dot product attention between the query and the key. 10 | 11 | Depending on parameters, defined as:: 12 | 13 | units is None: 14 | score = 15 | units is not None and luong_style is False: 16 | score = 17 | units is not None and luong_style is True: 18 | score = 19 | 20 | Parameters 21 | ---------- 22 | units: int or None, default None 23 | Project the query and key to vectors with `units` dimension 24 | before applying the attention. If set to None, 25 | the query vector and the key vector are directly used to compute the attention and 26 | should have the same dimension:: 27 | 28 | If the units is None, 29 | score = 30 | Else if the units is not None and luong_style is False: 31 | score = 32 | Else if the units is not None and luong_style is True: 33 | score = 34 | 35 | luong_style: bool, default False 36 | If turned on, the score will be:: 37 | 38 | score = 39 | 40 | `units` must be the same as the dimension of the key vector 41 | scaled: bool, default True 42 | Whether to divide the attention weights by the sqrt of the query dimension. 43 | This is first proposed in "[NIPS2017] Attention is all you need.":: 44 | 45 | score = / sqrt(dim_q) 46 | 47 | normalized: bool, default False 48 | If turned on, the cosine distance is used, i.e:: 49 | 50 | score = 51 | 52 | use_bias : bool, default True 53 | Whether to use bias in the projection layers. 54 | dropout : float, default 0.0 55 | Attention dropout 56 | weight_initializer : str or `Initializer` or None, default None 57 | Initializer of the weights 58 | bias_initializer : str or `Initializer`, default 'zeros' 59 | Initializer of the bias 60 | prefix : str or None, default None 61 | See document of `Block`. 62 | params : str or None, default None 63 | See document of `Block`. 64 | """ 65 | 66 | def __init__(self, units=None, luong_style=False, scaled=True, normalized=False, use_bias=True, 67 | dropout=0.0, temperature=1.0, weight_initializer=None, bias_initializer='zeros', 68 | prefix=None, params=None): 69 | super(ScaledDotProductAttentionCell, self).__init__(prefix=prefix, params=params) 70 | self._units = units 71 | self._scaled = scaled 72 | self._normalized = normalized 73 | self._use_bias = use_bias 74 | self._luong_style = luong_style 75 | self._dropout = dropout 76 | self._temperature = temperature 77 | if self._luong_style: 78 | assert units is not None, 'Luong style attention is not available without explicitly ' \ 79 | 'setting the units' 80 | with self.name_scope(): 81 | self._dropout_layer = nn.Dropout(dropout) 82 | if units is not None: 83 | with self.name_scope(): 84 | self._proj_query = nn.Dense(units=self._units, use_bias=self._use_bias, 85 | flatten=False, weight_initializer=weight_initializer, 86 | bias_initializer=bias_initializer, prefix='query_') 87 | if not self._luong_style: 88 | self._proj_key = nn.Dense(units=self._units, use_bias=self._use_bias, 89 | flatten=False, weight_initializer=weight_initializer, 90 | bias_initializer=bias_initializer, prefix='key_') 91 | if self._normalized: 92 | with self.name_scope(): 93 | self._l2_norm = L2Normalization(axis=-1) 94 | 95 | def _compute_weight(self, F, query, key, mask=None): 96 | if self._units is not None: 97 | query = self._proj_query(query) 98 | if not self._luong_style: 99 | key = self._proj_key(key) 100 | elif F == mx.nd: 101 | assert query.shape[-1] == key.shape[-1], 'Luong style attention requires key to ' \ 102 | 'have the same dim as the projected ' \ 103 | 'query. Received key {}, query {}.'.format( 104 | key.shape, query.shape) 105 | if self._normalized: 106 | query = self._l2_norm(query) 107 | key = self._l2_norm(key) 108 | if self._scaled: 109 | query = F.contrib.div_sqrt_dim(query) 110 | 111 | att_score = F.batch_dot(query, key, transpose_b=True) / self._temperature 112 | 113 | att_weights = self._dropout_layer(_masked_softmax(F, att_score, mask, self._dtype)) 114 | return att_weights 115 | 116 | 117 | class AttentionMapCell(HybridBlock): 118 | """Structure of the Transformer Decoder Cell. 119 | 120 | Parameters 121 | ---------- 122 | attention_cell : AttentionCell or str, default 'multi_head' 123 | Arguments of the attention cell. 124 | Can be 'multi_head', 'scaled_luong', 'scaled_dot', 'dot', 'cosine', 'normed_mlp', 'mlp' 125 | units : int 126 | Number of units for the output 127 | hidden_size : int 128 | number of units in the hidden layer of position-wise feed-forward networks 129 | num_heads : int 130 | Number of heads in multi-head attention 131 | scaled : bool 132 | Whether to scale the softmax input by the sqrt of the input dimension 133 | in multi-head attention 134 | dropout : float 135 | use_residual : bool 136 | output_attention: bool 137 | Whether to output the attention weights 138 | weight_initializer : str or Initializer 139 | Initializer for the input weights matrix, used for the linear 140 | transformation of the inputs. 141 | bias_initializer : str or Initializer 142 | Initializer for the bias vector. 143 | prefix : str, default 'rnn_' 144 | Prefix for name of `Block`s 145 | (and name of weight if params is `None`). 146 | params : Parameter or None 147 | Container for weight sharing between cells. 148 | Created if `None`. 149 | """ 150 | def __init__(self, units=128, hidden_size=512, dropout=0.0, use_residual=True, 151 | attn_temperature=1.0, weight_initializer=None, bias_initializer='zeros', 152 | prefix=None, params=None): 153 | super(AttentionMapCell, self).__init__(prefix=prefix, params=params) 154 | self._units = units 155 | self._dropout = dropout 156 | with self.name_scope(): 157 | if dropout: 158 | self.dropout_layer = nn.Dropout(rate=dropout) 159 | self.attention_cell = ScaledDotProductAttentionCell(temperature=attn_temperature, 160 | scaled=True, 161 | normalized=False) 162 | self.proj_layer = nn.Dense(units=units, flatten=False, 163 | use_bias=False, 164 | weight_initializer=weight_initializer, 165 | bias_initializer=bias_initializer, 166 | prefix='proj_inter_') 167 | self.ffn = PositionwiseFFN(hidden_size=hidden_size, 168 | units=units, 169 | use_residual=use_residual, 170 | dropout=dropout, 171 | weight_initializer=weight_initializer, 172 | bias_initializer=bias_initializer) 173 | 174 | self.layer_norm = nn.LayerNorm() 175 | 176 | def hybrid_forward(self, F, inputs, mem_value, mem_mask=None): #pylint: disable=unused-argument 177 | # pylint: disable=arguments-differ 178 | """Transformer Decoder Attention Cell. 179 | 180 | Parameters 181 | ---------- 182 | inputs : Symbol or NDArray 183 | Input sequence. Shape (batch_size, length, C_in) 184 | mem_value : Symbol or NDArrays 185 | Memory value, i.e. output of the encoder. Shape (batch_size, mem_length, C_in) 186 | mem_mask : Symbol or NDArray or None 187 | Mask for mem_value. Shape (batch_size, length, mem_length) 188 | 189 | Returns 190 | ------- 191 | decoder_cell_outputs: list 192 | Outputs of the decoder cell. Contains: 193 | 194 | - outputs of the transformer decoder cell. Shape (batch_size, length, C_out) 195 | - additional_outputs of all the transformer decoder cell 196 | """ 197 | attention_outputs, attention_weights = \ 198 | self.attention_cell(inputs, mem_value, mem_value, mem_mask) 199 | outputs = self.proj_layer(attention_outputs) 200 | if self._dropout: 201 | outputs = self.dropout_layer(outputs) 202 | outputs = self.layer_norm(outputs) 203 | outputs = self.ffn(outputs) 204 | return outputs, attention_outputs 205 | -------------------------------------------------------------------------------- /code/scripts/loss.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | from mxnet.gluon.loss import Loss, SoftmaxCELoss 4 | 5 | class SoftmaxCEMaskedLoss(SoftmaxCELoss): 6 | """Wrapper of the SoftmaxCELoss that supports valid_length as the input 7 | 8 | """ 9 | def hybrid_forward(self, F, pred, label, valid_length): # pylint: disable=arguments-differ 10 | """ 11 | Parameters 12 | ---------- 13 | F 14 | pred : Symbol or NDArray 15 | Shape (batch_size, length, V) 16 | label : Symbol or NDArray 17 | Shape (batch_size, length) 18 | valid_length : Symbol or NDArray 19 | Shape (batch_size, ) 20 | Returns 21 | ------- 22 | loss : Symbol or NDArray 23 | Shape (batch_size) 24 | """ 25 | if self._sparse_label: 26 | sample_weight = F.cast(F.expand_dims(F.ones_like(label), axis=-1), dtype=np.float32) 27 | else: 28 | sample_weight = F.ones_like(label) 29 | 30 | sample_weight = F.SequenceMask(sample_weight, 31 | sequence_length=valid_length, 32 | use_sequence_length=True, 33 | axis=1) 34 | 35 | return super(SoftmaxCEMaskedLoss, self).hybrid_forward(F, pred, label, sample_weight) 36 | 37 | 38 | class ICSLLoss(Loss): 39 | """Loss for IC/SL task. 40 | 41 | """ 42 | 43 | def __init__(self, sparse_label=True, weight=None, batch_axis=0, **kwargs): # pylint: disable=unused-argument 44 | super(ICSLLoss, self).__init__( 45 | weight=weight, batch_axis=batch_axis, **kwargs) 46 | self.ce_loss = SoftmaxCELoss() 47 | self.masked_ce_loss = SoftmaxCEMaskedLoss(sparse_label=sparse_label) 48 | 49 | def hybrid_forward(self, F, intent_pred, slot_pred, intent_label, slot_label, valid_length): # pylint: disable=arguments-differ 50 | """ 51 | Parameters 52 | ---------- 53 | intent_pred : intent prediction, shape (batch_size, num_intents) 54 | slot_pred : slot prediction, shape (batch_size, seq_length, num_slot_labels) 55 | intent_label : intent label, shape (batch_size) 56 | slot_label: slot label, shape (batch_size, seq_length) 57 | 58 | Returns 59 | ------- 60 | outputs : NDArray 61 | Shape (batch_size) 62 | """ 63 | intent_loss = self.ce_loss(intent_pred, intent_label) 64 | slot_loss = self.masked_ce_loss(slot_pred, slot_label, valid_length) 65 | return intent_loss + slot_loss 66 | -------------------------------------------------------------------------------- /code/scripts/lstm_alone.py: -------------------------------------------------------------------------------- 1 | import collections 2 | import gluonnlp as nlp 3 | import logging 4 | import mxnet as mx 5 | import os 6 | import sklearn.metrics 7 | import subprocess 8 | import sys 9 | import time 10 | import warnings 11 | 12 | from bert import * 13 | from mxnet import gluon 14 | from mxnet.gluon import Block, nn, rnn 15 | 16 | from loss import * 17 | from utils import * 18 | 19 | random_seed = int(sys.argv[1]) 20 | warnings.filterwarnings('ignore') 21 | data_dir = "../data/" 22 | model_dir = "../exp/" 23 | conll_prediction_file = data_dir + "conll.pred" 24 | 25 | PAD = '[PAD]' 26 | mx.random.seed(random_seed) 27 | ctx = [mx.gpu(0)] 28 | 29 | log = logging.getLogger('gluonnlp') 30 | log.setLevel(logging.DEBUG) 31 | formatter = logging.Formatter(fmt='[%(levelname)s] %(name)s:%(asctime)s %(message)s', datefmt='%H:%M:%S') 32 | fh = logging.FileHandler(os.path.join(model_dir, 'lstm.' + str(random_seed) + '.log'), mode='w') 33 | fh.setLevel(logging.INFO) 34 | fh.setFormatter(formatter) 35 | console = logging.StreamHandler() 36 | console.setLevel(logging.INFO) 37 | console.setFormatter(formatter) 38 | log.addHandler(console) 39 | log.addHandler(fh) 40 | 41 | class ICSL(Block): 42 | """Model for IC/SL task. 43 | 44 | The model feeds token ids into a biLSTM to get the sequence 45 | representations, then apply two dense layers for IC/SL task. 46 | """ 47 | 48 | def __init__(self, vocab_size, num_slot_labels, num_intents, embed_size=256, rnn_hidden_size=128, rnn_layers=1, rnn_dropout=.1, embed_dropout=.1, prefix=None, params=None): 49 | super(ICSL, self).__init__(prefix=prefix, params=params) 50 | with self.name_scope(): 51 | # Embedding 52 | self.word_embed = nn.Embedding(input_dim=vocab_size, output_dim=embed_size) 53 | self.embed_dropout = nn.Dropout(rate=embed_dropout) 54 | # RNN encoder 55 | self.rnn = rnn.LSTM(rnn_hidden_size, num_layers=rnn_layers, bidirectional=True, dropout=rnn_dropout) 56 | # IC/SL classifier 57 | self.slot_classifier = nn.Dense(units=num_slot_labels, 58 | flatten=False) 59 | self.intent_classifier = nn.Dense(units=num_intents, 60 | flatten=False) 61 | 62 | def forward(self, inputs): # pylint: disable=arguments-differ 63 | """Generate unnormalized scores for the given input sequences. 64 | 65 | Parameters 66 | ---------- 67 | inputs : NDArray, shape (batch_size, seq_length) 68 | Input words for the sequences. 69 | 70 | Returns 71 | ------- 72 | intent_prediction: NDArray 73 | Shape (batch_size, num_intents) 74 | slot_prediction : NDArray 75 | Shape (batch_size, seq_length, num_slot_labels) 76 | """ 77 | # embed: (batch_size, seq_length, embed_size) 78 | embed = self.word_embed(inputs) 79 | embed = self.embed_dropout(embed) 80 | # hidden: (seq_length, batch_size, rnn_hidden_size * 2) 81 | hidden = self.rnn(embed.swapaxes(0, 1)) 82 | # hidden: (batch_size, seq_length, rnn_hidden_size * 2) 83 | hidden = hidden.swapaxes(0, 1) 84 | # get intent and slot label predictions 85 | intent_prediction = self.intent_classifier(hidden[:, 0, :]) 86 | slot_prediction = self.slot_classifier(hidden[:, 1:, :]) 87 | return intent_prediction, slot_prediction 88 | 89 | 90 | def train(model_name, train_input, dev_input): 91 | """Training function.""" 92 | ## Arguments 93 | log_interval = 100 94 | batch_size = 32 95 | lr = 1e-3 96 | optimizer = 'adam' 97 | accumulate = None 98 | epochs = 20 99 | 100 | ## Load BERT model and vocabulary 101 | bert, vocabulary = nlp.model.get_model('bert_12_768_12', 102 | dataset_name='wiki_multilingual_uncased', 103 | pretrained=True, 104 | ctx=ctx, 105 | use_pooler=False, 106 | use_decoder=False, 107 | use_classifier=False) 108 | 109 | model = ICSL(len(vocabulary), num_slot_labels=len(label2idx), num_intents=len(intent2idx)) 110 | model.initialize(init=mx.init.Uniform(0.1), ctx=ctx) 111 | model.hybridize(static_alloc=True) 112 | 113 | icsl_loss_function = ICSLLoss() 114 | icsl_loss_function.hybridize(static_alloc=True) 115 | 116 | ic_metric = mx.metric.Accuracy() 117 | sl_metric = mx.metric.Accuracy() 118 | 119 | ## Load labeled data 120 | field_separator = nlp.data.Splitter('\t') 121 | # fields to select from the file: utterance, slot labels, intent, uid 122 | field_indices = [1, 3, 4, 0] 123 | train_data = nlp.data.TSVDataset(filename=train_input, 124 | field_separator=field_separator, 125 | num_discard_samples=1, 126 | field_indices=field_indices) 127 | 128 | # use the vocabulary from pre-trained model for tokenization 129 | bert_tokenizer = nlp.data.BERTTokenizer(vocabulary, lower=True) 130 | train_data_transform = train_data.transform(fn=lambda x: icsl_transform(x, vocabulary, label2idx, intent2idx, bert_tokenizer)[0]) 131 | # create data loader 132 | pad_token_id = vocabulary[PAD] 133 | pad_label_id = label2idx[PAD] 134 | batchify_fn = nlp.data.batchify.Tuple( 135 | nlp.data.batchify.Stack(), 136 | nlp.data.batchify.Pad(axis=0, pad_val=pad_token_id), 137 | nlp.data.batchify.Pad(axis=0, pad_val=pad_label_id), 138 | nlp.data.batchify.Stack('float32'), 139 | nlp.data.batchify.Stack('float32')) 140 | train_sampler = nlp.data.FixedBucketSampler(lengths=[len(item[1]) for item in train_data_transform], 141 | batch_size=batch_size, 142 | shuffle=True) 143 | train_dataloader = mx.gluon.data.DataLoader(train_data_transform, 144 | batchify_fn=batchify_fn, 145 | batch_sampler=train_sampler) 146 | 147 | optimizer_params = {'learning_rate': lr} 148 | trainer = gluon.Trainer(model.collect_params(), optimizer, 149 | optimizer_params, update_on_kvstore=False) 150 | 151 | # Collect differentiable parameters 152 | params = [p for p in model.collect_params().values() if p.grad_req != 'null'] 153 | # Set grad_req if gradient accumulation is required 154 | if accumulate: 155 | for p in params: 156 | p[1].grad_req = 'add' 157 | 158 | epoch_tic = time.time() 159 | total_num = 0 160 | log_num = 0 161 | best_score = (0, 0) 162 | for epoch_id in range(epochs): 163 | step_loss = 0 164 | tic = time.time() 165 | # train on labeled data 166 | for batch_id, data in enumerate(train_dataloader): 167 | # forward and backward 168 | with mx.autograd.record(): 169 | if data[0].shape[0] < len(ctx): 170 | data = split_and_load(data, [ctx[0]]) 171 | else: 172 | data = split_and_load(data, ctx) 173 | for chunk in data: 174 | _, token_ids, slot_label, intent_label, valid_length = chunk 175 | 176 | log_num += len(token_ids) 177 | total_num += len(token_ids) 178 | 179 | # forward computation 180 | intent_pred, slot_pred = model(token_ids) 181 | ls = icsl_loss_function(intent_pred, slot_pred, intent_label, slot_label, valid_length - 2).mean() 182 | 183 | if accumulate: 184 | ls = ls / accumulate 185 | ls.backward() 186 | step_loss += ls.asscalar() 187 | 188 | # update 189 | if not accumulate or (batch_id + 1) % accumulate == 0: 190 | trainer.allreduce_grads() 191 | nlp.utils.clip_grad_global_norm(params, 1) 192 | trainer.update(1, ignore_stale_grad=True) 193 | 194 | if (batch_id + 1) % log_interval == 0: 195 | toc = time.time() 196 | # update metrics 197 | ic_metric.update([intent_label], [intent_pred]) 198 | sl_metric.update(*process_seq_labels(slot_label, slot_pred, ignore_id=pad_label_id)) 199 | log.info('Epoch: {}, Batch: {}/{}, speed: {:.2f} samples/s, lr={:.7f}, loss={:.4f}, intent acc={:.3f}, slot acc={:.3f}' 200 | .format(epoch_id, 201 | batch_id, 202 | len(train_dataloader), 203 | log_num / (toc - tic), 204 | trainer.learning_rate, 205 | step_loss / log_interval, 206 | ic_metric.get()[1], 207 | sl_metric.get()[1])) 208 | 209 | tic = time.time() 210 | step_loss = 0 211 | log_num = 0 212 | 213 | mx.nd.waitall() 214 | epoch_toc = time.time() 215 | log.info('Time cost: {:.2f} s, Speed: {:.2f} samples/s' 216 | .format(epoch_toc - epoch_tic, total_num/(epoch_toc - epoch_tic))) 217 | 218 | # evaluate on development set 219 | log.info('Evaluate on development set:') 220 | intent_acc, slot_f1 = evaluate(model=model, eval_input=dev_input) 221 | if slot_f1 > best_score[1]: 222 | best_score = (intent_acc, slot_f1) 223 | model.save_parameters(os.path.join(model_dir, model_name + '.params')) 224 | 225 | 226 | def evaluate(model=None, model_name='', eval_input=''): 227 | """Evaluate the model on validation dataset. 228 | """ 229 | ## Load model 230 | bert, vocabulary = nlp.model.get_model('bert_12_768_12', 231 | dataset_name='wiki_multilingual_uncased', 232 | pretrained=True, 233 | ctx=ctx, 234 | use_pooler=False, 235 | use_decoder=False, 236 | use_classifier=False) 237 | if model is None: 238 | assert model_name != '' 239 | model = ICSL(len(vocabulary), num_slot_labels=len(label2idx), num_intents=len(intent2idx)) 240 | model.initialize(ctx=ctx) 241 | model.hybridize(static_alloc=True) 242 | model.load_parameters(os.path.join(model_dir, model_name + '.params')) 243 | 244 | idx2label = {} 245 | for label, idx in label2idx.items(): 246 | idx2label[idx] = label 247 | ## Load dev dataset 248 | field_separator = nlp.data.Splitter('\t') 249 | field_indices = [1, 3, 4, 0] 250 | eval_data = nlp.data.TSVDataset(filename=eval_input, 251 | field_separator=field_separator, 252 | num_discard_samples=1, 253 | field_indices=field_indices) 254 | 255 | bert_tokenizer = nlp.data.BERTTokenizer(vocabulary, lower=True) 256 | 257 | dev_alignment = {} 258 | eval_data_transform = [] 259 | for sample in eval_data: 260 | sample, alignment = icsl_transform(sample, vocabulary, label2idx, intent2idx, bert_tokenizer) 261 | eval_data_transform += [sample] 262 | dev_alignment[sample[0]] = alignment 263 | log.info('The number of examples after preprocessing: {}' 264 | .format(len(eval_data_transform))) 265 | 266 | test_batch_size = 16 267 | pad_token_id = vocabulary[PAD] 268 | pad_label_id = label2idx[PAD] 269 | batchify_fn = nlp.data.batchify.Tuple( 270 | nlp.data.batchify.Stack(), 271 | nlp.data.batchify.Pad(axis=0, pad_val=pad_token_id), 272 | nlp.data.batchify.Pad(axis=0, pad_val=pad_label_id), 273 | nlp.data.batchify.Stack('float32'), 274 | nlp.data.batchify.Stack('float32')) 275 | eval_dataloader = mx.gluon.data.DataLoader( 276 | eval_data_transform, 277 | batchify_fn=batchify_fn, 278 | num_workers=4, batch_size=test_batch_size, shuffle=False, last_batch='keep') 279 | 280 | _Result = collections.namedtuple( 281 | '_Result', ['intent', 'slot_labels']) 282 | all_results = {} 283 | 284 | total_num = 0 285 | for data in eval_dataloader: 286 | example_ids, token_ids, _, _, valid_length = data 287 | total_num += len(token_ids) 288 | # load data to GPU 289 | token_ids = token_ids.astype('float32').as_in_context(ctx[0]) 290 | valid_length = valid_length.astype('float32').as_in_context(ctx[0]) 291 | 292 | # forward computation 293 | intent_pred, slot_pred = model(token_ids) 294 | intent_pred = intent_pred.asnumpy() 295 | slot_pred = slot_pred.asnumpy() 296 | valid_length = valid_length.asnumpy() 297 | 298 | for eid, y_intent, y_slot, length in zip(example_ids, intent_pred, slot_pred, valid_length): 299 | eid = eid.asscalar() 300 | length = int(length) - 2 301 | intent_id = y_intent.argmax(axis=-1) 302 | slot_ids = y_slot.argmax(axis=-1).tolist()[:length] 303 | slot_names = [idx2label[idx] for idx in slot_ids] 304 | merged_slot_names = merge_slots(slot_names, dev_alignment[eid] + [length]) 305 | if eid not in all_results: 306 | all_results[eid] = _Result(intent_id, merged_slot_names) 307 | 308 | example_ids, utterances, labels, intents = load_tsv(eval_input) 309 | pred_intents = [] 310 | label_intents = [] 311 | for eid, intent in zip(example_ids, intents): 312 | label_intents.append(label2index(intent2idx, intent)) 313 | pred_intents.append(all_results[eid].intent) 314 | intent_acc = sklearn.metrics.accuracy_score(label_intents, pred_intents) 315 | log.info("Intent Accuracy: %.4f" % intent_acc) 316 | 317 | pred_icsl = [] 318 | label_icsl = [] 319 | for eid, intent, slot_labels in zip(example_ids, intents, labels): 320 | label_icsl.append(str(label2index(intent2idx, intent)) + ' ' + ' '.join(slot_labels)) 321 | pred_icsl.append(str(all_results[eid].intent) + ' ' + ' '.join(all_results[eid].slot_labels)) 322 | exact_match = sklearn.metrics.accuracy_score(label_icsl, pred_icsl) 323 | log.info("Exact Match: %.4f" % exact_match) 324 | 325 | with open(conll_prediction_file, "w") as fw: 326 | for eid, utterance, labels in zip(example_ids, utterances, labels): 327 | preds = all_results[eid].slot_labels 328 | for w, l, p in zip(utterance, labels, preds): 329 | fw.write(' '.join([w, l, p]) + '\n') 330 | fw.write('\n') 331 | proc = subprocess.Popen(["perl", "conlleval.pl"], stdin=subprocess.PIPE, stdout=subprocess.PIPE) 332 | with open(conll_prediction_file) as f: 333 | stdout = proc.communicate(f.read().encode())[0] 334 | result = stdout.decode('utf-8').split('\n')[1] 335 | slot_f1 = float(result.split()[-1].strip()) 336 | log.info("Slot Labeling: %s" % result) 337 | return intent_acc, slot_f1 338 | 339 | 340 | # extract labels 341 | train_input = data_dir + 'atis_train.tsv' 342 | intent2idx, label2idx = get_label_indices(train_input) 343 | 344 | # Train 345 | for lang in ['EN', 'ES', 'DE', 'ZH', 'JA', 'PT', 'FR', 'HI', 'TR']: 346 | log.info('Train on %s:' % lang) 347 | model_name = 'model_lstm_' + lang + '.' + str(random_seed) 348 | if lang == 'EN': 349 | train_input = data_dir + 'atis_train.tsv' 350 | dev_input = data_dir + 'atis_dev.tsv' 351 | else: 352 | train_input = data_dir + 'atis_train_' + lang + '.tsv' 353 | dev_input = data_dir + 'atis_dev_' + lang + '.tsv' 354 | train(model_name, train_input, dev_input) 355 | 356 | # Evaluate 357 | log.info('==========Supervised learning==========') 358 | for lang in ['EN', 'ES', 'DE', 'ZH', 'JA', 'PT', 'FR', 'HI', 'TR']: 359 | log.info('Evaluate on %s:' % lang) 360 | model_name = 'model_lstm_' + lang + '.' + str(random_seed) 361 | if lang == 'EN': 362 | test_input = data_dir + 'atis_test.tsv' 363 | else: 364 | test_input = data_dir + 'atis_test_' + lang + '.tsv' 365 | evaluate(model_name=model_name, eval_input=test_input) 366 | 367 | log.info('==========Transfer learning==========') 368 | src_lang = 'EN' 369 | for lang in ['EN', 'ES', 'DE', 'ZH', 'JA', 'PT', 'FR', 'HI', 'TR']: 370 | log.info('Evaluate on %s:' % lang) 371 | model_name = 'model_lstm_' + src_lang + '.' + str(random_seed) 372 | if lang == 'EN': 373 | test_input = data_dir + 'atis_test.tsv' 374 | else: 375 | test_input = data_dir + 'atis_test_' + lang + '.tsv' 376 | evaluate(model_name=model_name, eval_input=test_input) 377 | -------------------------------------------------------------------------------- /code/scripts/lstm_joint.py: -------------------------------------------------------------------------------- 1 | import collections 2 | import gluonnlp as nlp 3 | import logging 4 | import mxnet as mx 5 | import os 6 | import sklearn.metrics 7 | import subprocess 8 | import sys 9 | import time 10 | import warnings 11 | 12 | from bert import * 13 | from mxnet import gluon 14 | from mxnet.gluon import Block, nn, rnn 15 | 16 | from loss import * 17 | from utils import * 18 | 19 | random_seed = int(sys.argv[1]) 20 | warnings.filterwarnings('ignore') 21 | data_dir = "../data/" 22 | model_dir = "../exp/" 23 | conll_prediction_file = data_dir + "conll.pred" 24 | 25 | PAD = '[PAD]' 26 | mx.random.seed(random_seed) 27 | ctx = [mx.gpu(0)] 28 | 29 | log = logging.getLogger('gluonnlp') 30 | log.setLevel(logging.DEBUG) 31 | formatter = logging.Formatter(fmt='[%(levelname)s] %(name)s:%(asctime)s %(message)s', datefmt='%H:%M:%S') 32 | fh = logging.FileHandler(os.path.join(model_dir, 'lstm_joint.' + str(random_seed) + '.log'), mode='w') 33 | fh.setLevel(logging.INFO) 34 | fh.setFormatter(formatter) 35 | console = logging.StreamHandler() 36 | console.setLevel(logging.INFO) 37 | console.setFormatter(formatter) 38 | log.addHandler(console) 39 | log.addHandler(fh) 40 | 41 | class ICSL(Block): 42 | """Model for IC/SL task. 43 | 44 | The model feeds token ids into a biLSTM to get the sequence 45 | representations, then apply two dense layers for IC/SL task. 46 | """ 47 | 48 | def __init__(self, vocab_size, num_slot_labels, num_intents, embed_size=256, rnn_hidden_size=128, rnn_layers=1, rnn_dropout=.1, embed_dropout=.1, prefix=None, params=None): 49 | super(ICSL, self).__init__(prefix=prefix, params=params) 50 | with self.name_scope(): 51 | # Embedding 52 | self.word_embed = nn.Embedding(input_dim=vocab_size, output_dim=embed_size) 53 | self.embed_dropout = nn.Dropout(rate=embed_dropout) 54 | # RNN encoder 55 | self.rnn = rnn.LSTM(rnn_hidden_size, num_layers=rnn_layers, bidirectional=True, dropout=rnn_dropout) 56 | # IC/SL classifier 57 | self.slot_classifier = nn.Dense(units=num_slot_labels, 58 | flatten=False) 59 | self.intent_classifier = nn.Dense(units=num_intents, 60 | flatten=False) 61 | 62 | def forward(self, inputs): # pylint: disable=arguments-differ 63 | """Generate unnormalized scores for the given input sequences. 64 | 65 | Parameters 66 | ---------- 67 | inputs : NDArray, shape (batch_size, seq_length) 68 | Input words for the sequences. 69 | 70 | Returns 71 | ------- 72 | intent_prediction: NDArray 73 | Shape (batch_size, num_intents) 74 | slot_prediction : NDArray 75 | Shape (batch_size, seq_length, num_slot_labels) 76 | """ 77 | # embed: (batch_size, seq_length, embed_size) 78 | embed = self.word_embed(inputs) 79 | embed = self.embed_dropout(embed) 80 | # hidden: (seq_length, batch_size, rnn_hidden_size * 2) 81 | hidden = self.rnn(embed.swapaxes(0, 1)) 82 | # hidden: (batch_size, seq_length, rnn_hidden_size * 2) 83 | hidden = hidden.swapaxes(0, 1) 84 | # get intent and slot label predictions 85 | intent_prediction = self.intent_classifier(hidden[:, 0, :]) 86 | slot_prediction = self.slot_classifier(hidden[:, 1:, :]) 87 | return intent_prediction, slot_prediction 88 | 89 | 90 | def train(model_name, train_input, dev_input): 91 | """Training function.""" 92 | ## Arguments 93 | log_interval = 100 94 | batch_size = 32 95 | lr = 1e-3 96 | optimizer = 'adam' 97 | accumulate = None 98 | epochs = 20 99 | 100 | ## Load BERT model and vocabulary 101 | bert, vocabulary = nlp.model.get_model('bert_12_768_12', 102 | dataset_name='wiki_multilingual_uncased', 103 | pretrained=True, 104 | ctx=ctx, 105 | use_pooler=False, 106 | use_decoder=False, 107 | use_classifier=False) 108 | 109 | model = ICSL(len(vocabulary), num_slot_labels=len(label2idx), num_intents=len(intent2idx)) 110 | model.initialize(init=mx.init.Uniform(0.1), ctx=ctx) 111 | model.hybridize(static_alloc=True) 112 | 113 | icsl_loss_function = ICSLLoss() 114 | icsl_loss_function.hybridize(static_alloc=True) 115 | 116 | ic_metric = mx.metric.Accuracy() 117 | sl_metric = mx.metric.Accuracy() 118 | 119 | ## Load labeled data 120 | field_separator = nlp.data.Splitter('\t') 121 | # fields to select from the file: utterance, slot labels, intent, uid 122 | field_indices = [1, 3, 4, 0] 123 | train_data = nlp.data.TSVDataset(filename=train_input, 124 | field_separator=field_separator, 125 | num_discard_samples=1, 126 | field_indices=field_indices) 127 | 128 | # use the vocabulary from pre-trained model for tokenization 129 | bert_tokenizer = nlp.data.BERTTokenizer(vocabulary, lower=True) 130 | train_data_transform = train_data.transform(fn=lambda x: icsl_transform(x, vocabulary, label2idx, intent2idx, bert_tokenizer)[0]) 131 | # create data loader 132 | pad_token_id = vocabulary[PAD] 133 | pad_label_id = label2idx[PAD] 134 | batchify_fn = nlp.data.batchify.Tuple( 135 | nlp.data.batchify.Stack(), 136 | nlp.data.batchify.Pad(axis=0, pad_val=pad_token_id), 137 | nlp.data.batchify.Pad(axis=0, pad_val=pad_label_id), 138 | nlp.data.batchify.Stack('float32'), 139 | nlp.data.batchify.Stack('float32')) 140 | train_sampler = nlp.data.FixedBucketSampler(lengths=[len(item[1]) for item in train_data_transform], 141 | batch_size=batch_size, 142 | shuffle=True) 143 | train_dataloader = mx.gluon.data.DataLoader(train_data_transform, 144 | batchify_fn=batchify_fn, 145 | batch_sampler=train_sampler) 146 | 147 | optimizer_params = {'learning_rate': lr} 148 | trainer = gluon.Trainer(model.collect_params(), optimizer, 149 | optimizer_params, update_on_kvstore=False) 150 | 151 | # Collect differentiable parameters 152 | params = [p for p in model.collect_params().values() if p.grad_req != 'null'] 153 | # Set grad_req if gradient accumulation is required 154 | if accumulate: 155 | for p in params: 156 | p[1].grad_req = 'add' 157 | 158 | epoch_tic = time.time() 159 | total_num = 0 160 | log_num = 0 161 | best_score = (0, 0) 162 | for epoch_id in range(epochs): 163 | step_loss = 0 164 | tic = time.time() 165 | # train on labeled data 166 | for batch_id, data in enumerate(train_dataloader): 167 | # forward and backward 168 | with mx.autograd.record(): 169 | if data[0].shape[0] < len(ctx): 170 | data = split_and_load(data, [ctx[0]]) 171 | else: 172 | data = split_and_load(data, ctx) 173 | for chunk in data: 174 | _, token_ids, slot_label, intent_label, valid_length = chunk 175 | 176 | log_num += len(token_ids) 177 | total_num += len(token_ids) 178 | 179 | # forward computation 180 | intent_pred, slot_pred = model(token_ids) 181 | ls = icsl_loss_function(intent_pred, slot_pred, intent_label, slot_label, valid_length - 2).mean() 182 | 183 | if accumulate: 184 | ls = ls / accumulate 185 | ls.backward() 186 | step_loss += ls.asscalar() 187 | 188 | # update 189 | if not accumulate or (batch_id + 1) % accumulate == 0: 190 | trainer.allreduce_grads() 191 | nlp.utils.clip_grad_global_norm(params, 1) 192 | trainer.update(1, ignore_stale_grad=True) 193 | 194 | if (batch_id + 1) % log_interval == 0: 195 | toc = time.time() 196 | # update metrics 197 | ic_metric.update([intent_label], [intent_pred]) 198 | sl_metric.update(*process_seq_labels(slot_label, slot_pred, ignore_id=pad_label_id)) 199 | log.info('Epoch: {}, Batch: {}/{}, speed: {:.2f} samples/s, lr={:.7f}, loss={:.4f}, intent acc={:.3f}, slot acc={:.3f}' 200 | .format(epoch_id, 201 | batch_id, 202 | len(train_dataloader), 203 | log_num / (toc - tic), 204 | trainer.learning_rate, 205 | step_loss / log_interval, 206 | ic_metric.get()[1], 207 | sl_metric.get()[1])) 208 | 209 | tic = time.time() 210 | step_loss = 0 211 | log_num = 0 212 | 213 | mx.nd.waitall() 214 | epoch_toc = time.time() 215 | log.info('Time cost: {:.2f} s, Speed: {:.2f} samples/s' 216 | .format(epoch_toc - epoch_tic, total_num/(epoch_toc - epoch_tic))) 217 | 218 | # evaluate on development set 219 | log.info('Evaluate on development set:') 220 | intent_acc, slot_f1 = evaluate(model=model, eval_input=dev_input) 221 | if slot_f1 > best_score[1]: 222 | best_score = (intent_acc, slot_f1) 223 | model.save_parameters(os.path.join(model_dir, model_name + '.params')) 224 | 225 | 226 | def evaluate(model=None, model_name='', eval_input=''): 227 | """Evaluate the model on validation dataset. 228 | """ 229 | ## Load model 230 | bert, vocabulary = nlp.model.get_model('bert_12_768_12', 231 | dataset_name='wiki_multilingual_uncased', 232 | pretrained=True, 233 | ctx=ctx, 234 | use_pooler=False, 235 | use_decoder=False, 236 | use_classifier=False) 237 | if model is None: 238 | assert model_name != '' 239 | model = ICSL(len(vocabulary), num_slot_labels=len(label2idx), num_intents=len(intent2idx)) 240 | model.initialize(ctx=ctx) 241 | model.hybridize(static_alloc=True) 242 | model.load_parameters(os.path.join(model_dir, model_name + '.params')) 243 | 244 | idx2label = {} 245 | for label, idx in label2idx.items(): 246 | idx2label[idx] = label 247 | ## Load dev dataset 248 | field_separator = nlp.data.Splitter('\t') 249 | field_indices = [1, 3, 4, 0] 250 | eval_data = nlp.data.TSVDataset(filename=eval_input, 251 | field_separator=field_separator, 252 | num_discard_samples=1, 253 | field_indices=field_indices) 254 | 255 | bert_tokenizer = nlp.data.BERTTokenizer(vocabulary, lower=True) 256 | 257 | dev_alignment = {} 258 | eval_data_transform = [] 259 | for sample in eval_data: 260 | sample, alignment = icsl_transform(sample, vocabulary, label2idx, intent2idx, bert_tokenizer) 261 | eval_data_transform += [sample] 262 | dev_alignment[sample[0]] = alignment 263 | log.info('The number of examples after preprocessing: {}' 264 | .format(len(eval_data_transform))) 265 | 266 | test_batch_size = 16 267 | pad_token_id = vocabulary[PAD] 268 | pad_label_id = label2idx[PAD] 269 | batchify_fn = nlp.data.batchify.Tuple( 270 | nlp.data.batchify.Stack(), 271 | nlp.data.batchify.Pad(axis=0, pad_val=pad_token_id), 272 | nlp.data.batchify.Pad(axis=0, pad_val=pad_label_id), 273 | nlp.data.batchify.Stack('float32'), 274 | nlp.data.batchify.Stack('float32')) 275 | eval_dataloader = mx.gluon.data.DataLoader( 276 | eval_data_transform, 277 | batchify_fn=batchify_fn, 278 | num_workers=4, batch_size=test_batch_size, shuffle=False, last_batch='keep') 279 | 280 | _Result = collections.namedtuple( 281 | '_Result', ['intent', 'slot_labels']) 282 | all_results = {} 283 | 284 | total_num = 0 285 | for data in eval_dataloader: 286 | example_ids, token_ids, _, _, valid_length = data 287 | total_num += len(token_ids) 288 | # load data to GPU 289 | token_ids = token_ids.astype('float32').as_in_context(ctx[0]) 290 | valid_length = valid_length.astype('float32').as_in_context(ctx[0]) 291 | 292 | # forward computation 293 | intent_pred, slot_pred = model(token_ids) 294 | intent_pred = intent_pred.asnumpy() 295 | slot_pred = slot_pred.asnumpy() 296 | valid_length = valid_length.asnumpy() 297 | 298 | for eid, y_intent, y_slot, length in zip(example_ids, intent_pred, slot_pred, valid_length): 299 | eid = eid.asscalar() 300 | length = int(length) - 2 301 | intent_id = y_intent.argmax(axis=-1) 302 | slot_ids = y_slot.argmax(axis=-1).tolist()[:length] 303 | slot_names = [idx2label[idx] for idx in slot_ids] 304 | merged_slot_names = merge_slots(slot_names, dev_alignment[eid] + [length]) 305 | if eid not in all_results: 306 | all_results[eid] = _Result(intent_id, merged_slot_names) 307 | 308 | example_ids, utterances, labels, intents = load_tsv(eval_input) 309 | pred_intents = [] 310 | label_intents = [] 311 | for eid, intent in zip(example_ids, intents): 312 | label_intents.append(label2index(intent2idx, intent)) 313 | pred_intents.append(all_results[eid].intent) 314 | intent_acc = sklearn.metrics.accuracy_score(label_intents, pred_intents) 315 | log.info("Intent Accuracy: %.4f" % intent_acc) 316 | 317 | pred_icsl = [] 318 | label_icsl = [] 319 | for eid, intent, slot_labels in zip(example_ids, intents, labels): 320 | label_icsl.append(str(label2index(intent2idx, intent)) + ' ' + ' '.join(slot_labels)) 321 | pred_icsl.append(str(all_results[eid].intent) + ' ' + ' '.join(all_results[eid].slot_labels)) 322 | exact_match = sklearn.metrics.accuracy_score(label_icsl, pred_icsl) 323 | log.info("Exact Match: %.4f" % exact_match) 324 | 325 | with open(conll_prediction_file, "w") as fw: 326 | for eid, utterance, labels in zip(example_ids, utterances, labels): 327 | preds = all_results[eid].slot_labels 328 | for w, l, p in zip(utterance, labels, preds): 329 | fw.write(' '.join([w, l, p]) + '\n') 330 | fw.write('\n') 331 | proc = subprocess.Popen(["perl", "conlleval.pl"], stdin=subprocess.PIPE, stdout=subprocess.PIPE) 332 | with open(conll_prediction_file) as f: 333 | stdout = proc.communicate(f.read().encode())[0] 334 | result = stdout.decode('utf-8').split('\n')[1] 335 | slot_f1 = float(result.split()[-1].strip()) 336 | log.info("Slot Labeling: %s" % result) 337 | return intent_acc, slot_f1 338 | 339 | 340 | # extract labels 341 | train_input = data_dir + 'atis_train.tsv' 342 | intent2idx, label2idx = get_label_indices(train_input) 343 | 344 | # Train 345 | log.info('Train on all languages:') 346 | model_name = 'model_lstm_joint.' + str(random_seed) 347 | train_input = data_dir + 'atis_train_all.tsv' 348 | dev_input = data_dir + 'atis_dev.tsv' 349 | train(model_name, train_input, dev_input) 350 | 351 | # Evaluate 352 | for lang in ['EN', 'ES', 'DE', 'ZH', 'JA', 'PT', 'FR', 'HI', 'TR']: 353 | log.info('Evaluate on %s:' % lang) 354 | model_name = 'model_lstm_joint.' + str(random_seed) 355 | if lang == 'EN': 356 | test_input = data_dir + 'atis_test.tsv' 357 | else: 358 | test_input = data_dir + 'atis_test_' + lang + '.tsv' 359 | evaluate(model_name=model_name, eval_input=test_input) 360 | -------------------------------------------------------------------------------- /code/scripts/lstm_mt.py: -------------------------------------------------------------------------------- 1 | import collections 2 | import gluonnlp as nlp 3 | import logging 4 | import mxnet as mx 5 | import os 6 | import sklearn.metrics 7 | import subprocess 8 | import sys 9 | import time 10 | import warnings 11 | 12 | from bert import * 13 | from mxnet import gluon 14 | from mxnet.gluon import Block, nn, rnn 15 | 16 | from loss import * 17 | from utils import * 18 | 19 | random_seed = int(sys.argv[1]) 20 | warnings.filterwarnings('ignore') 21 | data_dir = "../data/" 22 | model_dir = "../exp/" 23 | conll_prediction_file = data_dir + "conll.pred" 24 | 25 | PAD = '[PAD]' 26 | mx.random.seed(random_seed) 27 | ctx = [mx.gpu(2)] 28 | 29 | log = logging.getLogger('gluonnlp') 30 | log.setLevel(logging.DEBUG) 31 | formatter = logging.Formatter(fmt='[%(levelname)s] %(name)s:%(asctime)s %(message)s', datefmt='%H:%M:%S') 32 | fh = logging.FileHandler(os.path.join(model_dir, 'lstm_mt.' + str(random_seed) + '.log'), mode='w') 33 | fh.setLevel(logging.INFO) 34 | fh.setFormatter(formatter) 35 | console = logging.StreamHandler() 36 | console.setLevel(logging.INFO) 37 | console.setFormatter(formatter) 38 | log.addHandler(console) 39 | log.addHandler(fh) 40 | 41 | class ICSL(Block): 42 | """Model for IC/SL task. 43 | 44 | The model feeds token ids into a biLSTM to get the sequence 45 | representations, then apply two dense layers for IC/SL task. 46 | """ 47 | 48 | def __init__(self, vocab_size, num_slot_labels, num_intents, embed_size=256, rnn_hidden_size=128, rnn_layers=1, rnn_dropout=.1, embed_dropout=.1, prefix=None, params=None): 49 | super(ICSL, self).__init__(prefix=prefix, params=params) 50 | with self.name_scope(): 51 | # Embedding 52 | self.word_embed = nn.Embedding(input_dim=vocab_size, output_dim=embed_size) 53 | self.embed_dropout = nn.Dropout(rate=embed_dropout) 54 | # RNN encoder 55 | self.rnn = rnn.LSTM(rnn_hidden_size, num_layers=rnn_layers, bidirectional=True, dropout=rnn_dropout) 56 | # IC/SL classifier 57 | self.slot_classifier = nn.Dense(units=num_slot_labels, 58 | flatten=False) 59 | self.intent_classifier = nn.Dense(units=num_intents, 60 | flatten=False) 61 | 62 | def forward(self, inputs): # pylint: disable=arguments-differ 63 | """Generate unnormalized scores for the given input sequences. 64 | 65 | Parameters 66 | ---------- 67 | inputs : NDArray, shape (batch_size, seq_length) 68 | Input words for the sequences. 69 | 70 | Returns 71 | ------- 72 | intent_prediction: NDArray 73 | Shape (batch_size, num_intents) 74 | slot_prediction : NDArray 75 | Shape (batch_size, seq_length, num_slot_labels) 76 | """ 77 | # embed: (batch_size, seq_length, embed_size) 78 | embed = self.word_embed(inputs) 79 | embed = self.embed_dropout(embed) 80 | # hidden: (seq_length, batch_size, rnn_hidden_size * 2) 81 | hidden = self.rnn(embed.swapaxes(0, 1)) 82 | # hidden: (batch_size, seq_length, rnn_hidden_size * 2) 83 | hidden = hidden.swapaxes(0, 1) 84 | # get intent and slot label predictions 85 | intent_prediction = self.intent_classifier(hidden[:, 0, :]) 86 | slot_prediction = self.slot_classifier(hidden[:, 1:, :]) 87 | return intent_prediction, slot_prediction 88 | 89 | 90 | def train(model_name, train_input): 91 | """Training function.""" 92 | ## Arguments 93 | log_interval = 100 94 | batch_size = 32 95 | lr = 1e-3 96 | optimizer = 'adam' 97 | accumulate = None 98 | epochs = 20 99 | 100 | ## Load BERT model and vocabulary 101 | bert, vocabulary = nlp.model.get_model('bert_12_768_12', 102 | dataset_name='wiki_multilingual_uncased', 103 | pretrained=True, 104 | ctx=ctx, 105 | use_pooler=False, 106 | use_decoder=False, 107 | use_classifier=False) 108 | 109 | model = ICSL(len(vocabulary), num_slot_labels=len(label2idx), num_intents=len(intent2idx)) 110 | model.initialize(init=mx.init.Uniform(0.1), ctx=ctx) 111 | model.hybridize(static_alloc=True) 112 | 113 | icsl_loss_function = ICSLLoss() 114 | icsl_loss_function.hybridize(static_alloc=True) 115 | 116 | ic_metric = mx.metric.Accuracy() 117 | sl_metric = mx.metric.Accuracy() 118 | 119 | ## Load labeled data 120 | field_separator = nlp.data.Splitter('\t') 121 | # fields to select from the file: utterance, slot labels, intent, uid 122 | field_indices = [1, 3, 4, 0] 123 | train_data = nlp.data.TSVDataset(filename=train_input, 124 | field_separator=field_separator, 125 | num_discard_samples=1, 126 | field_indices=field_indices) 127 | 128 | # use the vocabulary from pre-trained model for tokenization 129 | bert_tokenizer = nlp.data.BERTTokenizer(vocabulary, lower=True) 130 | train_data_transform = train_data.transform(fn=lambda x: icsl_transform(x, vocabulary, label2idx, intent2idx, bert_tokenizer)[0]) 131 | # create data loader 132 | pad_token_id = vocabulary[PAD] 133 | pad_label_id = label2idx[PAD] 134 | batchify_fn = nlp.data.batchify.Tuple( 135 | nlp.data.batchify.Stack(), 136 | nlp.data.batchify.Pad(axis=0, pad_val=pad_token_id), 137 | nlp.data.batchify.Pad(axis=0, pad_val=pad_label_id), 138 | nlp.data.batchify.Stack('float32'), 139 | nlp.data.batchify.Stack('float32')) 140 | train_sampler = nlp.data.FixedBucketSampler(lengths=[len(item[1]) for item in train_data_transform], 141 | batch_size=batch_size, 142 | shuffle=True) 143 | train_dataloader = mx.gluon.data.DataLoader(train_data_transform, 144 | batchify_fn=batchify_fn, 145 | batch_sampler=train_sampler) 146 | 147 | optimizer_params = {'learning_rate': lr} 148 | trainer = gluon.Trainer(model.collect_params(), optimizer, 149 | optimizer_params, update_on_kvstore=False) 150 | 151 | # Collect differentiable parameters 152 | params = [p for p in model.collect_params().values() if p.grad_req != 'null'] 153 | # Set grad_req if gradient accumulation is required 154 | if accumulate: 155 | for p in params: 156 | p[1].grad_req = 'add' 157 | 158 | epoch_tic = time.time() 159 | total_num = 0 160 | log_num = 0 161 | for epoch_id in range(epochs): 162 | step_loss = 0 163 | tic = time.time() 164 | # train on labeled data 165 | for batch_id, data in enumerate(train_dataloader): 166 | # forward and backward 167 | with mx.autograd.record(): 168 | if data[0].shape[0] < len(ctx): 169 | data = split_and_load(data, [ctx[0]]) 170 | else: 171 | data = split_and_load(data, ctx) 172 | for chunk in data: 173 | _, token_ids, slot_label, intent_label, valid_length = chunk 174 | 175 | log_num += len(token_ids) 176 | total_num += len(token_ids) 177 | 178 | # forward computation 179 | intent_pred, slot_pred = model(token_ids) 180 | ls = icsl_loss_function(intent_pred, slot_pred, intent_label, slot_label, valid_length - 2).mean() 181 | 182 | if accumulate: 183 | ls = ls / accumulate 184 | ls.backward() 185 | step_loss += ls.asscalar() 186 | 187 | # update 188 | if not accumulate or (batch_id + 1) % accumulate == 0: 189 | trainer.allreduce_grads() 190 | nlp.utils.clip_grad_global_norm(params, 1) 191 | trainer.update(1, ignore_stale_grad=True) 192 | 193 | if (batch_id + 1) % log_interval == 0: 194 | toc = time.time() 195 | # update metrics 196 | ic_metric.update([intent_label], [intent_pred]) 197 | sl_metric.update(*process_seq_labels(slot_label, slot_pred, ignore_id=pad_label_id)) 198 | log.info('Epoch: {}, Batch: {}/{}, speed: {:.2f} samples/s, lr={:.7f}, loss={:.4f}, intent acc={:.3f}, slot acc={:.3f}' 199 | .format(epoch_id, 200 | batch_id, 201 | len(train_dataloader), 202 | log_num / (toc - tic), 203 | trainer.learning_rate, 204 | step_loss / log_interval, 205 | ic_metric.get()[1], 206 | sl_metric.get()[1])) 207 | 208 | tic = time.time() 209 | step_loss = 0 210 | log_num = 0 211 | 212 | mx.nd.waitall() 213 | epoch_toc = time.time() 214 | log.info('Time cost: {:.2f} s, Speed: {:.2f} samples/s' 215 | .format(epoch_toc - epoch_tic, total_num/(epoch_toc - epoch_tic))) 216 | model.save_parameters(os.path.join(model_dir, model_name + '.params')) 217 | 218 | 219 | def evaluate(model=None, model_name='', eval_input=''): 220 | """Evaluate the model on validation dataset. 221 | """ 222 | ## Load model 223 | bert, vocabulary = nlp.model.get_model('bert_12_768_12', 224 | dataset_name='wiki_multilingual_uncased', 225 | pretrained=True, 226 | ctx=ctx, 227 | use_pooler=False, 228 | use_decoder=False, 229 | use_classifier=False) 230 | if model is None: 231 | assert model_name != '' 232 | model = ICSL(len(vocabulary), num_slot_labels=len(label2idx), num_intents=len(intent2idx)) 233 | model.initialize(ctx=ctx) 234 | model.hybridize(static_alloc=True) 235 | model.load_parameters(os.path.join(model_dir, model_name + '.params')) 236 | 237 | idx2label = {} 238 | for label, idx in label2idx.items(): 239 | idx2label[idx] = label 240 | ## Load dev dataset 241 | field_separator = nlp.data.Splitter('\t') 242 | field_indices = [1, 3, 4, 0] 243 | eval_data = nlp.data.TSVDataset(filename=eval_input, 244 | field_separator=field_separator, 245 | num_discard_samples=1, 246 | field_indices=field_indices) 247 | 248 | bert_tokenizer = nlp.data.BERTTokenizer(vocabulary, lower=True) 249 | 250 | dev_alignment = {} 251 | eval_data_transform = [] 252 | for sample in eval_data: 253 | sample, alignment = icsl_transform(sample, vocabulary, label2idx, intent2idx, bert_tokenizer) 254 | eval_data_transform += [sample] 255 | dev_alignment[sample[0]] = alignment 256 | log.info('The number of examples after preprocessing: {}' 257 | .format(len(eval_data_transform))) 258 | 259 | test_batch_size = 16 260 | pad_token_id = vocabulary[PAD] 261 | pad_label_id = label2idx[PAD] 262 | batchify_fn = nlp.data.batchify.Tuple( 263 | nlp.data.batchify.Stack(), 264 | nlp.data.batchify.Pad(axis=0, pad_val=pad_token_id), 265 | nlp.data.batchify.Pad(axis=0, pad_val=pad_label_id), 266 | nlp.data.batchify.Stack('float32'), 267 | nlp.data.batchify.Stack('float32')) 268 | eval_dataloader = mx.gluon.data.DataLoader( 269 | eval_data_transform, 270 | batchify_fn=batchify_fn, 271 | num_workers=4, batch_size=test_batch_size, shuffle=False, last_batch='keep') 272 | 273 | _Result = collections.namedtuple( 274 | '_Result', ['intent', 'slot_labels']) 275 | all_results = {} 276 | 277 | total_num = 0 278 | for data in eval_dataloader: 279 | example_ids, token_ids, _, _, valid_length = data 280 | total_num += len(token_ids) 281 | # load data to GPU 282 | token_ids = token_ids.astype('float32').as_in_context(ctx[0]) 283 | valid_length = valid_length.astype('float32').as_in_context(ctx[0]) 284 | 285 | # forward computation 286 | intent_pred, slot_pred = model(token_ids) 287 | intent_pred = intent_pred.asnumpy() 288 | slot_pred = slot_pred.asnumpy() 289 | valid_length = valid_length.asnumpy() 290 | 291 | for eid, y_intent, y_slot, length in zip(example_ids, intent_pred, slot_pred, valid_length): 292 | eid = eid.asscalar() 293 | length = int(length) - 2 294 | intent_id = y_intent.argmax(axis=-1) 295 | slot_ids = y_slot.argmax(axis=-1).tolist()[:length] 296 | slot_names = [idx2label[idx] for idx in slot_ids] 297 | merged_slot_names = merge_slots(slot_names, dev_alignment[eid] + [length]) 298 | if eid not in all_results: 299 | all_results[eid] = _Result(intent_id, merged_slot_names) 300 | 301 | example_ids, utterances, labels, intents = load_tsv(eval_input) 302 | pred_intents = [] 303 | label_intents = [] 304 | for eid, intent in zip(example_ids, intents): 305 | label_intents.append(label2index(intent2idx, intent)) 306 | pred_intents.append(all_results[eid].intent) 307 | intent_acc = sklearn.metrics.accuracy_score(label_intents, pred_intents) 308 | log.info("Intent Accuracy: %.4f" % intent_acc) 309 | 310 | pred_icsl = [] 311 | label_icsl = [] 312 | for eid, intent, slot_labels in zip(example_ids, intents, labels): 313 | label_icsl.append(str(label2index(intent2idx, intent)) + ' ' + ' '.join(slot_labels)) 314 | pred_icsl.append(str(all_results[eid].intent) + ' ' + ' '.join(all_results[eid].slot_labels)) 315 | exact_match = sklearn.metrics.accuracy_score(label_icsl, pred_icsl) 316 | log.info("Exact Match: %.4f" % exact_match) 317 | 318 | with open(conll_prediction_file, "w") as fw: 319 | for eid, utterance, labels in zip(example_ids, utterances, labels): 320 | preds = all_results[eid].slot_labels 321 | for w, l, p in zip(utterance, labels, preds): 322 | fw.write(' '.join([w, l, p]) + '\n') 323 | fw.write('\n') 324 | proc = subprocess.Popen(["perl", "conlleval.pl"], stdin=subprocess.PIPE, stdout=subprocess.PIPE) 325 | with open(conll_prediction_file) as f: 326 | stdout = proc.communicate(f.read().encode())[0] 327 | result = stdout.decode('utf-8').split('\n')[1] 328 | slot_f1 = float(result.split()[-1].strip()) 329 | log.info("Slot Labeling: %s" % result) 330 | return intent_acc, slot_f1 331 | 332 | # extract labels 333 | train_input = data_dir + 'atis_train.tsv' 334 | intent2idx, label2idx = get_label_indices(train_input) 335 | 336 | # Train 337 | for lang in ['ES', 'DE', 'ZH', 'JA', 'PT', 'FR', 'HI', 'TR']: 338 | log.info('Train on %s:' % lang) 339 | model_name = 'model_lstm_mt_' + lang + '.' + str(random_seed) 340 | train_input = data_dir + 'train_translated_' + lang + '.tsv' 341 | train(model_name, train_input) 342 | 343 | # Evaluate 344 | for lang in ['ES', 'DE', 'ZH', 'JA', 'PT', 'FR', 'HI', 'TR']: 345 | log.info('Evaluate on %s:' % lang) 346 | model_name = 'model_lstm_mt_' + lang + '.' + str(random_seed) 347 | test_input = data_dir + 'atis_test_' + lang + '.tsv' 348 | evaluate(model_name=model_name, eval_input=test_input) 349 | -------------------------------------------------------------------------------- /code/scripts/translate_and_align.py: -------------------------------------------------------------------------------- 1 | import nlu_constants 2 | import boto3 3 | import csv 4 | import json 5 | import jieba 6 | import sys 7 | import subprocess 8 | 9 | lang = sys.argv[1] 10 | 11 | data_dir = "../data/" 12 | train_tsv = "atis_train.tsv" 13 | valid_tsv = "atis_dev.tsv" 14 | train_target = "train_translated_" + lang.upper() + ".tsv" 15 | valid_target = "dev_translated_" + lang.upper() + ".tsv" 16 | source_lang = "en" 17 | target_lang = lang 18 | 19 | 20 | def idx2label(sources, source_labels): 21 | # we remove the BI tags because the order of BI may be changed after translation. 22 | ret_token_sls = [] # return a list of dictionary where 23 | for ix, ex in enumerate(sources): 24 | lbs = source_labels[ix] 25 | assert len(ex) == len(lbs) 26 | sls = {} 27 | for jx, token in enumerate(ex): 28 | if lbs[jx] != 'O': 29 | sls[jx] = lbs[jx][2:] 30 | ret_token_sls.append(sls) 31 | return ret_token_sls 32 | 33 | 34 | def load_tsv(fn): 35 | sources = [] 36 | source_labels = [] 37 | with open(fn) as tsvFile: 38 | tsvReader = csv.DictReader(tsvFile, delimiter="\t") 39 | for ix, line in enumerate(tsvReader): 40 | sources.append(line[nlu_constants.UTTERANCE_SYMBOL].split(' ')) 41 | source_labels.append(line[nlu_constants.SLOT_LABEL_SYMBOL].split(' ')) 42 | return sources, source_labels 43 | 44 | 45 | for source_tsv, target_tsv in [(train_tsv, train_target), (valid_tsv, valid_target)]: 46 | sources, source_labels = load_tsv(data_dir + source_tsv) 47 | token_sls = idx2label(sources, source_labels) 48 | with open(data_dir + target_tsv[:-4] + "_idx2label" + ".json", "w") as fw: 49 | json.dump(token_sls, fw) 50 | translator = boto3.client(service_name='translate', use_ssl=True, region_name='us-east-1') 51 | targets = [] 52 | for source in sources: 53 | result = translator.translate_text(Text=" ".join(source), SourceLanguageCode=source_lang, 54 | TargetLanguageCode=target_lang) 55 | targets.append(result.get('TranslatedText')) 56 | with open(data_dir + target_tsv[:-4] + ".json", "w") as fw: 57 | json.dump(targets, fw) 58 | with open(data_dir + target_tsv[:-4] + ".json") as f: 59 | targets = json.load(f) 60 | tokenized_targets = [] 61 | for target in targets: 62 | if target_lang == 'zh': 63 | target = target.replace(",", "") 64 | segs = [t.strip() for t in list(jieba.cut(target)) if not t.isspace()] 65 | else: 66 | target = target.replace(",", " ") 67 | segs = target.strip().split(' ') 68 | tokenized_targets.append(segs) 69 | with open(data_dir + target_tsv[:-4] + "_token" + ".json", "w") as fw: 70 | json.dump(tokenized_targets, fw) 71 | 72 | with open(data_dir + "atis_train_valid", "w") as fw: 73 | for source_tsv, target_tsv in [(train_tsv, train_target), (valid_tsv, valid_target)]: 74 | source_utterances, _ = load_tsv(data_dir + source_tsv) 75 | with open(data_dir + target_tsv[:-4] + "_token" + ".json") as f: 76 | target_utterances = json.load(f) 77 | for ix in range(len(source_utterances)): 78 | fw.write((" ".join(source_utterances[ix]) + " ||| " + " ".join(target_utterances[ix]) + "\n")) 79 | 80 | with open(data_dir + "forward.align", "w") as f: 81 | command = ["./fast_align/build/fast_align", "-i", data_dir + "atis_train_valid", "-d", "-o", "-v"] 82 | process = subprocess.Popen(command, stdin=subprocess.PIPE, stdout=subprocess.PIPE) 83 | stdout = process.communicate()[0].decode('utf-8') 84 | f.write(stdout) 85 | 86 | lens = [0] 87 | for target_tsv in [train_target, valid_target]: 88 | with open(data_dir + target_tsv[:-4] + "_token" + ".json") as f: 89 | lens.append(len(json.load(f)) + lens[-1]) 90 | lens = lens[1:] 91 | 92 | s2t_set = [] 93 | with open(data_dir + "forward.align") as f: 94 | s2t_indexes = [] 95 | len_idx = 0 96 | for ix, l in enumerate(f): 97 | if ix == lens[len_idx]: 98 | s2t_set.append(s2t_indexes) 99 | s2t_indexes = [] 100 | len_idx += 1 101 | if ix >= lens[-1]: 102 | break 103 | segs = l.split() 104 | s2t_idx = {} 105 | for seg in segs: 106 | st = seg.split('-') 107 | s2t_idx[int(st[0])] = int(st[1]) 108 | s2t_indexes.append(s2t_idx) 109 | if len(s2t_indexes) > 0: 110 | s2t_set.append(s2t_indexes) 111 | 112 | 113 | def align_labels(source_utterances, target_utterances, s2t_indexes, source_idx2labels): 114 | # generate target labels. 115 | ret_target_labels = [] 116 | for ix, tokens in enumerate(source_utterances): 117 | template = ['O'] * len(target_utterances[ix]) # generate template labels 118 | for jx in range(len(source_utterances[ix])): 119 | if jx in s2t_indexes[ix] and str(jx) in source_idx2labels[ix]: 120 | template[s2t_indexes[ix][jx]] = source_idx2labels[ix][str(jx)] 121 | # add BI labels 122 | state = 'O' 123 | for jx in range(len(template)): 124 | if template[jx] != 'O' and (state == 'O' or state != template[jx]): 125 | state = template[jx] 126 | template[jx] = 'B-' + template[jx] 127 | elif template[jx] != 'O' and state == template[jx]: 128 | template[jx] = 'I-' + template[jx] 129 | elif template[jx] == 'O': 130 | state = 'O' 131 | ret_target_labels.append(template) 132 | return ret_target_labels 133 | 134 | 135 | for ix, (source_tsv, target_tsv) in enumerate([(train_tsv, train_target), (valid_tsv, valid_target)]): 136 | source_utterances, source_labels = load_tsv(data_dir + source_tsv) 137 | with open(data_dir + target_tsv[:-4] + "_token" + ".json") as f: 138 | target_tokens = json.load(f) 139 | s2t_indexes = s2t_set[ix] 140 | with open(data_dir + target_tsv[:-4] + "_idx2label" + ".json") as f: 141 | token_sls = json.load(f) 142 | target_labels = align_labels(source_utterances, target_tokens, s2t_indexes, token_sls) 143 | with open(data_dir + source_tsv) as tsvFile: # also gen tsv format for DiSAN 144 | tsvReader = csv.DictReader(tsvFile, delimiter="\t") 145 | with open(data_dir + target_tsv, "w") as tsvFileW: 146 | tsvWriter = csv.DictWriter(tsvFileW, fieldnames=tsvReader.fieldnames, delimiter="\t") 147 | tsvWriter.writeheader() 148 | for jx, line in enumerate(tsvReader): 149 | line[nlu_constants.UTTERANCE_SYMBOL] = ' '.join(target_tokens[jx]) 150 | line[nlu_constants.SLOT_LABEL_SYMBOL] = ' '.join(target_labels[jx]) 151 | tsvWriter.writerow(line) 152 | -------------------------------------------------------------------------------- /code/scripts/utils.py: -------------------------------------------------------------------------------- 1 | import csv 2 | import mxnet as mx 3 | import numpy as np 4 | import random 5 | 6 | from mxnet import gluon 7 | 8 | PAD = '[PAD]' 9 | 10 | def process_seq_labels(label, pred, ignore_id=-1): 11 | # label: (batch_size * seq_length) 12 | label = label.reshape(-3).asnumpy() 13 | # pred: (batch_size * seq_length, num_labels) 14 | pred = pred.reshape(-3, 0).asnumpy() 15 | # ignore ignore_id 16 | keep_idx = np.where(label != ignore_id) 17 | label = mx.nd.array(label[keep_idx]) 18 | pred = mx.nd.array(pred[keep_idx]) 19 | return [label], [pred] 20 | 21 | def split_and_load(arrs, ctx): 22 | """split and load arrays to a list of contexts""" 23 | assert isinstance(arrs, (list, tuple)) 24 | if len(ctx) == 1: 25 | return [[arr.as_in_context(ctx[0]) for arr in arrs]] 26 | else: 27 | # split and load 28 | loaded_arrs = [gluon.utils.split_and_load(arr, ctx, even_split=False) for arr in arrs] 29 | return zip(*loaded_arrs) 30 | 31 | def label2index(map, key): 32 | return map[key] if key in map else len(map) 33 | 34 | def load_tsv(fn): 35 | example_ids = [] 36 | utterances = [] 37 | labels = [] 38 | intents = [] 39 | with open(fn) as tsvFile: 40 | tsvReader = csv.DictReader(tsvFile, delimiter="\t") 41 | for line in tsvReader: 42 | example_ids.append(int(line['u_id'])) 43 | utterances.append(line['utterance'].split(' ')) 44 | labels.append(line['slot-labels'].split(' ')) 45 | intents.append(line['intent']) 46 | return example_ids, utterances, labels, intents 47 | 48 | def get_label_indices(input_file): 49 | _, _, train_labels, train_intents = load_tsv(input_file) 50 | 51 | intent2idx = {} 52 | for intent in train_intents: 53 | if intent not in intent2idx: 54 | intent2idx[intent] = len(intent2idx) 55 | 56 | label2idx = {} 57 | for labels in train_labels: 58 | for l in labels: 59 | if l not in label2idx: 60 | label2idx[l] = len(label2idx) 61 | 62 | new_labels = [] 63 | for label in label2idx.keys(): 64 | if label.startswith('B'): 65 | cont_label = 'I' + label[1:] 66 | if cont_label not in label2idx: 67 | new_labels.append(cont_label) 68 | for label in new_labels: 69 | label2idx[label] = len(label2idx) 70 | if PAD not in label2idx: 71 | label2idx[PAD] = len(label2idx) 72 | return intent2idx, label2idx 73 | 74 | def merge_slots(slots, alignment): 75 | merged_slots = [] 76 | start_idx = alignment[0] 77 | for end_idx in alignment[1:]: 78 | tag = slots[start_idx] 79 | for slot in slots[start_idx: end_idx]: 80 | if slot.startswith('B') and tag == 'O': 81 | tag = slot 82 | elif slot.startswith('I') and tag == 'O': 83 | tag = slot 84 | start_idx = end_idx 85 | merged_slots.append(tag) 86 | return merged_slots 87 | 88 | def icsl_transform(sample, vocabulary, label2idx, intent2idx, bert_tokenizer): 89 | eid = int(sample[3]) 90 | out_sample = [] 91 | tag_alignment = [] 92 | bert_tokens = ['[CLS]'] 93 | bert_tags = [] 94 | for w, tag in zip(sample[0].split(), sample[1].split()): 95 | tag_alignment.append(len(bert_tags)) 96 | bert_toks = bert_tokenizer(w) 97 | bert_tokens.extend(bert_toks) 98 | if tag.startswith('B'): 99 | cont_tag = 'I' + tag[1:] 100 | bert_tags.extend([tag] + [cont_tag] * (len(bert_toks) - 1)) 101 | else: 102 | bert_tags.extend([tag] * len(bert_toks)) 103 | bert_tokens += ['[SEP]'] 104 | bert_tags += [PAD] 105 | # add example id 106 | out_sample += [eid] 107 | # add token ids 108 | out_sample += [[vocabulary[tok] for tok in bert_tokens]] 109 | # add slot labels 110 | out_sample += [[label2index(label2idx, tag) for tag in bert_tags]] 111 | # add intent label 112 | out_sample += [label2index(intent2idx, sample[2])] 113 | # add valid length 114 | valid_len = len(bert_tokens) 115 | out_sample += [valid_len] 116 | return out_sample, tag_alignment 117 | 118 | def parallel_icsl_transform(sample, vocabulary, label2idx, intent2idx, bert_tokenizer): 119 | out_sample = [] 120 | target = ['[CLS]'] 121 | bert_tags = [] 122 | for w, tag in zip(sample[1].split(), sample[2].split()): 123 | bert_toks = bert_tokenizer(w) 124 | target.extend(bert_toks) 125 | if tag.startswith('B'): 126 | cont_tag = 'I' + tag[1:] 127 | bert_tags.extend([tag] + [cont_tag] * (len(bert_toks) - 1)) 128 | else: 129 | bert_tags.extend([tag] * len(bert_toks)) 130 | target += ['[SEP]'] 131 | bert_tags += [PAD] 132 | source = ['[CLS]'] + bert_tokenizer(sample[0]) + ['[SEP]'] 133 | # add source ids 134 | out_sample += [[vocabulary[tok] for tok in source]] 135 | # add target ids 136 | out_sample += [[vocabulary[tok] for tok in target]] 137 | # add slot labels 138 | out_sample += [[label2index(label2idx, tag) for tag in bert_tags]] 139 | # add intent label 140 | out_sample += [label2index(intent2idx, sample[3])] 141 | # add source valid length 142 | out_sample += [len(source)] 143 | # add target valid length 144 | out_sample += [len(target)] 145 | return out_sample 146 | --------------------------------------------------------------------------------