├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE
├── NOTICE
├── README.md
└── code
    ├── README.md
    └── scripts
        ├── bert_alone.py
        ├── bert_joint.py
        ├── bert_mt.py
        ├── bert_soft_align.py
        ├── conlleval.pl
        ├── layers.py
        ├── loss.py
        ├── lstm_alone.py
        ├── lstm_joint.py
        ├── lstm_mt.py
        ├── translate_and_align.py
        └── utils.py


/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
1 | ## Code of Conduct
2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct).
3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact
4 | opensource-codeofconduct@amazon.com with any additional questions or comments.
5 | 


--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
 1 | # Contributing Guidelines
 2 | 
 3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional
 4 | documentation, we greatly value feedback and contributions from our community.
 5 | 
 6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary
 7 | information to effectively respond to your bug report or contribution.
 8 | 
 9 | 
10 | ## Reporting Bugs/Feature Requests
11 | 
12 | We welcome you to use the GitHub issue tracker to report bugs or suggest features.
13 | 
14 | When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already
15 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful:
16 | 
17 | * A reproducible test case or series of steps
18 | * The version of our code being used
19 | * Any modifications you've made relevant to the bug
20 | * Anything unusual about your environment or deployment
21 | 
22 | 
23 | ## Contributing via Pull Requests
24 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that:
25 | 
26 | 1. You are working against the latest source on the *master* branch.
27 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already.
28 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted.
29 | 
30 | To send us a pull request, please:
31 | 
32 | 1. Fork the repository.
33 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change.
34 | 3. Ensure local tests pass.
35 | 4. Commit to your fork using clear commit messages.
36 | 5. Send us a pull request, answering any default questions in the pull request interface.
37 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation.
38 | 
39 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and
40 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/).
41 | 
42 | 
43 | ## Finding contributions to work on
44 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start.
45 | 
46 | 
47 | ## Code of Conduct
48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct).
49 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact
50 | opensource-codeofconduct@amazon.com with any additional questions or comments.
51 | 
52 | 
53 | ## Security issue notifications
54 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue.
55 | 
56 | 
57 | ## Licensing
58 | 
59 | See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution.
60 | 
61 | We may ask you to sign a [Contributor License Agreement (CLA)](http://en.wikipedia.org/wiki/Contributor_License_Agreement) for larger changes.
62 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
  1 | 
  2 |                                  Apache License
  3 |                            Version 2.0, January 2004
  4 |                         http://www.apache.org/licenses/
  5 | 
  6 |    TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
  7 | 
  8 |    1. Definitions.
  9 | 
 10 |       "License" shall mean the terms and conditions for use, reproduction,
 11 |       and distribution as defined by Sections 1 through 9 of this document.
 12 | 
 13 |       "Licensor" shall mean the copyright owner or entity authorized by
 14 |       the copyright owner that is granting the License.
 15 | 
 16 |       "Legal Entity" shall mean the union of the acting entity and all
 17 |       other entities that control, are controlled by, or are under common
 18 |       control with that entity. For the purposes of this definition,
 19 |       "control" means (i) the power, direct or indirect, to cause the
 20 |       direction or management of such entity, whether by contract or
 21 |       otherwise, or (ii) ownership of fifty percent (50%) or more of the
 22 |       outstanding shares, or (iii) beneficial ownership of such entity.
 23 | 
 24 |       "You" (or "Your") shall mean an individual or Legal Entity
 25 |       exercising permissions granted by this License.
 26 | 
 27 |       "Source" form shall mean the preferred form for making modifications,
 28 |       including but not limited to software source code, documentation
 29 |       source, and configuration files.
 30 | 
 31 |       "Object" form shall mean any form resulting from mechanical
 32 |       transformation or translation of a Source form, including but
 33 |       not limited to compiled object code, generated documentation,
 34 |       and conversions to other media types.
 35 | 
 36 |       "Work" shall mean the work of authorship, whether in Source or
 37 |       Object form, made available under the License, as indicated by a
 38 |       copyright notice that is included in or attached to the work
 39 |       (an example is provided in the Appendix below).
 40 | 
 41 |       "Derivative Works" shall mean any work, whether in Source or Object
 42 |       form, that is based on (or derived from) the Work and for which the
 43 |       editorial revisions, annotations, elaborations, or other modifications
 44 |       represent, as a whole, an original work of authorship. For the purposes
 45 |       of this License, Derivative Works shall not include works that remain
 46 |       separable from, or merely link (or bind by name) to the interfaces of,
 47 |       the Work and Derivative Works thereof.
 48 | 
 49 |       "Contribution" shall mean any work of authorship, including
 50 |       the original version of the Work and any modifications or additions
 51 |       to that Work or Derivative Works thereof, that is intentionally
 52 |       submitted to Licensor for inclusion in the Work by the copyright owner
 53 |       or by an individual or Legal Entity authorized to submit on behalf of
 54 |       the copyright owner. For the purposes of this definition, "submitted"
 55 |       means any form of electronic, verbal, or written communication sent
 56 |       to the Licensor or its representatives, including but not limited to
 57 |       communication on electronic mailing lists, source code control systems,
 58 |       and issue tracking systems that are managed by, or on behalf of, the
 59 |       Licensor for the purpose of discussing and improving the Work, but
 60 |       excluding communication that is conspicuously marked or otherwise
 61 |       designated in writing by the copyright owner as "Not a Contribution."
 62 | 
 63 |       "Contributor" shall mean Licensor and any individual or Legal Entity
 64 |       on behalf of whom a Contribution has been received by Licensor and
 65 |       subsequently incorporated within the Work.
 66 | 
 67 |    2. Grant of Copyright License. Subject to the terms and conditions of
 68 |       this License, each Contributor hereby grants to You a perpetual,
 69 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 70 |       copyright license to reproduce, prepare Derivative Works of,
 71 |       publicly display, publicly perform, sublicense, and distribute the
 72 |       Work and such Derivative Works in Source or Object form.
 73 | 
 74 |    3. Grant of Patent License. Subject to the terms and conditions of
 75 |       this License, each Contributor hereby grants to You a perpetual,
 76 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 77 |       (except as stated in this section) patent license to make, have made,
 78 |       use, offer to sell, sell, import, and otherwise transfer the Work,
 79 |       where such license applies only to those patent claims licensable
 80 |       by such Contributor that are necessarily infringed by their
 81 |       Contribution(s) alone or by combination of their Contribution(s)
 82 |       with the Work to which such Contribution(s) was submitted. If You
 83 |       institute patent litigation against any entity (including a
 84 |       cross-claim or counterclaim in a lawsuit) alleging that the Work
 85 |       or a Contribution incorporated within the Work constitutes direct
 86 |       or contributory patent infringement, then any patent licenses
 87 |       granted to You under this License for that Work shall terminate
 88 |       as of the date such litigation is filed.
 89 | 
 90 |    4. Redistribution. You may reproduce and distribute copies of the
 91 |       Work or Derivative Works thereof in any medium, with or without
 92 |       modifications, and in Source or Object form, provided that You
 93 |       meet the following conditions:
 94 | 
 95 |       (a) You must give any other recipients of the Work or
 96 |           Derivative Works a copy of this License; and
 97 | 
 98 |       (b) You must cause any modified files to carry prominent notices
 99 |           stating that You changed the files; and
100 | 
101 |       (c) You must retain, in the Source form of any Derivative Works
102 |           that You distribute, all copyright, patent, trademark, and
103 |           attribution notices from the Source form of the Work,
104 |           excluding those notices that do not pertain to any part of
105 |           the Derivative Works; and
106 | 
107 |       (d) If the Work includes a "NOTICE" text file as part of its
108 |           distribution, then any Derivative Works that You distribute must
109 |           include a readable copy of the attribution notices contained
110 |           within such NOTICE file, excluding those notices that do not
111 |           pertain to any part of the Derivative Works, in at least one
112 |           of the following places: within a NOTICE text file distributed
113 |           as part of the Derivative Works; within the Source form or
114 |           documentation, if provided along with the Derivative Works; or,
115 |           within a display generated by the Derivative Works, if and
116 |           wherever such third-party notices normally appear. The contents
117 |           of the NOTICE file are for informational purposes only and
118 |           do not modify the License. You may add Your own attribution
119 |           notices within Derivative Works that You distribute, alongside
120 |           or as an addendum to the NOTICE text from the Work, provided
121 |           that such additional attribution notices cannot be construed
122 |           as modifying the License.
123 | 
124 |       You may add Your own copyright statement to Your modifications and
125 |       may provide additional or different license terms and conditions
126 |       for use, reproduction, or distribution of Your modifications, or
127 |       for any such Derivative Works as a whole, provided Your use,
128 |       reproduction, and distribution of the Work otherwise complies with
129 |       the conditions stated in this License.
130 | 
131 |    5. Submission of Contributions. Unless You explicitly state otherwise,
132 |       any Contribution intentionally submitted for inclusion in the Work
133 |       by You to the Licensor shall be under the terms and conditions of
134 |       this License, without any additional terms or conditions.
135 |       Notwithstanding the above, nothing herein shall supersede or modify
136 |       the terms of any separate license agreement you may have executed
137 |       with Licensor regarding such Contributions.
138 | 
139 |    6. Trademarks. This License does not grant permission to use the trade
140 |       names, trademarks, service marks, or product names of the Licensor,
141 |       except as required for reasonable and customary use in describing the
142 |       origin of the Work and reproducing the content of the NOTICE file.
143 | 
144 |    7. Disclaimer of Warranty. Unless required by applicable law or
145 |       agreed to in writing, Licensor provides the Work (and each
146 |       Contributor provides its Contributions) on an "AS IS" BASIS,
147 |       WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
148 |       implied, including, without limitation, any warranties or conditions
149 |       of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
150 |       PARTICULAR PURPOSE. You are solely responsible for determining the
151 |       appropriateness of using or redistributing the Work and assume any
152 |       risks associated with Your exercise of permissions under this License.
153 | 
154 |    8. Limitation of Liability. In no event and under no legal theory,
155 |       whether in tort (including negligence), contract, or otherwise,
156 |       unless required by applicable law (such as deliberate and grossly
157 |       negligent acts) or agreed to in writing, shall any Contributor be
158 |       liable to You for damages, including any direct, indirect, special,
159 |       incidental, or consequential damages of any character arising as a
160 |       result of this License or out of the use or inability to use the
161 |       Work (including but not limited to damages for loss of goodwill,
162 |       work stoppage, computer failure or malfunction, or any and all
163 |       other commercial damages or losses), even if such Contributor
164 |       has been advised of the possibility of such damages.
165 | 
166 |    9. Accepting Warranty or Additional Liability. While redistributing
167 |       the Work or Derivative Works thereof, You may choose to offer,
168 |       and charge a fee for, acceptance of support, warranty, indemnity,
169 |       or other liability obligations and/or rights consistent with this
170 |       License. However, in accepting such obligations, You may act only
171 |       on Your own behalf and on Your sole responsibility, not on behalf
172 |       of any other Contributor, and only if You agree to indemnify,
173 |       defend, and hold each Contributor harmless for any liability
174 |       incurred by, or claims asserted against, such Contributor by reason
175 |       of your accepting any such warranty or additional liability.
176 | 


--------------------------------------------------------------------------------
/NOTICE:
--------------------------------------------------------------------------------
1 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
2 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | ## MultiAtis++ Corpus
 2 | 
 3 | ### Description
 4 | 
 5 | The ATIS (Air Travel Information Services) collection was developed to support the research and development of speech understanding systems [1]. The original English data includes intent and slot annotations, and was later extended to Hindi and Turkish [2]. MultiATIS++ futher extends ATIS to 6 more languages, and hence, covers a total of 9 languages, that is, English, Spanish, German, French, Portuguese, Chinese, Japanese, Hindi and Turkish. These locales  belong to a diverse set of language families- Indo-European, Sino-Tibetan, Japonic and Altaic.
 6 | 
 7 | MultiATIS++ corpus has been outsourced to foster further research in the domain of multilingual/cross-lingual natural language understanding.
 8 | 
 9 | For more details, please check the paper:
10 | Xu, W., Haider, B. and Mansour, S., 2020. End-to-End Slot Alignment and Recognition for Cross-Lingual NLU. arXiv preprint arXiv:2004.14353 (https://arxiv.org/abs/2004.14353)
11 | 
12 | ### Accessing MultiAtis++
13 | 
14 | To obtain a copy of *MutliAtis++* data, please visit:
15 | https://catalog.ldc.upenn.edu/LDC2021T04
16 | 
17 | Please send your queries/comments to multiatis@amazon.com.
18 | 
19 | ### Citation
20 | 
21 | Please cite [3] when referring to the MultiATIS++ dataset.
22 | 
23 | 
24 | ## Soft-Align Implementation
25 | 
26 | Implementation of the *soft-align* method introduced in [3] will be available here, soon.
27 | 
28 | 
29 | ## Security
30 | 
31 | See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.
32 | 
33 | ## License
34 | 
35 | This project is licensed under the Apache-2.0 License.
36 | 
37 | ## References
38 | 
39 | [1] LDC93S5 ATIS2, LDC94S19 ATIS3 Training Data, LDC95S26 ATIS3 Test Data
40 | 
41 | [2] Shyam Upadhyay, Manaal Faruqui, Gokhan Tur, Dilek Hakkani-Tur, Larry Heck. (Almost) Zero-Shot Cross-Lingual Spoken Language Understanding. IEEE ICASSP 2018.
42 | 
43 | [3] Weijia Xu, Batool Haider, Saab Mansour. 2020. End-to-End Slot Alignment and Recognition for Cross-Lingual NLU. arXiv preprint arXiv:2004.14353.
44 | 


--------------------------------------------------------------------------------
/code/README.md:
--------------------------------------------------------------------------------
 1 | ### Environment
 2 | ```
 3 | pip install numpy scipy scikit-learn
 4 | pip install --upgrade mxnet>=1.6.0
 5 | pip install gluonnlp
 6 | ```
 7 | 
 8 | ### Data
 9 | + Multilingual ATIS dataset in EN, ES, DE, ZH, JA, PT, FR, HI, and TR.
10 | 
11 | ### Preparation
12 | + Download BERT related [codes](https://gluon-nlp.mxnet.io/model_zoo/bert/index.html).
13 | + Decompress it and place the folder in `code`.
14 | + Install [fast-align](https://github.com/clab/fast_align).
15 | 
16 | ### Run
17 | + For supervised experiments, run `python lstm_alone.py $seed` or `python bert_alone.py $seed` to train the biLSTM/BERT supervised model (`$seed` is a random seed number).
18 | + For multilingual experiments, run `python lstm_joint.py $seed` or `python bert_joint.py $seed` to train the biLSTM/BERT multilingual model.
19 | + For cross-lingual transfer using *MT+fast-align*, first run `python translate_and_align.py $lang` to translate the English utterances to the target language `$lang` and project the slot labels using fast-align. And then, run `python lstm_mt.py $seed` or `python bert_mt.py $seed` to train the biLSTM/BERT model.
20 | + For cross-lingual transfer using *MT+soft-align*, first run `python translate_and_align.py $lang` to translate the English utterances to the target language `$lang`. And then, run `python bert_soft_align.py $seed` to train the soft-alignment model.
21 | 


--------------------------------------------------------------------------------
/code/scripts/bert_alone.py:
--------------------------------------------------------------------------------
  1 | import collections
  2 | import gluonnlp as nlp
  3 | import logging
  4 | import mxnet as mx
  5 | import os
  6 | import sklearn.metrics
  7 | import subprocess
  8 | import sys
  9 | import time
 10 | import warnings
 11 | 
 12 | from bert import *
 13 | from mxnet import gluon
 14 | from mxnet.gluon import Block, nn, rnn
 15 | 
 16 | from loss import *
 17 | from utils import *
 18 | 
 19 | random_seed = int(sys.argv[1])
 20 | warnings.filterwarnings('ignore')
 21 | data_dir = "../data/"
 22 | model_dir = "../exp/"
 23 | conll_prediction_file = data_dir + "conll.pred"
 24 | 
 25 | PAD = '[PAD]'
 26 | mx.random.seed(random_seed)
 27 | ctx = [mx.gpu(0), mx.gpu(1)]
 28 | 
 29 | log = logging.getLogger('gluonnlp')
 30 | log.setLevel(logging.DEBUG)
 31 | formatter = logging.Formatter(fmt='[%(levelname)s] %(name)s:%(asctime)s %(message)s', datefmt='%H:%M:%S')
 32 | fh = logging.FileHandler(os.path.join(model_dir, 'bert.' + str(random_seed) + '.log'), mode='w')
 33 | fh.setLevel(logging.INFO)
 34 | fh.setFormatter(formatter)
 35 | console = logging.StreamHandler()
 36 | console.setLevel(logging.INFO)
 37 | console.setFormatter(formatter)
 38 | log.addHandler(console)
 39 | log.addHandler(fh)
 40 | 
 41 | class BERTForICSL(Block):
 42 |     """Model for IC/SL task.
 43 | 
 44 |     The model feeds token ids into BERT to get the sequence
 45 |     representations, then apply two dense layers for IC/SL task.
 46 |     """
 47 | 
 48 |     def __init__(self, bert, num_slot_labels, num_intents, hidden_size=768, dropout=.1, prefix=None, params=None):
 49 |         super(BERTForICSL, self).__init__(prefix=prefix, params=params)
 50 |         self.bert = bert
 51 |         with self.name_scope():
 52 |             self.dropout = nn.Dropout(rate=dropout)
 53 |             # IC/SL classifier
 54 |             self.slot_classifier = nn.Dense(units=num_slot_labels,
 55 |                                             in_units=hidden_size,
 56 |                                             flatten=False)
 57 |             self.intent_classifier = nn.Dense(units=num_intents,
 58 |                                               in_units=hidden_size,
 59 |                                               flatten=False)
 60 | 
 61 |     def encode(self, inputs, valid_length):
 62 |         types = mx.nd.zeros_like(inputs)
 63 |         encoded = self.bert(inputs, types, valid_length)
 64 |         encoded = self.dropout(encoded)
 65 |         return encoded
 66 | 
 67 |     def forward(self, inputs, valid_length):  # pylint: disable=arguments-differ
 68 |         """Generate unnormalized scores for the given input sequences.
 69 | 
 70 |         Parameters
 71 |         ----------
 72 |         inputs : NDArray, shape (batch_size, seq_length)
 73 |             Input words for the sequences.
 74 |         valid_length : NDArray or None, shape (batch_size)
 75 |             Valid length of the sequence. This is used to mask the padded tokens.
 76 | 
 77 |         Returns
 78 |         -------
 79 |         intent_prediction: NDArray
 80 |             Shape (batch_size, num_intents)
 81 |         slot_prediction : NDArray
 82 |             Shape (batch_size, seq_length, num_slot_labels)
 83 |         """
 84 |         # hidden: (batch_size, seq_length, hidden_size)
 85 |         hidden = self.encode(inputs, valid_length)
 86 |         # get intent and slot label predictions
 87 |         intent_prediction = self.intent_classifier(hidden[:, 0, :])
 88 |         slot_prediction = self.slot_classifier(hidden[:, 1:, :])
 89 |         return intent_prediction, slot_prediction
 90 | 
 91 | 
 92 | def train(model_name, train_input, dev_input):
 93 |     """Training function."""
 94 |     ## Arguments
 95 |     log_interval = 100
 96 |     batch_size = 32
 97 |     lr = 1e-5
 98 |     optimizer = 'adam'
 99 |     accumulate = None
100 |     epochs = 20
101 | 
102 |     ## Load BERT model and vocabulary
103 |     bert, vocabulary = nlp.model.get_model('bert_12_768_12',
104 |                                            dataset_name='wiki_multilingual_uncased',
105 |                                            pretrained=True,
106 |                                            ctx=ctx,
107 |                                            use_pooler=False,
108 |                                            use_decoder=False,
109 |                                            use_classifier=False)
110 | 
111 |     model = BERTForICSL(bert, num_slot_labels=len(label2idx), num_intents=len(intent2idx))
112 |     model.initialize(init=mx.init.Uniform(0.1), ctx=ctx)
113 |     model.hybridize(static_alloc=True)
114 | 
115 |     icsl_loss_function = ICSLLoss()
116 |     icsl_loss_function.hybridize(static_alloc=True)
117 | 
118 |     ic_metric = mx.metric.Accuracy()
119 |     sl_metric = mx.metric.Accuracy()
120 | 
121 |     ## Load labeled data
122 |     field_separator = nlp.data.Splitter('\t')
123 |     # fields to select from the file: utterance, slot labels, intent, uid
124 |     field_indices = [1, 3, 4, 0]
125 |     train_data = nlp.data.TSVDataset(filename=train_input,
126 |                                      field_separator=field_separator,
127 |                                      num_discard_samples=1,
128 |                                      field_indices=field_indices)
129 | 
130 |     # use the vocabulary from pre-trained model for tokenization
131 |     bert_tokenizer = nlp.data.BERTTokenizer(vocabulary, lower=True)
132 |     train_data_transform = train_data.transform(fn=lambda x: icsl_transform(x, vocabulary, label2idx, intent2idx, bert_tokenizer)[0])
133 |     # create data loader
134 |     pad_token_id = vocabulary[PAD]
135 |     pad_label_id = label2idx[PAD]
136 |     batchify_fn = nlp.data.batchify.Tuple(
137 |         nlp.data.batchify.Stack(),
138 |         nlp.data.batchify.Pad(axis=0, pad_val=pad_token_id),
139 |         nlp.data.batchify.Pad(axis=0, pad_val=pad_label_id),
140 |         nlp.data.batchify.Stack('float32'),
141 |         nlp.data.batchify.Stack('float32'))
142 |     train_sampler = nlp.data.FixedBucketSampler(lengths=[len(item[1]) for item in train_data_transform],
143 |                                                 batch_size=batch_size,
144 |                                                 shuffle=True)
145 |     train_dataloader = mx.gluon.data.DataLoader(train_data_transform,
146 |                                                 batchify_fn=batchify_fn,
147 |                                                 batch_sampler=train_sampler)
148 | 
149 |     optimizer_params = {'learning_rate': lr}
150 |     trainer = gluon.Trainer(model.collect_params(), optimizer,
151 |                             optimizer_params, update_on_kvstore=False)
152 | 
153 |     # Collect differentiable parameters
154 |     params = [p for p in model.collect_params().values() if p.grad_req != 'null']
155 |     # Set grad_req if gradient accumulation is required
156 |     if accumulate:
157 |         for p in params:
158 |             p[1].grad_req = 'add'
159 |     # Fix BERT embeddings if required
160 |     for p in model.collect_params().items():
161 |         if 'embed' in p[0]:
162 |             p[1].grad_req = 'null'
163 | 
164 |     epoch_tic = time.time()
165 |     total_num = 0
166 |     log_num = 0
167 |     best_score = (0, 0)
168 |     for epoch_id in range(epochs):
169 |         step_loss = 0
170 |         tic = time.time()
171 |         # train on labeled data
172 |         for batch_id, data in enumerate(train_dataloader):
173 |             # forward and backward
174 |             with mx.autograd.record():
175 |                 if data[0].shape[0] < len(ctx):
176 |                     data = split_and_load(data, [ctx[0]])
177 |                 else:
178 |                     data = split_and_load(data, ctx)
179 |                 for chunk in data:
180 |                     _, token_ids, slot_label, intent_label, valid_length = chunk
181 | 
182 |                     log_num += len(token_ids)
183 |                     total_num += len(token_ids)
184 | 
185 |                     # forward computation
186 |                     intent_pred, slot_pred = model(token_ids, valid_length)
187 |                     ls = icsl_loss_function(intent_pred, slot_pred, intent_label, slot_label, valid_length - 2).mean()
188 | 
189 |                     if accumulate:
190 |                         ls = ls / accumulate
191 |                     ls.backward()
192 |                     step_loss += ls.asscalar()
193 | 
194 |             # update
195 |             if not accumulate or (batch_id + 1) % accumulate == 0:
196 |                 trainer.allreduce_grads()
197 |                 nlp.utils.clip_grad_global_norm(params, 1)
198 |                 trainer.update(1, ignore_stale_grad=True)
199 | 
200 |             if (batch_id + 1) % log_interval == 0:
201 |                 toc = time.time()
202 |                 # update metrics
203 |                 ic_metric.update([intent_label], [intent_pred])
204 |                 sl_metric.update(*process_seq_labels(slot_label, slot_pred, ignore_id=pad_label_id))
205 |                 log.info('Epoch: {}, Batch: {}/{}, speed: {:.2f} samples/s, lr={:.7f}, loss={:.4f}, intent acc={:.3f}, slot acc={:.3f}'
206 |                          .format(epoch_id,
207 |                                  batch_id,
208 |                                  len(train_dataloader),
209 |                                  log_num / (toc - tic),
210 |                                  trainer.learning_rate,
211 |                                  step_loss / log_interval,
212 |                                  ic_metric.get()[1],
213 |                                  sl_metric.get()[1]))
214 |                 tic = time.time()
215 |                 step_loss = 0
216 |                 log_num = 0
217 | 
218 |         mx.nd.waitall()
219 |         epoch_toc = time.time()
220 |         log.info('Time cost: {:.2f} s, Speed: {:.2f} samples/s'
221 |                  .format(epoch_toc - epoch_tic, total_num/(epoch_toc - epoch_tic)))
222 |         # evaluate on development set
223 |         log.info('Evaluate on development set:')
224 |         intent_acc, slot_f1 = evaluate(model=model, eval_input=dev_input)
225 |         if slot_f1 > best_score[1]:
226 |             best_score = (intent_acc, slot_f1)
227 |             model.save_parameters(os.path.join(model_dir, model_name + '.params'))
228 | 
229 | 
230 | def evaluate(model=None, model_name='', eval_input=''):
231 |     """Evaluate the model on validation dataset.
232 |     """
233 |     ## Load model
234 |     bert, vocabulary = nlp.model.get_model('bert_12_768_12',
235 |                                            dataset_name='wiki_multilingual_uncased',
236 |                                            pretrained=True,
237 |                                            ctx=ctx,
238 |                                            use_pooler=False,
239 |                                            use_decoder=False,
240 |                                            use_classifier=False)
241 |     if model is None:
242 |         assert model_name != ''
243 |         model = BERTForICSL(bert, num_slot_labels=len(label2idx), num_intents=len(intent2idx))
244 |         model.initialize(ctx=ctx)
245 |         model.hybridize(static_alloc=True)
246 |         model.load_parameters(os.path.join(model_dir, model_name + '.params'))
247 | 
248 |     idx2label = {}
249 |     for label, idx in label2idx.items():
250 |         idx2label[idx] = label
251 |     ## Load dev dataset
252 |     field_separator = nlp.data.Splitter('\t')
253 |     field_indices = [1, 3, 4, 0]
254 |     eval_data = nlp.data.TSVDataset(filename=eval_input,
255 |                                     field_separator=field_separator,
256 |                                     num_discard_samples=1,
257 |                                     field_indices=field_indices)
258 | 
259 |     bert_tokenizer = nlp.data.BERTTokenizer(vocabulary, lower=True)
260 | 
261 |     dev_alignment = {}
262 |     eval_data_transform = []
263 |     for sample in eval_data:
264 |         sample, alignment = icsl_transform(sample, vocabulary, label2idx, intent2idx, bert_tokenizer)
265 |         eval_data_transform += [sample]
266 |         dev_alignment[sample[0]] = alignment
267 |     log.info('The number of examples after preprocessing: {}'
268 |              .format(len(eval_data_transform)))
269 | 
270 |     test_batch_size = 16
271 |     pad_token_id = vocabulary[PAD]
272 |     pad_label_id = label2idx[PAD]
273 |     batchify_fn = nlp.data.batchify.Tuple(
274 |         nlp.data.batchify.Stack(),
275 |         nlp.data.batchify.Pad(axis=0, pad_val=pad_token_id),
276 |         nlp.data.batchify.Pad(axis=0, pad_val=pad_label_id),
277 |         nlp.data.batchify.Stack('float32'),
278 |         nlp.data.batchify.Stack('float32'))
279 |     eval_dataloader = mx.gluon.data.DataLoader(
280 |         eval_data_transform,
281 |         batchify_fn=batchify_fn,
282 |         num_workers=4, batch_size=test_batch_size, shuffle=False, last_batch='keep')
283 | 
284 |     _Result = collections.namedtuple(
285 |         '_Result', ['intent', 'slot_labels'])
286 |     all_results = {}
287 | 
288 |     total_num = 0
289 |     for data in eval_dataloader:
290 |         example_ids, token_ids, _, _, valid_length = data
291 |         total_num += len(token_ids)
292 |         # load data to GPU
293 |         token_ids = token_ids.astype('float32').as_in_context(ctx[0])
294 |         valid_length = valid_length.astype('float32').as_in_context(ctx[0])
295 | 
296 |         # forward computation
297 |         intent_pred, slot_pred = model(token_ids, valid_length)
298 |         intent_pred = intent_pred.asnumpy()
299 |         slot_pred = slot_pred.asnumpy()
300 |         valid_length = valid_length.asnumpy()
301 | 
302 |         for eid, y_intent, y_slot, length in zip(example_ids, intent_pred, slot_pred, valid_length):
303 |             eid = eid.asscalar()
304 |             length = int(length) - 2
305 |             intent_id = y_intent.argmax(axis=-1)
306 |             slot_ids = y_slot.argmax(axis=-1).tolist()[:length]
307 |             slot_names = [idx2label[idx] for idx in slot_ids]
308 |             merged_slot_names = merge_slots(slot_names, dev_alignment[eid] + [length])
309 |             if eid not in all_results:
310 |                 all_results[eid] = _Result(intent_id, merged_slot_names)
311 | 
312 |     example_ids, utterances, labels, intents = load_tsv(eval_input)
313 |     pred_intents = []
314 |     label_intents = []
315 |     for eid, intent in zip(example_ids, intents):
316 |         label_intents.append(label2index(intent2idx, intent))
317 |         pred_intents.append(all_results[eid].intent)
318 |     intent_acc = sklearn.metrics.accuracy_score(label_intents, pred_intents)
319 |     log.info("Intent Accuracy: %.4f" % intent_acc)
320 | 
321 |     pred_icsl = []
322 |     label_icsl = []
323 |     for eid, intent, slot_labels in zip(example_ids, intents, labels):
324 |         label_icsl.append(str(label2index(intent2idx, intent)) + ' ' + ' '.join(slot_labels))
325 |         pred_icsl.append(str(all_results[eid].intent) + ' ' + ' '.join(all_results[eid].slot_labels))
326 |     exact_match = sklearn.metrics.accuracy_score(label_icsl, pred_icsl)
327 |     log.info("Exact Match: %.4f" % exact_match)
328 | 
329 |     with open(conll_prediction_file, "w") as fw:
330 |         for eid, utterance, labels in zip(example_ids, utterances, labels):
331 |             preds = all_results[eid].slot_labels
332 |             for w, l, p in zip(utterance, labels, preds):
333 |                 fw.write(' '.join([w, l, p]) + '\n')
334 |             fw.write('\n')
335 |     proc = subprocess.Popen(["perl", "conlleval.pl"], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
336 |     with open(conll_prediction_file) as f:
337 |         stdout = proc.communicate(f.read().encode())[0]
338 |     result = stdout.decode('utf-8').split('\n')[1]
339 |     slot_f1 = float(result.split()[-1].strip())
340 |     log.info("Slot Labeling: %s" % result)
341 |     return intent_acc, slot_f1
342 | 
343 | # extract labels
344 | train_input = data_dir + 'atis_train.tsv'
345 | intent2idx, label2idx = get_label_indices(train_input)
346 | 
347 | for lang in ['EN', 'ES', 'DE', 'ZH', 'JA', 'PT', 'FR', 'HI', 'TR']:
348 |     log.info('Train on %s:' % lang)
349 |     model_name = 'model_bert_' + lang + '.' + str(random_seed)
350 |     if lang == 'EN':
351 |         train_input = data_dir + 'atis_train.tsv'
352 |         dev_input = data_dir + 'atis_dev.tsv'
353 |     else:
354 |         train_input = data_dir + 'atis_train_' + lang + '.tsv'
355 |         dev_input = data_dir + 'atis_dev_' + lang + '.tsv'
356 |     train(model_name, train_input, dev_input)
357 | 
358 | log.info('==========Supervised learning==========')
359 | for lang in ['EN', 'ES', 'DE', 'ZH', 'JA', 'PT', 'FR', 'HI', 'TR']:
360 |     log.info('Evaluate on %s:' % lang)
361 |     model_name = 'model_bert_' + lang + '.' + str(random_seed)
362 |     if lang == 'EN':
363 |         test_input = data_dir + 'atis_test.tsv'
364 |     else:
365 |         test_input = data_dir + 'atis_test_' + lang + '.tsv'
366 |     evaluate(model_name=model_name, eval_input=test_input)
367 | 
368 | log.info('==========Transfer learning==========')
369 | src_lang = 'EN'
370 | for lang in ['EN', 'ES', 'DE', 'ZH', 'JA', 'PT', 'FR', 'HI', 'TR']:
371 |     log.info('Evaluate on %s:' % lang)
372 |     model_name = 'model_bert_' + src_lang + '.' + str(random_seed)
373 |     if lang == 'EN':
374 |         test_input = data_dir + 'atis_test.tsv'
375 |     else:
376 |         test_input = data_dir + 'atis_test_' + lang + '.tsv'
377 |     evaluate(model_name=model_name, eval_input=test_input)
378 | 


--------------------------------------------------------------------------------
/code/scripts/bert_joint.py:
--------------------------------------------------------------------------------
  1 | import collections
  2 | import gluonnlp as nlp
  3 | import logging
  4 | import mxnet as mx
  5 | import os
  6 | import sklearn.metrics
  7 | import subprocess
  8 | import sys
  9 | import time
 10 | import warnings
 11 | 
 12 | from bert import *
 13 | from mxnet import gluon
 14 | from mxnet.gluon import Block, nn, rnn
 15 | 
 16 | from loss import *
 17 | from utils import *
 18 | 
 19 | random_seed = int(sys.argv[1])
 20 | warnings.filterwarnings('ignore')
 21 | data_dir = "../data/"
 22 | model_dir = "../exp/"
 23 | conll_prediction_file = data_dir + "conll.pred"
 24 | 
 25 | PAD = '[PAD]'
 26 | mx.random.seed(random_seed)
 27 | ctx = [mx.gpu(0), mx.gpu(1)]
 28 | 
 29 | log = logging.getLogger('gluonnlp')
 30 | log.setLevel(logging.DEBUG)
 31 | formatter = logging.Formatter(fmt='[%(levelname)s] %(name)s:%(asctime)s %(message)s', datefmt='%H:%M:%S')
 32 | fh = logging.FileHandler(os.path.join(model_dir, 'bert_joint.' + str(random_seed) + '.log'), mode='w')
 33 | fh.setLevel(logging.INFO)
 34 | fh.setFormatter(formatter)
 35 | console = logging.StreamHandler()
 36 | console.setLevel(logging.INFO)
 37 | console.setFormatter(formatter)
 38 | log.addHandler(console)
 39 | log.addHandler(fh)
 40 | 
 41 | class BERTForICSL(Block):
 42 |     """Model for IC/SL task.
 43 | 
 44 |     The model feeds token ids into BERT to get the sequence
 45 |     representations, then apply two dense layers for IC/SL task.
 46 |     """
 47 | 
 48 |     def __init__(self, bert, num_slot_labels, num_intents, hidden_size=768, dropout=.1, prefix=None, params=None):
 49 |         super(BERTForICSL, self).__init__(prefix=prefix, params=params)
 50 |         self.bert = bert
 51 |         with self.name_scope():
 52 |             self.dropout = nn.Dropout(rate=dropout)
 53 |             # IC/SL classifier
 54 |             self.slot_classifier = nn.Dense(units=num_slot_labels,
 55 |                                             in_units=hidden_size,
 56 |                                             flatten=False)
 57 |             self.intent_classifier = nn.Dense(units=num_intents,
 58 |                                               in_units=hidden_size,
 59 |                                               flatten=False)
 60 | 
 61 |     def encode(self, inputs, valid_length):
 62 |         types = mx.nd.zeros_like(inputs)
 63 |         encoded = self.bert(inputs, types, valid_length)
 64 |         encoded = self.dropout(encoded)
 65 |         return encoded
 66 | 
 67 |     def forward(self, inputs, valid_length):  # pylint: disable=arguments-differ
 68 |         """Generate unnormalized scores for the given input sequences.
 69 | 
 70 |         Parameters
 71 |         ----------
 72 |         inputs : NDArray, shape (batch_size, seq_length)
 73 |             Input words for the sequences.
 74 |         valid_length : NDArray or None, shape (batch_size)
 75 |             Valid length of the sequence. This is used to mask the padded tokens.
 76 | 
 77 |         Returns
 78 |         -------
 79 |         intent_prediction: NDArray
 80 |             Shape (batch_size, num_intents)
 81 |         slot_prediction : NDArray
 82 |             Shape (batch_size, seq_length, num_slot_labels)
 83 |         """
 84 |         # hidden: (batch_size, seq_length, hidden_size)
 85 |         hidden = self.encode(inputs, valid_length)
 86 |         # get intent and slot label predictions
 87 |         intent_prediction = self.intent_classifier(hidden[:, 0, :])
 88 |         slot_prediction = self.slot_classifier(hidden[:, 1:, :])
 89 |         return intent_prediction, slot_prediction
 90 | 
 91 | 
 92 | def train(model_name, train_input, dev_input):
 93 |     """Training function."""
 94 |     ## Arguments
 95 |     log_interval = 100
 96 |     batch_size = 32
 97 |     lr = 1e-5
 98 |     optimizer = 'adam'
 99 |     accumulate = None
100 |     epochs = 20
101 | 
102 |     ## Load BERT model and vocabulary
103 |     bert, vocabulary = nlp.model.get_model('bert_12_768_12',
104 |                                            dataset_name='wiki_multilingual_uncased',
105 |                                            pretrained=True,
106 |                                            ctx=ctx,
107 |                                            use_pooler=False,
108 |                                            use_decoder=False,
109 |                                            use_classifier=False)
110 | 
111 |     model = BERTForICSL(bert, num_slot_labels=len(label2idx), num_intents=len(intent2idx))
112 |     model.initialize(init=mx.init.Uniform(0.1), ctx=ctx)
113 |     model.hybridize(static_alloc=True)
114 | 
115 |     icsl_loss_function = ICSLLoss()
116 |     icsl_loss_function.hybridize(static_alloc=True)
117 | 
118 |     ic_metric = mx.metric.Accuracy()
119 |     sl_metric = mx.metric.Accuracy()
120 | 
121 |     ## Load labeled data
122 |     field_separator = nlp.data.Splitter('\t')
123 |     # fields to select from the file: utterance, slot labels, intent, uid
124 |     field_indices = [1, 3, 4, 0]
125 |     train_data = nlp.data.TSVDataset(filename=train_input,
126 |                                      field_separator=field_separator,
127 |                                      num_discard_samples=1,
128 |                                      field_indices=field_indices)
129 | 
130 |     # use the vocabulary from pre-trained model for tokenization
131 |     bert_tokenizer = nlp.data.BERTTokenizer(vocabulary, lower=True)
132 |     train_data_transform = train_data.transform(fn=lambda x: icsl_transform(x, vocabulary, label2idx, intent2idx, bert_tokenizer)[0])
133 |     # create data loader
134 |     pad_token_id = vocabulary[PAD]
135 |     pad_label_id = label2idx[PAD]
136 |     batchify_fn = nlp.data.batchify.Tuple(
137 |         nlp.data.batchify.Stack(),
138 |         nlp.data.batchify.Pad(axis=0, pad_val=pad_token_id),
139 |         nlp.data.batchify.Pad(axis=0, pad_val=pad_label_id),
140 |         nlp.data.batchify.Stack('float32'),
141 |         nlp.data.batchify.Stack('float32'))
142 |     train_sampler = nlp.data.FixedBucketSampler(lengths=[len(item[1]) for item in train_data_transform],
143 |                                                 batch_size=batch_size,
144 |                                                 shuffle=True)
145 |     train_dataloader = mx.gluon.data.DataLoader(train_data_transform,
146 |                                                 batchify_fn=batchify_fn,
147 |                                                 batch_sampler=train_sampler)
148 | 
149 |     optimizer_params = {'learning_rate': lr}
150 |     trainer = gluon.Trainer(model.collect_params(), optimizer,
151 |                             optimizer_params, update_on_kvstore=False)
152 | 
153 |     # Collect differentiable parameters
154 |     params = [p for p in model.collect_params().values() if p.grad_req != 'null']
155 |     # Set grad_req if gradient accumulation is required
156 |     if accumulate:
157 |         for p in params:
158 |             p[1].grad_req = 'add'
159 |     # Fix BERT embeddings if required
160 |     for p in model.collect_params().items():
161 |         if 'embed' in p[0]:
162 |             p[1].grad_req = 'null'
163 | 
164 |     epoch_tic = time.time()
165 |     total_num = 0
166 |     log_num = 0
167 |     best_score = (0, 0)
168 |     for epoch_id in range(epochs):
169 |         step_loss = 0
170 |         tic = time.time()
171 |         # train on labeled data
172 |         for batch_id, data in enumerate(train_dataloader):
173 |             # forward and backward
174 |             with mx.autograd.record():
175 |                 if data[0].shape[0] < len(ctx):
176 |                     data = split_and_load(data, [ctx[0]])
177 |                 else:
178 |                     data = split_and_load(data, ctx)
179 |                 for chunk in data:
180 |                     _, token_ids, slot_label, intent_label, valid_length = chunk
181 | 
182 |                     log_num += len(token_ids)
183 |                     total_num += len(token_ids)
184 | 
185 |                     # forward computation
186 |                     intent_pred, slot_pred = model(token_ids, valid_length)
187 |                     ls = icsl_loss_function(intent_pred, slot_pred, intent_label, slot_label, valid_length - 2).mean()
188 | 
189 |                     if accumulate:
190 |                         ls = ls / accumulate
191 |                     ls.backward()
192 |                     step_loss += ls.asscalar()
193 | 
194 |             # update
195 |             if not accumulate or (batch_id + 1) % accumulate == 0:
196 |                 trainer.allreduce_grads()
197 |                 nlp.utils.clip_grad_global_norm(params, 1)
198 |                 trainer.update(1, ignore_stale_grad=True)
199 | 
200 |             if (batch_id + 1) % log_interval == 0:
201 |                 toc = time.time()
202 |                 # update metrics
203 |                 ic_metric.update([intent_label], [intent_pred])
204 |                 sl_metric.update(*process_seq_labels(slot_label, slot_pred, ignore_id=pad_label_id))
205 |                 log.info('Epoch: {}, Batch: {}/{}, speed: {:.2f} samples/s, lr={:.7f}, loss={:.4f}, intent acc={:.3f}, slot acc={:.3f}'
206 |                          .format(epoch_id,
207 |                                  batch_id,
208 |                                  len(train_dataloader),
209 |                                  log_num / (toc - tic),
210 |                                  trainer.learning_rate,
211 |                                  step_loss / log_interval,
212 |                                  ic_metric.get()[1],
213 |                                  sl_metric.get()[1]))
214 |                 tic = time.time()
215 |                 step_loss = 0
216 |                 log_num = 0
217 | 
218 |         mx.nd.waitall()
219 |         epoch_toc = time.time()
220 |         log.info('Time cost: {:.2f} s, Speed: {:.2f} samples/s'
221 |                  .format(epoch_toc - epoch_tic, total_num/(epoch_toc - epoch_tic)))
222 |         # evaluate on development set
223 |         log.info('Evaluate on development set:')
224 |         intent_acc, slot_f1 = evaluate(model=model, eval_input=dev_input)
225 |         if slot_f1 > best_score[1]:
226 |             best_score = (intent_acc, slot_f1)
227 |             model.save_parameters(os.path.join(model_dir, model_name + '.params'))
228 | 
229 | 
230 | def evaluate(model=None, model_name='', eval_input=''):
231 |     """Evaluate the model on validation dataset.
232 |     """
233 |     ## Load model
234 |     bert, vocabulary = nlp.model.get_model('bert_12_768_12',
235 |                                            dataset_name='wiki_multilingual_uncased',
236 |                                            pretrained=True,
237 |                                            ctx=ctx,
238 |                                            use_pooler=False,
239 |                                            use_decoder=False,
240 |                                            use_classifier=False)
241 |     if model is None:
242 |         assert model_name != ''
243 |         model = BERTForICSL(bert, num_slot_labels=len(label2idx), num_intents=len(intent2idx))
244 |         model.initialize(ctx=ctx)
245 |         model.hybridize(static_alloc=True)
246 |         model.load_parameters(os.path.join(model_dir, model_name + '.params'))
247 | 
248 |     idx2label = {}
249 |     for label, idx in label2idx.items():
250 |         idx2label[idx] = label
251 |     ## Load dev dataset
252 |     field_separator = nlp.data.Splitter('\t')
253 |     field_indices = [1, 3, 4, 0]
254 |     eval_data = nlp.data.TSVDataset(filename=eval_input,
255 |                                     field_separator=field_separator,
256 |                                     num_discard_samples=1,
257 |                                     field_indices=field_indices)
258 | 
259 |     bert_tokenizer = nlp.data.BERTTokenizer(vocabulary, lower=True)
260 | 
261 |     dev_alignment = {}
262 |     eval_data_transform = []
263 |     for sample in eval_data:
264 |         sample, alignment = icsl_transform(sample, vocabulary, label2idx, intent2idx, bert_tokenizer)
265 |         eval_data_transform += [sample]
266 |         dev_alignment[sample[0]] = alignment
267 |     log.info('The number of examples after preprocessing: {}'
268 |              .format(len(eval_data_transform)))
269 | 
270 |     test_batch_size = 16
271 |     pad_token_id = vocabulary[PAD]
272 |     pad_label_id = label2idx[PAD]
273 |     batchify_fn = nlp.data.batchify.Tuple(
274 |         nlp.data.batchify.Stack(),
275 |         nlp.data.batchify.Pad(axis=0, pad_val=pad_token_id),
276 |         nlp.data.batchify.Pad(axis=0, pad_val=pad_label_id),
277 |         nlp.data.batchify.Stack('float32'),
278 |         nlp.data.batchify.Stack('float32'))
279 |     eval_dataloader = mx.gluon.data.DataLoader(
280 |         eval_data_transform,
281 |         batchify_fn=batchify_fn,
282 |         num_workers=4, batch_size=test_batch_size, shuffle=False, last_batch='keep')
283 | 
284 |     _Result = collections.namedtuple(
285 |         '_Result', ['intent', 'slot_labels'])
286 |     all_results = {}
287 | 
288 |     total_num = 0
289 |     for data in eval_dataloader:
290 |         example_ids, token_ids, _, _, valid_length = data
291 |         total_num += len(token_ids)
292 |         # load data to GPU
293 |         token_ids = token_ids.astype('float32').as_in_context(ctx[0])
294 |         valid_length = valid_length.astype('float32').as_in_context(ctx[0])
295 | 
296 |         # forward computation
297 |         intent_pred, slot_pred = model(token_ids, valid_length)
298 |         intent_pred = intent_pred.asnumpy()
299 |         slot_pred = slot_pred.asnumpy()
300 |         valid_length = valid_length.asnumpy()
301 | 
302 |         for eid, y_intent, y_slot, length in zip(example_ids, intent_pred, slot_pred, valid_length):
303 |             eid = eid.asscalar()
304 |             length = int(length) - 2
305 |             intent_id = y_intent.argmax(axis=-1)
306 |             slot_ids = y_slot.argmax(axis=-1).tolist()[:length]
307 |             slot_names = [idx2label[idx] for idx in slot_ids]
308 |             merged_slot_names = merge_slots(slot_names, dev_alignment[eid] + [length])
309 |             if eid not in all_results:
310 |                 all_results[eid] = _Result(intent_id, merged_slot_names)
311 | 
312 |     example_ids, utterances, labels, intents = load_tsv(eval_input)
313 |     pred_intents = []
314 |     label_intents = []
315 |     for eid, intent in zip(example_ids, intents):
316 |         label_intents.append(label2index(intent2idx, intent))
317 |         pred_intents.append(all_results[eid].intent)
318 |     intent_acc = sklearn.metrics.accuracy_score(label_intents, pred_intents)
319 |     log.info("Intent Accuracy: %.4f" % intent_acc)
320 | 
321 |     pred_icsl = []
322 |     label_icsl = []
323 |     for eid, intent, slot_labels in zip(example_ids, intents, labels):
324 |         label_icsl.append(str(label2index(intent2idx, intent)) + ' ' + ' '.join(slot_labels))
325 |         pred_icsl.append(str(all_results[eid].intent) + ' ' + ' '.join(all_results[eid].slot_labels))
326 |     exact_match = sklearn.metrics.accuracy_score(label_icsl, pred_icsl)
327 |     log.info("Exact Match: %.4f" % exact_match)
328 | 
329 |     with open(conll_prediction_file, "w") as fw:
330 |         for eid, utterance, labels in zip(example_ids, utterances, labels):
331 |             preds = all_results[eid].slot_labels
332 |             for w, l, p in zip(utterance, labels, preds):
333 |                 fw.write(' '.join([w, l, p]) + '\n')
334 |             fw.write('\n')
335 |     proc = subprocess.Popen(["perl", "conlleval.pl"], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
336 |     with open(conll_prediction_file) as f:
337 |         stdout = proc.communicate(f.read().encode())[0]
338 |     result = stdout.decode('utf-8').split('\n')[1]
339 |     slot_f1 = float(result.split()[-1].strip())
340 |     log.info("Slot Labeling: %s" % result)
341 |     return intent_acc, slot_f1
342 | 
343 | # extract labels
344 | train_input = data_dir + 'atis_train.tsv'
345 | intent2idx, label2idx = get_label_indices(train_input)
346 | 
347 | log.info('Train on all languages:')
348 | model_name = 'model_bert_joint.' + str(random_seed)
349 | train_input = data_dir + 'atis_train_all.tsv'
350 | dev_input = data_dir + 'atis_dev.tsv'
351 | train(model_name, train_input, dev_input)
352 | 
353 | for lang in ['EN', 'ES', 'DE', 'ZH', 'JA', 'PT', 'FR', 'HI', 'TR']:
354 |     log.info('Evaluate on %s:' % lang)
355 |     model_name = 'model_bert_joint.' + str(random_seed)
356 |     if lang == 'EN':
357 |         test_input = data_dir + 'atis_test.tsv'
358 |     else:
359 |         test_input = data_dir + 'atis_test_' + lang + '.tsv'
360 |     evaluate(model_name=model_name, eval_input=test_input)
361 | 


--------------------------------------------------------------------------------
/code/scripts/bert_mt.py:
--------------------------------------------------------------------------------
  1 | import collections
  2 | import gluonnlp as nlp
  3 | import logging
  4 | import mxnet as mx
  5 | import os
  6 | import sklearn.metrics
  7 | import subprocess
  8 | import sys
  9 | import time
 10 | import warnings
 11 | 
 12 | from bert import *
 13 | from mxnet import gluon
 14 | from mxnet.gluon import Block, nn, rnn
 15 | 
 16 | from loss import *
 17 | from utils import *
 18 | 
 19 | random_seed = int(sys.argv[1])
 20 | warnings.filterwarnings('ignore')
 21 | data_dir = "../data/"
 22 | model_dir = "../exp/"
 23 | conll_prediction_file = data_dir + "conll.pred"
 24 | 
 25 | PAD = '[PAD]'
 26 | mx.random.seed(random_seed)
 27 | ctx = [mx.gpu(0), mx.gpu(1)]
 28 | 
 29 | log = logging.getLogger('gluonnlp')
 30 | log.setLevel(logging.DEBUG)
 31 | formatter = logging.Formatter(fmt='[%(levelname)s] %(name)s:%(asctime)s %(message)s', datefmt='%H:%M:%S')
 32 | fh = logging.FileHandler(os.path.join(model_dir, 'bert_mt.' + str(random_seed) + '.log'), mode='w')
 33 | fh.setLevel(logging.INFO)
 34 | fh.setFormatter(formatter)
 35 | console = logging.StreamHandler()
 36 | console.setLevel(logging.INFO)
 37 | console.setFormatter(formatter)
 38 | log.addHandler(console)
 39 | log.addHandler(fh)
 40 | 
 41 | class BERTForICSL(Block):
 42 |     """Model for IC/SL task.
 43 | 
 44 |     The model feeds token ids into BERT to get the sequence
 45 |     representations, then apply two dense layers for IC/SL task.
 46 |     """
 47 | 
 48 |     def __init__(self, bert, num_slot_labels, num_intents, hidden_size=768, dropout=.1, prefix=None, params=None):
 49 |         super(BERTForICSL, self).__init__(prefix=prefix, params=params)
 50 |         self.bert = bert
 51 |         with self.name_scope():
 52 |             self.dropout = nn.Dropout(rate=dropout)
 53 |             # IC/SL classifier
 54 |             self.slot_classifier = nn.Dense(units=num_slot_labels,
 55 |                                             in_units=hidden_size,
 56 |                                             flatten=False)
 57 |             self.intent_classifier = nn.Dense(units=num_intents,
 58 |                                               in_units=hidden_size,
 59 |                                               flatten=False)
 60 | 
 61 |     def encode(self, inputs, valid_length):
 62 |         types = mx.nd.zeros_like(inputs)
 63 |         encoded = self.bert(inputs, types, valid_length)
 64 |         encoded = self.dropout(encoded)
 65 |         return encoded
 66 | 
 67 |     def forward(self, inputs, valid_length):  # pylint: disable=arguments-differ
 68 |         """Generate unnormalized scores for the given input sequences.
 69 | 
 70 |         Parameters
 71 |         ----------
 72 |         inputs : NDArray, shape (batch_size, seq_length)
 73 |             Input words for the sequences.
 74 |         valid_length : NDArray or None, shape (batch_size)
 75 |             Valid length of the sequence. This is used to mask the padded tokens.
 76 | 
 77 |         Returns
 78 |         -------
 79 |         intent_prediction: NDArray
 80 |             Shape (batch_size, num_intents)
 81 |         slot_prediction : NDArray
 82 |             Shape (batch_size, seq_length, num_slot_labels)
 83 |         """
 84 |         # hidden: (batch_size, seq_length, hidden_size)
 85 |         hidden = self.encode(inputs, valid_length)
 86 |         # get intent and slot label predictions
 87 |         intent_prediction = self.intent_classifier(hidden[:, 0, :])
 88 |         slot_prediction = self.slot_classifier(hidden[:, 1:, :])
 89 |         return intent_prediction, slot_prediction
 90 | 
 91 | 
 92 | def train(model_name, train_input):
 93 |     """Training function."""
 94 |     ## Arguments
 95 |     log_interval = 100
 96 |     batch_size = 32
 97 |     lr = 1e-5
 98 |     optimizer = 'adam'
 99 |     accumulate = None
100 |     epochs = 20
101 | 
102 |     ## Load BERT model and vocabulary
103 |     bert, vocabulary = nlp.model.get_model('bert_12_768_12',
104 |                                            dataset_name='wiki_multilingual_uncased',
105 |                                            pretrained=True,
106 |                                            ctx=ctx,
107 |                                            use_pooler=False,
108 |                                            use_decoder=False,
109 |                                            use_classifier=False)
110 | 
111 |     model = BERTForICSL(bert, num_slot_labels=len(label2idx), num_intents=len(intent2idx))
112 |     model.initialize(init=mx.init.Uniform(0.1), ctx=ctx)
113 |     model.hybridize(static_alloc=True)
114 | 
115 |     icsl_loss_function = ICSLLoss()
116 |     icsl_loss_function.hybridize(static_alloc=True)
117 | 
118 |     ic_metric = mx.metric.Accuracy()
119 |     sl_metric = mx.metric.Accuracy()
120 | 
121 |     ## Load labeled data
122 |     field_separator = nlp.data.Splitter('\t')
123 |     # fields to select from the file: utterance, slot labels, intent, uid
124 |     field_indices = [1, 3, 4, 0]
125 |     train_data = nlp.data.TSVDataset(filename=train_input,
126 |                                      field_separator=field_separator,
127 |                                      num_discard_samples=1,
128 |                                      field_indices=field_indices)
129 | 
130 |     # use the vocabulary from pre-trained model for tokenization
131 |     bert_tokenizer = nlp.data.BERTTokenizer(vocabulary, lower=True)
132 |     train_data_transform = train_data.transform(fn=lambda x: icsl_transform(x, vocabulary, label2idx, intent2idx, bert_tokenizer)[0])
133 |     # create data loader
134 |     pad_token_id = vocabulary[PAD]
135 |     pad_label_id = label2idx[PAD]
136 |     batchify_fn = nlp.data.batchify.Tuple(
137 |         nlp.data.batchify.Stack(),
138 |         nlp.data.batchify.Pad(axis=0, pad_val=pad_token_id),
139 |         nlp.data.batchify.Pad(axis=0, pad_val=pad_label_id),
140 |         nlp.data.batchify.Stack('float32'),
141 |         nlp.data.batchify.Stack('float32'))
142 |     train_sampler = nlp.data.FixedBucketSampler(lengths=[len(item[1]) for item in train_data_transform],
143 |                                                 batch_size=batch_size,
144 |                                                 shuffle=True)
145 |     train_dataloader = mx.gluon.data.DataLoader(train_data_transform,
146 |                                                 batchify_fn=batchify_fn,
147 |                                                 batch_sampler=train_sampler)
148 | 
149 |     optimizer_params = {'learning_rate': lr}
150 |     trainer = gluon.Trainer(model.collect_params(), optimizer,
151 |                             optimizer_params, update_on_kvstore=False)
152 | 
153 |     # Collect differentiable parameters
154 |     params = [p for p in model.collect_params().values() if p.grad_req != 'null']
155 |     # Set grad_req if gradient accumulation is required
156 |     if accumulate:
157 |         for p in params:
158 |             p[1].grad_req = 'add'
159 |     # Fix BERT embeddings if required
160 |     for p in model.collect_params().items():
161 |         if 'embed' in p[0]:
162 |             p[1].grad_req = 'null'
163 | 
164 |     epoch_tic = time.time()
165 |     total_num = 0
166 |     log_num = 0
167 |     for epoch_id in range(epochs):
168 |         step_loss = 0
169 |         tic = time.time()
170 |         # train on labeled data
171 |         for batch_id, data in enumerate(train_dataloader):
172 |             # forward and backward
173 |             with mx.autograd.record():
174 |                 if data[0].shape[0] < len(ctx):
175 |                     data = split_and_load(data, [ctx[0]])
176 |                 else:
177 |                     data = split_and_load(data, ctx)
178 |                 for chunk in data:
179 |                     _, token_ids, slot_label, intent_label, valid_length = chunk
180 | 
181 |                     log_num += len(token_ids)
182 |                     total_num += len(token_ids)
183 | 
184 |                     # forward computation
185 |                     intent_pred, slot_pred = model(token_ids, valid_length)
186 |                     ls = icsl_loss_function(intent_pred, slot_pred, intent_label, slot_label, valid_length - 2).mean()
187 | 
188 |                     if accumulate:
189 |                         ls = ls / accumulate
190 |                     ls.backward()
191 |                     step_loss += ls.asscalar()
192 | 
193 |             # update
194 |             if not accumulate or (batch_id + 1) % accumulate == 0:
195 |                 trainer.allreduce_grads()
196 |                 nlp.utils.clip_grad_global_norm(params, 1)
197 |                 trainer.update(1, ignore_stale_grad=True)
198 | 
199 |             if (batch_id + 1) % log_interval == 0:
200 |                 toc = time.time()
201 |                 # update metrics
202 |                 ic_metric.update([intent_label], [intent_pred])
203 |                 sl_metric.update(*process_seq_labels(slot_label, slot_pred, ignore_id=pad_label_id))
204 |                 log.info('Epoch: {}, Batch: {}/{}, speed: {:.2f} samples/s, lr={:.7f}, loss={:.4f}, intent acc={:.3f}, slot acc={:.3f}'
205 |                          .format(epoch_id,
206 |                                  batch_id,
207 |                                  len(train_dataloader),
208 |                                  log_num / (toc - tic),
209 |                                  trainer.learning_rate,
210 |                                  step_loss / log_interval,
211 |                                  ic_metric.get()[1],
212 |                                  sl_metric.get()[1]))
213 |                 tic = time.time()
214 |                 step_loss = 0
215 |                 log_num = 0
216 | 
217 |         mx.nd.waitall()
218 |         epoch_toc = time.time()
219 |         log.info('Time cost: {:.2f} s, Speed: {:.2f} samples/s'
220 |                  .format(epoch_toc - epoch_tic, total_num/(epoch_toc - epoch_tic)))
221 |         model.save_parameters(os.path.join(model_dir, model_name + '.params'))
222 | 
223 | 
224 | def evaluate(model=None, model_name='', eval_input=''):
225 |     """Evaluate the model on validation dataset.
226 |     """
227 |     ## Load model
228 |     bert, vocabulary = nlp.model.get_model('bert_12_768_12',
229 |                                            dataset_name='wiki_multilingual_uncased',
230 |                                            pretrained=True,
231 |                                            ctx=ctx,
232 |                                            use_pooler=False,
233 |                                            use_decoder=False,
234 |                                            use_classifier=False)
235 |     if model is None:
236 |         assert model_name != ''
237 |         model = BERTForICSL(bert, num_slot_labels=len(label2idx), num_intents=len(intent2idx))
238 |         model.initialize(ctx=ctx)
239 |         model.hybridize(static_alloc=True)
240 |         model.load_parameters(os.path.join(model_dir, model_name + '.params'))
241 | 
242 |     idx2label = {}
243 |     for label, idx in label2idx.items():
244 |         idx2label[idx] = label
245 |     ## Load dev dataset
246 |     field_separator = nlp.data.Splitter('\t')
247 |     field_indices = [1, 3, 4, 0]
248 |     eval_data = nlp.data.TSVDataset(filename=eval_input,
249 |                                     field_separator=field_separator,
250 |                                     num_discard_samples=1,
251 |                                     field_indices=field_indices)
252 | 
253 |     bert_tokenizer = nlp.data.BERTTokenizer(vocabulary, lower=True)
254 | 
255 |     dev_alignment = {}
256 |     eval_data_transform = []
257 |     for sample in eval_data:
258 |         sample, alignment = icsl_transform(sample, vocabulary, label2idx, intent2idx, bert_tokenizer)
259 |         eval_data_transform += [sample]
260 |         dev_alignment[sample[0]] = alignment
261 |     log.info('The number of examples after preprocessing: {}'
262 |              .format(len(eval_data_transform)))
263 | 
264 |     test_batch_size = 16
265 |     pad_token_id = vocabulary[PAD]
266 |     pad_label_id = label2idx[PAD]
267 |     batchify_fn = nlp.data.batchify.Tuple(
268 |         nlp.data.batchify.Stack(),
269 |         nlp.data.batchify.Pad(axis=0, pad_val=pad_token_id),
270 |         nlp.data.batchify.Pad(axis=0, pad_val=pad_label_id),
271 |         nlp.data.batchify.Stack('float32'),
272 |         nlp.data.batchify.Stack('float32'))
273 |     eval_dataloader = mx.gluon.data.DataLoader(
274 |         eval_data_transform,
275 |         batchify_fn=batchify_fn,
276 |         num_workers=4, batch_size=test_batch_size, shuffle=False, last_batch='keep')
277 | 
278 |     _Result = collections.namedtuple(
279 |         '_Result', ['intent', 'slot_labels'])
280 |     all_results = {}
281 | 
282 |     total_num = 0
283 |     for data in eval_dataloader:
284 |         example_ids, token_ids, _, _, valid_length = data
285 |         total_num += len(token_ids)
286 |         # load data to GPU
287 |         token_ids = token_ids.astype('float32').as_in_context(ctx[0])
288 |         valid_length = valid_length.astype('float32').as_in_context(ctx[0])
289 | 
290 |         # forward computation
291 |         intent_pred, slot_pred = model(token_ids, valid_length)
292 |         intent_pred = intent_pred.asnumpy()
293 |         slot_pred = slot_pred.asnumpy()
294 |         valid_length = valid_length.asnumpy()
295 | 
296 |         for eid, y_intent, y_slot, length in zip(example_ids, intent_pred, slot_pred, valid_length):
297 |             eid = eid.asscalar()
298 |             length = int(length) - 2
299 |             intent_id = y_intent.argmax(axis=-1)
300 |             slot_ids = y_slot.argmax(axis=-1).tolist()[:length]
301 |             slot_names = [idx2label[idx] for idx in slot_ids]
302 |             merged_slot_names = merge_slots(slot_names, dev_alignment[eid] + [length])
303 |             if eid not in all_results:
304 |                 all_results[eid] = _Result(intent_id, merged_slot_names)
305 | 
306 |     example_ids, utterances, labels, intents = load_tsv(eval_input)
307 |     pred_intents = []
308 |     label_intents = []
309 |     for eid, intent in zip(example_ids, intents):
310 |         label_intents.append(label2index(intent2idx, intent))
311 |         pred_intents.append(all_results[eid].intent)
312 |     intent_acc = sklearn.metrics.accuracy_score(label_intents, pred_intents)
313 |     log.info("Intent Accuracy: %.4f" % intent_acc)
314 | 
315 |     pred_icsl = []
316 |     label_icsl = []
317 |     for eid, intent, slot_labels in zip(example_ids, intents, labels):
318 |         label_icsl.append(str(label2index(intent2idx, intent)) + ' ' + ' '.join(slot_labels))
319 |         pred_icsl.append(str(all_results[eid].intent) + ' ' + ' '.join(all_results[eid].slot_labels))
320 |     exact_match = sklearn.metrics.accuracy_score(label_icsl, pred_icsl)
321 |     log.info("Exact Match: %.4f" % exact_match)
322 | 
323 |     with open(conll_prediction_file, "w") as fw:
324 |         for eid, utterance, labels in zip(example_ids, utterances, labels):
325 |             preds = all_results[eid].slot_labels
326 |             for w, l, p in zip(utterance, labels, preds):
327 |                 fw.write(' '.join([w, l, p]) + '\n')
328 |             fw.write('\n')
329 |     proc = subprocess.Popen(["perl", "conlleval.pl"], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
330 |     with open(conll_prediction_file) as f:
331 |         stdout = proc.communicate(f.read().encode())[0]
332 |     result = stdout.decode('utf-8').split('\n')[1]
333 |     slot_f1 = float(result.split()[-1].strip())
334 |     log.info("Slot Labeling: %s" % result)
335 |     return intent_acc, slot_f1
336 | 
337 | 
338 | # extract labels
339 | train_input = data_dir + 'atis_train.tsv'
340 | intent2idx, label2idx = get_label_indices(train_input)
341 | 
342 | # Train
343 | for lang in ['ES', 'DE', 'ZH', 'JA', 'PT', 'FR', 'HI', 'TR']:
344 |     log.info('Train on %s:' % lang)
345 |     model_name = 'model_bert_mt_' + lang + '.' + str(random_seed)
346 |     train_input = data_dir + 'train_translated_' + lang + '.tsv'
347 |     train(model_name, train_input)
348 | 
349 | for lang in ['ES', 'DE', 'ZH', 'JA', 'PT', 'FR', 'HI', 'TR']:
350 |     log.info('Evaluate on %s:' % lang)
351 |     model_name = 'model_bert_mt_' + lang + '.' + str(random_seed)
352 |     test_input = data_dir + 'atis_test_' + lang + '.tsv'
353 |     evaluate(model_name=model_name, eval_input=test_input)
354 | 


--------------------------------------------------------------------------------
/code/scripts/bert_soft_align.py:
--------------------------------------------------------------------------------
  1 | import collections
  2 | import gluonnlp as nlp
  3 | import logging
  4 | import mxnet as mx
  5 | import os
  6 | import sklearn.metrics
  7 | import subprocess
  8 | import sys
  9 | import time
 10 | import warnings
 11 | 
 12 | from bert import *
 13 | from mxnet import gluon
 14 | from mxnet.gluon import Block, nn, rnn
 15 | from mxnet.gluon.loss import Loss, SoftmaxCELoss
 16 | 
 17 | from layers import *
 18 | from loss import *
 19 | from utils import *
 20 | 
 21 | random_seed = int(sys.argv[1])
 22 | warnings.filterwarnings('ignore')
 23 | data_dir = "../data/"
 24 | model_dir = "../exp/"
 25 | conll_prediction_file = data_dir + "conll.pred"
 26 | 
 27 | PAD = '[PAD]'
 28 | INF_INT = int(1e18)
 29 | mx.random.seed(random_seed)
 30 | ctx = [mx.gpu(i) for i in range(4)]
 31 | 
 32 | log = logging.getLogger('gluonnlp')
 33 | log.setLevel(logging.DEBUG)
 34 | formatter = logging.Formatter(fmt='[%(levelname)s] %(name)s:%(asctime)s %(message)s', datefmt='%H:%M:%S')
 35 | fh = logging.FileHandler(os.path.join(model_dir, 'bert_align.' + str(random_seed) + '.log'), mode='w')
 36 | fh.setLevel(logging.INFO)
 37 | fh.setFormatter(formatter)
 38 | console = logging.StreamHandler()
 39 | console.setLevel(logging.INFO)
 40 | console.setFormatter(formatter)
 41 | log.addHandler(console)
 42 | log.addHandler(fh)
 43 | 
 44 | class MultiTaskICSL(Block):
 45 |     """Model for IC/SL task.
 46 | 
 47 |     The model feeds token ids into BERT to get the sequence
 48 |     representations, then apply two dense layers for IC/SL task.
 49 |     """
 50 | 
 51 |     def __init__(self, bert, vocab_size, num_slot_labels, num_intents, hidden_size=768, dropout=.1, attn_temperature=.1, prefix=None, params=None):
 52 |         super(MultiTaskICSL, self).__init__(prefix=prefix, params=params)
 53 |         self.bert = bert
 54 |         with self.name_scope():
 55 |             self.dropout = nn.Dropout(rate=dropout)
 56 |             # IC/SL classifier
 57 |             self.slot_classifier = nn.Dense(units=num_slot_labels,
 58 |                                             in_units=hidden_size,
 59 |                                             flatten=False)
 60 |             self.intent_classifier = nn.Dense(units=num_intents,
 61 |                                               in_units=hidden_size,
 62 |                                               flatten=False)
 63 |             # LM output layer
 64 |             self.lm_output_layer = nn.Dense(units=vocab_size,
 65 |                                             in_units=hidden_size,
 66 |                                             params=self.bert.word_embed.params,
 67 |                                             flatten=False)
 68 |             # attention map layer
 69 |             self.attention_map_layer = AttentionMapCell(units=hidden_size,
 70 |                                                         hidden_size=hidden_size * 2,
 71 |                                                         attn_temperature=attn_temperature)
 72 | 
 73 |     def encode(self, inputs, valid_length):
 74 |         types = mx.nd.zeros_like(inputs)
 75 |         encoded = self.bert(inputs, types, valid_length)
 76 |         encoded = self.dropout(encoded)
 77 |         return encoded
 78 | 
 79 |     def forward(self, inputs, valid_length):  # pylint: disable=arguments-differ
 80 |         """Generate unnormalized scores for the given input sequences.
 81 | 
 82 |         Parameters
 83 |         ----------
 84 |         inputs : NDArray, shape (batch_size, seq_length)
 85 |             Input words for the sequences.
 86 |         valid_length : NDArray or None, shape (batch_size)
 87 |             Valid length of the sequence. This is used to mask the padded tokens.
 88 | 
 89 |         Returns
 90 |         -------
 91 |         intent_prediction: NDArray
 92 |             Shape (batch_size, num_intents)
 93 |         slot_prediction : NDArray
 94 |             Shape (batch_size, seq_length, num_slot_labels)
 95 |         """
 96 |         # hidden: (batch_size, seq_length, hidden_size)
 97 |         hidden = self.encode(inputs, valid_length)
 98 |         # get intent and slot label predictions
 99 |         intent_prediction = self.intent_classifier(hidden[:, 0, :])
100 |         slot_prediction = self.slot_classifier(hidden[:, 1:, :])
101 |         return intent_prediction, slot_prediction
102 | 
103 |     def translate_and_predict(self, source, target, src_valid_length):
104 |         """Generate unnormalized scores for the given input sequences.
105 | 
106 |         Parameters
107 |         ----------
108 |         source : NDArray, shape (batch_size, src_seq_length)
109 |             Input words for the source sequences.
110 |         target : NDArray, shape (batch_size, tgt_seq_length)
111 |             Input words for the target sequences.
112 |         src_valid_length : NDArray or None, shape (batch_size)
113 |             Valid length of the source sequence. This is used to mask the padded tokens.
114 | 
115 |         Returns
116 |         -------
117 |         translation : NDArray
118 |             Shape (batch_size, tgt_seq_length, vocab_size)
119 |         intent_prediction: NDArray
120 |             Shape (batch_size, num_intents)
121 |         slot_prediction : NDArray
122 |             Shape (batch_size, tgt_seq_length, num_slot_labels)
123 |         """
124 |         # src_len_mask: (batch_size, tgt_seq_length, src_seq_length)
125 |         src_len_mask = None
126 |         if src_valid_length is not None:
127 |             dtype = src_valid_length.dtype
128 |             ctx = src_valid_length.context
129 |             src_len_mask = mx.nd.broadcast_lesser(
130 |                 mx.nd.arange(source.shape[1], ctx=ctx, dtype=dtype).reshape((1, -1)),
131 |                 src_valid_length.reshape((-1, 1)))
132 |             src_len_mask = mx.nd.broadcast_axes(mx.nd.expand_dims(src_len_mask, axis=1), axis=1, size=target.shape[1])
133 |         # src_encoded: (batch_size, src_seq_length, hidden_size)
134 |         src_encoded = self.encode(source, src_valid_length)
135 |         # tgt_embed: (batch_size, tgt_seq_length, hidden_size)
136 |         tgt_embed = self.bert.word_embed(target)
137 |         # (batch_size, tgt_seq_length, hidden_size)
138 |         decoded, attn_output = self.attention_map_layer(tgt_embed, src_encoded, src_len_mask)
139 |         # translation: (batch_size, tgt_seq_length - 1, vocab_size)
140 |         translation = self.lm_output_layer(decoded[:, 1:, :])
141 |         # get intent and slot label predictions
142 |         intent_prediction = self.intent_classifier(src_encoded[:, 0, :])
143 |         slot_prediction = self.slot_classifier(attn_output[:, 1:, :])
144 |         return translation, intent_prediction, slot_prediction
145 | 
146 | def train(model_name, train_input, para_input):
147 |     """Training function."""
148 |     ## Arguments
149 |     log_interval = 100
150 |     batch_size = 32
151 |     lr = 1e-5
152 |     optimizer = 'adam'
153 |     accumulate = None
154 |     epochs = 20
155 |     mt_batches_per_epoch = 200
156 |     icsl_batches_per_epoch = 200
157 | 
158 |     ## Load BERT model and vocabulary
159 |     bert, vocabulary = nlp.model.get_model('bert_12_768_12',
160 |                                            dataset_name='wiki_multilingual_uncased',
161 |                                            pretrained=True,
162 |                                            ctx=ctx,
163 |                                            use_pooler=False,
164 |                                            use_decoder=False,
165 |                                            use_classifier=False)
166 | 
167 |     model = MultiTaskICSL(bert, len(vocabulary), num_slot_labels=len(label2idx), num_intents=len(intent2idx))
168 |     model.initialize(init=mx.init.Uniform(0.1), ctx=ctx)
169 |     model.hybridize(static_alloc=True)
170 | 
171 |     icsl_loss_function = ICSLLoss()
172 |     icsl_loss_function.hybridize(static_alloc=True)
173 |     ce_loss_function = SoftmaxCELoss()
174 |     ce_loss_function.hybridize(static_alloc=True)
175 |     mce_loss_function = SoftmaxCEMaskedLoss()
176 |     mce_loss_function.hybridize(static_alloc=True)
177 | 
178 |     ic_metric = mx.metric.Accuracy()
179 |     sl_metric = mx.metric.Accuracy()
180 | 
181 |     ## Load labeled data
182 |     field_separator = nlp.data.Splitter('\t')
183 |     # fields to select from the file: utterance, slot labels, intent, uid
184 |     field_indices = [1, 3, 4, 0]
185 |     train_data = nlp.data.TSVDataset(filename=train_input,
186 |                                      field_separator=field_separator,
187 |                                      num_discard_samples=1,
188 |                                      field_indices=field_indices)
189 | 
190 |     # use the vocabulary from pre-trained model for tokenization
191 |     bert_tokenizer = nlp.data.BERTTokenizer(vocabulary, lower=True)
192 |     train_data_transform = train_data.transform(fn=lambda x: icsl_transform(x, vocabulary, label2idx, intent2idx, bert_tokenizer)[0])
193 |     # create data loader
194 |     pad_token_id = vocabulary[PAD]
195 |     pad_label_id = label2idx[PAD]
196 |     batchify_fn = nlp.data.batchify.Tuple(
197 |         nlp.data.batchify.Stack(),
198 |         nlp.data.batchify.Pad(axis=0, pad_val=pad_token_id),
199 |         nlp.data.batchify.Pad(axis=0, pad_val=pad_label_id),
200 |         nlp.data.batchify.Stack('float32'),
201 |         nlp.data.batchify.Stack('float32'))
202 |     train_sampler = nlp.data.FixedBucketSampler(lengths=[len(item[1]) for item in train_data_transform],
203 |                                                 batch_size=batch_size,
204 |                                                 shuffle=True)
205 |     train_dataloader = mx.gluon.data.DataLoader(train_data_transform,
206 |                                                 batchify_fn=batchify_fn,
207 |                                                 batch_sampler=train_sampler)
208 | 
209 |     ## Load parallel data
210 |     field_separator = nlp.data.Splitter('\t')
211 |     # fields to select from the file: utterance, uid
212 |     field_indices = [0, 1, 2, 3]
213 |     para_data = nlp.data.TSVDataset(filename=para_input,
214 |                                     field_separator=field_separator,
215 |                                     num_discard_samples=0,
216 |                                     field_indices=field_indices)
217 | 
218 |     # use the vocabulary from pre-trained model for tokenization
219 |     bert_tokenizer = nlp.data.BERTTokenizer(vocabulary, lower=True)
220 |     para_data_transform = para_data.transform(fn=lambda x: parallel_icsl_transform(x, vocabulary, label2idx, intent2idx, bert_tokenizer))
221 |     # create data loader
222 |     batchify_fn = nlp.data.batchify.Tuple(
223 |         nlp.data.batchify.Pad(axis=0, pad_val=pad_token_id),
224 |         nlp.data.batchify.Pad(axis=0, pad_val=pad_token_id),
225 |         nlp.data.batchify.Pad(axis=0, pad_val=pad_label_id),
226 |         nlp.data.batchify.Stack('float32'),
227 |         nlp.data.batchify.Stack('float32'),
228 |         nlp.data.batchify.Stack('float32'))
229 |     train_sampler = nlp.data.FixedBucketSampler(lengths=[len(item[0]) for item in para_data_transform],
230 |                                                 batch_size=batch_size,
231 |                                                 shuffle=True)
232 |     para_dataloader = mx.gluon.data.DataLoader(para_data_transform,
233 |                                                batchify_fn=batchify_fn,
234 |                                                batch_sampler=train_sampler)
235 | 
236 |     optimizer_params = {'learning_rate': lr}
237 |     trainer = gluon.Trainer(model.collect_params(), optimizer,
238 |                             optimizer_params, update_on_kvstore=False)
239 |     optimizer_params = {'learning_rate': lr}
240 |     mt_trainer = gluon.Trainer(model.collect_params(), optimizer,
241 |                                optimizer_params, update_on_kvstore=False)
242 | 
243 |     # Collect differentiable parameters
244 |     params = [p for p in model.collect_params().values() if p.grad_req != 'null']
245 |     # Set grad_req if gradient accumulation is required
246 |     if accumulate:
247 |         for p in params:
248 |             p[1].grad_req = 'add'
249 |     # Fix BERT embeddings if required
250 |     for p in model.collect_params().items():
251 |         if 'embed' in p[0]:
252 |             p[1].grad_req = 'null'
253 | 
254 |     epoch_tic = time.time()
255 |     total_num = 0
256 |     log_num = 0
257 |     for epoch_id in range(epochs):
258 |         mt_loss, icsl_loss, step_loss = 0, 0, 0
259 |         tic = time.time()
260 | 
261 |         # train on parallel data
262 |         para_data_iterator = iter(para_dataloader)
263 |         num_batches = mt_batches_per_epoch if epoch_id > 0 else INF_INT
264 |         for batch_id in range(num_batches):
265 |             data = next(para_data_iterator, None)
266 |             if data is None:
267 |                 break
268 |             # forward and backward
269 |             with mx.autograd.record():
270 |                 if data[0].shape[0] < len(ctx):
271 |                     data = split_and_load(data, [ctx[0]])
272 |                 else:
273 |                     data = split_and_load(data, ctx)
274 |                 for chunk in data:
275 |                     source, target, slot_label, intent_label, src_valid_len, tgt_valid_len = chunk
276 | 
277 |                     # forward computation
278 |                     translation, intent_pred, slot_pred = model.translate_and_predict(source, target, src_valid_len)
279 |                     mt_ls = mce_loss_function(translation, target[:, 1:], tgt_valid_len - 1).mean()
280 |                     icsl_ls = icsl_loss_function(intent_pred, slot_pred, intent_label, slot_label, tgt_valid_len - 2).mean()
281 |                     ls = mt_ls + icsl_ls
282 | 
283 |                     if accumulate:
284 |                         ls = ls / accumulate
285 |                     ls.backward()
286 |                     mt_loss += mt_ls.asscalar()
287 |                     icsl_loss += icsl_ls.asscalar()
288 | 
289 |             # update
290 |             if not accumulate or (batch_id + 1) % accumulate == 0:
291 |                 mt_trainer.allreduce_grads()
292 |                 nlp.utils.clip_grad_global_norm(params, 1)
293 |                 mt_trainer.update(1, ignore_stale_grad=True)
294 |             if (batch_id + 1) % log_interval == 0:
295 |                 log.info('Epoch: {}, Batch: {}/{}, lr={:.7f}, mt_loss={:.4f}, icsl_loss={:.4f}'
296 |                          .format(epoch_id,
297 |                                  batch_id,
298 |                                  len(para_dataloader),
299 |                                  mt_trainer.learning_rate,
300 |                                  mt_loss / log_interval,
301 |                                  icsl_loss / log_interval))
302 |                 mt_loss = 0
303 |                 icsl_loss = 0
304 | 
305 |         # train on labeled data
306 |         train_data_iterator = iter(train_dataloader)
307 |         for batch_id in range(icsl_batches_per_epoch):
308 |             data = next(train_data_iterator, None)
309 |             if data is None:
310 |                 break
311 |             # forward and backward
312 |             with mx.autograd.record():
313 |                 if data[0].shape[0] < len(ctx):
314 |                     data = split_and_load(data, [ctx[0]])
315 |                 else:
316 |                     data = split_and_load(data, ctx)
317 |                 for chunk in data:
318 |                     _, token_ids, slot_label, intent_label, valid_length = chunk
319 | 
320 |                     log_num += len(token_ids)
321 |                     total_num += len(token_ids)
322 | 
323 |                     # forward computation
324 |                     intent_pred, slot_pred = model(token_ids, valid_length)
325 |                     ls = icsl_loss_function(intent_pred, slot_pred, intent_label, slot_label, valid_length - 2).mean()
326 | 
327 |                     if accumulate:
328 |                         ls = ls / accumulate
329 |                     ls.backward()
330 |                     step_loss += ls.asscalar()
331 | 
332 |             # update
333 |             if not accumulate or (batch_id + 1) % accumulate == 0:
334 |                 trainer.allreduce_grads()
335 |                 nlp.utils.clip_grad_global_norm(params, 1)
336 |                 trainer.update(1, ignore_stale_grad=True)
337 | 
338 |             if (batch_id + 1) % log_interval == 0:
339 |                 toc = time.time()
340 |                 # update metrics
341 |                 ic_metric.update([intent_label], [intent_pred])
342 |                 sl_metric.update(*process_seq_labels(slot_label, slot_pred, ignore_id=pad_label_id))
343 |                 log.info('Epoch: {}, Batch: {}/{}, speed: {:.2f} samples/s, lr={:.7f}, loss={:.4f}, intent acc={:.3f}, slot acc={:.3f}'
344 |                          .format(epoch_id,
345 |                                  batch_id,
346 |                                  len(train_dataloader),
347 |                                  log_num / (toc - tic),
348 |                                  trainer.learning_rate,
349 |                                  step_loss / log_interval,
350 |                                  ic_metric.get()[1],
351 |                                  sl_metric.get()[1]))
352 |                 tic = time.time()
353 |                 step_loss = 0
354 |                 log_num = 0
355 | 
356 |         mx.nd.waitall()
357 |         epoch_toc = time.time()
358 |         log.info('Time cost: {:.2f} s, Speed: {:.2f} samples/s'
359 |                  .format(epoch_toc - epoch_tic, total_num/(epoch_toc - epoch_tic)))
360 |         model.save_parameters(os.path.join(model_dir, model_name + '.params'))
361 | 
362 | 
363 | def evaluate(model=None, model_name='', eval_input=''):
364 |     """Evaluate the model on validation dataset.
365 |     """
366 |     ## Load model
367 |     bert, vocabulary = nlp.model.get_model('bert_12_768_12',
368 |                                            dataset_name='wiki_multilingual_uncased',
369 |                                            pretrained=True,
370 |                                            ctx=ctx,
371 |                                            use_pooler=False,
372 |                                            use_decoder=False,
373 |                                            use_classifier=False)
374 |     if model is None:
375 |         assert model_name != ''
376 |         model = MultiTaskICSL(bert, len(vocabulary), num_slot_labels=len(label2idx), num_intents=len(intent2idx))
377 |         model.initialize(ctx=ctx)
378 |         model.hybridize(static_alloc=True)
379 |         model.load_parameters(os.path.join(model_dir, model_name + '.params'))
380 | 
381 |     idx2label = {}
382 |     for label, idx in label2idx.items():
383 |         idx2label[idx] = label
384 |     ## Load dev dataset
385 |     field_separator = nlp.data.Splitter('\t')
386 |     field_indices = [1, 3, 4, 0]
387 |     eval_data = nlp.data.TSVDataset(filename=eval_input,
388 |                                     field_separator=field_separator,
389 |                                     num_discard_samples=1,
390 |                                     field_indices=field_indices)
391 | 
392 |     bert_tokenizer = nlp.data.BERTTokenizer(vocabulary, lower=True)
393 | 
394 |     dev_alignment = {}
395 |     eval_data_transform = []
396 |     for sample in eval_data:
397 |         sample, alignment = icsl_transform(sample, vocabulary, label2idx, intent2idx, bert_tokenizer)
398 |         eval_data_transform += [sample]
399 |         dev_alignment[sample[0]] = alignment
400 |     log.info('The number of examples after preprocessing: {}'
401 |              .format(len(eval_data_transform)))
402 | 
403 |     test_batch_size = 16
404 |     pad_token_id = vocabulary[PAD]
405 |     pad_label_id = label2idx[PAD]
406 |     batchify_fn = nlp.data.batchify.Tuple(
407 |         nlp.data.batchify.Stack(),
408 |         nlp.data.batchify.Pad(axis=0, pad_val=pad_token_id),
409 |         nlp.data.batchify.Pad(axis=0, pad_val=pad_label_id),
410 |         nlp.data.batchify.Stack('float32'),
411 |         nlp.data.batchify.Stack('float32'))
412 |     eval_dataloader = mx.gluon.data.DataLoader(
413 |         eval_data_transform,
414 |         batchify_fn=batchify_fn,
415 |         num_workers=4, batch_size=test_batch_size, shuffle=False, last_batch='keep')
416 | 
417 |     _Result = collections.namedtuple(
418 |         '_Result', ['intent', 'slot_labels'])
419 |     all_results = {}
420 | 
421 |     total_num = 0
422 |     for data in eval_dataloader:
423 |         example_ids, token_ids, _, _, valid_length = data
424 |         total_num += len(token_ids)
425 |         # load data to GPU
426 |         token_ids = token_ids.astype('float32').as_in_context(ctx[0])
427 |         valid_length = valid_length.astype('float32').as_in_context(ctx[0])
428 | 
429 |         # forward computation
430 |         intent_pred, slot_pred = model(token_ids, valid_length)
431 |         intent_pred = intent_pred.asnumpy()
432 |         slot_pred = slot_pred.asnumpy()
433 |         valid_length = valid_length.asnumpy()
434 | 
435 |         for eid, y_intent, y_slot, length in zip(example_ids, intent_pred, slot_pred, valid_length):
436 |             eid = eid.asscalar()
437 |             length = int(length) - 2
438 |             intent_id = y_intent.argmax(axis=-1)
439 |             slot_ids = y_slot.argmax(axis=-1).tolist()[:length]
440 |             slot_names = [idx2label[idx] for idx in slot_ids]
441 |             merged_slot_names = merge_slots(slot_names, dev_alignment[eid] + [length])
442 |             if eid not in all_results:
443 |                 all_results[eid] = _Result(intent_id, merged_slot_names)
444 | 
445 |     example_ids, utterances, labels, intents = load_tsv(eval_input)
446 |     pred_intents = []
447 |     label_intents = []
448 |     for eid, intent in zip(example_ids, intents):
449 |         label_intents.append(label2index(intent2idx, intent))
450 |         pred_intents.append(all_results[eid].intent)
451 |     intent_acc = sklearn.metrics.accuracy_score(label_intents, pred_intents)
452 |     log.info("Intent Accuracy: %.4f" % intent_acc)
453 | 
454 |     pred_icsl = []
455 |     label_icsl = []
456 |     for eid, intent, slot_labels in zip(example_ids, intents, labels):
457 |         label_icsl.append(str(label2index(intent2idx, intent)) + ' ' + ' '.join(slot_labels))
458 |         pred_icsl.append(str(all_results[eid].intent) + ' ' + ' '.join(all_results[eid].slot_labels))
459 |     exact_match = sklearn.metrics.accuracy_score(label_icsl, pred_icsl)
460 |     log.info("Exact Match: %.4f" % exact_match)
461 | 
462 |     with open(conll_prediction_file, "w") as fw:
463 |         for eid, utterance, labels in zip(example_ids, utterances, labels):
464 |             preds = all_results[eid].slot_labels
465 |             for w, l, p in zip(utterance, labels, preds):
466 |                 fw.write(' '.join([w, l, p]) + '\n')
467 |             fw.write('\n')
468 |     proc = subprocess.Popen(["perl", "conlleval.pl"], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
469 |     with open(conll_prediction_file) as f:
470 |         stdout = proc.communicate(f.read().encode())[0]
471 |     result = stdout.decode('utf-8').split('\n')[1]
472 |     slot_f1 = float(result.split()[-1].strip())
473 |     log.info("Slot Labeling: %s" % result)
474 |     return intent_acc, slot_f1
475 | 
476 | 
477 | # extract labels
478 | train_input = data_dir + 'atis_train.tsv'
479 | intent2idx, label2idx = get_label_indices(train_input)
480 | 
481 | for lang in ['ES', 'DE', 'ZH', 'JA', 'PT', 'FR', 'HI', 'TR']:
482 |     log.info('Train on %s:' % lang)
483 |     model_name = 'model_bert_align_' + lang + '.' + str(random_seed)
484 |     train_input = data_dir + 'atis_train.tsv'
485 |     para_input = data_dir + 'train_para_' + lang + '.tsv'
486 |     train(model_name, train_input, para_input)
487 | 
488 | for lang in ['ES', 'DE', 'ZH', 'JA', 'PT', 'FR', 'HI', 'TR']:
489 |     log.info('Evaluate on %s:' % lang)
490 |     model_name = 'model_bert_align_' + lang + '.' + str(random_seed)
491 |     test_input = data_dir + 'atis_test_' + lang + '.tsv'
492 |     evaluate(model_name=model_name, eval_input=test_input)
493 | 


--------------------------------------------------------------------------------
/code/scripts/conlleval.pl:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/perl -w
  2 | # conlleval: evaluate result of processing CoNLL-2000 shared task
  3 | # usage:     conlleval [-l] [-r] [-d delimiterTag] [-o oTag] < file
  4 | #            README: http://cnts.uia.ac.be/conll2000/chunking/output.html
  5 | # options:   l: generate LaTeX output for tables like in
  6 | #               http://cnts.uia.ac.be/conll2003/ner/example.tex
  7 | #            r: accept raw result tags (without B- and I- prefix;
  8 | #                                       assumes one word per chunk)
  9 | #            d: alternative delimiter tag (default is single space)
 10 | #            o: alternative outside tag (default is O)
 11 | # note:      the file should contain lines with items separated
 12 | #            by $delimiter characters (default space). The final
 13 | #            two items should contain the correct tag and the
 14 | #            guessed tag in that order. Sentences should be
 15 | #            separated from each other by empty lines or lines
 16 | #            with $boundary fields (default -X-).
 17 | # url:       http://lcg-www.uia.ac.be/conll2000/chunking/
 18 | # started:   1998-09-25
 19 | # version:   2004-01-26
 20 | # author:    Erik Tjong Kim Sang <erikt@uia.ua.ac.be>
 21 | 
 22 | use strict;
 23 | 
 24 | my $false = 0;
 25 | my $true = 42;
 26 | 
 27 | my $boundary = "-X-";     # sentence boundary
 28 | my $correct;              # current corpus chunk tag (I,O,B)
 29 | my $correctChunk = 0;     # number of correctly identified chunks
 30 | my $correctTags = 0;      # number of correct chunk tags
 31 | my $correctType;          # type of current corpus chunk tag (NP,VP,etc.)
 32 | my $delimiter = " ";      # field delimiter
 33 | my $FB1 = 0.0;            # FB1 score (Van Rijsbergen 1979)
 34 | my $firstItem;            # first feature (for sentence boundary checks)
 35 | my $foundCorrect = 0;     # number of chunks in corpus
 36 | my $foundGuessed = 0;     # number of identified chunks
 37 | my $guessed;              # current guessed chunk tag
 38 | my $guessedType;          # type of current guessed chunk tag
 39 | my $i;                    # miscellaneous counter
 40 | my $inCorrect = $false;   # currently processed chunk is correct until now
 41 | my $lastCorrect = "O";    # previous chunk tag in corpus
 42 | my $latex = 0;            # generate LaTeX formatted output
 43 | my $lastCorrectType = ""; # type of previously identified chunk tag
 44 | my $lastGuessed = "O";    # previously identified chunk tag
 45 | my $lastGuessedType = ""; # type of previous chunk tag in corpus
 46 | my $lastType;             # temporary storage for detecting duplicates
 47 | my $line;                 # line
 48 | my $nbrOfFeatures = -1;   # number of features per line
 49 | my $precision = 0.0;      # precision score
 50 | my $oTag = "O";           # outside tag, default O
 51 | my $raw = 0;              # raw input: add B to every token
 52 | my $recall = 0.0;         # recall score
 53 | my $tokenCounter = 0;     # token counter (ignores sentence breaks)
 54 | 
 55 | my %correctChunk = ();    # number of correctly identified chunks per type
 56 | my %foundCorrect = ();    # number of chunks in corpus per type
 57 | my %foundGuessed = ();    # number of identified chunks per type
 58 | 
 59 | my @features;             # features on line
 60 | my @sortedTypes;          # sorted list of chunk type names
 61 | 
 62 | # sanity check
 63 | while (@ARGV and $ARGV[0] =~ /^-/) {
 64 |    if ($ARGV[0] eq "-l") { $latex = 1; shift(@ARGV); }
 65 |    elsif ($ARGV[0] eq "-r") { $raw = 1; shift(@ARGV); }
 66 |    elsif ($ARGV[0] eq "-d") {
 67 |       shift(@ARGV);
 68 |       if (not defined $ARGV[0]) {
 69 |          die "conlleval: -d requires delimiter character";
 70 |       }
 71 |       $delimiter = shift(@ARGV);
 72 |    } elsif ($ARGV[0] eq "-o") {
 73 |       shift(@ARGV);
 74 |       if (not defined $ARGV[0]) {
 75 |          die "conlleval: -o requires delimiter character";
 76 |       }
 77 |       $oTag = shift(@ARGV);
 78 |    } else { die "conlleval: unknown argument $ARGV[0]\n"; }
 79 | }
 80 | if (@ARGV) { die "conlleval: unexpected command line argument\n"; }
 81 | # process input
 82 | while (<STDIN>) {
 83 |    chomp($line = $_);
 84 |    @features = split(/$delimiter/,$line);
 85 |    if ($nbrOfFeatures < 0) { $nbrOfFeatures = $#features; }
 86 |    elsif ($nbrOfFeatures != $#features and @features != 0) {
 87 |       printf STDERR "unexpected number of features: %d (%d)\n",
 88 |          $#features+1,$nbrOfFeatures+1;
 89 |       exit(1);
 90 |    }
 91 |    if (@features == 0 or
 92 |        $features[0] eq $boundary) { @features = ($boundary,"O","O"); }
 93 |    if (@features < 2) {
 94 |       die "conlleval: unexpected number of features in line $line\n";
 95 |    }
 96 |    if ($raw) {
 97 |       if ($features[$#features] eq $oTag) { $features[$#features] = "O"; }
 98 |       if ($features[$#features-1] eq $oTag) { $features[$#features-1] = "O"; }
 99 |       if ($features[$#features] ne "O") {
100 |          $features[$#features] = "B-$features[$#features]";
101 |       }
102 |       if ($features[$#features-1] ne "O") {
103 |          $features[$#features-1] = "B-$features[$#features-1]";
104 |       }
105 |    }
106 |    # 20040126 ET code which allows hyphens in the types
107 |    if ($features[$#features] =~ /^([^-]*)-(.*)$/) {
108 |       $guessed = $1;
109 |       $guessedType = $2;
110 |    } else {
111 |       $guessed = $features[$#features];
112 |       $guessedType = "";
113 |    }
114 |    pop(@features);
115 |    if ($features[$#features] =~ /^([^-]*)-(.*)$/) {
116 |       $correct = $1;
117 |       $correctType = $2;
118 |    } else {
119 |       $correct = $features[$#features];
120 |       $correctType = "";
121 |    }
122 |    pop(@features);
123 | #  ($guessed,$guessedType) = split(/-/,pop(@features));
124 | #  ($correct,$correctType) = split(/-/,pop(@features));
125 |    $guessedType = $guessedType ? $guessedType : "";
126 |    $correctType = $correctType ? $correctType : "";
127 |    $firstItem = shift(@features);
128 | 
129 |    # 1999-06-26 sentence breaks should always be counted as out of chunk
130 |    if ( $firstItem eq $boundary ) { $guessed = "O"; }
131 | 
132 |    if ($inCorrect) {
133 |       if ( &endOfChunk($lastCorrect,$correct,$lastCorrectType,$correctType) and
134 |            &endOfChunk($lastGuessed,$guessed,$lastGuessedType,$guessedType) and
135 |            $lastGuessedType eq $lastCorrectType) {
136 |          $inCorrect=$false;
137 |          $correctChunk++;
138 |          $correctChunk{$lastCorrectType} = $correctChunk{$lastCorrectType} ?
139 |              $correctChunk{$lastCorrectType}+1 : 1;
140 |       } elsif (
141 |            &endOfChunk($lastCorrect,$correct,$lastCorrectType,$correctType) !=
142 |            &endOfChunk($lastGuessed,$guessed,$lastGuessedType,$guessedType) or
143 |            $guessedType ne $correctType ) {
144 |          $inCorrect=$false;
145 |       }
146 |    }
147 | 
148 |    if ( &startOfChunk($lastCorrect,$correct,$lastCorrectType,$correctType) and
149 |         &startOfChunk($lastGuessed,$guessed,$lastGuessedType,$guessedType) and
150 |         $guessedType eq $correctType) { $inCorrect = $true; }
151 | 
152 |    if ( &startOfChunk($lastCorrect,$correct,$lastCorrectType,$correctType) ) {
153 |       $foundCorrect++;
154 |       $foundCorrect{$correctType} = $foundCorrect{$correctType} ?
155 |           $foundCorrect{$correctType}+1 : 1;
156 |    }
157 |    if ( &startOfChunk($lastGuessed,$guessed,$lastGuessedType,$guessedType) ) {
158 |       $foundGuessed++;
159 |       $foundGuessed{$guessedType} = $foundGuessed{$guessedType} ?
160 |           $foundGuessed{$guessedType}+1 : 1;
161 |    }
162 |    if ( $firstItem ne $boundary ) {
163 |       if ( $correct eq $guessed and $guessedType eq $correctType ) {
164 |          $correctTags++;
165 |       }
166 |       $tokenCounter++;
167 |    }
168 | 
169 |    $lastGuessed = $guessed;
170 |    $lastCorrect = $correct;
171 |    $lastGuessedType = $guessedType;
172 |    $lastCorrectType = $correctType;
173 | }
174 | if ($inCorrect) {
175 |    $correctChunk++;
176 |    $correctChunk{$lastCorrectType} = $correctChunk{$lastCorrectType} ?
177 |        $correctChunk{$lastCorrectType}+1 : 1;
178 | }
179 | 
180 | if (not $latex) {
181 |    # compute overall precision, recall and FB1 (default values are 0.0)
182 |    $precision = 100*$correctChunk/$foundGuessed if ($foundGuessed > 0);
183 |    $recall = 100*$correctChunk/$foundCorrect if ($foundCorrect > 0);
184 |    $FB1 = 2*$precision*$recall/($precision+$recall)
185 |       if ($precision+$recall > 0);
186 | 
187 |    # print overall performance
188 |    printf "processed $tokenCounter tokens with $foundCorrect phrases; ";
189 |    printf "found: $foundGuessed phrases; correct: $correctChunk.\n";
190 |    if ($tokenCounter>0) {
191 |       printf "accuracy: %6.2f%%; ",100*$correctTags/$tokenCounter;
192 |       printf "precision: %6.2f%%; ",$precision;
193 |       printf "recall: %6.2f%%; ",$recall;
194 |       printf "FB1: %6.2f\n",$FB1;
195 |    }
196 | }
197 | 
198 | # sort chunk type names
199 | undef($lastType);
200 | @sortedTypes = ();
201 | foreach $i (sort (keys %foundCorrect,keys %foundGuessed)) {
202 |    if (not($lastType) or $lastType ne $i) {
203 |       push(@sortedTypes,($i));
204 |    }
205 |    $lastType = $i;
206 | }
207 | # print performance per chunk type
208 | if (not $latex) {
209 |    for $i (@sortedTypes) {
210 |       $correctChunk{$i} = $correctChunk{$i} ? $correctChunk{$i} : 0;
211 |       if (not($foundGuessed{$i})) { $foundGuessed{$i} = 0; $precision = 0.0; }
212 |       else { $precision = 100*$correctChunk{$i}/$foundGuessed{$i}; }
213 |       if (not($foundCorrect{$i})) { $recall = 0.0; }
214 |       else { $recall = 100*$correctChunk{$i}/$foundCorrect{$i}; }
215 |       if ($precision+$recall == 0.0) { $FB1 = 0.0; }
216 |       else { $FB1 = 2*$precision*$recall/($precision+$recall); }
217 |       printf "%17s: ",$i;
218 |       printf "precision: %6.2f%%; ",$precision;
219 |       printf "recall: %6.2f%%; ",$recall;
220 |       printf "FB1: %6.2f  %d\n",$FB1,$foundGuessed{$i};
221 |    }
222 | } else {
223 |    print "        & Precision &  Recall  & F\$_{\\beta=1} \\\\\\hline";
224 |    for $i (@sortedTypes) {
225 |       $correctChunk{$i} = $correctChunk{$i} ? $correctChunk{$i} : 0;
226 |       if (not($foundGuessed{$i})) { $precision = 0.0; }
227 |       else { $precision = 100*$correctChunk{$i}/$foundGuessed{$i}; }
228 |       if (not($foundCorrect{$i})) { $recall = 0.0; }
229 |       else { $recall = 100*$correctChunk{$i}/$foundCorrect{$i}; }
230 |       if ($precision+$recall == 0.0) { $FB1 = 0.0; }
231 |       else { $FB1 = 2*$precision*$recall/($precision+$recall); }
232 |       printf "\n%-7s &  %6.2f\\%% & %6.2f\\%% & %6.2f \\\\",
233 |              $i,$precision,$recall,$FB1;
234 |    }
235 |    print "\\hline\n";
236 |    $precision = 0.0;
237 |    $recall = 0;
238 |    $FB1 = 0.0;
239 |    $precision = 100*$correctChunk/$foundGuessed if ($foundGuessed > 0);
240 |    $recall = 100*$correctChunk/$foundCorrect if ($foundCorrect > 0);
241 |    $FB1 = 2*$precision*$recall/($precision+$recall)
242 |       if ($precision+$recall > 0);
243 |    printf "Overall &  %6.2f\\%% & %6.2f\\%% & %6.2f \\\\\\hline\n",
244 |           $precision,$recall,$FB1;
245 | }
246 | 
247 | exit 0;
248 | 
249 | # endOfChunk: checks if a chunk ended between the previous and current word
250 | # arguments:  previous and current chunk tags, previous and current types
251 | # note:       this code is capable of handling other chunk representations
252 | #             than the default CoNLL-2000 ones, see EACL'99 paper of Tjong
253 | #             Kim Sang and Veenstra http://xxx.lanl.gov/abs/cs.CL/9907006
254 | 
255 | sub endOfChunk {
256 |    my $prevTag = shift(@_);
257 |    my $tag = shift(@_);
258 |    my $prevType = shift(@_);
259 |    my $type = shift(@_);
260 |    my $chunkEnd = $false;
261 | 
262 |    if ( $prevTag eq "B" and $tag eq "B" ) { $chunkEnd = $true; }
263 |    if ( $prevTag eq "B" and $tag eq "O" ) { $chunkEnd = $true; }
264 |    if ( $prevTag eq "I" and $tag eq "B" ) { $chunkEnd = $true; }
265 |    if ( $prevTag eq "I" and $tag eq "O" ) { $chunkEnd = $true; }
266 | 
267 |    if ( $prevTag eq "E" and $tag eq "E" ) { $chunkEnd = $true; }
268 |    if ( $prevTag eq "E" and $tag eq "I" ) { $chunkEnd = $true; }
269 |    if ( $prevTag eq "E" and $tag eq "O" ) { $chunkEnd = $true; }
270 |    if ( $prevTag eq "I" and $tag eq "O" ) { $chunkEnd = $true; }
271 | 
272 |    if ($prevTag ne "O" and $prevTag ne "." and $prevType ne $type) {
273 |       $chunkEnd = $true;
274 |    }
275 | 
276 |    # corrected 1998-12-22: these chunks are assumed to have length 1
277 |    if ( $prevTag eq "]" ) { $chunkEnd = $true; }
278 |    if ( $prevTag eq "[" ) { $chunkEnd = $true; }
279 | 
280 |    return($chunkEnd);
281 | }
282 | 
283 | # startOfChunk: checks if a chunk started between the previous and current word
284 | # arguments:    previous and current chunk tags, previous and current types
285 | # note:         this code is capable of handling other chunk representations
286 | #               than the default CoNLL-2000 ones, see EACL'99 paper of Tjong
287 | #               Kim Sang and Veenstra http://xxx.lanl.gov/abs/cs.CL/9907006
288 | 
289 | sub startOfChunk {
290 |    my $prevTag = shift(@_);
291 |    my $tag = shift(@_);
292 |    my $prevType = shift(@_);
293 |    my $type = shift(@_);
294 |    my $chunkStart = $false;
295 | 
296 |    if ( $prevTag eq "B" and $tag eq "B" ) { $chunkStart = $true; }
297 |    if ( $prevTag eq "I" and $tag eq "B" ) { $chunkStart = $true; }
298 |    if ( $prevTag eq "O" and $tag eq "B" ) { $chunkStart = $true; }
299 |    if ( $prevTag eq "O" and $tag eq "I" ) { $chunkStart = $true; }
300 | 
301 |    if ( $prevTag eq "E" and $tag eq "E" ) { $chunkStart = $true; }
302 |    if ( $prevTag eq "E" and $tag eq "I" ) { $chunkStart = $true; }
303 |    if ( $prevTag eq "O" and $tag eq "E" ) { $chunkStart = $true; }
304 |    if ( $prevTag eq "O" and $tag eq "I" ) { $chunkStart = $true; }
305 | 
306 |    if ($tag ne "O" and $tag ne "." and $prevType ne $type) {
307 |       $chunkStart = $true;
308 |    }
309 | 
310 |    # corrected 1998-12-22: these chunks are assumed to have length 1
311 |    if ( $tag eq "[" ) { $chunkStart = $true; }
312 |    if ( $tag eq "]" ) { $chunkStart = $true; }
313 | 
314 |    return($chunkStart);
315 | }
316 | 


--------------------------------------------------------------------------------
/code/scripts/layers.py:
--------------------------------------------------------------------------------
  1 | import mxnet as mx
  2 | 
  3 | from gluonnlp.model.attention_cell import AttentionCell, _masked_softmax
  4 | from gluonnlp.model.block import L2Normalization
  5 | from gluonnlp.model.transformer import PositionwiseFFN
  6 | from mxnet.gluon import nn, HybridBlock
  7 | 
  8 | class ScaledDotProductAttentionCell(AttentionCell):
  9 |     """Dot product attention between the query and the key.
 10 | 
 11 |     Depending on parameters, defined as::
 12 | 
 13 |         units is None:
 14 |             score = <h_q, h_k>
 15 |         units is not None and luong_style is False:
 16 |             score = <W_q h_q, W_k h_k>
 17 |         units is not None and luong_style is True:
 18 |             score = <W h_q, h_k>
 19 | 
 20 |     Parameters
 21 |     ----------
 22 |     units: int or None, default None
 23 |         Project the query and key to vectors with `units` dimension
 24 |         before applying the attention. If set to None,
 25 |         the query vector and the key vector are directly used to compute the attention and
 26 |         should have the same dimension::
 27 | 
 28 |             If the units is None,
 29 |                 score = <h_q, h_k>
 30 |             Else if the units is not None and luong_style is False:
 31 |                 score = <W_q h_q, W_k, h_k>
 32 |             Else if the units is not None and luong_style is True:
 33 |                 score = <W h_q, h_k>
 34 | 
 35 |     luong_style: bool, default False
 36 |         If turned on, the score will be::
 37 | 
 38 |             score = <W h_q, h_k>
 39 | 
 40 |         `units` must be the same as the dimension of the key vector
 41 |     scaled: bool, default True
 42 |         Whether to divide the attention weights by the sqrt of the query dimension.
 43 |         This is first proposed in "[NIPS2017] Attention is all you need."::
 44 | 
 45 |             score = <h_q, h_k> / sqrt(dim_q)
 46 | 
 47 |     normalized: bool, default False
 48 |         If turned on, the cosine distance is used, i.e::
 49 | 
 50 |             score = <h_q / ||h_q||, h_k / ||h_k||>
 51 | 
 52 |     use_bias : bool, default True
 53 |         Whether to use bias in the projection layers.
 54 |     dropout : float, default 0.0
 55 |         Attention dropout
 56 |     weight_initializer : str or `Initializer` or None, default None
 57 |         Initializer of the weights
 58 |     bias_initializer : str or `Initializer`, default 'zeros'
 59 |         Initializer of the bias
 60 |     prefix : str or None, default None
 61 |         See document of `Block`.
 62 |     params : str or None, default None
 63 |         See document of `Block`.
 64 |     """
 65 | 
 66 |     def __init__(self, units=None, luong_style=False, scaled=True, normalized=False, use_bias=True,
 67 |                  dropout=0.0, temperature=1.0, weight_initializer=None, bias_initializer='zeros',
 68 |                  prefix=None, params=None):
 69 |         super(ScaledDotProductAttentionCell, self).__init__(prefix=prefix, params=params)
 70 |         self._units = units
 71 |         self._scaled = scaled
 72 |         self._normalized = normalized
 73 |         self._use_bias = use_bias
 74 |         self._luong_style = luong_style
 75 |         self._dropout = dropout
 76 |         self._temperature = temperature
 77 |         if self._luong_style:
 78 |             assert units is not None, 'Luong style attention is not available without explicitly ' \
 79 |                                       'setting the units'
 80 |         with self.name_scope():
 81 |             self._dropout_layer = nn.Dropout(dropout)
 82 |         if units is not None:
 83 |             with self.name_scope():
 84 |                 self._proj_query = nn.Dense(units=self._units, use_bias=self._use_bias,
 85 |                                             flatten=False, weight_initializer=weight_initializer,
 86 |                                             bias_initializer=bias_initializer, prefix='query_')
 87 |                 if not self._luong_style:
 88 |                     self._proj_key = nn.Dense(units=self._units, use_bias=self._use_bias,
 89 |                                               flatten=False, weight_initializer=weight_initializer,
 90 |                                               bias_initializer=bias_initializer, prefix='key_')
 91 |         if self._normalized:
 92 |             with self.name_scope():
 93 |                 self._l2_norm = L2Normalization(axis=-1)
 94 | 
 95 |     def _compute_weight(self, F, query, key, mask=None):
 96 |         if self._units is not None:
 97 |             query = self._proj_query(query)
 98 |             if not self._luong_style:
 99 |                 key = self._proj_key(key)
100 |             elif F == mx.nd:
101 |                 assert query.shape[-1] == key.shape[-1], 'Luong style attention requires key to ' \
102 |                                                          'have the same dim as the projected ' \
103 |                                                          'query. Received key {}, query {}.'.format(
104 |                     key.shape, query.shape)
105 |         if self._normalized:
106 |             query = self._l2_norm(query)
107 |             key = self._l2_norm(key)
108 |         if self._scaled:
109 |             query = F.contrib.div_sqrt_dim(query)
110 | 
111 |         att_score = F.batch_dot(query, key, transpose_b=True) / self._temperature
112 | 
113 |         att_weights = self._dropout_layer(_masked_softmax(F, att_score, mask, self._dtype))
114 |         return att_weights
115 | 
116 | 
117 | class AttentionMapCell(HybridBlock):
118 |     """Structure of the Transformer Decoder Cell.
119 | 
120 |     Parameters
121 |     ----------
122 |     attention_cell : AttentionCell or str, default 'multi_head'
123 |         Arguments of the attention cell.
124 |         Can be 'multi_head', 'scaled_luong', 'scaled_dot', 'dot', 'cosine', 'normed_mlp', 'mlp'
125 |     units : int
126 |         Number of units for the output
127 |     hidden_size : int
128 |         number of units in the hidden layer of position-wise feed-forward networks
129 |     num_heads : int
130 |         Number of heads in multi-head attention
131 |     scaled : bool
132 |         Whether to scale the softmax input by the sqrt of the input dimension
133 |         in multi-head attention
134 |     dropout : float
135 |     use_residual : bool
136 |     output_attention: bool
137 |         Whether to output the attention weights
138 |     weight_initializer : str or Initializer
139 |         Initializer for the input weights matrix, used for the linear
140 |         transformation of the inputs.
141 |     bias_initializer : str or Initializer
142 |         Initializer for the bias vector.
143 |     prefix : str, default 'rnn_'
144 |         Prefix for name of `Block`s
145 |         (and name of weight if params is `None`).
146 |     params : Parameter or None
147 |         Container for weight sharing between cells.
148 |         Created if `None`.
149 |     """
150 |     def __init__(self, units=128, hidden_size=512, dropout=0.0, use_residual=True,
151 |                  attn_temperature=1.0, weight_initializer=None, bias_initializer='zeros',
152 |                  prefix=None, params=None):
153 |         super(AttentionMapCell, self).__init__(prefix=prefix, params=params)
154 |         self._units = units
155 |         self._dropout = dropout
156 |         with self.name_scope():
157 |             if dropout:
158 |                 self.dropout_layer = nn.Dropout(rate=dropout)
159 |             self.attention_cell = ScaledDotProductAttentionCell(temperature=attn_temperature,
160 |                                                                 scaled=True,
161 |                                                                 normalized=False)
162 |             self.proj_layer = nn.Dense(units=units, flatten=False,
163 |                                        use_bias=False,
164 |                                        weight_initializer=weight_initializer,
165 |                                        bias_initializer=bias_initializer,
166 |                                        prefix='proj_inter_')
167 |             self.ffn = PositionwiseFFN(hidden_size=hidden_size,
168 |                                        units=units,
169 |                                        use_residual=use_residual,
170 |                                        dropout=dropout,
171 |                                        weight_initializer=weight_initializer,
172 |                                        bias_initializer=bias_initializer)
173 | 
174 |             self.layer_norm = nn.LayerNorm()
175 | 
176 |     def hybrid_forward(self, F, inputs, mem_value, mem_mask=None):  #pylint: disable=unused-argument
177 |         #  pylint: disable=arguments-differ
178 |         """Transformer Decoder Attention Cell.
179 | 
180 |         Parameters
181 |         ----------
182 |         inputs : Symbol or NDArray
183 |             Input sequence. Shape (batch_size, length, C_in)
184 |         mem_value : Symbol or NDArrays
185 |             Memory value, i.e. output of the encoder. Shape (batch_size, mem_length, C_in)
186 |         mem_mask : Symbol or NDArray or None
187 |             Mask for mem_value. Shape (batch_size, length, mem_length)
188 | 
189 |         Returns
190 |         -------
191 |         decoder_cell_outputs: list
192 |             Outputs of the decoder cell. Contains:
193 | 
194 |             - outputs of the transformer decoder cell. Shape (batch_size, length, C_out)
195 |             - additional_outputs of all the transformer decoder cell
196 |         """
197 |         attention_outputs, attention_weights = \
198 |             self.attention_cell(inputs, mem_value, mem_value, mem_mask)
199 |         outputs = self.proj_layer(attention_outputs)
200 |         if self._dropout:
201 |             outputs = self.dropout_layer(outputs)
202 |         outputs = self.layer_norm(outputs)
203 |         outputs = self.ffn(outputs)
204 |         return outputs, attention_outputs
205 | 


--------------------------------------------------------------------------------
/code/scripts/loss.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | 
 3 | from mxnet.gluon.loss import Loss, SoftmaxCELoss
 4 | 
 5 | class SoftmaxCEMaskedLoss(SoftmaxCELoss):
 6 |     """Wrapper of the SoftmaxCELoss that supports valid_length as the input
 7 | 
 8 |     """
 9 |     def hybrid_forward(self, F, pred, label, valid_length): # pylint: disable=arguments-differ
10 |         """
11 |         Parameters
12 |         ----------
13 |         F
14 |         pred : Symbol or NDArray
15 |             Shape (batch_size, length, V)
16 |         label : Symbol or NDArray
17 |             Shape (batch_size, length)
18 |         valid_length : Symbol or NDArray
19 |             Shape (batch_size, )
20 |         Returns
21 |         -------
22 |         loss : Symbol or NDArray
23 |             Shape (batch_size)
24 |         """
25 |         if self._sparse_label:
26 |             sample_weight = F.cast(F.expand_dims(F.ones_like(label), axis=-1), dtype=np.float32)
27 |         else:
28 |             sample_weight = F.ones_like(label)
29 | 
30 |         sample_weight = F.SequenceMask(sample_weight,
31 |                                        sequence_length=valid_length,
32 |                                        use_sequence_length=True,
33 |                                        axis=1)
34 | 
35 |         return super(SoftmaxCEMaskedLoss, self).hybrid_forward(F, pred, label, sample_weight)
36 | 
37 | 
38 | class ICSLLoss(Loss):
39 |     """Loss for IC/SL task.
40 | 
41 |     """
42 | 
43 |     def __init__(self, sparse_label=True, weight=None, batch_axis=0, **kwargs):  # pylint: disable=unused-argument
44 |         super(ICSLLoss, self).__init__(
45 |             weight=weight, batch_axis=batch_axis, **kwargs)
46 |         self.ce_loss = SoftmaxCELoss()
47 |         self.masked_ce_loss = SoftmaxCEMaskedLoss(sparse_label=sparse_label)
48 | 
49 |     def hybrid_forward(self, F, intent_pred, slot_pred, intent_label, slot_label, valid_length):  # pylint: disable=arguments-differ
50 |         """
51 |         Parameters
52 |         ----------
53 |         intent_pred : intent prediction, shape (batch_size, num_intents)
54 |         slot_pred : slot prediction, shape (batch_size, seq_length, num_slot_labels)
55 |         intent_label : intent label, shape (batch_size)
56 |         slot_label: slot label, shape (batch_size, seq_length)
57 | 
58 |         Returns
59 |         -------
60 |         outputs : NDArray
61 |             Shape (batch_size)
62 |         """
63 |         intent_loss = self.ce_loss(intent_pred, intent_label)
64 |         slot_loss = self.masked_ce_loss(slot_pred, slot_label, valid_length)
65 |         return intent_loss + slot_loss
66 | 


--------------------------------------------------------------------------------
/code/scripts/lstm_alone.py:
--------------------------------------------------------------------------------
  1 | import collections
  2 | import gluonnlp as nlp
  3 | import logging
  4 | import mxnet as mx
  5 | import os
  6 | import sklearn.metrics
  7 | import subprocess
  8 | import sys
  9 | import time
 10 | import warnings
 11 | 
 12 | from bert import *
 13 | from mxnet import gluon
 14 | from mxnet.gluon import Block, nn, rnn
 15 | 
 16 | from loss import *
 17 | from utils import *
 18 | 
 19 | random_seed = int(sys.argv[1])
 20 | warnings.filterwarnings('ignore')
 21 | data_dir = "../data/"
 22 | model_dir = "../exp/"
 23 | conll_prediction_file = data_dir + "conll.pred"
 24 | 
 25 | PAD = '[PAD]'
 26 | mx.random.seed(random_seed)
 27 | ctx = [mx.gpu(0)]
 28 | 
 29 | log = logging.getLogger('gluonnlp')
 30 | log.setLevel(logging.DEBUG)
 31 | formatter = logging.Formatter(fmt='[%(levelname)s] %(name)s:%(asctime)s %(message)s', datefmt='%H:%M:%S')
 32 | fh = logging.FileHandler(os.path.join(model_dir, 'lstm.' + str(random_seed) + '.log'), mode='w')
 33 | fh.setLevel(logging.INFO)
 34 | fh.setFormatter(formatter)
 35 | console = logging.StreamHandler()
 36 | console.setLevel(logging.INFO)
 37 | console.setFormatter(formatter)
 38 | log.addHandler(console)
 39 | log.addHandler(fh)
 40 | 
 41 | class ICSL(Block):
 42 |     """Model for IC/SL task.
 43 | 
 44 |     The model feeds token ids into a biLSTM to get the sequence
 45 |     representations, then apply two dense layers for IC/SL task.
 46 |     """
 47 | 
 48 |     def __init__(self, vocab_size, num_slot_labels, num_intents, embed_size=256, rnn_hidden_size=128, rnn_layers=1, rnn_dropout=.1, embed_dropout=.1, prefix=None, params=None):
 49 |         super(ICSL, self).__init__(prefix=prefix, params=params)
 50 |         with self.name_scope():
 51 |             # Embedding
 52 |             self.word_embed = nn.Embedding(input_dim=vocab_size, output_dim=embed_size)
 53 |             self.embed_dropout = nn.Dropout(rate=embed_dropout)
 54 |             # RNN encoder
 55 |             self.rnn = rnn.LSTM(rnn_hidden_size, num_layers=rnn_layers, bidirectional=True, dropout=rnn_dropout)
 56 |             # IC/SL classifier
 57 |             self.slot_classifier = nn.Dense(units=num_slot_labels,
 58 |                                             flatten=False)
 59 |             self.intent_classifier = nn.Dense(units=num_intents,
 60 |                                               flatten=False)
 61 | 
 62 |     def forward(self, inputs):  # pylint: disable=arguments-differ
 63 |         """Generate unnormalized scores for the given input sequences.
 64 | 
 65 |         Parameters
 66 |         ----------
 67 |         inputs : NDArray, shape (batch_size, seq_length)
 68 |             Input words for the sequences.
 69 | 
 70 |         Returns
 71 |         -------
 72 |         intent_prediction: NDArray
 73 |             Shape (batch_size, num_intents)
 74 |         slot_prediction : NDArray
 75 |             Shape (batch_size, seq_length, num_slot_labels)
 76 |         """
 77 |         # embed: (batch_size, seq_length, embed_size)
 78 |         embed = self.word_embed(inputs)
 79 |         embed = self.embed_dropout(embed)
 80 |         # hidden: (seq_length, batch_size, rnn_hidden_size * 2)
 81 |         hidden = self.rnn(embed.swapaxes(0, 1))
 82 |         # hidden: (batch_size, seq_length, rnn_hidden_size * 2)
 83 |         hidden = hidden.swapaxes(0, 1)
 84 |         # get intent and slot label predictions
 85 |         intent_prediction = self.intent_classifier(hidden[:, 0, :])
 86 |         slot_prediction = self.slot_classifier(hidden[:, 1:, :])
 87 |         return intent_prediction, slot_prediction
 88 | 
 89 | 
 90 | def train(model_name, train_input, dev_input):
 91 |     """Training function."""
 92 |     ## Arguments
 93 |     log_interval = 100
 94 |     batch_size = 32
 95 |     lr = 1e-3
 96 |     optimizer = 'adam'
 97 |     accumulate = None
 98 |     epochs = 20
 99 | 
100 |     ## Load BERT model and vocabulary
101 |     bert, vocabulary = nlp.model.get_model('bert_12_768_12',
102 |                                            dataset_name='wiki_multilingual_uncased',
103 |                                            pretrained=True,
104 |                                            ctx=ctx,
105 |                                            use_pooler=False,
106 |                                            use_decoder=False,
107 |                                            use_classifier=False)
108 | 
109 |     model = ICSL(len(vocabulary), num_slot_labels=len(label2idx), num_intents=len(intent2idx))
110 |     model.initialize(init=mx.init.Uniform(0.1), ctx=ctx)
111 |     model.hybridize(static_alloc=True)
112 | 
113 |     icsl_loss_function = ICSLLoss()
114 |     icsl_loss_function.hybridize(static_alloc=True)
115 | 
116 |     ic_metric = mx.metric.Accuracy()
117 |     sl_metric = mx.metric.Accuracy()
118 | 
119 |     ## Load labeled data
120 |     field_separator = nlp.data.Splitter('\t')
121 |     # fields to select from the file: utterance, slot labels, intent, uid
122 |     field_indices = [1, 3, 4, 0]
123 |     train_data = nlp.data.TSVDataset(filename=train_input,
124 |                                      field_separator=field_separator,
125 |                                      num_discard_samples=1,
126 |                                      field_indices=field_indices)
127 | 
128 |     # use the vocabulary from pre-trained model for tokenization
129 |     bert_tokenizer = nlp.data.BERTTokenizer(vocabulary, lower=True)
130 |     train_data_transform = train_data.transform(fn=lambda x: icsl_transform(x, vocabulary, label2idx, intent2idx, bert_tokenizer)[0])
131 |     # create data loader
132 |     pad_token_id = vocabulary[PAD]
133 |     pad_label_id = label2idx[PAD]
134 |     batchify_fn = nlp.data.batchify.Tuple(
135 |         nlp.data.batchify.Stack(),
136 |         nlp.data.batchify.Pad(axis=0, pad_val=pad_token_id),
137 |         nlp.data.batchify.Pad(axis=0, pad_val=pad_label_id),
138 |         nlp.data.batchify.Stack('float32'),
139 |         nlp.data.batchify.Stack('float32'))
140 |     train_sampler = nlp.data.FixedBucketSampler(lengths=[len(item[1]) for item in train_data_transform],
141 |                                                 batch_size=batch_size,
142 |                                                 shuffle=True)
143 |     train_dataloader = mx.gluon.data.DataLoader(train_data_transform,
144 |                                                 batchify_fn=batchify_fn,
145 |                                                 batch_sampler=train_sampler)
146 | 
147 |     optimizer_params = {'learning_rate': lr}
148 |     trainer = gluon.Trainer(model.collect_params(), optimizer,
149 |                             optimizer_params, update_on_kvstore=False)
150 | 
151 |     # Collect differentiable parameters
152 |     params = [p for p in model.collect_params().values() if p.grad_req != 'null']
153 |     # Set grad_req if gradient accumulation is required
154 |     if accumulate:
155 |         for p in params:
156 |             p[1].grad_req = 'add'
157 | 
158 |     epoch_tic = time.time()
159 |     total_num = 0
160 |     log_num = 0
161 |     best_score = (0, 0)
162 |     for epoch_id in range(epochs):
163 |         step_loss = 0
164 |         tic = time.time()
165 |         # train on labeled data
166 |         for batch_id, data in enumerate(train_dataloader):
167 |             # forward and backward
168 |             with mx.autograd.record():
169 |                 if data[0].shape[0] < len(ctx):
170 |                     data = split_and_load(data, [ctx[0]])
171 |                 else:
172 |                     data = split_and_load(data, ctx)
173 |                 for chunk in data:
174 |                     _, token_ids, slot_label, intent_label, valid_length = chunk
175 | 
176 |                     log_num += len(token_ids)
177 |                     total_num += len(token_ids)
178 | 
179 |                     # forward computation
180 |                     intent_pred, slot_pred = model(token_ids)
181 |                     ls = icsl_loss_function(intent_pred, slot_pred, intent_label, slot_label, valid_length - 2).mean()
182 | 
183 |                     if accumulate:
184 |                         ls = ls / accumulate
185 |                     ls.backward()
186 |                     step_loss += ls.asscalar()
187 | 
188 |             # update
189 |             if not accumulate or (batch_id + 1) % accumulate == 0:
190 |                 trainer.allreduce_grads()
191 |                 nlp.utils.clip_grad_global_norm(params, 1)
192 |                 trainer.update(1, ignore_stale_grad=True)
193 | 
194 |             if (batch_id + 1) % log_interval == 0:
195 |                 toc = time.time()
196 |                 # update metrics
197 |                 ic_metric.update([intent_label], [intent_pred])
198 |                 sl_metric.update(*process_seq_labels(slot_label, slot_pred, ignore_id=pad_label_id))
199 |                 log.info('Epoch: {}, Batch: {}/{}, speed: {:.2f} samples/s, lr={:.7f}, loss={:.4f}, intent acc={:.3f}, slot acc={:.3f}'
200 |                          .format(epoch_id,
201 |                                  batch_id,
202 |                                  len(train_dataloader),
203 |                                  log_num / (toc - tic),
204 |                                  trainer.learning_rate,
205 |                                  step_loss / log_interval,
206 |                                  ic_metric.get()[1],
207 |                                  sl_metric.get()[1]))
208 | 
209 |                 tic = time.time()
210 |                 step_loss = 0
211 |                 log_num = 0
212 | 
213 |         mx.nd.waitall()
214 |         epoch_toc = time.time()
215 |         log.info('Time cost: {:.2f} s, Speed: {:.2f} samples/s'
216 |                  .format(epoch_toc - epoch_tic, total_num/(epoch_toc - epoch_tic)))
217 | 
218 |         # evaluate on development set
219 |         log.info('Evaluate on development set:')
220 |         intent_acc, slot_f1 = evaluate(model=model, eval_input=dev_input)
221 |         if slot_f1 > best_score[1]:
222 |             best_score = (intent_acc, slot_f1)
223 |             model.save_parameters(os.path.join(model_dir, model_name + '.params'))
224 | 
225 | 
226 | def evaluate(model=None, model_name='', eval_input=''):
227 |     """Evaluate the model on validation dataset.
228 |     """
229 |     ## Load model
230 |     bert, vocabulary = nlp.model.get_model('bert_12_768_12',
231 |                                            dataset_name='wiki_multilingual_uncased',
232 |                                            pretrained=True,
233 |                                            ctx=ctx,
234 |                                            use_pooler=False,
235 |                                            use_decoder=False,
236 |                                            use_classifier=False)
237 |     if model is None:
238 |         assert model_name != ''
239 |         model = ICSL(len(vocabulary), num_slot_labels=len(label2idx), num_intents=len(intent2idx))
240 |         model.initialize(ctx=ctx)
241 |         model.hybridize(static_alloc=True)
242 |         model.load_parameters(os.path.join(model_dir, model_name + '.params'))
243 | 
244 |     idx2label = {}
245 |     for label, idx in label2idx.items():
246 |         idx2label[idx] = label
247 |     ## Load dev dataset
248 |     field_separator = nlp.data.Splitter('\t')
249 |     field_indices = [1, 3, 4, 0]
250 |     eval_data = nlp.data.TSVDataset(filename=eval_input,
251 |                                     field_separator=field_separator,
252 |                                     num_discard_samples=1,
253 |                                     field_indices=field_indices)
254 | 
255 |     bert_tokenizer = nlp.data.BERTTokenizer(vocabulary, lower=True)
256 | 
257 |     dev_alignment = {}
258 |     eval_data_transform = []
259 |     for sample in eval_data:
260 |         sample, alignment = icsl_transform(sample, vocabulary, label2idx, intent2idx, bert_tokenizer)
261 |         eval_data_transform += [sample]
262 |         dev_alignment[sample[0]] = alignment
263 |     log.info('The number of examples after preprocessing: {}'
264 |              .format(len(eval_data_transform)))
265 | 
266 |     test_batch_size = 16
267 |     pad_token_id = vocabulary[PAD]
268 |     pad_label_id = label2idx[PAD]
269 |     batchify_fn = nlp.data.batchify.Tuple(
270 |         nlp.data.batchify.Stack(),
271 |         nlp.data.batchify.Pad(axis=0, pad_val=pad_token_id),
272 |         nlp.data.batchify.Pad(axis=0, pad_val=pad_label_id),
273 |         nlp.data.batchify.Stack('float32'),
274 |         nlp.data.batchify.Stack('float32'))
275 |     eval_dataloader = mx.gluon.data.DataLoader(
276 |         eval_data_transform,
277 |         batchify_fn=batchify_fn,
278 |         num_workers=4, batch_size=test_batch_size, shuffle=False, last_batch='keep')
279 | 
280 |     _Result = collections.namedtuple(
281 |         '_Result', ['intent', 'slot_labels'])
282 |     all_results = {}
283 | 
284 |     total_num = 0
285 |     for data in eval_dataloader:
286 |         example_ids, token_ids, _, _, valid_length = data
287 |         total_num += len(token_ids)
288 |         # load data to GPU
289 |         token_ids = token_ids.astype('float32').as_in_context(ctx[0])
290 |         valid_length = valid_length.astype('float32').as_in_context(ctx[0])
291 | 
292 |         # forward computation
293 |         intent_pred, slot_pred = model(token_ids)
294 |         intent_pred = intent_pred.asnumpy()
295 |         slot_pred = slot_pred.asnumpy()
296 |         valid_length = valid_length.asnumpy()
297 | 
298 |         for eid, y_intent, y_slot, length in zip(example_ids, intent_pred, slot_pred, valid_length):
299 |             eid = eid.asscalar()
300 |             length = int(length) - 2
301 |             intent_id = y_intent.argmax(axis=-1)
302 |             slot_ids = y_slot.argmax(axis=-1).tolist()[:length]
303 |             slot_names = [idx2label[idx] for idx in slot_ids]
304 |             merged_slot_names = merge_slots(slot_names, dev_alignment[eid] + [length])
305 |             if eid not in all_results:
306 |                 all_results[eid] = _Result(intent_id, merged_slot_names)
307 | 
308 |     example_ids, utterances, labels, intents = load_tsv(eval_input)
309 |     pred_intents = []
310 |     label_intents = []
311 |     for eid, intent in zip(example_ids, intents):
312 |         label_intents.append(label2index(intent2idx, intent))
313 |         pred_intents.append(all_results[eid].intent)
314 |     intent_acc = sklearn.metrics.accuracy_score(label_intents, pred_intents)
315 |     log.info("Intent Accuracy: %.4f" % intent_acc)
316 | 
317 |     pred_icsl = []
318 |     label_icsl = []
319 |     for eid, intent, slot_labels in zip(example_ids, intents, labels):
320 |         label_icsl.append(str(label2index(intent2idx, intent)) + ' ' + ' '.join(slot_labels))
321 |         pred_icsl.append(str(all_results[eid].intent) + ' ' + ' '.join(all_results[eid].slot_labels))
322 |     exact_match = sklearn.metrics.accuracy_score(label_icsl, pred_icsl)
323 |     log.info("Exact Match: %.4f" % exact_match)
324 | 
325 |     with open(conll_prediction_file, "w") as fw:
326 |         for eid, utterance, labels in zip(example_ids, utterances, labels):
327 |             preds = all_results[eid].slot_labels
328 |             for w, l, p in zip(utterance, labels, preds):
329 |                 fw.write(' '.join([w, l, p]) + '\n')
330 |             fw.write('\n')
331 |     proc = subprocess.Popen(["perl", "conlleval.pl"], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
332 |     with open(conll_prediction_file) as f:
333 |         stdout = proc.communicate(f.read().encode())[0]
334 |     result = stdout.decode('utf-8').split('\n')[1]
335 |     slot_f1 = float(result.split()[-1].strip())
336 |     log.info("Slot Labeling: %s" % result)
337 |     return intent_acc, slot_f1
338 | 
339 | 
340 | # extract labels
341 | train_input = data_dir + 'atis_train.tsv'
342 | intent2idx, label2idx = get_label_indices(train_input)
343 | 
344 | # Train
345 | for lang in ['EN', 'ES', 'DE', 'ZH', 'JA', 'PT', 'FR', 'HI', 'TR']:
346 |     log.info('Train on %s:' % lang)
347 |     model_name = 'model_lstm_' + lang + '.' + str(random_seed)
348 |     if lang == 'EN':
349 |         train_input = data_dir + 'atis_train.tsv'
350 |         dev_input = data_dir + 'atis_dev.tsv'
351 |     else:
352 |         train_input = data_dir + 'atis_train_' + lang + '.tsv'
353 |         dev_input = data_dir + 'atis_dev_' + lang + '.tsv'
354 |     train(model_name, train_input, dev_input)
355 | 
356 | # Evaluate
357 | log.info('==========Supervised learning==========')
358 | for lang in ['EN', 'ES', 'DE', 'ZH', 'JA', 'PT', 'FR', 'HI', 'TR']:
359 |     log.info('Evaluate on %s:' % lang)
360 |     model_name = 'model_lstm_' + lang + '.' + str(random_seed)
361 |     if lang == 'EN':
362 |         test_input = data_dir + 'atis_test.tsv'
363 |     else:
364 |         test_input = data_dir + 'atis_test_' + lang + '.tsv'
365 |     evaluate(model_name=model_name, eval_input=test_input)
366 | 
367 | log.info('==========Transfer learning==========')
368 | src_lang = 'EN'
369 | for lang in ['EN', 'ES', 'DE', 'ZH', 'JA', 'PT', 'FR', 'HI', 'TR']:
370 |     log.info('Evaluate on %s:' % lang)
371 |     model_name = 'model_lstm_' + src_lang + '.' + str(random_seed)
372 |     if lang == 'EN':
373 |         test_input = data_dir + 'atis_test.tsv'
374 |     else:
375 |         test_input = data_dir + 'atis_test_' + lang + '.tsv'
376 |     evaluate(model_name=model_name, eval_input=test_input)
377 | 


--------------------------------------------------------------------------------
/code/scripts/lstm_joint.py:
--------------------------------------------------------------------------------
  1 | import collections
  2 | import gluonnlp as nlp
  3 | import logging
  4 | import mxnet as mx
  5 | import os
  6 | import sklearn.metrics
  7 | import subprocess
  8 | import sys
  9 | import time
 10 | import warnings
 11 | 
 12 | from bert import *
 13 | from mxnet import gluon
 14 | from mxnet.gluon import Block, nn, rnn
 15 | 
 16 | from loss import *
 17 | from utils import *
 18 | 
 19 | random_seed = int(sys.argv[1])
 20 | warnings.filterwarnings('ignore')
 21 | data_dir = "../data/"
 22 | model_dir = "../exp/"
 23 | conll_prediction_file = data_dir + "conll.pred"
 24 | 
 25 | PAD = '[PAD]'
 26 | mx.random.seed(random_seed)
 27 | ctx = [mx.gpu(0)]
 28 | 
 29 | log = logging.getLogger('gluonnlp')
 30 | log.setLevel(logging.DEBUG)
 31 | formatter = logging.Formatter(fmt='[%(levelname)s] %(name)s:%(asctime)s %(message)s', datefmt='%H:%M:%S')
 32 | fh = logging.FileHandler(os.path.join(model_dir, 'lstm_joint.' + str(random_seed) + '.log'), mode='w')
 33 | fh.setLevel(logging.INFO)
 34 | fh.setFormatter(formatter)
 35 | console = logging.StreamHandler()
 36 | console.setLevel(logging.INFO)
 37 | console.setFormatter(formatter)
 38 | log.addHandler(console)
 39 | log.addHandler(fh)
 40 | 
 41 | class ICSL(Block):
 42 |     """Model for IC/SL task.
 43 | 
 44 |     The model feeds token ids into a biLSTM to get the sequence
 45 |     representations, then apply two dense layers for IC/SL task.
 46 |     """
 47 | 
 48 |     def __init__(self, vocab_size, num_slot_labels, num_intents, embed_size=256, rnn_hidden_size=128, rnn_layers=1, rnn_dropout=.1, embed_dropout=.1, prefix=None, params=None):
 49 |         super(ICSL, self).__init__(prefix=prefix, params=params)
 50 |         with self.name_scope():
 51 |             # Embedding
 52 |             self.word_embed = nn.Embedding(input_dim=vocab_size, output_dim=embed_size)
 53 |             self.embed_dropout = nn.Dropout(rate=embed_dropout)
 54 |             # RNN encoder
 55 |             self.rnn = rnn.LSTM(rnn_hidden_size, num_layers=rnn_layers, bidirectional=True, dropout=rnn_dropout)
 56 |             # IC/SL classifier
 57 |             self.slot_classifier = nn.Dense(units=num_slot_labels,
 58 |                                             flatten=False)
 59 |             self.intent_classifier = nn.Dense(units=num_intents,
 60 |                                               flatten=False)
 61 | 
 62 |     def forward(self, inputs):  # pylint: disable=arguments-differ
 63 |         """Generate unnormalized scores for the given input sequences.
 64 | 
 65 |         Parameters
 66 |         ----------
 67 |         inputs : NDArray, shape (batch_size, seq_length)
 68 |             Input words for the sequences.
 69 | 
 70 |         Returns
 71 |         -------
 72 |         intent_prediction: NDArray
 73 |             Shape (batch_size, num_intents)
 74 |         slot_prediction : NDArray
 75 |             Shape (batch_size, seq_length, num_slot_labels)
 76 |         """
 77 |         # embed: (batch_size, seq_length, embed_size)
 78 |         embed = self.word_embed(inputs)
 79 |         embed = self.embed_dropout(embed)
 80 |         # hidden: (seq_length, batch_size, rnn_hidden_size * 2)
 81 |         hidden = self.rnn(embed.swapaxes(0, 1))
 82 |         # hidden: (batch_size, seq_length, rnn_hidden_size * 2)
 83 |         hidden = hidden.swapaxes(0, 1)
 84 |         # get intent and slot label predictions
 85 |         intent_prediction = self.intent_classifier(hidden[:, 0, :])
 86 |         slot_prediction = self.slot_classifier(hidden[:, 1:, :])
 87 |         return intent_prediction, slot_prediction
 88 | 
 89 | 
 90 | def train(model_name, train_input, dev_input):
 91 |     """Training function."""
 92 |     ## Arguments
 93 |     log_interval = 100
 94 |     batch_size = 32
 95 |     lr = 1e-3
 96 |     optimizer = 'adam'
 97 |     accumulate = None
 98 |     epochs = 20
 99 | 
100 |     ## Load BERT model and vocabulary
101 |     bert, vocabulary = nlp.model.get_model('bert_12_768_12',
102 |                                            dataset_name='wiki_multilingual_uncased',
103 |                                            pretrained=True,
104 |                                            ctx=ctx,
105 |                                            use_pooler=False,
106 |                                            use_decoder=False,
107 |                                            use_classifier=False)
108 | 
109 |     model = ICSL(len(vocabulary), num_slot_labels=len(label2idx), num_intents=len(intent2idx))
110 |     model.initialize(init=mx.init.Uniform(0.1), ctx=ctx)
111 |     model.hybridize(static_alloc=True)
112 | 
113 |     icsl_loss_function = ICSLLoss()
114 |     icsl_loss_function.hybridize(static_alloc=True)
115 | 
116 |     ic_metric = mx.metric.Accuracy()
117 |     sl_metric = mx.metric.Accuracy()
118 | 
119 |     ## Load labeled data
120 |     field_separator = nlp.data.Splitter('\t')
121 |     # fields to select from the file: utterance, slot labels, intent, uid
122 |     field_indices = [1, 3, 4, 0]
123 |     train_data = nlp.data.TSVDataset(filename=train_input,
124 |                                      field_separator=field_separator,
125 |                                      num_discard_samples=1,
126 |                                      field_indices=field_indices)
127 | 
128 |     # use the vocabulary from pre-trained model for tokenization
129 |     bert_tokenizer = nlp.data.BERTTokenizer(vocabulary, lower=True)
130 |     train_data_transform = train_data.transform(fn=lambda x: icsl_transform(x, vocabulary, label2idx, intent2idx, bert_tokenizer)[0])
131 |     # create data loader
132 |     pad_token_id = vocabulary[PAD]
133 |     pad_label_id = label2idx[PAD]
134 |     batchify_fn = nlp.data.batchify.Tuple(
135 |         nlp.data.batchify.Stack(),
136 |         nlp.data.batchify.Pad(axis=0, pad_val=pad_token_id),
137 |         nlp.data.batchify.Pad(axis=0, pad_val=pad_label_id),
138 |         nlp.data.batchify.Stack('float32'),
139 |         nlp.data.batchify.Stack('float32'))
140 |     train_sampler = nlp.data.FixedBucketSampler(lengths=[len(item[1]) for item in train_data_transform],
141 |                                                 batch_size=batch_size,
142 |                                                 shuffle=True)
143 |     train_dataloader = mx.gluon.data.DataLoader(train_data_transform,
144 |                                                 batchify_fn=batchify_fn,
145 |                                                 batch_sampler=train_sampler)
146 | 
147 |     optimizer_params = {'learning_rate': lr}
148 |     trainer = gluon.Trainer(model.collect_params(), optimizer,
149 |                             optimizer_params, update_on_kvstore=False)
150 | 
151 |     # Collect differentiable parameters
152 |     params = [p for p in model.collect_params().values() if p.grad_req != 'null']
153 |     # Set grad_req if gradient accumulation is required
154 |     if accumulate:
155 |         for p in params:
156 |             p[1].grad_req = 'add'
157 | 
158 |     epoch_tic = time.time()
159 |     total_num = 0
160 |     log_num = 0
161 |     best_score = (0, 0)
162 |     for epoch_id in range(epochs):
163 |         step_loss = 0
164 |         tic = time.time()
165 |         # train on labeled data
166 |         for batch_id, data in enumerate(train_dataloader):
167 |             # forward and backward
168 |             with mx.autograd.record():
169 |                 if data[0].shape[0] < len(ctx):
170 |                     data = split_and_load(data, [ctx[0]])
171 |                 else:
172 |                     data = split_and_load(data, ctx)
173 |                 for chunk in data:
174 |                     _, token_ids, slot_label, intent_label, valid_length = chunk
175 | 
176 |                     log_num += len(token_ids)
177 |                     total_num += len(token_ids)
178 | 
179 |                     # forward computation
180 |                     intent_pred, slot_pred = model(token_ids)
181 |                     ls = icsl_loss_function(intent_pred, slot_pred, intent_label, slot_label, valid_length - 2).mean()
182 | 
183 |                     if accumulate:
184 |                         ls = ls / accumulate
185 |                     ls.backward()
186 |                     step_loss += ls.asscalar()
187 | 
188 |             # update
189 |             if not accumulate or (batch_id + 1) % accumulate == 0:
190 |                 trainer.allreduce_grads()
191 |                 nlp.utils.clip_grad_global_norm(params, 1)
192 |                 trainer.update(1, ignore_stale_grad=True)
193 | 
194 |             if (batch_id + 1) % log_interval == 0:
195 |                 toc = time.time()
196 |                 # update metrics
197 |                 ic_metric.update([intent_label], [intent_pred])
198 |                 sl_metric.update(*process_seq_labels(slot_label, slot_pred, ignore_id=pad_label_id))
199 |                 log.info('Epoch: {}, Batch: {}/{}, speed: {:.2f} samples/s, lr={:.7f}, loss={:.4f}, intent acc={:.3f}, slot acc={:.3f}'
200 |                          .format(epoch_id,
201 |                                  batch_id,
202 |                                  len(train_dataloader),
203 |                                  log_num / (toc - tic),
204 |                                  trainer.learning_rate,
205 |                                  step_loss / log_interval,
206 |                                  ic_metric.get()[1],
207 |                                  sl_metric.get()[1]))
208 | 
209 |                 tic = time.time()
210 |                 step_loss = 0
211 |                 log_num = 0
212 | 
213 |         mx.nd.waitall()
214 |         epoch_toc = time.time()
215 |         log.info('Time cost: {:.2f} s, Speed: {:.2f} samples/s'
216 |                  .format(epoch_toc - epoch_tic, total_num/(epoch_toc - epoch_tic)))
217 | 
218 |         # evaluate on development set
219 |         log.info('Evaluate on development set:')
220 |         intent_acc, slot_f1 = evaluate(model=model, eval_input=dev_input)
221 |         if slot_f1 > best_score[1]:
222 |             best_score = (intent_acc, slot_f1)
223 |             model.save_parameters(os.path.join(model_dir, model_name + '.params'))
224 | 
225 | 
226 | def evaluate(model=None, model_name='', eval_input=''):
227 |     """Evaluate the model on validation dataset.
228 |     """
229 |     ## Load model
230 |     bert, vocabulary = nlp.model.get_model('bert_12_768_12',
231 |                                            dataset_name='wiki_multilingual_uncased',
232 |                                            pretrained=True,
233 |                                            ctx=ctx,
234 |                                            use_pooler=False,
235 |                                            use_decoder=False,
236 |                                            use_classifier=False)
237 |     if model is None:
238 |         assert model_name != ''
239 |         model = ICSL(len(vocabulary), num_slot_labels=len(label2idx), num_intents=len(intent2idx))
240 |         model.initialize(ctx=ctx)
241 |         model.hybridize(static_alloc=True)
242 |         model.load_parameters(os.path.join(model_dir, model_name + '.params'))
243 | 
244 |     idx2label = {}
245 |     for label, idx in label2idx.items():
246 |         idx2label[idx] = label
247 |     ## Load dev dataset
248 |     field_separator = nlp.data.Splitter('\t')
249 |     field_indices = [1, 3, 4, 0]
250 |     eval_data = nlp.data.TSVDataset(filename=eval_input,
251 |                                     field_separator=field_separator,
252 |                                     num_discard_samples=1,
253 |                                     field_indices=field_indices)
254 | 
255 |     bert_tokenizer = nlp.data.BERTTokenizer(vocabulary, lower=True)
256 | 
257 |     dev_alignment = {}
258 |     eval_data_transform = []
259 |     for sample in eval_data:
260 |         sample, alignment = icsl_transform(sample, vocabulary, label2idx, intent2idx, bert_tokenizer)
261 |         eval_data_transform += [sample]
262 |         dev_alignment[sample[0]] = alignment
263 |     log.info('The number of examples after preprocessing: {}'
264 |              .format(len(eval_data_transform)))
265 | 
266 |     test_batch_size = 16
267 |     pad_token_id = vocabulary[PAD]
268 |     pad_label_id = label2idx[PAD]
269 |     batchify_fn = nlp.data.batchify.Tuple(
270 |         nlp.data.batchify.Stack(),
271 |         nlp.data.batchify.Pad(axis=0, pad_val=pad_token_id),
272 |         nlp.data.batchify.Pad(axis=0, pad_val=pad_label_id),
273 |         nlp.data.batchify.Stack('float32'),
274 |         nlp.data.batchify.Stack('float32'))
275 |     eval_dataloader = mx.gluon.data.DataLoader(
276 |         eval_data_transform,
277 |         batchify_fn=batchify_fn,
278 |         num_workers=4, batch_size=test_batch_size, shuffle=False, last_batch='keep')
279 | 
280 |     _Result = collections.namedtuple(
281 |         '_Result', ['intent', 'slot_labels'])
282 |     all_results = {}
283 | 
284 |     total_num = 0
285 |     for data in eval_dataloader:
286 |         example_ids, token_ids, _, _, valid_length = data
287 |         total_num += len(token_ids)
288 |         # load data to GPU
289 |         token_ids = token_ids.astype('float32').as_in_context(ctx[0])
290 |         valid_length = valid_length.astype('float32').as_in_context(ctx[0])
291 | 
292 |         # forward computation
293 |         intent_pred, slot_pred = model(token_ids)
294 |         intent_pred = intent_pred.asnumpy()
295 |         slot_pred = slot_pred.asnumpy()
296 |         valid_length = valid_length.asnumpy()
297 | 
298 |         for eid, y_intent, y_slot, length in zip(example_ids, intent_pred, slot_pred, valid_length):
299 |             eid = eid.asscalar()
300 |             length = int(length) - 2
301 |             intent_id = y_intent.argmax(axis=-1)
302 |             slot_ids = y_slot.argmax(axis=-1).tolist()[:length]
303 |             slot_names = [idx2label[idx] for idx in slot_ids]
304 |             merged_slot_names = merge_slots(slot_names, dev_alignment[eid] + [length])
305 |             if eid not in all_results:
306 |                 all_results[eid] = _Result(intent_id, merged_slot_names)
307 | 
308 |     example_ids, utterances, labels, intents = load_tsv(eval_input)
309 |     pred_intents = []
310 |     label_intents = []
311 |     for eid, intent in zip(example_ids, intents):
312 |         label_intents.append(label2index(intent2idx, intent))
313 |         pred_intents.append(all_results[eid].intent)
314 |     intent_acc = sklearn.metrics.accuracy_score(label_intents, pred_intents)
315 |     log.info("Intent Accuracy: %.4f" % intent_acc)
316 | 
317 |     pred_icsl = []
318 |     label_icsl = []
319 |     for eid, intent, slot_labels in zip(example_ids, intents, labels):
320 |         label_icsl.append(str(label2index(intent2idx, intent)) + ' ' + ' '.join(slot_labels))
321 |         pred_icsl.append(str(all_results[eid].intent) + ' ' + ' '.join(all_results[eid].slot_labels))
322 |     exact_match = sklearn.metrics.accuracy_score(label_icsl, pred_icsl)
323 |     log.info("Exact Match: %.4f" % exact_match)
324 | 
325 |     with open(conll_prediction_file, "w") as fw:
326 |         for eid, utterance, labels in zip(example_ids, utterances, labels):
327 |             preds = all_results[eid].slot_labels
328 |             for w, l, p in zip(utterance, labels, preds):
329 |                 fw.write(' '.join([w, l, p]) + '\n')
330 |             fw.write('\n')
331 |     proc = subprocess.Popen(["perl", "conlleval.pl"], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
332 |     with open(conll_prediction_file) as f:
333 |         stdout = proc.communicate(f.read().encode())[0]
334 |     result = stdout.decode('utf-8').split('\n')[1]
335 |     slot_f1 = float(result.split()[-1].strip())
336 |     log.info("Slot Labeling: %s" % result)
337 |     return intent_acc, slot_f1
338 | 
339 | 
340 | # extract labels
341 | train_input = data_dir + 'atis_train.tsv'
342 | intent2idx, label2idx = get_label_indices(train_input)
343 | 
344 | # Train
345 | log.info('Train on all languages:')
346 | model_name = 'model_lstm_joint.' + str(random_seed)
347 | train_input = data_dir + 'atis_train_all.tsv'
348 | dev_input = data_dir + 'atis_dev.tsv'
349 | train(model_name, train_input, dev_input)
350 | 
351 | # Evaluate
352 | for lang in ['EN', 'ES', 'DE', 'ZH', 'JA', 'PT', 'FR', 'HI', 'TR']:
353 |     log.info('Evaluate on %s:' % lang)
354 |     model_name = 'model_lstm_joint.' + str(random_seed)
355 |     if lang == 'EN':
356 |         test_input = data_dir + 'atis_test.tsv'
357 |     else:
358 |         test_input = data_dir + 'atis_test_' + lang + '.tsv'
359 |     evaluate(model_name=model_name, eval_input=test_input)
360 | 


--------------------------------------------------------------------------------
/code/scripts/lstm_mt.py:
--------------------------------------------------------------------------------
  1 | import collections
  2 | import gluonnlp as nlp
  3 | import logging
  4 | import mxnet as mx
  5 | import os
  6 | import sklearn.metrics
  7 | import subprocess
  8 | import sys
  9 | import time
 10 | import warnings
 11 | 
 12 | from bert import *
 13 | from mxnet import gluon
 14 | from mxnet.gluon import Block, nn, rnn
 15 | 
 16 | from loss import *
 17 | from utils import *
 18 | 
 19 | random_seed = int(sys.argv[1])
 20 | warnings.filterwarnings('ignore')
 21 | data_dir = "../data/"
 22 | model_dir = "../exp/"
 23 | conll_prediction_file = data_dir + "conll.pred"
 24 | 
 25 | PAD = '[PAD]'
 26 | mx.random.seed(random_seed)
 27 | ctx = [mx.gpu(2)]
 28 | 
 29 | log = logging.getLogger('gluonnlp')
 30 | log.setLevel(logging.DEBUG)
 31 | formatter = logging.Formatter(fmt='[%(levelname)s] %(name)s:%(asctime)s %(message)s', datefmt='%H:%M:%S')
 32 | fh = logging.FileHandler(os.path.join(model_dir, 'lstm_mt.' + str(random_seed) + '.log'), mode='w')
 33 | fh.setLevel(logging.INFO)
 34 | fh.setFormatter(formatter)
 35 | console = logging.StreamHandler()
 36 | console.setLevel(logging.INFO)
 37 | console.setFormatter(formatter)
 38 | log.addHandler(console)
 39 | log.addHandler(fh)
 40 | 
 41 | class ICSL(Block):
 42 |     """Model for IC/SL task.
 43 | 
 44 |     The model feeds token ids into a biLSTM to get the sequence
 45 |     representations, then apply two dense layers for IC/SL task.
 46 |     """
 47 | 
 48 |     def __init__(self, vocab_size, num_slot_labels, num_intents, embed_size=256, rnn_hidden_size=128, rnn_layers=1, rnn_dropout=.1, embed_dropout=.1, prefix=None, params=None):
 49 |         super(ICSL, self).__init__(prefix=prefix, params=params)
 50 |         with self.name_scope():
 51 |             # Embedding
 52 |             self.word_embed = nn.Embedding(input_dim=vocab_size, output_dim=embed_size)
 53 |             self.embed_dropout = nn.Dropout(rate=embed_dropout)
 54 |             # RNN encoder
 55 |             self.rnn = rnn.LSTM(rnn_hidden_size, num_layers=rnn_layers, bidirectional=True, dropout=rnn_dropout)
 56 |             # IC/SL classifier
 57 |             self.slot_classifier = nn.Dense(units=num_slot_labels,
 58 |                                             flatten=False)
 59 |             self.intent_classifier = nn.Dense(units=num_intents,
 60 |                                               flatten=False)
 61 | 
 62 |     def forward(self, inputs):  # pylint: disable=arguments-differ
 63 |         """Generate unnormalized scores for the given input sequences.
 64 | 
 65 |         Parameters
 66 |         ----------
 67 |         inputs : NDArray, shape (batch_size, seq_length)
 68 |             Input words for the sequences.
 69 | 
 70 |         Returns
 71 |         -------
 72 |         intent_prediction: NDArray
 73 |             Shape (batch_size, num_intents)
 74 |         slot_prediction : NDArray
 75 |             Shape (batch_size, seq_length, num_slot_labels)
 76 |         """
 77 |         # embed: (batch_size, seq_length, embed_size)
 78 |         embed = self.word_embed(inputs)
 79 |         embed = self.embed_dropout(embed)
 80 |         # hidden: (seq_length, batch_size, rnn_hidden_size * 2)
 81 |         hidden = self.rnn(embed.swapaxes(0, 1))
 82 |         # hidden: (batch_size, seq_length, rnn_hidden_size * 2)
 83 |         hidden = hidden.swapaxes(0, 1)
 84 |         # get intent and slot label predictions
 85 |         intent_prediction = self.intent_classifier(hidden[:, 0, :])
 86 |         slot_prediction = self.slot_classifier(hidden[:, 1:, :])
 87 |         return intent_prediction, slot_prediction
 88 | 
 89 | 
 90 | def train(model_name, train_input):
 91 |     """Training function."""
 92 |     ## Arguments
 93 |     log_interval = 100
 94 |     batch_size = 32
 95 |     lr = 1e-3
 96 |     optimizer = 'adam'
 97 |     accumulate = None
 98 |     epochs = 20
 99 | 
100 |     ## Load BERT model and vocabulary
101 |     bert, vocabulary = nlp.model.get_model('bert_12_768_12',
102 |                                            dataset_name='wiki_multilingual_uncased',
103 |                                            pretrained=True,
104 |                                            ctx=ctx,
105 |                                            use_pooler=False,
106 |                                            use_decoder=False,
107 |                                            use_classifier=False)
108 | 
109 |     model = ICSL(len(vocabulary), num_slot_labels=len(label2idx), num_intents=len(intent2idx))
110 |     model.initialize(init=mx.init.Uniform(0.1), ctx=ctx)
111 |     model.hybridize(static_alloc=True)
112 | 
113 |     icsl_loss_function = ICSLLoss()
114 |     icsl_loss_function.hybridize(static_alloc=True)
115 | 
116 |     ic_metric = mx.metric.Accuracy()
117 |     sl_metric = mx.metric.Accuracy()
118 | 
119 |     ## Load labeled data
120 |     field_separator = nlp.data.Splitter('\t')
121 |     # fields to select from the file: utterance, slot labels, intent, uid
122 |     field_indices = [1, 3, 4, 0]
123 |     train_data = nlp.data.TSVDataset(filename=train_input,
124 |                                      field_separator=field_separator,
125 |                                      num_discard_samples=1,
126 |                                      field_indices=field_indices)
127 | 
128 |     # use the vocabulary from pre-trained model for tokenization
129 |     bert_tokenizer = nlp.data.BERTTokenizer(vocabulary, lower=True)
130 |     train_data_transform = train_data.transform(fn=lambda x: icsl_transform(x, vocabulary, label2idx, intent2idx, bert_tokenizer)[0])
131 |     # create data loader
132 |     pad_token_id = vocabulary[PAD]
133 |     pad_label_id = label2idx[PAD]
134 |     batchify_fn = nlp.data.batchify.Tuple(
135 |         nlp.data.batchify.Stack(),
136 |         nlp.data.batchify.Pad(axis=0, pad_val=pad_token_id),
137 |         nlp.data.batchify.Pad(axis=0, pad_val=pad_label_id),
138 |         nlp.data.batchify.Stack('float32'),
139 |         nlp.data.batchify.Stack('float32'))
140 |     train_sampler = nlp.data.FixedBucketSampler(lengths=[len(item[1]) for item in train_data_transform],
141 |                                                 batch_size=batch_size,
142 |                                                 shuffle=True)
143 |     train_dataloader = mx.gluon.data.DataLoader(train_data_transform,
144 |                                                 batchify_fn=batchify_fn,
145 |                                                 batch_sampler=train_sampler)
146 | 
147 |     optimizer_params = {'learning_rate': lr}
148 |     trainer = gluon.Trainer(model.collect_params(), optimizer,
149 |                             optimizer_params, update_on_kvstore=False)
150 | 
151 |     # Collect differentiable parameters
152 |     params = [p for p in model.collect_params().values() if p.grad_req != 'null']
153 |     # Set grad_req if gradient accumulation is required
154 |     if accumulate:
155 |         for p in params:
156 |             p[1].grad_req = 'add'
157 | 
158 |     epoch_tic = time.time()
159 |     total_num = 0
160 |     log_num = 0
161 |     for epoch_id in range(epochs):
162 |         step_loss = 0
163 |         tic = time.time()
164 |         # train on labeled data
165 |         for batch_id, data in enumerate(train_dataloader):
166 |             # forward and backward
167 |             with mx.autograd.record():
168 |                 if data[0].shape[0] < len(ctx):
169 |                     data = split_and_load(data, [ctx[0]])
170 |                 else:
171 |                     data = split_and_load(data, ctx)
172 |                 for chunk in data:
173 |                     _, token_ids, slot_label, intent_label, valid_length = chunk
174 | 
175 |                     log_num += len(token_ids)
176 |                     total_num += len(token_ids)
177 | 
178 |                     # forward computation
179 |                     intent_pred, slot_pred = model(token_ids)
180 |                     ls = icsl_loss_function(intent_pred, slot_pred, intent_label, slot_label, valid_length - 2).mean()
181 | 
182 |                     if accumulate:
183 |                         ls = ls / accumulate
184 |                     ls.backward()
185 |                     step_loss += ls.asscalar()
186 | 
187 |             # update
188 |             if not accumulate or (batch_id + 1) % accumulate == 0:
189 |                 trainer.allreduce_grads()
190 |                 nlp.utils.clip_grad_global_norm(params, 1)
191 |                 trainer.update(1, ignore_stale_grad=True)
192 | 
193 |             if (batch_id + 1) % log_interval == 0:
194 |                 toc = time.time()
195 |                 # update metrics
196 |                 ic_metric.update([intent_label], [intent_pred])
197 |                 sl_metric.update(*process_seq_labels(slot_label, slot_pred, ignore_id=pad_label_id))
198 |                 log.info('Epoch: {}, Batch: {}/{}, speed: {:.2f} samples/s, lr={:.7f}, loss={:.4f}, intent acc={:.3f}, slot acc={:.3f}'
199 |                          .format(epoch_id,
200 |                                  batch_id,
201 |                                  len(train_dataloader),
202 |                                  log_num / (toc - tic),
203 |                                  trainer.learning_rate,
204 |                                  step_loss / log_interval,
205 |                                  ic_metric.get()[1],
206 |                                  sl_metric.get()[1]))
207 | 
208 |                 tic = time.time()
209 |                 step_loss = 0
210 |                 log_num = 0
211 | 
212 |         mx.nd.waitall()
213 |         epoch_toc = time.time()
214 |         log.info('Time cost: {:.2f} s, Speed: {:.2f} samples/s'
215 |                  .format(epoch_toc - epoch_tic, total_num/(epoch_toc - epoch_tic)))
216 |         model.save_parameters(os.path.join(model_dir, model_name + '.params'))
217 | 
218 | 
219 | def evaluate(model=None, model_name='', eval_input=''):
220 |     """Evaluate the model on validation dataset.
221 |     """
222 |     ## Load model
223 |     bert, vocabulary = nlp.model.get_model('bert_12_768_12',
224 |                                            dataset_name='wiki_multilingual_uncased',
225 |                                            pretrained=True,
226 |                                            ctx=ctx,
227 |                                            use_pooler=False,
228 |                                            use_decoder=False,
229 |                                            use_classifier=False)
230 |     if model is None:
231 |         assert model_name != ''
232 |         model = ICSL(len(vocabulary), num_slot_labels=len(label2idx), num_intents=len(intent2idx))
233 |         model.initialize(ctx=ctx)
234 |         model.hybridize(static_alloc=True)
235 |         model.load_parameters(os.path.join(model_dir, model_name + '.params'))
236 | 
237 |     idx2label = {}
238 |     for label, idx in label2idx.items():
239 |         idx2label[idx] = label
240 |     ## Load dev dataset
241 |     field_separator = nlp.data.Splitter('\t')
242 |     field_indices = [1, 3, 4, 0]
243 |     eval_data = nlp.data.TSVDataset(filename=eval_input,
244 |                                     field_separator=field_separator,
245 |                                     num_discard_samples=1,
246 |                                     field_indices=field_indices)
247 | 
248 |     bert_tokenizer = nlp.data.BERTTokenizer(vocabulary, lower=True)
249 | 
250 |     dev_alignment = {}
251 |     eval_data_transform = []
252 |     for sample in eval_data:
253 |         sample, alignment = icsl_transform(sample, vocabulary, label2idx, intent2idx, bert_tokenizer)
254 |         eval_data_transform += [sample]
255 |         dev_alignment[sample[0]] = alignment
256 |     log.info('The number of examples after preprocessing: {}'
257 |              .format(len(eval_data_transform)))
258 | 
259 |     test_batch_size = 16
260 |     pad_token_id = vocabulary[PAD]
261 |     pad_label_id = label2idx[PAD]
262 |     batchify_fn = nlp.data.batchify.Tuple(
263 |         nlp.data.batchify.Stack(),
264 |         nlp.data.batchify.Pad(axis=0, pad_val=pad_token_id),
265 |         nlp.data.batchify.Pad(axis=0, pad_val=pad_label_id),
266 |         nlp.data.batchify.Stack('float32'),
267 |         nlp.data.batchify.Stack('float32'))
268 |     eval_dataloader = mx.gluon.data.DataLoader(
269 |         eval_data_transform,
270 |         batchify_fn=batchify_fn,
271 |         num_workers=4, batch_size=test_batch_size, shuffle=False, last_batch='keep')
272 | 
273 |     _Result = collections.namedtuple(
274 |         '_Result', ['intent', 'slot_labels'])
275 |     all_results = {}
276 | 
277 |     total_num = 0
278 |     for data in eval_dataloader:
279 |         example_ids, token_ids, _, _, valid_length = data
280 |         total_num += len(token_ids)
281 |         # load data to GPU
282 |         token_ids = token_ids.astype('float32').as_in_context(ctx[0])
283 |         valid_length = valid_length.astype('float32').as_in_context(ctx[0])
284 | 
285 |         # forward computation
286 |         intent_pred, slot_pred = model(token_ids)
287 |         intent_pred = intent_pred.asnumpy()
288 |         slot_pred = slot_pred.asnumpy()
289 |         valid_length = valid_length.asnumpy()
290 | 
291 |         for eid, y_intent, y_slot, length in zip(example_ids, intent_pred, slot_pred, valid_length):
292 |             eid = eid.asscalar()
293 |             length = int(length) - 2
294 |             intent_id = y_intent.argmax(axis=-1)
295 |             slot_ids = y_slot.argmax(axis=-1).tolist()[:length]
296 |             slot_names = [idx2label[idx] for idx in slot_ids]
297 |             merged_slot_names = merge_slots(slot_names, dev_alignment[eid] + [length])
298 |             if eid not in all_results:
299 |                 all_results[eid] = _Result(intent_id, merged_slot_names)
300 | 
301 |     example_ids, utterances, labels, intents = load_tsv(eval_input)
302 |     pred_intents = []
303 |     label_intents = []
304 |     for eid, intent in zip(example_ids, intents):
305 |         label_intents.append(label2index(intent2idx, intent))
306 |         pred_intents.append(all_results[eid].intent)
307 |     intent_acc = sklearn.metrics.accuracy_score(label_intents, pred_intents)
308 |     log.info("Intent Accuracy: %.4f" % intent_acc)
309 | 
310 |     pred_icsl = []
311 |     label_icsl = []
312 |     for eid, intent, slot_labels in zip(example_ids, intents, labels):
313 |         label_icsl.append(str(label2index(intent2idx, intent)) + ' ' + ' '.join(slot_labels))
314 |         pred_icsl.append(str(all_results[eid].intent) + ' ' + ' '.join(all_results[eid].slot_labels))
315 |     exact_match = sklearn.metrics.accuracy_score(label_icsl, pred_icsl)
316 |     log.info("Exact Match: %.4f" % exact_match)
317 | 
318 |     with open(conll_prediction_file, "w") as fw:
319 |         for eid, utterance, labels in zip(example_ids, utterances, labels):
320 |             preds = all_results[eid].slot_labels
321 |             for w, l, p in zip(utterance, labels, preds):
322 |                 fw.write(' '.join([w, l, p]) + '\n')
323 |             fw.write('\n')
324 |     proc = subprocess.Popen(["perl", "conlleval.pl"], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
325 |     with open(conll_prediction_file) as f:
326 |         stdout = proc.communicate(f.read().encode())[0]
327 |     result = stdout.decode('utf-8').split('\n')[1]
328 |     slot_f1 = float(result.split()[-1].strip())
329 |     log.info("Slot Labeling: %s" % result)
330 |     return intent_acc, slot_f1
331 | 
332 | # extract labels
333 | train_input = data_dir + 'atis_train.tsv'
334 | intent2idx, label2idx = get_label_indices(train_input)
335 | 
336 | # Train
337 | for lang in ['ES', 'DE', 'ZH', 'JA', 'PT', 'FR', 'HI', 'TR']:
338 |     log.info('Train on %s:' % lang)
339 |     model_name = 'model_lstm_mt_' + lang + '.' + str(random_seed)
340 |     train_input = data_dir + 'train_translated_' + lang + '.tsv'
341 |     train(model_name, train_input)
342 | 
343 | # Evaluate
344 | for lang in ['ES', 'DE', 'ZH', 'JA', 'PT', 'FR', 'HI', 'TR']:
345 |     log.info('Evaluate on %s:' % lang)
346 |     model_name = 'model_lstm_mt_' + lang + '.' + str(random_seed)
347 |     test_input = data_dir + 'atis_test_' + lang + '.tsv'
348 |     evaluate(model_name=model_name, eval_input=test_input)
349 | 


--------------------------------------------------------------------------------
/code/scripts/translate_and_align.py:
--------------------------------------------------------------------------------
  1 | import nlu_constants
  2 | import boto3
  3 | import csv
  4 | import json
  5 | import jieba
  6 | import sys
  7 | import subprocess
  8 | 
  9 | lang = sys.argv[1]
 10 | 
 11 | data_dir = "../data/"
 12 | train_tsv = "atis_train.tsv"
 13 | valid_tsv = "atis_dev.tsv"
 14 | train_target = "train_translated_" + lang.upper() + ".tsv"
 15 | valid_target = "dev_translated_" + lang.upper() + ".tsv"
 16 | source_lang = "en"
 17 | target_lang = lang
 18 | 
 19 | 
 20 | def idx2label(sources, source_labels):
 21 |     # we remove the BI tags because the order of BI may be changed after translation.
 22 |     ret_token_sls = []  # return a list of dictionary where
 23 |     for ix, ex in enumerate(sources):
 24 |         lbs = source_labels[ix]
 25 |         assert len(ex) == len(lbs)
 26 |         sls = {}
 27 |         for jx, token in enumerate(ex):
 28 |             if lbs[jx] != 'O':
 29 |                 sls[jx] = lbs[jx][2:]
 30 |         ret_token_sls.append(sls)
 31 |     return ret_token_sls
 32 | 
 33 | 
 34 | def load_tsv(fn):
 35 |     sources = []
 36 |     source_labels = []
 37 |     with open(fn) as tsvFile:
 38 |         tsvReader = csv.DictReader(tsvFile, delimiter="\t")
 39 |         for ix, line in enumerate(tsvReader):
 40 |             sources.append(line[nlu_constants.UTTERANCE_SYMBOL].split(' '))
 41 |             source_labels.append(line[nlu_constants.SLOT_LABEL_SYMBOL].split(' '))
 42 |     return sources, source_labels
 43 | 
 44 | 
 45 | for source_tsv, target_tsv in [(train_tsv, train_target), (valid_tsv, valid_target)]:
 46 |     sources, source_labels = load_tsv(data_dir + source_tsv)
 47 |     token_sls = idx2label(sources, source_labels)
 48 |     with open(data_dir + target_tsv[:-4] + "_idx2label" + ".json", "w") as fw:
 49 |         json.dump(token_sls, fw)
 50 |     translator = boto3.client(service_name='translate', use_ssl=True, region_name='us-east-1')
 51 |     targets = []
 52 |     for source in sources:
 53 |         result = translator.translate_text(Text=" ".join(source), SourceLanguageCode=source_lang,
 54 |                                            TargetLanguageCode=target_lang)
 55 |         targets.append(result.get('TranslatedText'))
 56 |     with open(data_dir + target_tsv[:-4] + ".json", "w") as fw:
 57 |         json.dump(targets, fw)
 58 |     with open(data_dir + target_tsv[:-4] + ".json") as f:
 59 |         targets = json.load(f)
 60 |     tokenized_targets = []
 61 |     for target in targets:
 62 |         if target_lang == 'zh':
 63 |             target = target.replace(",", "")
 64 |             segs = [t.strip() for t in list(jieba.cut(target)) if not t.isspace()]
 65 |         else:
 66 |             target = target.replace(",", " ")
 67 |             segs = target.strip().split(' ')
 68 |         tokenized_targets.append(segs)
 69 |     with open(data_dir + target_tsv[:-4] + "_token" + ".json", "w") as fw:
 70 |         json.dump(tokenized_targets, fw)
 71 | 
 72 | with open(data_dir + "atis_train_valid", "w") as fw:
 73 |     for source_tsv, target_tsv in [(train_tsv, train_target), (valid_tsv, valid_target)]:
 74 |         source_utterances, _ = load_tsv(data_dir + source_tsv)
 75 |         with open(data_dir + target_tsv[:-4] + "_token" + ".json") as f:
 76 |             target_utterances = json.load(f)
 77 |         for ix in range(len(source_utterances)):
 78 |             fw.write((" ".join(source_utterances[ix]) + " ||| " + " ".join(target_utterances[ix]) + "\n"))
 79 | 
 80 | with open(data_dir + "forward.align", "w") as f:
 81 |     command = ["./fast_align/build/fast_align", "-i", data_dir + "atis_train_valid", "-d", "-o", "-v"]
 82 |     process = subprocess.Popen(command, stdin=subprocess.PIPE, stdout=subprocess.PIPE)
 83 |     stdout = process.communicate()[0].decode('utf-8')
 84 |     f.write(stdout)
 85 | 
 86 | lens = [0]
 87 | for target_tsv in [train_target, valid_target]:
 88 |     with open(data_dir + target_tsv[:-4] + "_token" + ".json") as f:
 89 |         lens.append(len(json.load(f)) + lens[-1])
 90 | lens = lens[1:]
 91 | 
 92 | s2t_set = []
 93 | with open(data_dir + "forward.align") as f:
 94 |     s2t_indexes = []
 95 |     len_idx = 0
 96 |     for ix, l in enumerate(f):
 97 |         if ix == lens[len_idx]:
 98 |             s2t_set.append(s2t_indexes)
 99 |             s2t_indexes = []
100 |             len_idx += 1
101 |         if ix >= lens[-1]:
102 |             break
103 |         segs = l.split()
104 |         s2t_idx = {}
105 |         for seg in segs:
106 |             st = seg.split('-')
107 |             s2t_idx[int(st[0])] = int(st[1])
108 |         s2t_indexes.append(s2t_idx)
109 |     if len(s2t_indexes) > 0:
110 |         s2t_set.append(s2t_indexes)
111 | 
112 | 
113 | def align_labels(source_utterances, target_utterances, s2t_indexes, source_idx2labels):
114 |     # generate target labels.
115 |     ret_target_labels = []
116 |     for ix, tokens in enumerate(source_utterances):
117 |         template = ['O'] * len(target_utterances[ix])  # generate template labels
118 |         for jx in range(len(source_utterances[ix])):
119 |             if jx in s2t_indexes[ix] and str(jx) in source_idx2labels[ix]:
120 |                 template[s2t_indexes[ix][jx]] = source_idx2labels[ix][str(jx)]
121 |         # add BI labels
122 |         state = 'O'
123 |         for jx in range(len(template)):
124 |             if template[jx] != 'O' and (state == 'O' or state != template[jx]):
125 |                 state = template[jx]
126 |                 template[jx] = 'B-' + template[jx]
127 |             elif template[jx] != 'O' and state == template[jx]:
128 |                 template[jx] = 'I-' + template[jx]
129 |             elif template[jx] == 'O':
130 |                 state = 'O'
131 |         ret_target_labels.append(template)
132 |     return ret_target_labels
133 | 
134 | 
135 | for ix, (source_tsv, target_tsv) in enumerate([(train_tsv, train_target), (valid_tsv, valid_target)]):
136 |     source_utterances, source_labels = load_tsv(data_dir + source_tsv)
137 |     with open(data_dir + target_tsv[:-4] + "_token" + ".json") as f:
138 |         target_tokens = json.load(f)
139 |     s2t_indexes = s2t_set[ix]
140 |     with open(data_dir + target_tsv[:-4] + "_idx2label" + ".json") as f:
141 |         token_sls = json.load(f)
142 |     target_labels = align_labels(source_utterances, target_tokens, s2t_indexes, token_sls)
143 |     with open(data_dir + source_tsv) as tsvFile:  # also gen tsv format for DiSAN
144 |         tsvReader = csv.DictReader(tsvFile, delimiter="\t")
145 |         with open(data_dir + target_tsv, "w") as tsvFileW:
146 |             tsvWriter = csv.DictWriter(tsvFileW, fieldnames=tsvReader.fieldnames, delimiter="\t")
147 |             tsvWriter.writeheader()
148 |             for jx, line in enumerate(tsvReader):
149 |                 line[nlu_constants.UTTERANCE_SYMBOL] = ' '.join(target_tokens[jx])
150 |                 line[nlu_constants.SLOT_LABEL_SYMBOL] = ' '.join(target_labels[jx])
151 |                 tsvWriter.writerow(line)
152 | 


--------------------------------------------------------------------------------
/code/scripts/utils.py:
--------------------------------------------------------------------------------
  1 | import csv
  2 | import mxnet as mx
  3 | import numpy as np
  4 | import random
  5 | 
  6 | from mxnet import gluon
  7 | 
  8 | PAD = '[PAD]'
  9 | 
 10 | def process_seq_labels(label, pred, ignore_id=-1):
 11 |     # label: (batch_size * seq_length)
 12 |     label = label.reshape(-3).asnumpy()
 13 |     # pred: (batch_size * seq_length, num_labels)
 14 |     pred = pred.reshape(-3, 0).asnumpy()
 15 |     # ignore ignore_id
 16 |     keep_idx = np.where(label != ignore_id)
 17 |     label = mx.nd.array(label[keep_idx])
 18 |     pred = mx.nd.array(pred[keep_idx])
 19 |     return [label], [pred]
 20 | 
 21 | def split_and_load(arrs, ctx):
 22 |     """split and load arrays to a list of contexts"""
 23 |     assert isinstance(arrs, (list, tuple))
 24 |     if len(ctx) == 1:
 25 |         return [[arr.as_in_context(ctx[0]) for arr in arrs]]
 26 |     else:
 27 |         # split and load
 28 |         loaded_arrs = [gluon.utils.split_and_load(arr, ctx, even_split=False) for arr in arrs]
 29 |         return zip(*loaded_arrs)
 30 | 
 31 | def label2index(map, key):
 32 |     return map[key] if key in map else len(map)
 33 | 
 34 | def load_tsv(fn):
 35 |     example_ids = []
 36 |     utterances = []
 37 |     labels = []
 38 |     intents = []
 39 |     with open(fn) as tsvFile:
 40 |         tsvReader = csv.DictReader(tsvFile, delimiter="\t")
 41 |         for line in tsvReader:
 42 |             example_ids.append(int(line['u_id']))
 43 |             utterances.append(line['utterance'].split(' '))
 44 |             labels.append(line['slot-labels'].split(' '))
 45 |             intents.append(line['intent'])
 46 |     return example_ids, utterances, labels, intents
 47 | 
 48 | def get_label_indices(input_file):
 49 |     _, _, train_labels, train_intents = load_tsv(input_file)
 50 | 
 51 |     intent2idx = {}
 52 |     for intent in train_intents:
 53 |         if intent not in intent2idx:
 54 |             intent2idx[intent] = len(intent2idx)
 55 | 
 56 |     label2idx = {}
 57 |     for labels in train_labels:
 58 |         for l in labels:
 59 |             if l not in label2idx:
 60 |                 label2idx[l] = len(label2idx)
 61 | 
 62 |     new_labels = []
 63 |     for label in label2idx.keys():
 64 |         if label.startswith('B'):
 65 |             cont_label = 'I' + label[1:]
 66 |             if cont_label not in label2idx:
 67 |                 new_labels.append(cont_label)
 68 |     for label in new_labels:
 69 |         label2idx[label] = len(label2idx)
 70 |     if PAD not in label2idx:
 71 |         label2idx[PAD] = len(label2idx)
 72 |     return intent2idx, label2idx
 73 | 
 74 | def merge_slots(slots, alignment):
 75 |     merged_slots = []
 76 |     start_idx = alignment[0]
 77 |     for end_idx in alignment[1:]:
 78 |         tag = slots[start_idx]
 79 |         for slot in slots[start_idx: end_idx]:
 80 |             if slot.startswith('B') and tag == 'O':
 81 |                 tag = slot
 82 |             elif slot.startswith('I') and tag == 'O':
 83 |                 tag = slot
 84 |         start_idx = end_idx
 85 |         merged_slots.append(tag)
 86 |     return merged_slots
 87 | 
 88 | def icsl_transform(sample, vocabulary, label2idx, intent2idx, bert_tokenizer):
 89 |     eid = int(sample[3])
 90 |     out_sample = []
 91 |     tag_alignment = []
 92 |     bert_tokens = ['[CLS]']
 93 |     bert_tags = []
 94 |     for w, tag in zip(sample[0].split(), sample[1].split()):
 95 |         tag_alignment.append(len(bert_tags))
 96 |         bert_toks = bert_tokenizer(w)
 97 |         bert_tokens.extend(bert_toks)
 98 |         if tag.startswith('B'):
 99 |             cont_tag = 'I' + tag[1:]
100 |             bert_tags.extend([tag] + [cont_tag] * (len(bert_toks) - 1))
101 |         else:
102 |             bert_tags.extend([tag] * len(bert_toks))
103 |     bert_tokens += ['[SEP]']
104 |     bert_tags += [PAD]
105 |     # add example id
106 |     out_sample += [eid]
107 |     # add token ids
108 |     out_sample += [[vocabulary[tok] for tok in bert_tokens]]
109 |     # add slot labels
110 |     out_sample += [[label2index(label2idx, tag) for tag in bert_tags]]
111 |     # add intent label
112 |     out_sample += [label2index(intent2idx, sample[2])]
113 |     # add valid length
114 |     valid_len = len(bert_tokens)
115 |     out_sample += [valid_len]
116 |     return out_sample, tag_alignment
117 | 
118 | def parallel_icsl_transform(sample, vocabulary, label2idx, intent2idx, bert_tokenizer):
119 |     out_sample = []
120 |     target = ['[CLS]']
121 |     bert_tags = []
122 |     for w, tag in zip(sample[1].split(), sample[2].split()):
123 |         bert_toks = bert_tokenizer(w)
124 |         target.extend(bert_toks)
125 |         if tag.startswith('B'):
126 |             cont_tag = 'I' + tag[1:]
127 |             bert_tags.extend([tag] + [cont_tag] * (len(bert_toks) - 1))
128 |         else:
129 |             bert_tags.extend([tag] * len(bert_toks))
130 |     target += ['[SEP]']
131 |     bert_tags += [PAD]
132 |     source = ['[CLS]'] + bert_tokenizer(sample[0]) + ['[SEP]']
133 |     # add source ids
134 |     out_sample += [[vocabulary[tok] for tok in source]]
135 |     # add target ids
136 |     out_sample += [[vocabulary[tok] for tok in target]]
137 |     # add slot labels
138 |     out_sample += [[label2index(label2idx, tag) for tag in bert_tags]]
139 |     # add intent label
140 |     out_sample += [label2index(intent2idx, sample[3])]
141 |     # add source valid length
142 |     out_sample += [len(source)]
143 |     # add target valid length
144 |     out_sample += [len(target)]
145 |     return out_sample
146 | 


--------------------------------------------------------------------------------