├── LICENSE.md
├── README.md
├── alexnet_distilled.py
├── alexnet_distilled2bits.py
├── alexnet_doublefilters.py
├── cifar100_test.py
├── cifar10_diff_quant_test.py
├── cifar10_test.py
├── cifar10_wideResNet.py
├── cnn_models
    ├── __init__.py
    ├── alexnet_kfilters.py
    ├── conv_forward_model.py
    ├── help_fun.py
    ├── resnet_kfilters.py
    ├── wide_resnet.py
    └── wide_resnet_imagenet.py
├── datasets
    ├── CIFAR10.py
    ├── CIFAR100.py
    ├── ImageNet12.py
    ├── MNIST.py
    ├── PennTreeBank.py
    ├── __init__.py
    ├── customs_datasets.py
    ├── torchvision_extension.py
    └── translation_datasets.py
├── helpers
    └── functions.py
├── imageNet_distilled.py
├── model_manager.py
├── onmt
    ├── Beam.py
    ├── IO.py
    ├── Loss.py
    ├── ModelConstructor.py
    ├── Models.py
    ├── Optim.py
    ├── Trainer.py
    ├── Translator.py
    ├── Utils.py
    ├── __init__.py
    ├── modules
    │   ├── Conv2Conv.py
    │   ├── ConvMultiStepAttention.py
    │   ├── CopyGenerator.py
    │   ├── Embeddings.py
    │   ├── Gate.py
    │   ├── GlobalAttention.py
    │   ├── ImageEncoder.py
    │   ├── MultiHeadedAttn.py
    │   ├── SRU.py
    │   ├── StackedRNN.py
    │   ├── StructuredAttention.py
    │   ├── Transformer.py
    │   ├── UtilClass.py
    │   ├── WeightNorm.py
    │   └── __init__.py
    └── standard_options.py
├── openNMT_WMT13.py
├── openNMT_integ_dataset.py
├── openNMT_multi30k.py
├── perl_scripts
    ├── README.md
    ├── multi-bleu.perl
    ├── nonbreaking_prefix.de
    ├── nonbreaking_prefix.en
    └── tokenizer.perl
├── quantization
    ├── __init__.py
    ├── help_functions.py
    └── quant_functions.py
├── requirements.txt
├── resnet34_doublefilters.py
└── translation_models
    ├── __init__.py
    ├── help_fun.py
    └── model.py


/LICENSE.md:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2017 Antonio Polino - Dan Alistarh - Razvan Pascanu
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | #  Model compression via distillation and quantization
  2 | 
  3 | This code has been written to experiment with quantized distillation and differentiable quantization, techniques developed in our paper ["Model compression via distillation and quantization"](https://arxiv.org/abs/1802.05668).
  4 | 
  5 | If you find this code useful in your research, please cite the paper:
  6 | 
  7 | ```
  8 | @article{2018arXiv180205668P,
  9 |    author = {{Polino}, A. and {Pascanu}, R. and {Alistarh}, D.},
 10 |     title = "{Model compression via distillation and quantization}",
 11 |   journal = {ArXiv e-prints},
 12 | archivePrefix = "arXiv",
 13 |    eprint = {1802.05668},
 14 |  keywords = {Computer Science - Neural and Evolutionary Computing, Computer Science - Learning},
 15 |      year = 2018,
 16 |     month = feb,
 17 | }
 18 | ```
 19 | 
 20 | 
 21 | The code is written in [Pytorch 0.3](http://pytorch.org/) using Python 3.6. It is not backward compatible with Python2.x
 22 | 
 23 | *Note* Pytorch 0.4 introduced some major breaking changes. To use this code, please use Pytorch 0.3.
 24 | 
 25 | Check for the compatible version of torchvision. To run the code, use torchvision 0.2.0.
 26 | ```
 27 | pip install torchvision==0.2.0
 28 | ```
 29 | This should be done after installing the [requirements](requirements.txt).
 30 | 
 31 | # Getting started
 32 | 
 33 | ### Prerequisites
 34 | This code is mostly self contained. Only a few additional libraries are requires, specified in [requirements.txt](requirements.txt). The repository already contains a fork of the [openNMT-py project](https://github.com/OpenNMT/OpenNMT-py). Note that, due to the rapidly changing nature of the openNMT-py codebase and the substantial time and effort required to make it compatible with our code, it is unlikely that we will support newer versions of openNMT-py.
 35 | 
 36 | ### Summary of folder's content
 37 | This is a short explanation of the contents of each folder:
 38 | 
 39 |  - *datasets* is a package that automatically downloads and process several datasets, including CIFAR10, PennTreeBank, WMT2013, etc.
 40 |  - *quantization* contains the quantization functions that are used.
 41 |  - *perl_scripts* contains some perl scripts taken from the [moses project](https://github.com/moses-smt/mosesdecoder) to help with the translation task.
 42 |  - *onmt* contains the code from [openNMT-py project](https://github.com/OpenNMT/OpenNMT-py). It is slightly modified to make it consistent with our codebase.
 43 |  - *helpers* contains some functions used across the whole project.
 44 |  - *model_manager.py* contains a useful class that implements common I/O operations on saved models. It is especially useful when training multiple similar models, as it keeps track of the options with which the models were trained and the results of each training run. *Note*: it does not support concurrent access to the same files. I am working on a version that does; if you are interested, drop me a line.
 45 |  - First-level files like [cifar10_test.py](cifar10_test.py) are the main files that implement the experiments using the rest of the codebase.
 46 |  - Other folders contain model definitions and training routines, depending on the task.
 47 | 
 48 | ### Running the code
 49 | 
 50 | The first thing to do is to import some dataset and create the train and test set loaders.
 51 | Define a folder where you want to save all your datasets; they will be automatically downloaded and processed in the folder specified. The following example shows how to load the CIFAR10 dataset, create and train a model.
 52 | ```python
 53 | import datasets
 54 | datasets.BASE_DATA_FOLDER = '/home/saved_datasets'
 55 | 
 56 | batch_size = 50
 57 | cifar10 = datasets.CIFAR10() #-> will be saved in /home/saved_datasets/cifar10
 58 | train_loader, test_loader = cifar10.getTrainLoader(batch_size), cifar10.getTestLoader(batch_size)
 59 | ```
 60 | 
 61 | Now we can use ```train_loader``` and ```test_loader``` as generators from which to get the train and test data as pytorch tensors.
 62 | 
 63 | At this point we just need to define a model and train it:
 64 | 
 65 | ```python
 66 | import os
 67 | import cnn_models.conv_forward_model as convForwModel
 68 | import cnn_models.help_fun as cnn_hf
 69 | teacherModel = convForwModel.ConvolForwardNet(**convForwModel.teacherModelSpec,
 70 |                                               useBatchNorm=True,
 71 |                                               useAffineTransformInBatchNorm=True)
 72 | convForwModel.train_model(teacherModel, train_loader, test_loader, epochs_to_train=200)
 73 | ```
 74 | 
 75 |  As mentioned before, it is often better to use the ModelManager class to be able to automatically save the results and retrieve them later. So we would typically write
 76 | 
 77 | ```python
 78 | import os
 79 | import cnn_models.conv_forward_model as convForwModel
 80 | import cnn_models.help_fun as cnn_hf
 81 | import model_manager
 82 | cifar10Manager = model_manager.ModelManager('model_manager_cifar10.tst',
 83 |                                             'model_manager', create_new_model_manager=False)#the first time set this to True
 84 | model_name = 'cifar10_teacher'
 85 | cifar10modelsFolder = '~/quantized_distillation/'
 86 | teacherModelPath = os.path.join(cifar10modelsFolder, model_name)
 87 | teacherModel = convForwModel.ConvolForwardNet(**convForwModel.teacherModelSpec,
 88 |                                               useBatchNorm=True,
 89 |                                               useAffineTransformInBatchNorm=True)
 90 | if not model_name in cifar10Manager.saved_models:
 91 |     cifar10Manager.add_new_model(model_name, teacherModelPath,
 92 |             arguments_creator_function={**convForwModel.teacherModelSpec,
 93 |                                         'useBatchNorm':True,
 94 |                                         'useAffineTransformInBatchNorm':True})
 95 | cifar10Manager.train_model(teacherModel, model_name=model_name,
 96 |                            train_function=convForwModel.train_model,
 97 |                            arguments_train_function={'epochs_to_train': 200},
 98 |                            train_loader=train_loader, test_loader=test_loader)
 99 | ```         
100 | 
101 | This is the general structure necessary to use the code. For more examples, please look at one of the main files that are used to run the experiments.
102 | 
103 | # Authors
104 | 
105 |  - Antonio Polino
106 |  - Razvan Pascanu
107 |  - Dan Alistarh
108 | 
109 | # License
110 | 
111 | The code is licensed under the MIT Licence. See the [LICENSE.md](LICENSE.md) file for detail.
112 | 
113 | # Acknowledgements
114 | 
115 | We would like to thank Ce Zhang  (ETH Zürich), Hantian Zhang (ETH Zürich) and Martin Jaggi (EPFL) for their support with experiments and valuable feedback.
116 | 


--------------------------------------------------------------------------------
/alexnet_distilled.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import torch
  3 | import torchvision
  4 | import cnn_models.conv_forward_model as convForwModel
  5 | import cnn_models.help_fun as cnn_hf
  6 | import datasets
  7 | import model_manager
  8 | 
  9 | cuda_devices = os.environ['CUDA_VISIBLE_DEVICES'].split(',')
 10 | print('CUDA_VISIBLE_DEVICES: {} for a total of {} GPUs'.format(cuda_devices, len(cuda_devices)))
 11 | 
 12 | 
 13 | NUM_BITS = 4
 14 | print('Number of bits in training: {}'.format(NUM_BITS))
 15 | 
 16 | datasets.BASE_DATA_FOLDER = '...'
 17 | SAVED_MODELS_FOLDER = '...'
 18 | 
 19 | 
 20 | USE_CUDA = torch.cuda.is_available()
 21 | NUM_GPUS = len(cuda_devices)
 22 | 
 23 | try:
 24 |     os.mkdir(datasets.BASE_DATA_FOLDER)
 25 | except:pass
 26 | try:
 27 |     os.mkdir(SAVED_MODELS_FOLDER)
 28 | except:pass
 29 | 
 30 | epochsToTrainImageNet = 90
 31 | imageNet12modelsFolder = os.path.join(SAVED_MODELS_FOLDER, 'imagenet12_new')
 32 | imagenet_manager = model_manager.ModelManager('model_manager_imagenet_Alexnet_distilled4bits.tst',
 33 |                                               'model_manager', create_new_model_manager=False)
 34 | 
 35 | for x in imagenet_manager.list_models():
 36 |     if imagenet_manager.get_num_training_runs(x) >= 1:
 37 |         s = '{}; Last prediction acc: {}, Best prediction acc: {}'.format(x,
 38 |                                             imagenet_manager.load_metadata(x)[1]['predictionAccuracy'][-1],
 39 |                                             max(imagenet_manager.load_metadata(x)[1]['predictionAccuracy']))
 40 |         print(s)
 41 | 
 42 | try:
 43 |     os.mkdir(imageNet12modelsFolder)
 44 | except:pass
 45 | 
 46 | print('Batch size: {}'.format(batch_size))
 47 | 
 48 | if batch_size % NUM_GPUS != 0:
 49 |     raise ValueError('Batch size: {} must be a multiple of the number of gpus:{}'.format(batch_size, NUM_GPUS))
 50 | 
 51 | 
 52 | imageNet12 = datasets.ImageNet12('...',
 53 |                                  '...',
 54 |                                  type_of_data_augmentation='extended', already_scaled=False,
 55 |                                  pin_memory=True)
 56 | 
 57 | 
 58 | train_loader = imageNet12.getTrainLoader(batch_size, shuffle=True)
 59 | test_loader = imageNet12.getTestLoader(batch_size, shuffle=False)
 60 | 
 61 | # # Teacher model
 62 | alexnet_unquantized = torchvision.models.alexnet(pretrained=True)
 63 | if USE_CUDA:
 64 |     alexnet_unquantized = alexnet_unquantized.cuda()
 65 | if NUM_GPUS > 1:
 66 |     alexnet_unquantized = torch.nn.parallel.DataParallel(alexnet_unquantized)
 67 | 
 68 | 
 69 | #Train a wide-resNet with quantized distillation
 70 | quant_distilled_model_name = 'alexnet_quant_distilled{}bits'.format(NUM_BITS)
 71 | quantDistilledModelPath = os.path.join(imageNet12modelsFolder, quant_distilled_model_name)
 72 | quantDistilledOptions = {}
 73 | quant_distilled_model = torchvision.models.alexnet(pretrained=False)
 74 | 
 75 | if USE_CUDA:
 76 |     quant_distilled_model = quant_distilled_model.cuda()
 77 | if NUM_GPUS > 1:
 78 |     quant_distilled_model = torch.nn.parallel.DataParallel(quant_distilled_model)
 79 | 
 80 | if not quant_distilled_model_name in imagenet_manager.saved_models:
 81 |     imagenet_manager.add_new_model(quant_distilled_model_name, quantDistilledModelPath,
 82 |                                    arguments_creator_function=quantDistilledOptions)
 83 | 
 84 | imagenet_manager.train_model(quant_distilled_model, model_name=quant_distilled_model_name,
 85 |                              train_function=convForwModel.train_model,
 86 |                              arguments_train_function={'epochs_to_train': epochsToTrainImageNet,
 87 |                                                        'learning_rate_style': 'imagenet',
 88 |                                                        'initial_learning_rate': 0.1,
 89 |                                                        'use_nesterov':True,
 90 |                                                        'initial_momentum':0.9,
 91 |                                                        'weight_decayL2':1e-4,
 92 |                                                        'start_epoch': 0,
 93 |                                                        'print_every':30,
 94 |                                                        'use_distillation_loss':True,
 95 |                                                        'teacher_model': alexnet_unquantized,
 96 |                                                        'quantizeWeights':True,
 97 |                                                        'numBits':NUM_BITS,
 98 |                                                        'bucket_size':256,
 99 |                                                        'quantize_first_and_last_layer': False},
100 |                              train_loader=train_loader, test_loader=test_loader)
101 | 


--------------------------------------------------------------------------------
/alexnet_distilled2bits.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import torch
  3 | import torchvision
  4 | import cnn_models.conv_forward_model as convForwModel
  5 | import cnn_models.help_fun as cnn_hf
  6 | import datasets
  7 | import model_manager
  8 | 
  9 | cuda_devices = os.environ['CUDA_VISIBLE_DEVICES'].split(',')
 10 | print('CUDA_VISIBLE_DEVICES: {} for a total of {} GPUs'.format(cuda_devices, len(cuda_devices)))
 11 | 
 12 | NUM_BITS = 2
 13 | print('Number of bits in training: {}'.format(NUM_BITS))
 14 | 
 15 | datasets.BASE_DATA_FOLDER = '...'
 16 | SAVED_MODELS_FOLDER = '...'
 17 | 
 18 | USE_CUDA = torch.cuda.is_available()
 19 | NUM_GPUS = len(cuda_devices)
 20 | 
 21 | try:
 22 |     os.mkdir(datasets.BASE_DATA_FOLDER)
 23 | except:pass
 24 | try:
 25 |     os.mkdir(SAVED_MODELS_FOLDER)
 26 | except:pass
 27 | 
 28 | TRAIN_QUANTIZED_DISTILLED = False
 29 | 
 30 | epochsToTrainImageNet = 90
 31 | imageNet12modelsFolder = os.path.join(SAVED_MODELS_FOLDER, 'imagenet12_new')
 32 | imagenet_manager = model_manager.ModelManager('model_manager_imagenet_Alexnet_distilled2bits.tst',
 33 |                                               'model_manager', create_new_model_manager=False)
 34 | 
 35 | for x in imagenet_manager.list_models():
 36 |     if imagenet_manager.get_num_training_runs(x) >= 1:
 37 |         s = '{}; Last prediction acc: {}, Best prediction acc: {}'.format(x,
 38 |                                             imagenet_manager.load_metadata(x)[1]['predictionAccuracy'][-1],
 39 |                                             max(imagenet_manager.load_metadata(x)[1]['predictionAccuracy']))
 40 |         print(s)
 41 | 
 42 | try:
 43 |     os.mkdir(imageNet12modelsFolder)
 44 | except:pass
 45 | 
 46 | print('Batch size: {}'.format(batch_size))
 47 | 
 48 | if batch_size % NUM_GPUS != 0:
 49 |     raise ValueError('Batch size: {} must be a multiple of the number of gpus:{}'.format(batch_size, NUM_GPUS))
 50 | 
 51 | imageNet12 = datasets.ImageNet12('...',
 52 |                                  '...',
 53 |                                  type_of_data_augmentation='extended', already_scaled=False,
 54 |                                  pin_memory=True)
 55 | 
 56 | 
 57 | 
 58 | train_loader = imageNet12.getTrainLoader(batch_size, shuffle=True)
 59 | test_loader = imageNet12.getTestLoader(batch_size, shuffle=False)
 60 | 
 61 | # # Teacher model
 62 | alexnet_unquantized = torchvision.models.alexnet(pretrained=True)
 63 | if USE_CUDA:
 64 |     alexnet_unquantized = alexnet_unquantized.cuda()
 65 | if NUM_GPUS > 1:
 66 |     alexnet_unquantized = torch.nn.parallel.DataParallel(alexnet_unquantized)
 67 | 
 68 | 
 69 | #Train a wide-resNet with quantized distillation
 70 | quant_distilled_model_name = 'alexnet_quant_distilled{}bits'.format(NUM_BITS)
 71 | quantDistilledModelPath = os.path.join(imageNet12modelsFolder, quant_distilled_model_name)
 72 | quantDistilledOptions = {}
 73 | quant_distilled_model = torchvision.models.alexnet(pretrained=False)
 74 | 
 75 | if USE_CUDA:
 76 |     quant_distilled_model = quant_distilled_model.cuda()
 77 | if NUM_GPUS > 1:
 78 |     quant_distilled_model = torch.nn.parallel.DataParallel(quant_distilled_model)
 79 | 
 80 | if not quant_distilled_model_name in imagenet_manager.saved_models:
 81 |     imagenet_manager.add_new_model(quant_distilled_model_name, quantDistilledModelPath,
 82 |                                    arguments_creator_function=quantDistilledOptions)
 83 | 
 84 | if TRAIN_QUANTIZED_DISTILLED:
 85 |     imagenet_manager.train_model(quant_distilled_model, model_name=quant_distilled_model_name,
 86 |                                  train_function=convForwModel.train_model,
 87 |                                  arguments_train_function={'epochs_to_train': epochsToTrainImageNet,
 88 |                                                            'learning_rate_style': 'imagenet',
 89 |                                                            'initial_learning_rate': 0.1,
 90 |                                                            'use_nesterov':True,
 91 |                                                            'initial_momentum':0.9,
 92 |                                                            'weight_decayL2':1e-4,
 93 |                                                            'start_epoch': 0,
 94 |                                                            'print_every':30,
 95 |                                                            'use_distillation_loss':True,
 96 |                                                            'teacher_model': alexnet_unquantized,
 97 |                                                            'quantizeWeights':True,
 98 |                                                            'numBits':NUM_BITS,
 99 |                                                            'bucket_size':256,
100 |                                                            'quantize_first_and_last_layer': False},
101 |                                  train_loader=train_loader, test_loader=test_loader)
102 | quant_distilled_model.load_state_dict(imagenet_manager.load_model_state_dict(quant_distilled_model_name))
103 | 
104 | print(cnn_hf.evaluateModel(quant_distilled_model))
105 | 


--------------------------------------------------------------------------------
/alexnet_doublefilters.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import torch
  3 | import torchvision
  4 | import cnn_models.conv_forward_model as convForwModel
  5 | import cnn_models.help_fun as cnn_hf
  6 | import cnn_models.alexnet_kfilters as alexnet_kfilters
  7 | import datasets
  8 | import model_manager
  9 | import functools
 10 | import quantization
 11 | import helpers.functions as mhf
 12 | 
 13 | cuda_devices = os.environ['CUDA_VISIBLE_DEVICES'].split(',')
 14 | print('CUDA_VISIBLE_DEVICES: {} for a total of {} GPUs'.format(cuda_devices, len(cuda_devices)))
 15 | 
 16 | 
 17 | NUM_BITS = 4
 18 | print('Server Name: {}'.format(server_name))
 19 | print('Number of bits in training: {}'.format(NUM_BITS))
 20 | 
 21 | datasets.BASE_DATA_FOLDER = '...'
 22 | SAVED_MODELS_FOLDER = '...'
 23 | 
 24 | USE_CUDA = torch.cuda.is_available()
 25 | NUM_GPUS = len(cuda_devices)
 26 | 
 27 | try:
 28 |     os.mkdir(datasets.BASE_DATA_FOLDER)
 29 | except:pass
 30 | try:
 31 |     os.mkdir(SAVED_MODELS_FOLDER)
 32 | except:pass
 33 | 
 34 | epochsToTrainImageNet = 90
 35 | imageNet12modelsFolder = os.path.join(SAVED_MODELS_FOLDER, 'imagenet12_new')
 36 | imagenet_manager = model_manager.ModelManager('model_manager_imagenet_AlexnetDoubleFilters_distilled4bits.tst',
 37 |                                               'model_manager', create_new_model_manager=False)
 38 | 
 39 | for x in imagenet_manager.list_models():
 40 |     if imagenet_manager.get_num_training_runs(x) >= 1:
 41 |         s = '{}; Last prediction acc: {}, Best prediction acc: {}'.format(x,
 42 |                                             imagenet_manager.load_metadata(x)[1]['predictionAccuracy'][-1],
 43 |                                             max(imagenet_manager.load_metadata(x)[1]['predictionAccuracy']))
 44 |         print(s)
 45 | 
 46 | try:
 47 |     os.mkdir(imageNet12modelsFolder)
 48 | except:pass
 49 | 
 50 | 
 51 | TRAIN_QUANTIZED_DISTILLED = False
 52 | 
 53 | print('Batch size: {}'.format(batch_size))
 54 | 
 55 | if batch_size % NUM_GPUS != 0:
 56 |     raise ValueError('Batch size: {} must be a multiple of the number of gpus:{}'.format(batch_size, NUM_GPUS))
 57 | 
 58 | imageNet12 = datasets.ImageNet12('...',
 59 |                                  '...',
 60 |                                  type_of_data_augmentation='extended', already_scaled=False,
 61 |                                  pin_memory=True)
 62 | 
 63 | 
 64 | train_loader = imageNet12.getTrainLoader(batch_size, shuffle=True)
 65 | test_loader = imageNet12.getTestLoader(batch_size, shuffle=False)
 66 | 
 67 | # # Teacher model
 68 | alexnet_unquantized = torchvision.models.alexnet(pretrained=True)
 69 | if USE_CUDA:
 70 |     alexnet_unquantized = alexnet_unquantized.cuda()
 71 | if NUM_GPUS > 1:
 72 |     alexnet_unquantized = torch.nn.parallel.DataParallel(alexnet_unquantized)
 73 | 
 74 | 
 75 | #Train a wide-resNet with quantized distillation
 76 | quant_distilled_model_name = 'alexnet_DoubleFiltersquant_distilled{}bits'.format(NUM_BITS)
 77 | quantDistilledModelPath = os.path.join(imageNet12modelsFolder, quant_distilled_model_name)
 78 | quantDistilledOptions = {}
 79 | quant_distilled_model = alexnet_kfilters.AlexNet(k=2)
 80 | 
 81 | if USE_CUDA:
 82 |     quant_distilled_model = quant_distilled_model.cuda()
 83 | if NUM_GPUS > 1:
 84 |     quant_distilled_model = torch.nn.parallel.DataParallel(quant_distilled_model)
 85 | 
 86 | if not quant_distilled_model_name in imagenet_manager.saved_models:
 87 |     imagenet_manager.add_new_model(quant_distilled_model_name, quantDistilledModelPath,
 88 |                                    arguments_creator_function=quantDistilledOptions)
 89 | 
 90 | if TRAIN_QUANTIZED_DISTILLED:
 91 |     imagenet_manager.train_model(quant_distilled_model, model_name=quant_distilled_model_name,
 92 |                                  train_function=convForwModel.train_model,
 93 |                                  arguments_train_function={'epochs_to_train': epochsToTrainImageNet,
 94 |                                                            'learning_rate_style': 'imagenet',
 95 |                                                            'initial_learning_rate': 0.1,
 96 |                                                            'use_nesterov':True,
 97 |                                                            'initial_momentum':0.9,
 98 |                                                            'weight_decayL2':1e-4,
 99 |                                                            'start_epoch': 0,
100 |                                                            'print_every':30,
101 |                                                            'use_distillation_loss':True,
102 |                                                            'teacher_model': alexnet_unquantized,
103 |                                                            'quantizeWeights':True,
104 |                                                            'numBits':NUM_BITS,
105 |                                                            'bucket_size':256,
106 |                                                            'quantize_first_and_last_layer': False},
107 |                                  train_loader=train_loader, test_loader=test_loader)
108 | quant_distilled_model.load_state_dict(imagenet_manager.load_model_state_dict(quant_distilled_model_name))
109 | print(cnn_hf.evaluateModel(quant_distilled_model, test_loader, fastEvaluation=False))
110 | print(cnn_hf.evaluateModel(quant_distilled_model, test_loader, fastEvaluation=False, k=5))
111 | print(cnn_hf.evaluateModel(alexnet_unquantized, test_loader, fastEvaluation=False))
112 | print(cnn_hf.evaluateModel(alexnet_unquantized, test_loader, fastEvaluation=False, k=5))
113 | quant_fun = functools.partial(quantization.uniformQuantization, s=2**4, bucket_size=256)
114 | size_mb = mhf.get_size_quantized_model(quant_distilled_model, 4, quant_fun, 256,
115 |                                        quantizeFirstLastLayer=False)
116 | print(size_mb)
117 | 


--------------------------------------------------------------------------------
/cifar10_diff_quant_test.py:
--------------------------------------------------------------------------------
  1 | import model_manager
  2 | import torch
  3 | import os
  4 | import datasets
  5 | import cnn_models.conv_forward_model as convForwModel
  6 | import cnn_models.help_fun as cnn_hf
  7 | import quantization
  8 | import pickle
  9 | import copy
 10 | import quantization.help_functions as qhf
 11 | import functools
 12 | import helpers.functions as mhf
 13 | import itertools as it
 14 | 
 15 | datasets.BASE_DATA_FOLDER = '...'
 16 | SAVED_MODELS_FOLDER = '...'
 17 | USE_CUDA = torch.cuda.is_available()
 18 | 
 19 | print('CUDA_VISIBLE_DEVICES: {}'.format(os.environ['CUDA_VISIBLE_DEVICES']))
 20 | 
 21 | try:
 22 |     os.mkdir(datasets.BASE_DATA_FOLDER)
 23 | except:pass
 24 | try:
 25 |     os.mkdir(SAVED_MODELS_FOLDER)
 26 | except:pass
 27 | 
 28 | 
 29 | cifar10Manager = model_manager.ModelManager('model_manager_cifar10.tst',
 30 |                                             'model_manager', create_new_model_manager=False)
 31 | cifar10modelsFolder = os.path.join(SAVED_MODELS_FOLDER, 'cifar10')
 32 | 
 33 | for x in cifar10Manager.list_models():
 34 |     if cifar10Manager.get_num_training_runs(x) >= 1:
 35 |         print(x, cifar10Manager.load_metadata(x)[1]['predictionAccuracy'][-1])
 36 | 
 37 | try:
 38 |     os.mkdir(cifar10modelsFolder)
 39 | except:pass
 40 | 
 41 | USE_BATCH_NORM = True
 42 | AFFINE_BATCH_NORM = True
 43 | 
 44 | 
 45 | COMPUTE_DIFFERENT_HEURISTICS = False
 46 | 
 47 | batch_size = 25
 48 | cifar10 = datasets.CIFAR10()
 49 | train_loader, test_loader = cifar10.getTrainLoader(batch_size), cifar10.getTestLoader(batch_size)
 50 | 
 51 | 
 52 | ## distilled model
 53 | distilled_model_name = 'cifar10_distilled'
 54 | distilledModelSpec = copy.deepcopy(convForwModel.smallerModelSpec)
 55 | distilledModelSpec['spec_dropout_rates'] = [] #no dropout with distilled model
 56 | 
 57 | 
 58 | def values_param_iter(n=3):
 59 |     return it.product(*([True, False],)*n)
 60 | 
 61 | numBits = [2, 4]
 62 | if COMPUTE_DIFFERENT_HEURISTICS:
 63 |     for numBit in numBits:
 64 |         for assign_bits_auto, use_distillation_loss, compute_initial_points in values_param_iter(n=3):
 65 |             if compute_initial_points is True:
 66 |                 compute_initial_points = 'quantiles'
 67 |             else:
 68 |                 compute_initial_points = 'uniform'
 69 |             str_identifier = 'quantpoints{}bits_auto{}_distill{}_initial"{}"'.format(numBit, assign_bits_auto,
 70 |                                                                                   use_distillation_loss,
 71 |                                                                                   compute_initial_points)
 72 |             distilled_quantized_model_name = distilled_model_name + str_identifier
 73 |             distilled_quantized_model = convForwModel.ConvolForwardNet(**distilledModelSpec,
 74 |                                                                        useBatchNorm=USE_BATCH_NORM,
 75 |                                                                        useAffineTransformInBatchNorm=AFFINE_BATCH_NORM)
 76 |             if USE_CUDA: distilled_quantized_model = distilled_quantized_model.cuda()
 77 |             distilled_quantized_model.load_state_dict(cifar10Manager.load_model_state_dict(distilled_model_name))
 78 |             epochs_to_train = 50
 79 | 
 80 |             quantized_model_dict, quantization_points, infoDict = convForwModel.optimize_quantization_points(
 81 |                 distilled_quantized_model,
 82 |                 train_loader, test_loader, numPointsPerTensor=2**numBit,
 83 |                 assignBitsAutomatically=assign_bits_auto,
 84 |                 bucket_size=256, epochs_to_train=epochs_to_train,
 85 |                 use_distillation_loss=use_distillation_loss, initial_learning_rate=1e-5,
 86 |                 initialize_method=compute_initial_points)
 87 |             quantization_points = [x.data.view(1,-1).cpu().numpy().tolist()[0] for x in quantization_points]
 88 |             save_path = cifar10Manager.get_model_base_path(distilled_model_name) + str_identifier
 89 |             with open(save_path, 'wb') as p:
 90 |                 pickle.dump((quantization_points, infoDict), p)
 91 |             torch.save(quantized_model_dict, save_path + '_model_state_dict')
 92 | 
 93 | for numBit in numBits:
 94 |     for assign_bits_auto, use_distillation_loss, compute_initial_points in values_param_iter(n=3):
 95 |         if compute_initial_points is True:
 96 |             compute_initial_points = 'quantiles'
 97 |         else:
 98 |             compute_initial_points = 'uniform'
 99 |         str_identifier = 'quantpoints{}bits_auto{}_distill{}_initial"{}"'.format(numBit, assign_bits_auto,
100 |                                                                                  use_distillation_loss,
101 |                                                                                  compute_initial_points)
102 |         distilled_quantized_model_name = distilled_model_name + str_identifier
103 |         save_path = cifar10Manager.get_model_base_path(distilled_model_name) + str_identifier
104 |         with open(save_path, 'rb') as p:
105 |             quantization_points, infoDict = pickle.load(p)
106 | 
107 |         distilled_quantized_model = convForwModel.ConvolForwardNet(**distilledModelSpec,
108 |                                                                    useBatchNorm=USE_BATCH_NORM,
109 |                                                                    useAffineTransformInBatchNorm=AFFINE_BATCH_NORM)
110 |         if USE_CUDA: distilled_quantized_model = distilled_quantized_model.cuda()
111 |         distilled_quantized_model.load_state_dict(torch.load(save_path + '_model_state_dict'))
112 |         reported_accuracy = max(infoDict['predictionAccuracy'])
113 |         actual_accuracy = cnn_hf.evaluateModel(distilled_quantized_model, test_loader) #this corresponds to the last one
114 |         #the only problem is that I don't save the model with the max accuracy, but the model at the last epoch
115 |         print('Model "{}" => reported accuracy: {} - actual accuracy: {}'.format(distilled_quantized_model_name,
116 |                                                                                  reported_accuracy, actual_accuracy))
117 | 


--------------------------------------------------------------------------------
/cnn_models/__init__.py:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/cnn_models/alexnet_kfilters.py:
--------------------------------------------------------------------------------
 1 | import torch.nn as nn
 2 | import torch.utils.model_zoo as model_zoo
 3 | import math
 4 | 
 5 | class AlexNet(nn.Module):
 6 | 
 7 |     def __init__(self, num_classes=1000, k=1):
 8 |         super(AlexNet, self).__init__()
 9 |         self.features = nn.Sequential(
10 |             nn.Conv2d(3, math.floor(64*k), kernel_size=11, stride=4, padding=2), #originally 64 filters
11 |             nn.ReLU(inplace=True),
12 |             nn.MaxPool2d(kernel_size=3, stride=2),
13 |             nn.Conv2d(math.floor(64*k), math.floor(192*k), kernel_size=5, padding=2), #originally 192
14 |             nn.ReLU(inplace=True),
15 |             nn.MaxPool2d(kernel_size=3, stride=2),
16 |             nn.Conv2d(math.floor(192*k), math.floor(384*k), kernel_size=3, padding=1), #originally 384
17 |             nn.ReLU(inplace=True),
18 |             nn.Conv2d(math.floor(384*k), math.floor(256*k), kernel_size=3, padding=1), #originally 256
19 |             nn.ReLU(inplace=True),
20 |             nn.Conv2d(math.floor(256*k), math.floor(256*k), kernel_size=3, padding=1), #originally 256
21 |             nn.ReLU(inplace=True),
22 |             nn.MaxPool2d(kernel_size=3, stride=2),
23 |         )
24 |         self.classifier = nn.Sequential(
25 |             nn.Dropout(),
26 |             nn.Linear(math.floor(256*k) * 6 * 6, 4096), #originally 256 * 6 * 6
27 |             nn.ReLU(inplace=True),
28 |             nn.Dropout(),
29 |             nn.Linear(4096, 4096),
30 |             nn.ReLU(inplace=True),
31 |             nn.Linear(4096, num_classes),
32 |         )
33 | 
34 |     def forward(self, x):
35 |         x = self.features(x)
36 |         x = x.view(x.size(0), 512 * 6 * 6) #originally 256 * 6 * 6
37 |         x = self.classifier(x)
38 |         return x


--------------------------------------------------------------------------------
/cnn_models/resnet_kfilters.py:
--------------------------------------------------------------------------------
  1 | import torch.nn as nn
  2 | import math
  3 | 
  4 | 
  5 | def conv3x3(in_planes, out_planes, stride=1):
  6 |     "3x3 convolution with padding"
  7 |     return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
  8 |                      padding=1, bias=False)
  9 | 
 10 | 
 11 | class BasicBlock(nn.Module):
 12 |     expansion = 1
 13 | 
 14 |     def __init__(self, inplanes, planes, stride=1, downsample=None):
 15 |         super(BasicBlock, self).__init__()
 16 |         self.conv1 = conv3x3(inplanes, planes, stride)
 17 |         self.bn1 = nn.BatchNorm2d(planes)
 18 |         self.relu = nn.ReLU(inplace=True)
 19 |         self.conv2 = conv3x3(planes, planes)
 20 |         self.bn2 = nn.BatchNorm2d(planes)
 21 |         self.downsample = downsample
 22 |         self.stride = stride
 23 | 
 24 |     def forward(self, x):
 25 |         residual = x
 26 | 
 27 |         out = self.conv1(x)
 28 |         out = self.bn1(out)
 29 |         out = self.relu(out)
 30 | 
 31 |         out = self.conv2(out)
 32 |         out = self.bn2(out)
 33 | 
 34 |         if self.downsample is not None:
 35 |             residual = self.downsample(x)
 36 | 
 37 |         out += residual
 38 |         out = self.relu(out)
 39 | 
 40 |         return out
 41 | 
 42 | 
 43 | class Bottleneck(nn.Module):
 44 |     expansion = 4
 45 | 
 46 |     def __init__(self, inplanes, planes, stride=1, downsample=None):
 47 |         super(Bottleneck, self).__init__()
 48 |         self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
 49 |         self.bn1 = nn.BatchNorm2d(planes)
 50 |         self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,
 51 |                                padding=1, bias=False)
 52 |         self.bn2 = nn.BatchNorm2d(planes)
 53 |         self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False)
 54 |         self.bn3 = nn.BatchNorm2d(planes * 4)
 55 |         self.relu = nn.ReLU(inplace=True)
 56 |         self.downsample = downsample
 57 |         self.stride = stride
 58 | 
 59 |     def forward(self, x):
 60 |         residual = x
 61 | 
 62 |         out = self.conv1(x)
 63 |         out = self.bn1(out)
 64 |         out = self.relu(out)
 65 | 
 66 |         out = self.conv2(out)
 67 |         out = self.bn2(out)
 68 |         out = self.relu(out)
 69 | 
 70 |         out = self.conv3(out)
 71 |         out = self.bn3(out)
 72 | 
 73 |         if self.downsample is not None:
 74 |             residual = self.downsample(x)
 75 | 
 76 |         out += residual
 77 |         out = self.relu(out)
 78 | 
 79 |         return out
 80 | 
 81 | 
 82 | class ResNet(nn.Module):
 83 | 
 84 |     def __init__(self, block, layers, num_classes=1000, k=1):
 85 |         self.inplanes = math.floor(64*k) #originally 64
 86 |         super(ResNet, self).__init__()
 87 |         self.conv1 = nn.Conv2d(3, math.floor(64*k), kernel_size=7, stride=2, padding=3, #originally 64
 88 |                                bias=False)
 89 |         self.bn1 = nn.BatchNorm2d(math.floor(64*k))                                     #originally 64
 90 |         self.relu = nn.ReLU(inplace=True)
 91 |         self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
 92 |         self.layer1 = self._make_layer(block, math.floor(64*k), layers[0])             #originally 64
 93 |         self.layer2 = self._make_layer(block, math.floor(128*k), layers[1], stride=2)  #originally 128
 94 |         self.layer3 = self._make_layer(block, math.floor(256*k), layers[2], stride=2)  #originally 256
 95 |         self.layer4 = self._make_layer(block, math.floor(512*k), layers[3], stride=2)  #originally 512
 96 |         self.avgpool = nn.AvgPool2d(7, stride=1)
 97 |         self.fc = nn.Linear(math.floor(512*k) * block.expansion, num_classes) #originally 512 * block.expansion
 98 | 
 99 |         #initialization
100 |         for m in self.modules():
101 |             if isinstance(m, nn.Conv2d):
102 |                 n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
103 |                 m.weight.data.normal_(0, math.sqrt(2. / n))
104 |             elif isinstance(m, nn.BatchNorm2d):
105 |                 m.weight.data.fill_(1)
106 |                 m.bias.data.zero_()
107 | 
108 |     def _make_layer(self, block, planes, blocks, stride=1):
109 |         downsample = None
110 |         if stride != 1 or self.inplanes != planes * block.expansion:
111 |             downsample = nn.Sequential(
112 |                 nn.Conv2d(self.inplanes, planes * block.expansion,
113 |                           kernel_size=1, stride=stride, bias=False),
114 |                 nn.BatchNorm2d(planes * block.expansion),
115 |             )
116 | 
117 |         layers = []
118 |         layers.append(block(self.inplanes, planes, stride, downsample))
119 |         self.inplanes = planes * block.expansion
120 |         for i in range(1, blocks):
121 |             layers.append(block(self.inplanes, planes))
122 | 
123 |         return nn.Sequential(*layers)
124 | 
125 |     def forward(self, x):
126 |         x = self.conv1(x)
127 |         x = self.bn1(x)
128 |         x = self.relu(x)
129 |         x = self.maxpool(x)
130 | 
131 |         x = self.layer1(x)
132 |         x = self.layer2(x)
133 |         x = self.layer3(x)
134 |         x = self.layer4(x)
135 | 
136 |         x = self.avgpool(x)
137 |         x = x.view(x.size(0), -1)
138 |         x = self.fc(x)
139 | 
140 |         return x
141 | 
142 | 
143 | def resnet18(**kwargs):
144 |     """Constructs a ResNet-18 model.
145 | 
146 |     Args:
147 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
148 |     """
149 |     model = ResNet(BasicBlock, [2, 2, 2, 2], **kwargs)
150 |     return model
151 | 
152 | 
153 | def resnet34(**kwargs):
154 |     """Constructs a ResNet-34 model.
155 | 
156 |     Args:
157 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
158 |     """
159 |     model = ResNet(BasicBlock, [3, 4, 6, 3], **kwargs)
160 |     return model
161 | 
162 | 
163 | def resnet50(**kwargs):
164 |     """Constructs a ResNet-50 model.
165 | 
166 |     Args:
167 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
168 |     """
169 |     model = ResNet(Bottleneck, [3, 4, 6, 3], **kwargs)
170 |     return model
171 | 
172 | 
173 | def resnet101(**kwargs):
174 |     """Constructs a ResNet-101 model.
175 | 
176 |     Args:
177 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
178 |     """
179 |     model = ResNet(Bottleneck, [3, 4, 23, 3], **kwargs)
180 |     return model
181 | 
182 | 
183 | def resnet152(**kwargs):
184 |     """Constructs a ResNet-152 model.
185 | 
186 |     Args:
187 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
188 |     """
189 |     model = ResNet(Bottleneck, [3, 8, 36, 3], **kwargs)
190 |     return model
191 | 


--------------------------------------------------------------------------------
/cnn_models/wide_resnet.py:
--------------------------------------------------------------------------------
 1 | #code taken from https://github.com/meliketoy/wide-resnet.pytorch
 2 | import torch
 3 | import torch.nn as nn
 4 | import torch.nn.init as init
 5 | import torch.nn.functional as F
 6 | from torch.autograd import Variable
 7 | 
 8 | import sys
 9 | import numpy as np
10 | 
11 | #TODO: Some of the things are not equal to the model definition (from the authors)
12 | # which is here: https://github.com/szagoruyko/functional-zoo/blob/master/wide-resnet-50-2-export.ipynb
13 | 
14 | 
15 | def conv3x3(in_planes, out_planes, stride=1):
16 |     #TODO: Authors use, in their conv2d a padding=0 by default if I am not mistaken
17 |     return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, padding=1, bias=True)
18 | 
19 | def conv_init(m):
20 |     classname = m.__class__.__name__
21 |     if classname.find('Conv') != -1:
22 |         init.xavier_uniform(m.weight, gain=np.sqrt(2))
23 |         init.constant(m.bias, 0)
24 |     elif classname.find('BatchNorm') != -1:
25 |         init.constant(m.weight, 1)
26 |         init.constant(m.bias, 0)
27 | 
28 | class wide_basic(nn.Module):
29 |     def __init__(self, in_planes, planes, dropout_rate, stride=1):
30 |         super(wide_basic, self).__init__()
31 |         self.bn1 = nn.BatchNorm2d(in_planes)
32 |         self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=3, padding=1, bias=True)
33 |         self.dropout = nn.Dropout(p=dropout_rate)
34 |         self.bn2 = nn.BatchNorm2d(planes)
35 |         self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride, padding=1, bias=True)
36 | 
37 |         self.shortcut = nn.Sequential()
38 |         if stride != 1 or in_planes != planes:
39 |             self.shortcut = nn.Sequential(
40 |                 nn.Conv2d(in_planes, planes, kernel_size=1, stride=stride, bias=True),
41 |             )
42 | 
43 |     def forward(self, x):
44 |         out = self.dropout(self.conv1(F.relu(self.bn1(x))))
45 |         out = self.conv2(F.relu(self.bn2(out)))
46 |         out += self.shortcut(x)
47 | 
48 |         return out
49 | 
50 | class Wide_ResNet(nn.Module):
51 |     def __init__(self, depth, widen_factor, dropout_rate, num_classes):
52 |         super(Wide_ResNet, self).__init__()
53 |         self.in_planes = 16
54 | 
55 |         assert ((depth-4)%6 ==0), 'Wide-resnet depth should be 6n+4'
56 |         n = int((depth-4)/6)
57 |         k = widen_factor
58 | 
59 |         nStages = [16, 16*k, 32*k, 64*k]
60 | 
61 |         self.conv1 = conv3x3(3,nStages[0]) #TODO: authors use stride=2, padding=3 in first convolution
62 |         self.layer1 = self._wide_layer(wide_basic, nStages[1], n, dropout_rate, stride=1)
63 |         self.layer2 = self._wide_layer(wide_basic, nStages[2], n, dropout_rate, stride=2)
64 |         self.layer3 = self._wide_layer(wide_basic, nStages[3], n, dropout_rate, stride=2)
65 |         self.bn1 = nn.BatchNorm2d(nStages[3], momentum=0.9)
66 |         self.linear = nn.Linear(nStages[3], num_classes)
67 |         self.apply(conv_init)
68 | 
69 |     def _wide_layer(self, block, planes, num_blocks, dropout_rate, stride):
70 |         strides = [stride] + [1]*(num_blocks-1)
71 |         layers = []
72 | 
73 |         for stride in strides:
74 |             layers.append(block(self.in_planes, planes, dropout_rate, stride))
75 |             self.in_planes = planes
76 | 
77 |         return nn.Sequential(*layers)
78 | 
79 |     def forward(self, x):
80 |         out = self.conv1(x) #TODO: after first layer they use relu and maxpool2d with parameters 3, 2, 1
81 |         out = self.layer1(out)
82 |         out = self.layer2(out)
83 |         out = self.layer3(out)
84 |         out = F.relu(self.bn1(out))
85 |         out = F.avg_pool2d(out, 8)
86 |         out = out.view(out.size(0), -1)
87 |         out = self.linear(out)
88 | 
89 |         return out


--------------------------------------------------------------------------------
/cnn_models/wide_resnet_imagenet.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import torch.nn as nn
 3 | import torch.nn.init as init
 4 | import torch.nn.functional as F
 5 | from torch.autograd import Variable
 6 | 
 7 | import sys
 8 | import numpy as np
 9 | 
10 | #TODO: Some of the things are not equal to the model definition (from the authors)
11 | # which is here: https://github.com/szagoruyko/functional-zoo/blob/master/wide-resnet-50-2-export.ipynb
12 | 
13 | #code taken from https://github.com/meliketoy/wide-resnet.pytorch
14 | 
15 | def conv_init(m):
16 |     classname = m.__class__.__name__
17 |     if classname.find('Conv') != -1:
18 |         init.xavier_uniform(m.weight, gain=np.sqrt(2))
19 |         init.constant(m.bias, 0)
20 |     elif classname.find('BatchNorm') != -1:
21 |         init.constant(m.weight, 1)
22 |         init.constant(m.bias, 0)
23 | 
24 | class wide_bottleneck_straightned(nn.Module):
25 |     #Bottlebeck because has the structure 1x1 conv, 3x3 conv, 1x1 conv
26 |     #straightned because the dimensions are the same for all convolutions
27 |     def __init__(self, in_planes, planes, dropout_rate, stride=1, use_residual=True):
28 |         super(wide_bottleneck_straightned, self).__init__()
29 |         self.bn1 = nn.BatchNorm2d(in_planes)
30 |         self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=1, stride=1, padding=0, bias=True)
31 |         self.dropout1 = nn.Dropout(p=dropout_rate)
32 |         self.bn2 = nn.BatchNorm2d(planes)
33 |         stride3conv = (not use_residual) and stride or 1
34 |         self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride3conv, padding=1, bias=True)
35 |         self.dropout2 = nn.Dropout(p=dropout_rate)
36 |         self.bn3 = nn.BatchNorm2d(planes)
37 |         self.conv3 = nn.Conv2d(planes, planes, kernel_size=1, stride=1, padding=0, bias=True)
38 |         self.use_residual = use_residual
39 |         if not self.use_residual:
40 |             self.conv_dim = nn.Conv2d(in_planes, planes, kernel_size=1, stride=stride, padding=0, bias=True)
41 | 
42 |     def forward(self, input):
43 |         x = input
44 |         x = self.dropout1(self.conv1(F.relu(self.bn1(x))))
45 |         x = self.dropout2(self.conv2(F.relu(self.bn2(x))))
46 |         x = self.conv3(F.relu((self.bn3(x))))
47 |         if self.use_residual:
48 |             x += input
49 |         else:
50 |             x += self.conv_dim(input)
51 |         x = F.relu(x)
52 |         return x
53 | 
54 | class Wide_ResNet_imagenet(nn.Module):
55 |     def __init__(self, depth, widen_factor, dropout_rate, num_classes):
56 |         super(Wide_ResNet_imagenet, self).__init__()
57 | 
58 |         assert ((depth-5)%12 ==0), 'Wide-resnet depth should be 12n+5'
59 |         n = int((depth-5)/12)
60 |         k = widen_factor
61 | 
62 |         nStages = [16*k, 32*k, 64*k, 128*k, 256*k]
63 |         self.in_planes = nStages[0]
64 | 
65 |         self.conv1 = nn.Conv2d(3, nStages[0], 7, stride=2, padding=3, bias=True)
66 |         self.layer1 = self._wide_layer(wide_bottleneck_straightned, nStages[1], n, dropout_rate, stride=1)
67 |         self.layer2 = self._wide_layer(wide_bottleneck_straightned, nStages[2], n, dropout_rate, stride=2)
68 |         self.layer3 = self._wide_layer(wide_bottleneck_straightned, nStages[3], n, dropout_rate, stride=2)
69 |         self.layer4 = self._wide_layer(wide_bottleneck_straightned, nStages[4], n, dropout_rate, stride=2)
70 |         self.bn1 = nn.BatchNorm2d(nStages[4], momentum=0.9)
71 |         self.linear = nn.Linear(nStages[4], num_classes)
72 |         self.apply(conv_init)
73 | 
74 |     def _wide_layer(self, block, planes, num_blocks, dropout_rate, stride):
75 |         layers = []
76 | 
77 |         for idx_block in range(num_blocks):
78 |             use_residual = idx_block != 0
79 |             layers.append(block(self.in_planes, planes, dropout_rate, stride, use_residual))
80 |             self.in_planes = planes
81 | 
82 |         return nn.Sequential(*layers)
83 | 
84 |     def forward(self, x):
85 |         out = F.relu(self.conv1(x))
86 |         out = F.max_pool2d(out, kernel_size=3, stride=2, padding=1)
87 | 
88 |         out = self.layer1(out)
89 |         out = self.layer2(out)
90 |         out = self.layer3(out)
91 |         out = self.layer4(out)
92 |         out = F.relu(self.bn1(out))
93 |         out = F.avg_pool2d(out, kernel_size=7, stride=1, padding=0)
94 |         out = out.view(out.size(0), -1)
95 |         out = self.linear(out)
96 | 
97 |         return out


--------------------------------------------------------------------------------
/datasets/CIFAR10.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import torchvision
 3 | import torchvision.transforms as vision_transforms
 4 | import datasets
 5 | import torch
 6 | import datasets.torchvision_extension as vision_transforms_extension
 7 | import numpy as np
 8 | 
 9 | try:
10 |     import matplotlib.pyplot as plt
11 | except:
12 |     print('Cannot import matplotlib. CIFAR10.save_img method will crash if used')
13 | 
14 | meanstd = {
15 |    'mean':[0.5, 0.5, 0.5],
16 |    'std': [0.5, 0.5, 0.5],
17 | }
18 | 
19 | class CIFAR10(object):
20 |     def __init__(self, dataFolder=None, pin_memory=False):
21 | 
22 |         self.dataFolder = dataFolder if dataFolder is not None else os.path.join(datasets.BASE_DATA_FOLDER, 'CIFAR10')
23 |         self.pin_memory = pin_memory
24 |         self.meanStd = meanstd
25 | 
26 |         #download the dataset
27 |         torchvision.datasets.CIFAR10(self.dataFolder, download=True)
28 | 
29 |         #add some useful metadata
30 |         self.classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
31 | 
32 |     def getTrainLoader(self, batch_size, shuffle=True, num_workers=1, checkFileIntegrity=False):
33 | 
34 |         #first we define the training transform we will apply to the dataset
35 |         listOfTransoforms = []
36 |         listOfTransoforms.append(vision_transforms.RandomCrop((32, 32), padding=4))
37 |         listOfTransoforms.append(vision_transforms.RandomHorizontalFlip())
38 |         #
39 |         # listOfTransoforms.append(vision_transforms.ColorJitter(brightness=0.4,
40 |         #                                                                  contrast=0.4,
41 |         #                                                                  saturation=0.4))
42 |         listOfTransoforms.append(vision_transforms.ToTensor())
43 |         # TODO: TO make this work I need the pca values, i.e. eigenvalues and eigenvectors
44 |         # of the RGB colors, computed on a subset of the cifar10 dataset.
45 |         # try to implement this at some point
46 |         # listOfTransoforms.append(vision_transforms_extension.Lighting(alphastd=0.1,
47 |         #                                                               eigval=self.pca['eigval'],
48 |         #                                                               eigvec=self.pca['eigvec']))
49 |         listOfTransoforms.append(vision_transforms.Normalize(mean=self.meanStd['mean'],
50 |                                                              std=self.meanStd['std']))
51 | 
52 |         train_transform = vision_transforms.Compose(listOfTransoforms)
53 | 
54 |         #define the trainset
55 |         trainset = torchvision.datasets.CIFAR10(root=self.dataFolder, train=True,
56 |                                         download=checkFileIntegrity, transform=train_transform)
57 |         trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size, shuffle=shuffle,
58 |                                            num_workers=num_workers, pin_memory=self.pin_memory)
59 | 
60 |         return trainloader
61 | 
62 |     def getTestLoader(self, batch_size, shuffle=True, num_workers=1, checkFileIntegrity=False):
63 | 
64 |         listOfTransoforms = [vision_transforms.ToTensor()]
65 |         listOfTransoforms.append(vision_transforms.Normalize(mean=self.meanStd['mean'],
66 |                                                              std=self.meanStd['std']))
67 | 
68 |         test_transform = vision_transforms.Compose(listOfTransoforms)
69 | 
70 |         testset = torchvision.datasets.CIFAR10(root=self.dataFolder, train=False,
71 |                                                download=checkFileIntegrity, transform=test_transform)
72 |         testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size, shuffle=shuffle,
73 |                                                   num_workers=num_workers, pin_memory=self.pin_memory)
74 | 
75 |         return testloader
76 | 
77 |     @staticmethod
78 |     def save_img(img, path_file):
79 |         try:
80 |             img = img.data #in case a variable is passed
81 |         except:pass
82 |         mean_ = meanstd['mean']
83 |         std_ = meanstd['std']
84 |         meanDivStd = [-mean_[idx]/std_[idx] for idx in range(len(mean_))]
85 |         inv_std = [1/std_[idx] for idx in range(len(std_))]
86 |         img = vision_transforms.Normalize(meanDivStd, inv_std)(img)
87 |         npimg = img.cpu().numpy()
88 |         plt.imsave(path_file, np.transpose(npimg, (1, 2, 0)))


--------------------------------------------------------------------------------
/datasets/CIFAR100.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import torchvision
 3 | import torchvision.transforms as vision_transforms
 4 | import datasets
 5 | import torch
 6 | import datasets.torchvision_extension as vision_transforms_extension
 7 | 
 8 | 
 9 | meanstd = {
10 |    'mean': [0.5071, 0.4867, 0.4408],
11 |    'std': [0.2675, 0.2565, 0.2761],
12 | }
13 | 
14 | class CIFAR100(object):
15 |     def __init__(self, dataFolder=None, pin_memory=False):
16 | 
17 |         self.dataFolder = dataFolder if dataFolder is not None else os.path.join(datasets.BASE_DATA_FOLDER, 'CIFAR100')
18 |         self.pin_memory = pin_memory
19 |         self.meanStd = meanstd
20 | 
21 |         #download the dataset
22 |         torchvision.datasets.CIFAR100(self.dataFolder, download=True)
23 | 
24 |     def getTrainLoader(self, batch_size, shuffle=True, num_workers=1, checkFileIntegrity=False):
25 | 
26 |         #first we define the training transform we will apply to the dataset
27 |         listOfTransoforms = []
28 |         listOfTransoforms.append(vision_transforms.RandomCrop((32, 32), padding=4))
29 |         listOfTransoforms.append(vision_transforms.RandomHorizontalFlip())
30 |         # listOfTransoforms.append(vision_transforms.ColorJitter(brightness=0.4,
31 |         #                                                                  contrast=0.4,
32 |         #                                                                  saturation=0.4))
33 |         listOfTransoforms.append(vision_transforms.ToTensor())
34 |         listOfTransoforms.append(vision_transforms.Normalize(mean=self.meanStd['mean'],
35 |                                                              std=self.meanStd['std']))
36 | 
37 |         train_transform = vision_transforms.Compose(listOfTransoforms)
38 | 
39 |         #define the trainset
40 |         trainset = torchvision.datasets.CIFAR100(root=self.dataFolder, train=True,
41 |                                         download=checkFileIntegrity, transform=train_transform)
42 |         trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size, shuffle=shuffle,
43 |                                            num_workers=num_workers, pin_memory=self.pin_memory)
44 | 
45 |         return trainloader
46 | 
47 |     def getTestLoader(self, batch_size, shuffle=True, num_workers=1, checkFileIntegrity=False):
48 | 
49 |         listOfTransoforms = [vision_transforms.ToTensor()]
50 |         listOfTransoforms.append(vision_transforms.Normalize(mean=self.meanStd['mean'],
51 |                                                              std=self.meanStd['std']))
52 | 
53 |         test_transform = vision_transforms.Compose(listOfTransoforms)
54 | 
55 |         testset = torchvision.datasets.CIFAR100(root=self.dataFolder, train=False,
56 |                                                download=checkFileIntegrity, transform=test_transform)
57 |         testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size, shuffle=shuffle,
58 |                                                   num_workers=num_workers, pin_memory=self.pin_memory)
59 | 
60 |         return testloader


--------------------------------------------------------------------------------
/datasets/ImageNet12.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torchvision
  3 | import torchvision.transforms as vision_transforms
  4 | import torch.utils.data
  5 | import datasets.torchvision_extension as vision_transforms_extension
  6 | 
  7 | #For this dataset, automatic download has not been implemented. You have to provide path to the train and test folders
  8 | #formatted as described here: http://pytorch.org/docs/master/torchvision/datasets.html#imagefolder
  9 | #Essentially images in the same class must be in the same folder with the name of the class, like so:
 10 | # root/dog/xxx.png
 11 | # root/dog/xxy.png
 12 | # root/dog/xxz.png
 13 | #
 14 | # root/cat/123.png
 15 | # root/cat/nsdf3.png
 16 | # root/cat/asd932_.png
 17 | 
 18 | #To prepare the imagenet2012 dataset in such a way, follow the instructions at
 19 | # https://github.com/soumith/imagenet-multiGPU.torch
 20 | # It says:
 21 | # The training images for imagenet are already in appropriate subfolders (like n07579787, n07880968).
 22 | # You need to get the validation groundtruth and move the validation images into appropriate subfolders.
 23 | # To do this, download ILSVRC2012_img_train.tar ILSVRC2012_img_val.tar and use the following commands:
 24 | # extract train data
 25 | # mkdir train && mv ILSVRC2012_img_train.tar train/ && cd train
 26 | # tar -xvf ILSVRC2012_img_train.tar && rm -f ILSVRC2012_img_train.tar
 27 | # find . -name "*.tar" | while read NAME ; do mkdir -p "${NAME%.tar}"; tar -xvf "${NAME}" -C "${NAME%.tar}"; rm -f "${NAME}"; done
 28 | # # extract validation data
 29 | # cd ../ && mkdir val && mv ILSVRC2012_img_val.tar val/ && cd val && tar -xvf ILSVRC2012_img_val.tar
 30 | # wget -qO- https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh | bash
 31 | 
 32 | #Now you're all set!
 33 | 
 34 | 
 35 | #Computed from random subset of ImageNet training images
 36 | #This values were taken from the fb.resnet github project linked above
 37 | meanstd = {
 38 |    'mean':[0.485, 0.456, 0.406],
 39 |    'std': [0.229, 0.224, 0.225],
 40 | }
 41 | 
 42 | pca = {
 43 |    'eigval': torch.Tensor([0.2175, 0.0188, 0.0045]),
 44 |    'eigvec': torch.Tensor([
 45 |       [-0.5675,  0.7192,  0.4009],
 46 |       [-0.5808, -0.0045, -0.8140],
 47 |       [-0.5836, -0.6948,  0.4203],
 48 |    ])
 49 | }
 50 | 
 51 | class ImageNet12(object):
 52 |     def __init__(self, trainFolder, testFolder, pin_memory=False, size_images=224,
 53 |                  scaled_size=256, type_of_data_augmentation='basic', already_scaled=False):
 54 |         self.trainFolder = trainFolder
 55 |         self.testFolder = testFolder
 56 |         self.pin_memory = pin_memory
 57 |         self.meanstd = meanstd
 58 |         self.pca = pca
 59 |         #images will be rescaled to match this size
 60 |         if not isinstance(size_images, int):
 61 |             raise ValueError('size_images must be an int. It will be scaled to a square image')
 62 |         self.size_images = size_images
 63 |         self.scaled_size = scaled_size
 64 |         type_of_data_augmentation = type_of_data_augmentation.lower()
 65 |         if type_of_data_augmentation not in ('basic', 'extended'):
 66 |             raise ValueError('type_of_data_augmentation must be either basic or extended')
 67 |         self.type_of_data_augmentation = type_of_data_augmentation
 68 |         self.already_scaled = already_scaled # if you scaled all the images before training (see link above)
 69 |                                              # then set this to True
 70 | 
 71 |     def getTrainLoader(self, batch_size, shuffle=True, num_workers=4):
 72 | 
 73 |         # first we define the training transform we will apply to the dataset
 74 |         list_of_transforms = []
 75 |         list_of_transforms.append(vision_transforms.RandomSizedCrop(self.size_images))
 76 |         list_of_transforms.append(vision_transforms.RandomHorizontalFlip())
 77 | 
 78 |         if self.type_of_data_augmentation == 'extended':
 79 |             list_of_transforms.append(vision_transforms.ColorJitter(brightness=0.4,
 80 |                                                                              contrast=0.4,
 81 |                                                                              saturation=0.4))
 82 |         list_of_transforms.append(vision_transforms.ToTensor())
 83 |         if self.type_of_data_augmentation == 'extended':
 84 |             list_of_transforms.append(vision_transforms_extension.Lighting(alphastd=0.1,
 85 |                                                                           eigval=self.pca['eigval'],
 86 |                                                                           eigvec=self.pca['eigvec']))
 87 | 
 88 |         list_of_transforms.append(vision_transforms.Normalize(mean=self.meanstd['mean'],
 89 |                                                              std=self.meanstd['std']))
 90 |         train_transform = vision_transforms.Compose(list_of_transforms)
 91 |         train_set = torchvision.datasets.ImageFolder(self.trainFolder, train_transform)
 92 |         train_loader = torch.utils.data.DataLoader(train_set, batch_size=batch_size, shuffle=shuffle,
 93 |                                         num_workers=num_workers, pin_memory=self.pin_memory)
 94 | 
 95 |         return train_loader
 96 | 
 97 |     def getTestLoader(self, batch_size, shuffle=True, num_workers=4):
 98 |         # first we define the training transform we will apply to the dataset
 99 |         list_of_transforms = []
100 |         if self.already_scaled is False:
101 |             list_of_transforms.append(vision_transforms.Resize(self.scaled_size))
102 |         list_of_transforms.append(vision_transforms.CenterCrop(self.size_images))
103 |         list_of_transforms.append(vision_transforms.ToTensor())
104 |         list_of_transforms.append(vision_transforms.Normalize(mean=self.meanstd['mean'],
105 |                                                              std=self.meanstd['std']))
106 | 
107 |         test_transform = vision_transforms.Compose(list_of_transforms)
108 | 
109 |         test_set = torchvision.datasets.ImageFolder(self.testFolder, test_transform)
110 |         test_loader = torch.utils.data.DataLoader(test_set, batch_size=batch_size, shuffle=shuffle,
111 |                                         num_workers=num_workers, pin_memory=self.pin_memory)
112 | 
113 |         return test_loader


--------------------------------------------------------------------------------
/datasets/MNIST.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import torchvision
 3 | import torchvision.transforms as vision_transforms
 4 | import datasets
 5 | import torch
 6 | import datasets.torchvision_extension as vision_transforms_extension
 7 | 
 8 | 
 9 | meanstd = {
10 |    'mean':(0.1307,),
11 |    'std': (0.3081,),
12 | }
13 | 
14 | class MNIST(object):
15 |     def __init__(self, dataFolder=None, pin_memory=False):
16 | 
17 |         self.dataFolder = dataFolder if dataFolder is not None else os.path.join(datasets.BASE_DATA_FOLDER, 'MNIST')
18 |         self.pin_memory = pin_memory
19 |         self.meanStd = meanstd
20 | 
21 |         #download the dataset
22 |         torchvision.datasets.MNIST(self.dataFolder, download=True)
23 | 
24 |     def getTrainLoader(self, batch_size, shuffle=True, num_workers=1, checkFileIntegrity=False):
25 | 
26 |         #first we define the training transform we will apply to the dataset
27 |         listOfTransoforms = []
28 |         listOfTransoforms.append(vision_transforms.ToTensor())
29 |         listOfTransoforms.append(vision_transforms.Normalize(mean=self.meanStd['mean'],
30 |                                                              std=self.meanStd['std']))
31 |         train_transform = vision_transforms.Compose(listOfTransoforms)
32 | 
33 |         #define the trainset
34 |         trainset = torchvision.datasets.MNIST(root=self.dataFolder, train=True,
35 |                                         download=checkFileIntegrity, transform=train_transform)
36 |         trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size, shuffle=shuffle,
37 |                                            num_workers=num_workers, pin_memory=self.pin_memory)
38 | 
39 |         return trainloader
40 | 
41 |     def getTestLoader(self, batch_size, shuffle=True, num_workers=1, checkFileIntegrity=False):
42 | 
43 |         listOfTransoforms = [vision_transforms.ToTensor()]
44 |         listOfTransoforms.append(vision_transforms.Normalize(mean=self.meanStd['mean'],
45 |                                                              std=self.meanStd['std']))
46 | 
47 |         test_transform = vision_transforms.Compose(listOfTransoforms)
48 |         testset = torchvision.datasets.MNIST(root=self.dataFolder, train=False,
49 |                                                download=checkFileIntegrity, transform=test_transform)
50 |         testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size, shuffle=shuffle,
51 |                                                   num_workers=num_workers, pin_memory=self.pin_memory)
52 | 
53 |         return testloader


--------------------------------------------------------------------------------
/datasets/PennTreeBank.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import datasets
  3 | import torch
  4 | import pickle
  5 | import urllib
  6 | import shutil
  7 | import numpy as np
  8 | import helpers.functions as mhf
  9 | 
 10 | class PennTreeBank(object):
 11 |     def __init__(self, dataFolder=None):
 12 |         self.dataFolder = dataFolder if dataFolder is not None else os.path.join(datasets.BASE_DATA_FOLDER, 'PennTreeBank')
 13 |         self.dictionary = Dictionary()
 14 |         try:
 15 |             os.mkdir(self.dataFolder)
 16 |         except:pass
 17 | 
 18 |         self.trainFilePath = os.path.join(self.dataFolder, 'train.txt')
 19 |         self.testFilePath = os.path.join(self.dataFolder, 'test.txt')
 20 |         self.validFilePath = os.path.join(self.dataFolder, 'valid.txt')
 21 | 
 22 |         self.trainSetPath = os.path.join(self.dataFolder, 'trainSet')
 23 |         self.testSetPath = os.path.join(self.dataFolder, 'testSet')
 24 |         self.validSetPath = os.path.join(self.dataFolder, 'validSet')
 25 |         self.dictionaryPath = os.path.join(self.dataFolder, 'dictionary')
 26 | 
 27 |         checkProcessedFiles = self.checkProcessedFiles()
 28 | 
 29 |         if (not self.checkDataFiles()) and (not checkProcessedFiles):
 30 |             #download the files from pytorch example folder, but only if the data are not there and not even the
 31 |             #processed files
 32 |             baseUrl = 'https://raw.githubusercontent.com/pytorch/examples/master/word_language_model/data/penn/'
 33 |             trainUrl = baseUrl + 'train.txt'
 34 |             testUrl = baseUrl + 'test.txt'
 35 |             validUrl = baseUrl + 'valid.txt'
 36 | 
 37 |             for pathToSave, urlDownload in zip([self.trainFilePath, self.testFilePath, self.validFilePath],
 38 |                                                                                     [trainUrl, testUrl, validUrl]):
 39 |                 print('Downloading {} to {}'.format(urlDownload, pathToSave))
 40 |                 with urllib.request.urlopen(urlDownload) as response, open(pathToSave, 'wb') as out_file:
 41 |                     shutil.copyfileobj(response, out_file)
 42 | 
 43 |             print('Files downloaded')
 44 |         else:
 45 |             print('Files already downloaded')
 46 | 
 47 |         if not checkProcessedFiles:
 48 |             print('Processing files')
 49 | 
 50 |             self.trainSet = self.tokenize(self.trainFilePath)
 51 |             self.testSet = self.tokenize(self.testFilePath)
 52 |             self.validSet = self.tokenize(self.validFilePath)
 53 | 
 54 |             for pathToSave, dataset in zip([self.trainSetPath, self.testSetPath, self.validSetPath],
 55 |                                                                     [self.trainSet, self.testSet, self.validSet]):
 56 |                 with open(pathToSave, 'wb') as f:
 57 |                     torch.save(dataset, f)
 58 | 
 59 |             with open(self.dictionaryPath, 'wb') as f:
 60 |                 pickle.dump(self.dictionary, f)
 61 | 
 62 |             print('Files processed')
 63 |         else:
 64 |             with open(self.dictionaryPath, 'rb') as f:
 65 |                 self.dictionary = pickle.load(f)
 66 | 
 67 |             with open(self.trainSetPath, 'rb') as f:
 68 |                 self.trainSet = torch.load(f)
 69 | 
 70 |             with open(self.testSetPath, 'rb') as f:
 71 |                 self.testSet = torch.load(f)
 72 | 
 73 |             with open(self.validSetPath, 'rb') as f:
 74 |                 self.validSet = torch.load(f)
 75 | 
 76 |             print('Files already processed')
 77 | 
 78 |     def getTrainLoader(self, batch_size, length_sequence, force_same_size_batch=False):
 79 |         return self.getDataLoader('train', batch_size, length_sequence, shuffle=True,
 80 |                                   force_same_size_batch=force_same_size_batch)
 81 | 
 82 |     def getTestLoader(self, batch_size, length_sequence, force_same_size_batch=False):
 83 |         return self.getDataLoader('test', batch_size,length_sequence, shuffle=False,
 84 |                                   force_same_size_batch=force_same_size_batch)
 85 | 
 86 |     def getValidLoader(self, batch_size, length_sequence, force_same_size_batch=False):
 87 |         return self.getDataLoader('valid', batch_size, length_sequence, shuffle=False,
 88 |                                   force_same_size_batch=force_same_size_batch)
 89 | 
 90 |     def getDataLoader(self, type, batch_size, length_sequence, shuffle=True, force_same_size_batch=False):
 91 | 
 92 |         if type == 'train':
 93 |             data = self.trainSet
 94 |         elif type == 'test':
 95 |             data = self.testSet
 96 |         elif type == 'valid':
 97 |             data = self.validSet
 98 |         else:
 99 |             raise ValueError('Invalid type. It must be "train", "test" or "valid"')
100 | 
101 |         if data.ndimension() != 1:
102 |             raise ValueError('Data in input must be a vector')
103 | 
104 |         length_data = data.size(0)
105 |         total_amount_data = length_data - length_sequence - 1
106 | 
107 |         def loadIter():
108 | 
109 |             if shuffle:
110 |                 allIndices = list(range(length_data - length_sequence))
111 |                 np.random.shuffle(allIndices)
112 | 
113 |             countNumData = 0
114 |             while True:
115 |                 if countNumData + batch_size < total_amount_data:
116 |                     dimCurrBatch = batch_size
117 |                 else:
118 |                     if force_same_size_batch is True:
119 |                         break
120 |                     dimCurrBatch = total_amount_data - countNumData
121 | 
122 |                 currBatchData = torch.LongTensor(dimCurrBatch, length_sequence).zero_()
123 |                 if currBatchData.type() != data.type():
124 |                     currBatchData.type_as(data)
125 | 
126 |                 currBatchTarget = torch.LongTensor(dimCurrBatch, length_sequence).zero_()
127 |                 if currBatchTarget.type() != data.type():
128 |                     currBatchTarget.type_as(data)
129 | 
130 |                 for j in range(countNumData, countNumData + dimCurrBatch):
131 |                     idxToUse = allIndices[j] if shuffle is True else j
132 |                     currBatchData[j-countNumData, :] = data[idxToUse:idxToUse+length_sequence]
133 |                     currBatchTarget[j-countNumData, :] = data[idxToUse+1:idxToUse+length_sequence+1]
134 | 
135 |                 yield currBatchData, currBatchTarget
136 | 
137 |                 countNumData = countNumData + dimCurrBatch
138 |                 if countNumData >= total_amount_data:
139 |                     break
140 | 
141 |         dataLoader = mhf.DataLoader(loadIter, total_amount_data, batch_size, shuffled=shuffle,
142 |                                     length_sequence=length_sequence,
143 |                                     force_same_size_batch=force_same_size_batch)
144 |         return dataLoader
145 | 
146 |     def checkDataFiles(self):
147 |         return all(os.path.isfile(x) for x in [self.trainFilePath, self.testFilePath,
148 |                                                self.validFilePath])
149 | 
150 |     def checkProcessedFiles(self):
151 |         return all(os.path.isfile(x) for x in [self.trainSetPath, self.testSetPath,
152 |                                                self.validSetPath, self.dictionaryPath])
153 | 
154 |     def tokenize(self, path):
155 | 
156 |         ''' Tokenizes a text file '''
157 | 
158 |         assert os.path.exists(path)
159 | 
160 |         # Add words to the dictionary
161 |         with open(path, 'r') as f:
162 |             tokens = 0
163 |             for line in f:
164 |                 words = line.split() + ['<eos>']
165 |                 tokens += len(words)
166 |                 for word in words:
167 |                     self.dictionary.add_word(word)
168 | 
169 |         # Tokenize file content
170 |         with open(path, 'r') as f:
171 |             ids = torch.LongTensor(tokens)
172 |             token = 0
173 |             for line in f:
174 |                 words = line.split() + ['<eos>']
175 |                 for word in words:
176 |                     ids[token] = self.dictionary.word2idx[word]
177 |                     token += 1
178 | 
179 |         return ids
180 | 
181 | class Dictionary(object):
182 | 
183 |     'Helper class for PennTreeBank dataset '
184 | 
185 |     def __init__(self):
186 |         self.word2idx = {}
187 |         self.idx2word = []
188 | 
189 |     def add_word(self, word):
190 |         if word not in self.word2idx:
191 |             self.idx2word.append(word)
192 |             self.word2idx[word] = len(self.idx2word) - 1
193 |         return self.word2idx[word]
194 | 
195 |     def __len__(self):
196 |         return len(self.idx2word)
197 | 
198 | 


--------------------------------------------------------------------------------
/datasets/__init__.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import sys
 3 | _currDir = os.path.dirname(os.path.abspath(__file__))
 4 | BASE_DATA_FOLDER = os.path.join(_currDir, 'saved_datasets')
 5 | PATH_PERL_SCRIPTS_FOLDER = os.path.abspath(os.path.join(_currDir, '..', 'perl_scripts'))
 6 | 
 7 | try:
 8 |     os.mkdir(BASE_DATA_FOLDER)
 9 | except:pass
10 | 
11 | from .CIFAR10 import CIFAR10
12 | from .CIFAR100 import CIFAR100
13 | from .PennTreeBank import PennTreeBank
14 | from .ImageNet12 import ImageNet12
15 | from .translation_datasets import multi30k_DE_EN, onmt_integ_dataset, WMT13_DE_EN
16 | from .MNIST import MNIST
17 | from .customs_datasets import LoadingTensorsDataset
18 | 
19 | __all__ = ('CIFAR10', 'PennTreeBank', 'WMT13_DE_EN', 'ImageNet12', 'multi30k_DE_EN',
20 |            'onmt_integ_dataset', 'CIFAR100', 'MNIST', 'LoadingTensorsDataset')


--------------------------------------------------------------------------------
/datasets/customs_datasets.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import helpers.functions as mhf
 3 | import numpy as np
 4 | 
 5 | class LoadingTensorsDataset:
 6 | 
 7 |     'A simple loading dataset - loads the tensor that are passed in input'
 8 | 
 9 |     def __init__(self, path_train_data, path_test_data):
10 | 
11 |         self.trainData = torch.load(path_train_data)
12 |         self.testData = torch.load(path_test_data)
13 | 
14 |     def get_train_loader(self, batch_size):
15 |         return self.get_data_loader('train', batch_size, shuffle=True)
16 | 
17 |     def get_test_loader(self, batch_size):
18 |         return self.get_data_loader('test', batch_size, shuffle=False)
19 | 
20 |     def get_data_loader(self, type, batch_size, shuffle=False):
21 |         if batch_size <= 0:
22 |             raise ValueError('batch size must be bigger than zero')
23 | 
24 |         if type == 'train':
25 |             dataset, labels = self.trainData
26 |         elif type == 'test':
27 |             dataset, labels = self.testData
28 |         else: raise ValueError('Invalid value for type')
29 | 
30 |         total_amount_data = dataset.size(0)
31 | 
32 |         def loadIter():
33 | 
34 |             if shuffle:
35 |                 allIndices = list(range(total_amount_data)) #TODO: This is stupidily inefficient. Change when have time
36 |                 np.random.shuffle(allIndices)
37 | 
38 |             currIdx = 0
39 |             while True:
40 | 
41 |                 if currIdx + batch_size > total_amount_data:
42 |                     currData = dataset[currIdx:total_amount_data, :]
43 |                     currLabels = labels[currIdx:total_amount_data]
44 |                     yield currData, currLabels
45 |                     break
46 | 
47 |                 currData = dataset[currIdx:currIdx+batch_size, :]
48 |                 currLabels = labels[currIdx:currIdx+batch_size]
49 |                 yield currData, currLabels
50 | 
51 |                 currIdx += batch_size
52 | 
53 |         dataLoader = mhf.DataLoader(loadIter, total_amount_data, batch_size, shuffled=shuffle)
54 | 
55 |         return dataLoader


--------------------------------------------------------------------------------
/datasets/torchvision_extension.py:
--------------------------------------------------------------------------------
 1 | #In this file some more transformations (apart from the ones defined in torchvision.transform)
 2 | #are added. Particularly helpful to train imagenet, and in the style of the transforms
 3 | #used by fb.resnet https://github.com/facebook/fb.resnet.torch/blob/master/datasets/imagenet.lua
 4 | 
 5 | #This file is taken from a proposed pull request on the torchvision github project.
 6 | #At the moment this pull request has not been accepted yet, that is why I report it here.
 7 | #Link to the pull request: https://github.com/pytorch/vision/pull/27/files
 8 | 
 9 | class Lighting(object):
10 | 
11 |     """Lighting noise(AlexNet - style PCA - based noise)"""
12 | 
13 |     def __init__(self, alphastd, eigval, eigvec):
14 |         self.alphastd = alphastd
15 |         self.eigval = eigval
16 |         self.eigvec = eigvec
17 | 
18 |     def __call__(self, img):
19 |         # img is supposed go be a torch tensor
20 | 
21 |         if self.alphastd == 0:
22 |             return img
23 | 
24 |         alpha = img.new().resize_(3).normal_(0, self.alphastd)
25 |         rgb = self.eigvec.type_as(img).clone()\
26 |             .mul(alpha.view(1, 3).expand(3, 3))\
27 |             .mul(self.eigval.view(1, 3).expand(3, 3))\
28 |             .sum(1).squeeze()
29 | 
30 |         return img.add(rgb.view(3, 1, 1).expand_as(img))
31 | 


--------------------------------------------------------------------------------
/helpers/functions.py:
--------------------------------------------------------------------------------
  1 | import time
  2 | import smtplib
  3 | import torch
  4 | import pickle
  5 | import os
  6 | import tarfile
  7 | from email.mime.text import MIMEText
  8 | from collections import namedtuple
  9 | from collections import OrderedDict
 10 | import functools
 11 | import numpy as np
 12 | import torch.nn as nn
 13 | from torch.autograd import Variable
 14 | import math
 15 | import quantization.help_functions as qhf
 16 | 
 17 | 
 18 | USE_CUDA = torch.cuda.is_available()
 19 | 
 20 | 
 21 | def rsetattr(obj, attr, val):
 22 |     'recurrent setattr'
 23 | 
 24 |     pre, _, post = attr.rpartition('.')
 25 |     return setattr(rgetattr(obj, pre) if pre else obj, post, val)
 26 | 
 27 | 
 28 | sentinel = object()
 29 | 
 30 | 
 31 | def rgetattr(obj, attr, default=sentinel):
 32 |     'recurrent getattr'
 33 | 
 34 |     if default is sentinel:
 35 |         _getattr = getattr
 36 |     else:
 37 |         def _getattr(obj, name):
 38 |             return getattr(obj, name, default)
 39 |     return functools.reduce(_getattr, [obj] + attr.split('.'))
 40 | 
 41 | def read_email_info_from_file(filepath):
 42 | 
 43 |     '''
 44 |     read email username and password from a file. The format is simply
 45 | 
 46 |     email_account::: emailAccount@account.com
 47 |     password::: password_of_email_account
 48 | 
 49 |     '''
 50 |     email_account_flag = 'email_account::: '
 51 |     password_flag = 'password::: '
 52 | 
 53 |     with open(filepath, 'r') as p:
 54 |         for line in p.readlines():
 55 |             line = line.rstrip()
 56 |             if line.startswith(email_account_flag):
 57 |                 email_account = line[len(email_account_flag):]
 58 |                 if '@' not in email_account:
 59 |                     raise ValueError('File badly formatted; missing "@" in email account')
 60 |             elif line.startswith('password::: '):
 61 |                 password = line[len(password_flag):]
 62 |             else:
 63 |                 raise ValueError('File badly formatted; wrong line identificators. '
 64 |                 'Lines should start with "{}" and "{}"'.format(email_account_flag, password_flag))
 65 | 
 66 |     try:
 67 |         return email_account, password
 68 |     except:
 69 |         raise ValueError('File badly formatted, missing email or password information')
 70 | 
 71 | def send_email_yandex(username, password, targets, subject, message, verbose=True):
 72 |     try:
 73 |         smtp_ssl_host = 'smtp.yandex.com'
 74 |         smtp_ssl_port = 465
 75 |         email_suffix = '@yandex.com'
 76 |         if email_suffix in username:
 77 |             username = username
 78 |         else:
 79 |             if '@' in username:
 80 |                 raise ValueError('This does not appear to be a yandex email account')
 81 |             username = username + email_suffix
 82 | 
 83 |         if isinstance(targets, str):
 84 |             targets = [targets]
 85 | 
 86 |         msg = MIMEText(message)
 87 |         msg['Subject'] = subject
 88 |         msg['From'] = username
 89 |         msg['To'] = ', '.join(targets)
 90 | 
 91 |         server = smtplib.SMTP_SSL(smtp_ssl_host, smtp_ssl_port)
 92 |         server.login(username, password)
 93 |         server.sendmail(username, targets, msg.as_string())
 94 |         server.quit()
 95 |         errMsg = ''
 96 |         if verbose:
 97 |             print('email sent')
 98 |         return True, errMsg
 99 |     except Exception as e:
100 |         errMsg = 'Unable to send email: {}'.format(e)
101 |         if verbose:
102 |             print(errMsg)
103 |         return False, errMsg
104 | 
105 | 
106 | def asMinutesHours(s):
107 |     h = s // 3600
108 |     s -= h*3600
109 |     m = s // 60
110 |     s -= m * 60
111 |     if h == 0:
112 |         if m == 0:
113 |             return '%ds' % s
114 |         else:
115 |             return '%dm %ds' % (m, s)
116 |     else:
117 |         return '%dh %dm %ds' %(h,m,s)
118 | 
119 | 
120 | def timeSince(since):
121 |     now = time.time()
122 |     s = now - since
123 |     return '{}'.format(asMinutesHours(s))
124 | 
125 | 
126 | def getNumberOfParameters(model):
127 |     res = 0
128 |     for x in model.parameters():
129 |         res = res + x.data.cpu().numpy().size
130 | 
131 |     return res
132 | 
133 | def convertToNamedTuple(dictionary):
134 | 
135 |     '''
136 | 
137 |     :param dictionary: converts a dictionary to a named tuple (if possible), so that you can access with the .attribute
138 |                        syntax. This is necessary to use openNMT-py code base
139 |     :return: the namedtuple
140 |     '''
141 | 
142 |     return namedtuple('GenericDict', dictionary.keys())(**dictionary)
143 | 
144 | def convertToDictionary(named_tuple):
145 |     '''
146 |     :param namedTuple: converts a named tuple  into a dictionary
147 |     :return: the dictionary
148 |     '''
149 | 
150 |     return named_tuple._asdict()
151 | 
152 | def extractTarFile(tar_path, extract_path=None):
153 |     if extract_path is None:
154 |         extract_path = os.path.splitext(tar_path)[0]
155 |     try:
156 |         os.mkdir(extract_path)
157 |     except:pass
158 | 
159 |     tar = tarfile.open(tar_path, 'r')
160 |     for item in tar:
161 |         tar.extract(item, extract_path)
162 |         if item.name.find(".tgz") != -1 or item.name.find(".tar") != -1:
163 |             extractTarFile(item.name, "./" + item.name[:item.name.rfind('/')])
164 | 
165 | def countLinesFile(filepath):
166 |     with open(filepath, 'r') as f:
167 |         count = sum(1 for line in f)
168 |     return count
169 | 
170 | def remove_files_list(list_filepath):
171 |     infoMsg = ''
172 |     for x in list_filepath:
173 |         try:
174 |             os.remove(x)
175 |         except Exception as e:
176 |             infoMsg += repr(e) + '\n'
177 |     return infoMsg
178 | 
179 | def convert_state_dict_to_data_parallel(state_dict):
180 | 
181 |     '''
182 |     Converts a state dict that was saved without data parallel to one tha can be loaded
183 |     by a data parallel module
184 |     '''
185 |     new_state_dict = OrderedDict()
186 |     for k, v in state_dict.items():
187 |         name = 'module.' + k
188 |         new_state_dict[name] = v
189 |     return new_state_dict
190 | 
191 | def convert_state_dict_from_data_parallel(state_dict):
192 | 
193 |     '''
194 |     Converts a state dict that was saved with data parallel to one tha can be loaded
195 |     by a non-data parallel module
196 |     '''
197 | 
198 |     new_state_dict = OrderedDict()
199 |     for k, v in state_dict.items():
200 |         if k.startswith('module.'):
201 |             name = k[7:]  # remove `module.`
202 |         else:
203 |             raise ValueError('The state_dict passed was not saved by a data parallel instance')
204 |         new_state_dict[name] = v
205 |     return new_state_dict
206 | 
207 | def num_distinct_elements(numpy_array, tol=1e-8):
208 | 
209 |     '''returns the number of distinct elements, considering elements closer than tol as the same
210 |     numpy_array must be one dimensional!'''
211 | 
212 |     aux = numpy_array[~(np.triu(np.abs(numpy_array[:, None] - numpy_array) <= tol, 1)).any(0)]
213 |     #maybe this is better: np.unique(numpy_array.round(decimals=5)).size
214 |     return aux.size
215 | 
216 | def get_size_reduction(effective_number_bits, bucket_size=256, full_precision_bits=32):
217 | 
218 |     if bucket_size is None:
219 |         return full_precision_bits/effective_number_bits
220 | 
221 |     f = full_precision_bits
222 |     k = bucket_size
223 |     b = effective_number_bits
224 |     return (k*f)/(k*b+2*f)
225 | 
226 | def get_size_quantized_model(model, numBits, quantization_functions, bucket_size=256,
227 |                              type_quantization='uniform', quantizeFirstLastLayer=True):
228 | 
229 |     'Returns size in MB'
230 | 
231 |     if numBits is None:
232 |         return sum(p.numel() for p in model.parameters()) * 4 / 1000000
233 | 
234 | 
235 |     numTensors = sum(1 for _ in model.parameters())
236 |     if quantizeFirstLastLayer is True:
237 |         def get_quantized_params():
238 |             return model.parameters()
239 |         def get_unquantized_params():
240 |             return iter(())
241 |     else:
242 |         def get_quantized_params():
243 |             return  (p for idx, p in enumerate(model.parameters()) if idx not in (0, numTensors - 1))
244 |         def get_unquantized_params():
245 |             return (p for idx, p in enumerate(model.parameters()) if idx in (0, numTensors - 1))
246 | 
247 |     count_quantized_parameters = sum(p.numel() for p in get_quantized_params())
248 |     count_unquantized_parameters = sum(p.numel() for p in get_unquantized_params())
249 | 
250 |     #Now get the best huffmann bit length for the quantized parameters
251 |     actual_bit_huffmman = qhf.get_huffman_encoding_mean_bit_length(get_quantized_params(), quantization_functions,
252 |                                                                    type_quantization, s=2**numBits)
253 | 
254 |     #Now we can compute the size.
255 |     size_mb = 0
256 |     size_mb += count_unquantized_parameters*4 #32 bits / 8 = 4 byte per parameter
257 |     size_mb += actual_bit_huffmman*count_quantized_parameters/8 #For the quantized parameters we use the mean huffman length
258 |     if bucket_size is not None:
259 |         size_mb += count_quantized_parameters/bucket_size*8  #for every bucket size, we have to save 2 parameters.
260 |                                                              #so we multiply the number of buckets by 2*32/8 = 8
261 |     size_mb = size_mb / 1000000 #to bring it in MB
262 |     return size_mb
263 | 
264 | 
265 | def get_entropy(probabilities):
266 | 
267 |     natural_log = torch.log(1/probabilities)
268 |     natural_log[natural_log == float('inf')] = 0 #this puts all inf in the tensor to 0, so they don't matter for the entropy
269 |     log_2 = natural_log / np.log(2)
270 |     entropy = (probabilities * log_2).sum()
271 |     return entropy
272 | 
273 | def compute_entropy_layer(layer_out, normalize=False):
274 |     prob_out = torch.nn.functional.softmax(Variable(layer_out), dim=1).data
275 |     curr_entropy = [get_entropy(prob_out[idx_b, :]) for idx_b in range(prob_out.size(0))]
276 |     curr_entropy = torch.FloatTensor(curr_entropy).view(-1, 1)
277 |     if USE_CUDA: curr_entropy = curr_entropy.cuda()
278 |     if normalize:
279 |         # Normalize them with the max possible entropy value, so divide by log_2(n)
280 |         N = layer_out.size(1)
281 |         curr_entropy = curr_entropy / math.log2(N)
282 |     return curr_entropy
283 | 
284 | class DataLoader(object):
285 | 
286 |     """
287 |     Simple data loader that wraps the one-epoch generator
288 |     """
289 | 
290 |     #TODO: Shouldn't this inherit from some torch.utils class?
291 | 
292 |     #TODO: This probably belongs to the dataset package
293 | 
294 |     def __init__(self, dataLoaderIterator, length_dataset, batch_size, shuffled, **kwargs):
295 |         self.dataLoaderIterator = dataLoaderIterator
296 |         self.batch_size = batch_size
297 |         self.shuffled = shuffled
298 |         self.length_dataset = length_dataset
299 | 
300 |         for key, val in kwargs.items():
301 |             setattr(self, key, val)
302 | 
303 |     def __iter__(self):
304 |         return self.dataLoaderIterator()
305 | 
306 |     def __len__(self):
307 |         #TODO: shouldn't it rather be int(self.length_dataset/batch_size)?
308 |         return self.length_dataset
309 | 
310 | class EnsembleModel(nn.Module):
311 |     def __init__(self, modules):
312 |         super(EnsembleModel, self).__init__()
313 | 
314 |         self.modules_list = nn.ModuleList(modules)
315 | 
316 |     def forward(self, input):
317 | 
318 |         num_modules = len(self.modules_list)
319 |         output = self.modules_list[0](input)
320 |         for idx in range(1, num_modules):
321 |             output += self.modules_list[idx](input)
322 |         return output / num_modules
323 | 


--------------------------------------------------------------------------------
/imageNet_distilled.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import torch
  3 | import torchvision
  4 | import cnn_models.conv_forward_model as convForwModel
  5 | import cnn_models.help_fun as cnn_hf
  6 | import datasets
  7 | import model_manager
  8 | 
  9 | cuda_devices = os.environ['CUDA_VISIBLE_DEVICES'].split(',')
 10 | print('CUDA_VISIBLE_DEVICES: {} for a total of {} GPUs'.format(cuda_devices, len(cuda_devices)))
 11 | 
 12 | 
 13 | if 'NUM_BITS' in os.environ:
 14 |     NUM_BITS = int(os.environ['NUM_BITS'])
 15 | else:
 16 |     NUM_BITS = 4
 17 | 
 18 | print('Number of bits in training: {}'.format(NUM_BITS))
 19 | 
 20 | datasets.BASE_DATA_FOLDER = '...'
 21 | SAVED_MODELS_FOLDER = '...'
 22 | 
 23 | USE_CUDA = torch.cuda.is_available()
 24 | NUM_GPUS = len(cuda_devices)
 25 | 
 26 | try:
 27 |     os.mkdir(datasets.BASE_DATA_FOLDER)
 28 | except:pass
 29 | try:
 30 |     os.mkdir(SAVED_MODELS_FOLDER)
 31 | except:pass
 32 | 
 33 | epochsToTrainImageNet = 90
 34 | imageNet12modelsFolder = os.path.join(SAVED_MODELS_FOLDER, 'imagenet12_new')
 35 | imagenet_manager = model_manager.ModelManager('model_manager_imagenet_distilled_New{}bits.tst'.format(NUM_BITS),
 36 |                                               'model_manager', create_new_model_manager=False)
 37 | 
 38 | for x in imagenet_manager.list_models():
 39 |     if imagenet_manager.get_num_training_runs(x) >= 1:
 40 |         s = '{}; Last prediction acc: {}, Best prediction acc: {}'.format(x,
 41 |                                             imagenet_manager.load_metadata(x)[1]['predictionAccuracy'][-1],
 42 |                                             max(imagenet_manager.load_metadata(x)[1]['predictionAccuracy']))
 43 |         print(s)
 44 | 
 45 | try:
 46 |     os.mkdir(imageNet12modelsFolder)
 47 | except:pass
 48 | 
 49 | print('Batch size: {}'.format(batch_size))
 50 | 
 51 | if batch_size % NUM_GPUS != 0:
 52 |     raise ValueError('Batch size: {} must be a multiple of the number of gpus:{}'.format(batch_size, NUM_GPUS))
 53 | 
 54 | imageNet12 = datasets.ImageNet12('...',
 55 |                                  '...',
 56 |                                  type_of_data_augmentation='extended', already_scaled=False,
 57 |                                  pin_memory=True)
 58 | 
 59 | 
 60 | train_loader = imageNet12.getTrainLoader(batch_size, shuffle=True)
 61 | test_loader = imageNet12.getTestLoader(batch_size, shuffle=False)
 62 | 
 63 | # # Teacher model
 64 | # resnet152 = torchvision.models.resnet152(True)  #already trained
 65 | # if USE_CUDA:
 66 | #     resnet152 = resnet152.cuda()
 67 | # if NUM_GPUS > 1:
 68 | #     resnet152 = torch.nn.parallel.DataParallel(resnet152)
 69 | 
 70 | 
 71 | #normal resnet18 training
 72 | resnet18 = torchvision.models.resnet18(False) #not pre-trained, 11.7 million parameters
 73 | if USE_CUDA:
 74 |     resnet18 = resnet18.cuda()
 75 | if NUM_GPUS > 1:
 76 |     resnet18 = torch.nn.parallel.DataParallel(resnet18)
 77 | model_name = 'resnet18_normal_fullprecision'
 78 | model_path = os.path.join(imageNet12modelsFolder, model_name)
 79 | 
 80 | if not model_name in imagenet_manager.saved_models:
 81 |     imagenet_manager.add_new_model(model_name, model_path,
 82 |                                    arguments_creator_function={'loaded_from':'torchvision_models'})
 83 | 
 84 | imagenet_manager.train_model(resnet18, model_name=model_name,
 85 |                              train_function=convForwModel.train_model,
 86 |                              arguments_train_function={'epochs_to_train': epochsToTrainImageNet,
 87 |                                                        'learning_rate_style': 'imagenet',
 88 |                                                        'initial_learning_rate': 0.1,
 89 |                                                        'weight_decayL2':1e-4,
 90 |                                                        'start_epoch':0,
 91 |                                                        'print_every':30},
 92 |                              train_loader=train_loader, test_loader=test_loader)
 93 | 
 94 | #distilled
 95 | # resnet18_distilled = torchvision.models.resnet18(False) #not pre-trained, 11.7 million parameters
 96 | # if USE_CUDA:
 97 | #     resnet18_distilled = resnet18_distilled.cuda()
 98 | # if NUM_GPUS > 1:
 99 | #     resnet18_distilled = torch.nn.parallel.DataParallel(resnet18_distilled)
100 | # model_name = 'resnet18_distilled'
101 | # model_path = os.path.join(imageNet12modelsFolder, model_name)
102 | #
103 | # if not model_name in imagenet_manager.saved_models:
104 | #     imagenet_manager.add_new_model(model_name, model_path,
105 | #                                    arguments_creator_function={'loaded_from':'torchvision_models'})
106 | 
107 | # imagenet_manager.train_model(resnet18_distilled, model_name=model_name,
108 | #                              train_function=convForwModel.train_model,
109 | #                              arguments_train_function={'epochs_to_train': epochsToTrainImageNet,
110 | #                                                        'teacher_model': resnet34,
111 | #                                                        'learning_rate_style': 'imagenet',
112 | #                                                        'initial_learning_rate': initial_lr,
113 | #                                                        'weight_decayL2':1e-4,
114 | #                                                        'use_distillation_loss':True,
115 | #                                                        'start_epoch':start_epoch,
116 | #                                                        'print_every':100},
117 | #                              train_loader=train_loader, test_loader=test_loader)
118 | 
119 | #quantized distilled
120 | # bits_to_try = [NUM_BITS]
121 | #
122 | # for numBit in bits_to_try:
123 | #     resnet18_quant_distilled = torchvision.models.resnet18(False) #not pre-trained, 11.7 million parameters
124 | #     if USE_CUDA:
125 | #         resnet18_quant_distilled = resnet18_quant_distilled.cuda()
126 | #     if NUM_GPUS > 1:
127 | #         resnet18_quant_distilled = torch.nn.parallel.DataParallel(resnet18_quant_distilled)
128 | #     model_name = 'resnet18_quant_distilled_{}bits'.format(numBit)
129 | #     model_path = os.path.join(imageNet12modelsFolder, model_name)
130 | #
131 | #     if not model_name in imagenet_manager.saved_models:
132 | #         imagenet_manager.add_new_model(model_name, model_path,
133 | #                                        arguments_creator_function={'loaded_from':'torchvision_models'})
134 | #
135 | #     imagenet_manager.train_model(resnet18_quant_distilled, model_name=model_name,
136 | #                                  train_function=convForwModel.train_model,
137 | #                                  arguments_train_function={'epochs_to_train': epochsToTrainImageNet,
138 | #                                                            'learning_rate_style': 'imagenet',
139 | #                                                            'initial_learning_rate': 0.1,
140 | #                                                            'use_nesterov':True,
141 | #                                                            'initial_momentum':0.9,
142 | #                                                            'weight_decayL2':1e-4,
143 | #                                                            'start_epoch': 0,
144 | #                                                            'print_every':30,
145 | #                                                            'use_distillation_loss':True,
146 | #                                                            'teacher_model': resnet152,
147 | #                                                            'quantizeWeights':True,
148 | #                                                            'numBits':numBit,
149 | #                                                            'bucket_size':256,
150 | #                                                            'quantize_first_and_last_layer': False},
151 | #                                  train_loader=train_loader, test_loader=test_loader)
152 | 


--------------------------------------------------------------------------------
/onmt/Beam.py:
--------------------------------------------------------------------------------
  1 | from __future__ import division
  2 | import torch
  3 | import onmt
  4 | 
  5 | """
  6 |  Class for managing the internals of the beam search process.
  7 | 
  8 |  Takes care of beams, back pointers, and scores.
  9 | """
 10 | 
 11 | 
 12 | class Beam(object):
 13 |     def __init__(self, size, n_best=1, cuda=False, vocab=None,
 14 |                  global_scorer=None):
 15 | 
 16 |         self.size = size
 17 |         self.tt = torch.cuda if cuda else torch
 18 | 
 19 |         # The score for each translation on the beam.
 20 |         self.scores = self.tt.FloatTensor(size).zero_()
 21 |         self.allScores = []
 22 | 
 23 |         # The backpointers at each time-step.
 24 |         self.prevKs = []
 25 | 
 26 |         # The outputs at each time-step.
 27 |         self.nextYs = [self.tt.LongTensor(size)
 28 |                        .fill_(vocab.stoi[onmt.IO.PAD_WORD])]
 29 |         self.nextYs[0][0] = vocab.stoi[onmt.IO.BOS_WORD]
 30 |         self.vocab = vocab
 31 | 
 32 |         # Has EOS topped the beam yet.
 33 |         self._eos = self.vocab.stoi[onmt.IO.EOS_WORD]
 34 |         self.eosTop = False
 35 | 
 36 |         # The attentions (matrix) for each time.
 37 |         self.attn = []
 38 | 
 39 |         # Time and k pair for finished.
 40 |         self.finished = []
 41 |         self.n_best = n_best
 42 | 
 43 |         # Information for global scoring.
 44 |         self.globalScorer = global_scorer
 45 |         self.globalState = {}
 46 | 
 47 |     def getCurrentState(self):
 48 |         "Get the outputs for the current timestep."
 49 |         return self.nextYs[-1]
 50 | 
 51 |     def getCurrentOrigin(self):
 52 |         "Get the backpointers for the current timestep."
 53 |         return self.prevKs[-1]
 54 | 
 55 |     def advance(self, wordLk, attnOut):
 56 |         """
 57 |         Given prob over words for every last beam `wordLk` and attention
 58 |         `attnOut`: Compute and update the beam search.
 59 | 
 60 |         Parameters:
 61 | 
 62 |         * `wordLk`- probs of advancing from the last step (K x words)
 63 |         * `attnOut`- attention at the last step
 64 | 
 65 |         Returns: True if beam search is complete.
 66 |         """
 67 |         numWords = wordLk.size(1)
 68 | 
 69 |         # Sum the previous scores.
 70 |         if len(self.prevKs) > 0:
 71 |             beamLk = wordLk + self.scores.unsqueeze(1).expand_as(wordLk)
 72 | 
 73 |             # Don't let EOS have children.
 74 |             for i in range(self.nextYs[-1].size(0)):
 75 |                 if self.nextYs[-1][i] == self._eos:
 76 |                     beamLk[i] = -1e20
 77 |         else:
 78 |             beamLk = wordLk[0]
 79 |         flatBeamLk = beamLk.view(-1)
 80 |         bestScores, bestScoresId = flatBeamLk.topk(self.size, 0, True, True)
 81 | 
 82 |         self.allScores.append(self.scores)
 83 |         self.scores = bestScores
 84 | 
 85 |         # bestScoresId is flattened beam x word array, so calculate which
 86 |         # word and beam each score came from
 87 |         prevK = bestScoresId / numWords
 88 |         self.prevKs.append(prevK)
 89 |         self.nextYs.append((bestScoresId - prevK * numWords))
 90 |         self.attn.append(attnOut.index_select(0, prevK))
 91 | 
 92 |         if self.globalScorer is not None:
 93 |             self.globalScorer.updateGlobalState(self)
 94 | 
 95 |         for i in range(self.nextYs[-1].size(0)):
 96 |             if self.nextYs[-1][i] == self._eos:
 97 |                 s = self.scores[i]
 98 |                 if self.globalScorer is not None:
 99 |                     globalScores = self.globalScorer.score(self, self.scores)
100 |                     s = globalScores[i]
101 |                 self.finished.append((s, len(self.nextYs) - 1, i))
102 | 
103 |         # End condition is when top-of-beam is EOS and no global score.
104 |         if self.nextYs[-1][0] == self.vocab.stoi[onmt.IO.EOS_WORD]:
105 |             # self.allScores.append(self.scores)
106 |             self.eosTop = True
107 | 
108 |     def done(self):
109 |         return self.eosTop and len(self.finished) >= self.n_best
110 | 
111 |     def sortFinished(self, minimum=None):
112 |         if minimum is not None:
113 |             i = 0
114 |             # Add from beam until we have minimum outputs.
115 |             while len(self.finished) < minimum:
116 |                 s = self.scores[i]
117 |                 if self.globalScorer is not None:
118 |                     globalScores = self.globalScorer.score(self, self.scores)
119 |                     s = globalScores[i]
120 |                 self.finished.append((s, len(self.nextYs) - 1, i))
121 | 
122 |         self.finished.sort(key=lambda a: -a[0])
123 |         scores = [sc for sc, _, _ in self.finished]
124 |         ks = [(t, k) for _, t, k in self.finished]
125 |         return scores, ks
126 | 
127 |     def getHyp(self, timestep, k):
128 |         """
129 |         Walk back to construct the full hypothesis.
130 |         """
131 |         hyp, attn = [], []
132 |         for j in range(len(self.prevKs[:timestep]) - 1, -1, -1):
133 |             hyp.append(self.nextYs[j+1][k])
134 |             attn.append(self.attn[j][k])
135 |             k = self.prevKs[j][k]
136 |         return hyp[::-1], torch.stack(attn[::-1])
137 | 
138 | 
139 | class GNMTGlobalScorer(object):
140 |     """
141 |     Google NMT ranking score from Wu et al.
142 |     """
143 |     def __init__(self, alpha, beta):
144 |         self.alpha = alpha
145 |         self.beta = beta
146 | 
147 |     def score(self, beam, logprobs):
148 |         "Additional term add to log probability"
149 |         cov = beam.globalState["coverage"]
150 |         pen = self.beta * torch.min(cov, cov.clone().fill_(1.0)).log().sum(1)
151 |         l_term = (((5 + len(beam.nextYs)) ** self.alpha) /
152 |                   ((5 + 1) ** self.alpha))
153 |         return (logprobs / l_term) + pen
154 | 
155 |     def updateGlobalState(self, beam):
156 |         "Keeps the coverage vector as sum of attens"
157 |         if len(beam.prevKs) == 1:
158 |             beam.globalState["coverage"] = beam.attn[-1]
159 |         else:
160 |             beam.globalState["coverage"] = beam.globalState["coverage"] \
161 |                 .index_select(0, beam.prevKs[-1]).add(beam.attn[-1])
162 | 


--------------------------------------------------------------------------------
/onmt/Loss.py:
--------------------------------------------------------------------------------
  1 | """
  2 | This file handles the details of the loss function during training.
  3 | 
  4 | This includes: LossComputeBase and the standard NMTLossCompute, and
  5 |                sharded loss compute stuff.
  6 | """
  7 | from __future__ import division
  8 | import torch
  9 | import torch.nn as nn
 10 | from torch.autograd import Variable
 11 | 
 12 | import onmt
 13 | 
 14 | 
 15 | class LossComputeBase(nn.Module):
 16 |     """
 17 |     This is the loss criterion base class. Users can implement their own
 18 |     loss computation strategy by making subclass of this one.
 19 |     Users need to implement the compute_loss() method.
 20 |     We inherits from nn.Module to leverage the cuda behavior.
 21 |     """
 22 |     def __init__(self, generator, tgt_vocab):
 23 |         super(LossComputeBase, self).__init__()
 24 |         self.generator = generator
 25 |         self.tgt_vocab = tgt_vocab
 26 |         self.padding_idx = tgt_vocab.stoi[onmt.IO.PAD_WORD]
 27 | 
 28 |     def forward(self, batch, output, target, **kwargs):
 29 |         """
 30 |         Compute the loss. Subclass must define the compute_loss().
 31 |         Args:
 32 |             batch: the current batch.
 33 |             output: the predict output from the model.
 34 |             target: the validate target to compare output with.
 35 |             **kwargs: additional info for computing loss.
 36 |         """
 37 |         # Need to simplify this interface.
 38 |         return self.compute_loss(batch, output, target, **kwargs)
 39 | 
 40 |     def sharded_compute_loss(self, batch, output, attns,
 41 |                              cur_trunc, trunc_size, shard_size, teacher_outputs=None):
 42 |         """
 43 |         Compute the loss in shards for efficiency.
 44 |         """
 45 |         batch_stats = onmt.Statistics()
 46 |         range_ = (cur_trunc, cur_trunc + trunc_size)
 47 |         gen_state = make_gen_state(output, batch, attns, range_,
 48 |                                    self.copy_attn, teacher_outputs)
 49 | 
 50 |         for shard in shards(gen_state, shard_size):
 51 |             loss, stats = self.compute_loss(batch, **shard)
 52 |             loss.div(batch.batch_size).backward()
 53 |             batch_stats.update(stats)
 54 | 
 55 |         return batch_stats
 56 | 
 57 |     def stats(self, loss, scores, target):
 58 |         """
 59 |         Compute and return a Statistics object.
 60 | 
 61 |         Args:
 62 |             loss(Tensor): the loss computed by the loss criterion.
 63 |             scores(Tensor): a sequence of predict output with scores.
 64 |         """
 65 |         pred = scores.max(1)[1]
 66 |         non_padding = target.ne(self.padding_idx)
 67 |         num_correct = pred.eq(target) \
 68 |                           .masked_select(non_padding) \
 69 |                           .sum()
 70 |         return onmt.Statistics(loss[0], non_padding.sum(), num_correct)
 71 | 
 72 |     def bottle(self, v):
 73 |         return v.view(-1, v.size(2))
 74 | 
 75 |     def unbottle(self, v, batch_size):
 76 |         return v.view(-1, batch_size, v.size(1))
 77 | 
 78 | 
 79 | class NMTLossCompute(LossComputeBase):
 80 |     """
 81 |     Standard NMT Loss Computation.
 82 |     """
 83 |     def __init__(self, generator, tgt_vocab, use_distillation_loss=False, teacher_generator=None):
 84 | 
 85 |         if use_distillation_loss is True and teacher_generator is None:
 86 |             raise ValueError('to use distillation loss you have to pass the teacher generator')
 87 | 
 88 |         super(NMTLossCompute, self).__init__(generator, tgt_vocab)
 89 | 
 90 |         self.copy_attn = False
 91 |         weight = torch.ones(len(tgt_vocab))
 92 |         weight[self.padding_idx] = 0
 93 |         self.criterion = nn.NLLLoss(weight, size_average=False)
 94 |         self.use_distillation_loss = use_distillation_loss
 95 |         self.teacher_generator = teacher_generator
 96 | 
 97 |     def compute_loss(self, batch, output, target, **kwargs):
 98 |         """ See base class for args description. """
 99 |         scores = self.generator(self.bottle(output))
100 |         scores_data = scores.data.clone()
101 | 
102 |         target = target.view(-1)
103 |         target_data = target.data.clone()
104 | 
105 |         loss = self.criterion(scores, target)
106 |         if self.use_distillation_loss:
107 |             weight_teacher_loss = 0.7
108 |             teacher_outputs = kwargs['teacher_outputs']
109 |             scores_teacher = self.teacher_generator(self.bottle(teacher_outputs))
110 |             prob_teacher = scores_teacher.exp().detach()
111 |             # Here we use a temperature of 1..
112 |             loss_distilled = nn.functional.kl_div(scores, prob_teacher,
113 |                                                   weight=self.criterion.weight,
114 |                                                   size_average=self.criterion.size_average)
115 |             loss = (1-weight_teacher_loss)*loss + weight_teacher_loss*loss_distilled
116 | 
117 |         loss_data = loss.data.clone()
118 |         stats = self.stats(loss_data, scores_data, target_data)
119 | 
120 |         return loss, stats
121 | 
122 | 
123 | def make_gen_state(output, batch, attns, range_, copy_attn=None, teacher_outputs=None):
124 |     """
125 |     Create generator state for use in sharded loss computation.
126 |     This needs to match compute_loss exactly.
127 |     """
128 |     if copy_attn and getattr(batch, 'alignment', None) is None:
129 |         raise AssertionError("using -copy_attn you need to pass in "
130 |                              "-dynamic_dict during preprocess stage.")
131 | 
132 |     res_dict = {}
133 |     res_dict["output"] = output
134 |     if teacher_outputs is not None:
135 |         res_dict['teacher_outputs'] = teacher_outputs
136 |     res_dict["target"] = batch.tgt[range_[0] + 1: range_[1]]
137 |     res_dict["copy_attn"] = attns.get("copy")
138 |     res_dict["align"] = None if not copy_attn else batch.alignment[range_[0] + 1: range_[1]]
139 |     res_dict["coverage"] = attns.get("coverage")
140 | 
141 |     return res_dict
142 | 
143 | 
144 | def filter_gen_state(state):
145 |     for k, v in state.items():
146 |         if v is not None:
147 |             if isinstance(v, Variable) and v.requires_grad:
148 |                 v = Variable(v.data, requires_grad=True, volatile=False)
149 |             yield k, v
150 | 
151 | 
152 | def shards(state, shard_size, eval=False):
153 |     """
154 |     Args:
155 |         state: A dictionary which corresponds to the output of
156 |                make_gen_state(). The values for those keys are
157 |                Tensor-like or None.
158 |         shard_size: The maximum size of the shards yielded by the model.
159 |         eval: If True, only yield the state, nothing else.
160 |               Otherwise, yield shards.
161 | 
162 |     yields:
163 |         Each yielded shard is a dict.
164 |     side effect:
165 |         After the last shard, this function does back-propagation.
166 |     """
167 |     if eval:
168 |         yield state
169 |     else:
170 |         # non_none: the subdict of the state dictionary where the values
171 |         # are not None.
172 |         non_none = dict(filter_gen_state(state))
173 | 
174 |         # Now, the iteration:
175 |         # split_state is a dictionary of sequences of tensor-like but we
176 |         # want a sequence of dictionaries of tensors.
177 |         # First, unzip the dictionary into a sequence of keys and a
178 |         # sequence of tensor-like sequences.
179 |         keys, values = zip(*((k, torch.split(v, shard_size))
180 |                              for k, v in non_none.items()))
181 | 
182 |         # Now, yield a dictionary for each shard. The keys are always
183 |         # the same. values is a sequence of length #keys where each
184 |         # element is a sequence of length #shards. We want to iterate
185 |         # over the shards, not over the keys: therefore, the values need
186 |         # to be re-zipped by shard and then each shard can be paired
187 |         # with the keys.
188 |         for shard_tensors in zip(*values):
189 |             yield dict(zip(keys, shard_tensors))
190 | 
191 |         # Assumed backprop'd
192 |         variables = ((state[k], v.grad.data) for k, v in non_none.items()
193 |                      if isinstance(v, Variable) and v.grad is not None)
194 |         inputs, grads = zip(*variables)
195 |         torch.autograd.backward(inputs, grads)
196 | 


--------------------------------------------------------------------------------
/onmt/ModelConstructor.py:
--------------------------------------------------------------------------------
  1 | """
  2 | This file is for models creation, which consults options
  3 | and creates each encoder and decoder accordingly.
  4 | """
  5 | import torch.nn as nn
  6 | 
  7 | import onmt
  8 | import onmt.Models
  9 | import onmt.modules
 10 | from onmt.IO import ONMTDataset
 11 | from onmt.Models import NMTModel, MeanEncoder, RNNEncoder, \
 12 |                         StdRNNDecoder, InputFeedRNNDecoder
 13 | from onmt.modules import Embeddings, ImageEncoder, CopyGenerator, \
 14 |                          TransformerEncoder, TransformerDecoder, \
 15 |                          CNNEncoder, CNNDecoder
 16 | 
 17 | 
 18 | def make_embeddings(opt, word_dict, feature_dicts, for_encoder=True):
 19 |     """
 20 |     Make an Embeddings instance.
 21 |     Args:
 22 |         opt: the option in current environment.
 23 |         word_dict(Vocab): words dictionary.
 24 |         feature_dicts([Vocab], optional): a list of feature dictionary.
 25 |         for_encoder(bool): make Embeddings for encoder or decoder?
 26 |     """
 27 |     if for_encoder:
 28 |         embedding_dim = opt.src_word_vec_size
 29 |     else:
 30 |         embedding_dim = opt.tgt_word_vec_size
 31 | 
 32 |     word_padding_idx = word_dict.stoi[onmt.IO.PAD_WORD]
 33 |     num_word_embeddings = len(word_dict)
 34 | 
 35 |     feats_padding_idx = [feat_dict.stoi[onmt.IO.PAD_WORD]
 36 |                          for feat_dict in feature_dicts]
 37 |     num_feat_embeddings = [len(feat_dict) for feat_dict in
 38 |                            feature_dicts]
 39 | 
 40 |     return Embeddings(embedding_dim,
 41 |                       opt.position_encoding,
 42 |                       opt.feat_merge,
 43 |                       opt.feat_vec_exponent,
 44 |                       opt.feat_vec_size,
 45 |                       opt.dropout,
 46 |                       word_padding_idx,
 47 |                       feats_padding_idx,
 48 |                       num_word_embeddings,
 49 |                       num_feat_embeddings)
 50 | 
 51 | 
 52 | def make_encoder(opt, embeddings):
 53 |     """
 54 |     Various encoder dispatcher function.
 55 |     Args:
 56 |         opt: the option in current environment.
 57 |         embeddings (Embeddings): vocab embeddings for this encoder.
 58 |     """
 59 |     if opt.encoder_type == "transformer":
 60 |         return TransformerEncoder(opt.enc_layers, opt.rnn_size,
 61 |                                   opt.dropout, embeddings)
 62 |     elif opt.encoder_type == "cnn":
 63 |         return CNNEncoder(opt.enc_layers, opt.rnn_size,
 64 |                           opt.cnn_kernel_width,
 65 |                           opt.dropout, embeddings)
 66 |     elif opt.encoder_type == "mean":
 67 |         return MeanEncoder(opt.enc_layers, embeddings)
 68 |     else:
 69 |         # "rnn" or "brnn"
 70 |         return RNNEncoder(opt.rnn_type, opt.brnn, opt.dec_layers,
 71 |                           opt.rnn_size, opt.dropout, embeddings)
 72 | 
 73 | 
 74 | def make_decoder(opt, embeddings):
 75 |     """
 76 |     Various decoder dispatcher function.
 77 |     Args:
 78 |         opt: the option in current environment.
 79 |         embeddings (Embeddings): vocab embeddings for this decoder.
 80 |     """
 81 |     if opt.decoder_type == "transformer":
 82 |         return TransformerDecoder(opt.dec_layers, opt.rnn_size,
 83 |                                   opt.global_attention, opt.copy_attn,
 84 |                                   opt.dropout, embeddings)
 85 |     elif opt.decoder_type == "cnn":
 86 |         return CNNDecoder(opt.dec_layers, opt.rnn_size,
 87 |                           opt.global_attention, opt.copy_attn,
 88 |                           opt.cnn_kernel_width, opt.dropout,
 89 |                           embeddings)
 90 |     elif opt.input_feed:
 91 |         return InputFeedRNNDecoder(opt.rnn_type, opt.brnn,
 92 |                                    opt.dec_layers, opt.rnn_size,
 93 |                                    opt.global_attention,
 94 |                                    opt.coverage_attn,
 95 |                                    opt.context_gate,
 96 |                                    opt.copy_attn,
 97 |                                    opt.dropout,
 98 |                                    embeddings)
 99 |     else:
100 |         return StdRNNDecoder(opt.rnn_type, opt.brnn,
101 |                              opt.dec_layers, opt.rnn_size,
102 |                              opt.global_attention,
103 |                              opt.coverage_attn,
104 |                              opt.context_gate,
105 |                              opt.copy_attn,
106 |                              opt.dropout,
107 |                              embeddings)
108 | 
109 | 
110 | def make_base_model(model_opt, fields, gpu, checkpoint=None):
111 |     """
112 |     Args:
113 |         model_opt: the option loaded from checkpoint.
114 |         fields: `Field` objects for the model.
115 |         gpu(bool): whether to use gpu.
116 |         checkpoint: the model gnerated by train phase, or a resumed snapshot
117 |                     model from a stopped training.
118 |     Returns:
119 |         the NMTModel.
120 |     """
121 |     assert model_opt.model_type in ["text", "img"], \
122 |         ("Unsupported model type %s" % (model_opt.model_type))
123 | 
124 |     # Make encoder.
125 |     if model_opt.model_type == "text":
126 |         src_dict = fields["src"].vocab
127 |         feature_dicts = ONMTDataset.collect_feature_dicts(fields)
128 |         src_embeddings = make_embeddings(model_opt, src_dict,
129 |                                          feature_dicts)
130 |         encoder = make_encoder(model_opt, src_embeddings)
131 |     else:
132 |         encoder = ImageEncoder(model_opt.layers,
133 |                                model_opt.brnn,
134 |                                model_opt.rnn_size,
135 |                                model_opt.dropout)
136 | 
137 |     # Make decoder.
138 |     tgt_dict = fields["tgt"].vocab
139 |     # TODO: prepare for a future where tgt features are possible.
140 |     feature_dicts = []
141 |     tgt_embeddings = make_embeddings(model_opt, tgt_dict,
142 |                                      feature_dicts, for_encoder=False)
143 |     decoder = make_decoder(model_opt, tgt_embeddings)
144 | 
145 |     # Make NMTModel(= encoder + decoder).
146 |     model = NMTModel(encoder, decoder)
147 | 
148 |     # Make Generator.
149 |     if not model_opt.copy_attn:
150 |         generator = nn.Sequential(
151 |             nn.Linear(model_opt.rnn_size, len(fields["tgt"].vocab)),
152 |             nn.LogSoftmax())
153 |         if model_opt.share_decoder_embeddings:
154 |             generator[0].weight = decoder.embeddings.word_lut.weight
155 |     else:
156 |         generator = CopyGenerator(model_opt, fields["src"].vocab,
157 |                                   fields["tgt"].vocab)
158 | 
159 |     # Load the model states from checkpoint or initialize them.
160 |     if checkpoint is not None:
161 |         #print('Loading model parameters.')
162 |         model.load_state_dict(checkpoint['model'])
163 |         generator.load_state_dict(checkpoint['generator'])
164 |     else:
165 |         if model_opt.param_init != 0.0:
166 |             #print('Intializing parameters.')
167 |             for p in model.parameters():
168 |                 p.data.uniform_(-model_opt.param_init, model_opt.param_init)
169 |         model.encoder.embeddings.load_pretrained_vectors(
170 |                 model_opt.pre_word_vecs_enc, model_opt.fix_word_vecs_enc)
171 |         model.decoder.embeddings.load_pretrained_vectors(
172 |                 model_opt.pre_word_vecs_dec, model_opt.fix_word_vecs_dec)
173 | 
174 |     # add the generator to the module (does this register the parameter?)
175 |     model.generator = generator
176 | 
177 |     # Make the whole model leverage GPU if indicated to do so.
178 |     if gpu:
179 |         model.cuda()
180 |     else:
181 |         model.cpu()
182 | 
183 |     return model
184 | 


--------------------------------------------------------------------------------
/onmt/Optim.py:
--------------------------------------------------------------------------------
 1 | import torch.optim as optim
 2 | from torch.nn.utils import clip_grad_norm
 3 | 
 4 | 
 5 | class Optim(object):
 6 | 
 7 |     def set_parameters(self, params):
 8 |         self.params = [p for p in params if p.requires_grad]
 9 |         if self.method == 'sgd':
10 |             self.optimizer = optim.SGD(self.params, lr=self.lr)
11 |         elif self.method == 'adagrad':
12 |             self.optimizer = optim.Adagrad(self.params, lr=self.lr)
13 |         elif self.method == 'adadelta':
14 |             self.optimizer = optim.Adadelta(self.params, lr=self.lr)
15 |         elif self.method == 'adam':
16 |             self.optimizer = optim.Adam(self.params, lr=self.lr,
17 |                                         betas=self.betas, eps=1e-9)
18 |         else:
19 |             raise RuntimeError("Invalid optim method: " + self.method)
20 | 
21 |     def __init__(self, method, lr, max_grad_norm,
22 |                  lr_decay=1, start_decay_at=None,
23 |                  beta1=0.9, beta2=0.98,
24 |                  opt=None):
25 |         self.last_ppl = None
26 |         self.lr = lr
27 |         self.max_grad_norm = max_grad_norm
28 |         self.method = method
29 |         self.lr_decay = lr_decay
30 |         self.start_decay_at = start_decay_at
31 |         self.start_decay = False
32 |         self._step = 0
33 |         self.betas = [beta1, beta2]
34 |         self.opt = opt
35 | 
36 |     def _setRate(self, lr):
37 |         self.lr = lr
38 |         self.optimizer.param_groups[0]['lr'] = self.lr
39 | 
40 |     def step(self):
41 |         "Compute gradients norm."
42 |         self._step += 1
43 | 
44 |         # Decay method used in tensor2tensor.
45 |         #Changed here, since NamedTuple does not have a __dict__ attribute
46 |         if self.opt.decay_method == "noam":
47 |             self._setRate(
48 |                 self.opt.learning_rate *
49 |                 (self.opt.rnn_size ** (-0.5) *
50 |                  min(self._step ** (-0.5),
51 |                      self._step * self.opt.warmup_steps**(-1.5))))
52 | 
53 |         if self.max_grad_norm:
54 |             clip_grad_norm(self.params, self.max_grad_norm)
55 |         self.optimizer.step()
56 | 
57 |     def updateLearningRate(self, ppl, epoch):
58 |         """
59 |         Decay learning rate if val perf does not improve
60 |         or we hit the start_decay_at limit.
61 |         """
62 | 
63 |         if self.start_decay_at is not None and epoch >= self.start_decay_at:
64 |             self.start_decay = True
65 |         if self.last_ppl is not None and ppl > self.last_ppl:
66 |             self.start_decay = True
67 | 
68 |         if self.start_decay:
69 |             self.lr = self.lr * self.lr_decay
70 |             print("Decaying learning rate to %g" % self.lr)
71 | 
72 |         self.last_ppl = ppl
73 |         self.optimizer.param_groups[0]['lr'] = self.lr
74 | 


--------------------------------------------------------------------------------
/onmt/Trainer.py:
--------------------------------------------------------------------------------
  1 | from __future__ import division
  2 | """
  3 | This is the loadable seq2seq trainer library that is
  4 | in charge of training details, loss compute, and statistics.
  5 | See train.py for a use case of this library.
  6 | 
  7 | Note!!! To make this a general library, we implement *only*
  8 | mechanism things here(i.e. what to do), and leave the strategy
  9 | things to users(i.e. how to do it). Also see train.py(one of the
 10 | users of this library) for the strategy things we do.
 11 | """
 12 | import time
 13 | import sys
 14 | import math
 15 | import torch
 16 | import torch.nn as nn
 17 | 
 18 | import onmt
 19 | import onmt.modules
 20 | 
 21 | 
 22 | class Statistics(object):
 23 |     """
 24 |     Train/validate loss statistics.
 25 |     """
 26 |     def __init__(self, loss=0, n_words=0, n_correct=0):
 27 |         self.loss = loss
 28 |         self.n_words = n_words
 29 |         self.n_correct = n_correct
 30 |         self.n_src_words = 0
 31 |         self.start_time = time.time()
 32 | 
 33 |     def update(self, stat):
 34 |         self.loss += stat.loss
 35 |         self.n_words += stat.n_words
 36 |         self.n_correct += stat.n_correct
 37 | 
 38 |     def accuracy(self):
 39 |         return 100 * (self.n_correct / self.n_words)
 40 | 
 41 |     def ppl(self):
 42 |         return math.exp(min(self.loss / self.n_words, 100))
 43 | 
 44 |     def elapsed_time(self):
 45 |         return time.time() - self.start_time
 46 | 
 47 |     def output(self, epoch, batch, n_batches, start):
 48 |         t = self.elapsed_time()
 49 |         print(("Epoch %2d, %5d/%5d; acc: %6.2f; ppl: %6.2f; " +
 50 |                "%3.0f src tok/s; %3.0f tgt tok/s; %6.0f s elapsed") %
 51 |               (epoch, batch,  n_batches,
 52 |                self.accuracy(),
 53 |                self.ppl(),
 54 |                self.n_src_words / (t + 1e-5),
 55 |                self.n_words / (t + 1e-5),
 56 |                time.time() - start))
 57 |         sys.stdout.flush()
 58 | 
 59 |     def log(self, prefix, experiment, optim):
 60 |         t = self.elapsed_time()
 61 |         experiment.add_scalar_value(prefix + "_ppl", self.ppl())
 62 |         experiment.add_scalar_value(prefix + "_accuracy", self.accuracy())
 63 |         experiment.add_scalar_value(prefix + "_tgtper",  self.n_words / t)
 64 |         experiment.add_scalar_value(prefix + "_lr", optim.lr)
 65 | 
 66 | 
 67 | class Trainer(object):
 68 |     def __init__(self, model, train_iter, valid_iter,
 69 |                  train_loss, valid_loss, optim,
 70 |                  trunc_size, shard_size):
 71 |         """
 72 |         Args:
 73 |             model: the seq2seq model.
 74 |             train_iter: the train data iterator.
 75 |             valid_iter: the validate data iterator.
 76 |             train_loss: the train side LossCompute object for computing loss.
 77 |             valid_loss: the valid side LossCompute object for computing loss.
 78 |             optim: the optimizer responsible for lr update.
 79 |             trunc_size: a batch is divided by several truncs of this size.
 80 |             shard_size: compute loss in shards of this size for efficiency.
 81 |         """
 82 |         # Basic attributes.
 83 |         self.model = model
 84 |         self.train_iter = train_iter
 85 |         self.valid_iter = valid_iter
 86 |         self.train_loss = train_loss
 87 |         self.valid_loss = valid_loss
 88 |         self.optim = optim
 89 |         self.trunc_size = trunc_size
 90 |         self.shard_size = shard_size
 91 | 
 92 |         # Set model in training mode.
 93 |         self.model.train()
 94 | 
 95 |     def train(self, epoch, report_func=None):
 96 |         """ Called for each epoch to train. """
 97 |         total_stats = Statistics()
 98 |         report_stats = Statistics()
 99 | 
100 |         for i, batch in enumerate(self.train_iter):
101 |             target_size = batch.tgt.size(0)
102 |             # Truncated BPTT
103 |             trunc_size = self.trunc_size if self.trunc_size else target_size
104 | 
105 |             dec_state = None
106 |             _, src_lengths = batch.src
107 | 
108 |             src = onmt.IO.make_features(batch, 'src')
109 |             tgt_outer = onmt.IO.make_features(batch, 'tgt')
110 |             report_stats.n_src_words += src_lengths.sum()
111 | 
112 |             for j in range(0, target_size-1, trunc_size):
113 |                 # 1. Create truncated target.
114 |                 tgt = tgt_outer[j: j + trunc_size]
115 | 
116 |                 # 2. F-prop all but generator.
117 |                 self.model.zero_grad()
118 |                 outputs, attns, dec_state = \
119 |                     self.model(src, tgt, src_lengths, dec_state)
120 | 
121 |                 # 3. Compute loss in shards for memory efficiency.
122 |                 batch_stats = self.train_loss.sharded_compute_loss(
123 |                         batch, outputs, attns, j,
124 |                         trunc_size, self.shard_size)
125 | 
126 |                 # 4. Update the parameters and statistics.
127 |                 self.optim.step()
128 |                 total_stats.update(batch_stats)
129 |                 report_stats.update(batch_stats)
130 | 
131 |                 # If truncated, don't backprop fully.
132 |                 if dec_state is not None:
133 |                     dec_state.detach()
134 | 
135 |             if report_func is not None:
136 |                 report_func(epoch, i, len(self.train_iter),
137 |                             total_stats.start_time, self.optim.lr,
138 |                             report_stats)
139 |                 report_stats = Statistics()
140 | 
141 |         return total_stats
142 | 
143 |     def validate(self):
144 |         """ Called for each epoch to validate. """
145 |         # Set model in validating mode.
146 |         self.model.eval()
147 | 
148 |         stats = Statistics()
149 | 
150 |         for batch in self.valid_iter:
151 |             _, src_lengths = batch.src
152 |             src = onmt.IO.make_features(batch, 'src')
153 |             tgt = onmt.IO.make_features(batch, 'tgt')
154 | 
155 |             # F-prop through the model.
156 |             outputs, attns, _ = self.model(src, tgt, src_lengths)
157 | 
158 |             # Compute loss.
159 |             gen_state = onmt.Loss.make_gen_state(
160 |                 outputs, batch, attns, (0, batch.tgt.size(0)))
161 |             _, batch_stats = self.valid_loss(batch, **gen_state)
162 | 
163 |             # Update statistics.
164 |             stats.update(batch_stats)
165 | 
166 |         # Set model back to training mode.
167 |         self.model.train()
168 | 
169 |         return stats
170 | 
171 |     def epoch_step(self, ppl, epoch):
172 |         """ Called for each epoch to update learning rate. """
173 |         return self.optim.updateLearningRate(ppl, epoch)
174 | 
175 |     def drop_checkpoint(self, opt, epoch, fields, valid_stats):
176 |         """ Called conditionally each epoch to save a snapshot. """
177 |         real_model = (self.model.module
178 |                       if isinstance(self.model, nn.DataParallel)
179 |                       else self.model)
180 |         real_generator = (real_model.generator.module
181 |                           if isinstance(real_model.generator, nn.DataParallel)
182 |                           else real_model.generator)
183 | 
184 |         model_state_dict = real_model.state_dict()
185 |         model_state_dict = {k: v for k, v in model_state_dict.items()
186 |                             if 'generator' not in k}
187 |         generator_state_dict = real_generator.state_dict()
188 |         checkpoint = {
189 |             'model': model_state_dict,
190 |             'generator': generator_state_dict,
191 |             'vocab': onmt.IO.ONMTDataset.save_vocab(fields),
192 |             'opt': opt,
193 |             'epoch': epoch,
194 |             'optim': self.optim
195 |         }
196 |         torch.save(checkpoint,
197 |                    '%s_acc_%.2f_ppl_%.2f_e%d.pt'
198 |                    % (opt.save_model, valid_stats.accuracy(),
199 |                       valid_stats.ppl(), epoch))
200 | 


--------------------------------------------------------------------------------
/onmt/Translator.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | from torch.autograd import Variable
  3 | 
  4 | import onmt
  5 | import onmt.Models
  6 | import onmt.ModelConstructor
  7 | import onmt.modules
  8 | import onmt.IO
  9 | from onmt.Utils import use_gpu
 10 | 
 11 | 
 12 | class Translator(object):
 13 |     def __init__(self, opt, dummy_opt={}):
 14 |         # Add in default model arguments, possibly added since training.
 15 |         self.opt = opt
 16 |         checkpoint = torch.load(opt.model,
 17 |                                 map_location=lambda storage, loc: storage)
 18 |         self.fields = onmt.IO.ONMTDataset.load_fields(checkpoint['vocab'])
 19 | 
 20 |         model_opt = checkpoint['opt']
 21 |         for arg in dummy_opt:
 22 |             if arg not in model_opt:
 23 |                 model_opt.__dict__[arg] = dummy_opt[arg]
 24 | 
 25 |         self._type = model_opt.encoder_type
 26 |         self.copy_attn = model_opt.copy_attn
 27 | 
 28 |         self.model = onmt.ModelConstructor.make_base_model(
 29 |                             model_opt, self.fields, use_gpu(opt), checkpoint)
 30 |         self.model.eval()
 31 |         self.model.generator.eval()
 32 | 
 33 |         # for debugging
 34 |         self.beam_accum = None
 35 | 
 36 |     def initBeamAccum(self):
 37 |         self.beam_accum = {
 38 |             "predicted_ids": [],
 39 |             "beam_parent_ids": [],
 40 |             "scores": [],
 41 |             "log_probs": []}
 42 | 
 43 |     def buildTargetTokens(self, pred, src, attn, copy_vocab):
 44 |         vocab = self.fields["tgt"].vocab
 45 |         tokens = []
 46 |         for tok in pred:
 47 |             if tok < len(vocab):
 48 |                 tokens.append(vocab.itos[tok])
 49 |             else:
 50 |                 tokens.append(copy_vocab.itos[tok - len(vocab)])
 51 |             if tokens[-1] == onmt.IO.EOS_WORD:
 52 |                 tokens = tokens[:-1]
 53 |                 break
 54 | 
 55 |         if self.opt.replace_unk and attn is not None:
 56 |             for i in range(len(tokens)):
 57 |                 if tokens[i] == vocab.itos[onmt.IO.UNK]:
 58 |                     _, maxIndex = attn[i].max(0)
 59 |                     tokens[i] = self.fields["src"].vocab.itos[src[maxIndex[0]]]
 60 |         return tokens
 61 | 
 62 |     def _runTarget(self, batch, data):
 63 | 
 64 |         _, src_lengths = batch.src
 65 |         src = onmt.IO.make_features(batch, 'src')
 66 |         tgt_in = onmt.IO.make_features(batch, 'tgt')[:-1]
 67 | 
 68 |         #  (1) run the encoder on the src
 69 |         encStates, context = self.model.encoder(src, src_lengths)
 70 |         decStates = self.model.decoder.init_decoder_state(
 71 |                                         src, context, encStates)
 72 | 
 73 |         #  (2) if a target is specified, compute the 'goldScore'
 74 |         #  (i.e. log likelihood) of the target under the model
 75 |         tt = torch.cuda if self.opt.cuda else torch
 76 |         goldScores = tt.FloatTensor(batch.batch_size).fill_(0)
 77 |         decOut, decStates, attn = self.model.decoder(
 78 |             tgt_in, context, decStates)
 79 | 
 80 |         tgt_pad = self.fields["tgt"].vocab.stoi[onmt.IO.PAD_WORD]
 81 |         for dec, tgt in zip(decOut, batch.tgt[1:].data):
 82 |             # Log prob of each word.
 83 |             out = self.model.generator.forward(dec)
 84 |             tgt = tgt.unsqueeze(1)
 85 |             scores = out.data.gather(1, tgt)
 86 |             scores.masked_fill_(tgt.eq(tgt_pad), 0)
 87 |             goldScores += scores
 88 |         return goldScores
 89 | 
 90 |     def translateBatch(self, batch, dataset):
 91 |         beam_size = self.opt.beam_size
 92 |         batch_size = batch.batch_size
 93 | 
 94 |         # (1) Run the encoder on the src.
 95 |         _, src_lengths = batch.src
 96 |         src = onmt.IO.make_features(batch, 'src')
 97 |         encStates, context = self.model.encoder(src, src_lengths)
 98 |         decStates = self.model.decoder.init_decoder_state(
 99 |                                         src, context, encStates)
100 | 
101 |         #  (1b) Initialize for the decoder.
102 |         def var(a): return Variable(a, volatile=True)
103 | 
104 |         def rvar(a): return var(a.repeat(1, beam_size, 1))
105 | 
106 |         # Repeat everything beam_size times.
107 |         context = rvar(context.data)
108 |         src = rvar(src.data)
109 |         srcMap = rvar(batch.src_map.data)
110 |         decStates.repeat_beam_size_times(beam_size)
111 |         scorer = None
112 |         # scorer=onmt.GNMTGlobalScorer(0.3, 0.4)
113 |         beam = [onmt.Beam(beam_size, n_best=self.opt.n_best,
114 |                           cuda=self.opt.cuda,
115 |                           vocab=self.fields["tgt"].vocab,
116 |                           global_scorer=scorer)
117 |                 for __ in range(batch_size)]
118 | 
119 |         # (2) run the decoder to generate sentences, using beam search.
120 | 
121 |         def bottle(m):
122 |             return m.view(batch_size * beam_size, -1)
123 | 
124 |         def unbottle(m):
125 |             return m.view(beam_size, batch_size, -1)
126 | 
127 |         for i in range(self.opt.max_sent_length):
128 | 
129 |             if all((b.done() for b in beam)):
130 |                 break
131 | 
132 |             # Construct batch x beam_size nxt words.
133 |             # Get all the pending current beam words and arrange for forward.
134 |             inp = var(torch.stack([b.getCurrentState() for b in beam])
135 |                       .t().contiguous().view(1, -1))
136 | 
137 |             # Turn any copied words to UNKs
138 |             # 0 is unk
139 |             if self.copy_attn:
140 |                 inp = inp.masked_fill(
141 |                     inp.gt(len(self.fields["tgt"].vocab) - 1), 0)
142 | 
143 |             # Temporary kludge solution to handle changed dim expectation
144 |             # in the decoder
145 |             inp = inp.unsqueeze(2)
146 | 
147 |             # Run one step.
148 |             decOut, decStates, attn = \
149 |                 self.model.decoder(inp, context, decStates)
150 |             decOut = decOut.squeeze(0)
151 |             # decOut: beam x rnn_size
152 | 
153 |             # (b) Compute a vector of batch*beam word scores.
154 |             if not self.copy_attn:
155 |                 out = self.model.generator.forward(decOut).data
156 |                 out = unbottle(out)
157 |                 # beam x tgt_vocab
158 |             else:
159 |                 out = self.model.generator.forward(decOut,
160 |                                                    attn["copy"].squeeze(0),
161 |                                                    srcMap)
162 |                 # beam x (tgt_vocab + extra_vocab)
163 |                 out = dataset.collapse_copy_scores(
164 |                     unbottle(out.data),
165 |                     batch, self.fields["tgt"].vocab)
166 |                 # beam x tgt_vocab
167 |                 out = out.log()
168 | 
169 |             # (c) Advance each beam.
170 |             for j, b in enumerate(beam):
171 |                 b.advance(out[:, j],  unbottle(attn["std"]).data[:, j])
172 |                 decStates.beam_update(j, b.getCurrentOrigin(), beam_size)
173 | 
174 |         if "tgt" in batch.__dict__:
175 |             allGold = self._runTarget(batch, dataset)
176 |         else:
177 |             allGold = [0] * batch_size
178 | 
179 |         # (3) Package everything up.
180 |         allHyps, allScores, allAttn = [], [], []
181 |         for b in beam:
182 |             n_best = self.opt.n_best
183 |             scores, ks = b.sortFinished(minimum=n_best)
184 |             hyps, attn = [], []
185 |             for i, (times, k) in enumerate(ks[:n_best]):
186 |                 hyp, att = b.getHyp(times, k)
187 |                 hyps.append(hyp)
188 |                 attn.append(att)
189 |             allHyps.append(hyps)
190 |             allScores.append(scores)
191 |             allAttn.append(attn)
192 | 
193 |         return allHyps, allScores, allAttn, allGold
194 | 
195 |     def translate(self, batch, data):
196 |         #  (1) convert words to indexes
197 |         batch_size = batch.batch_size
198 | 
199 |         #  (2) translate
200 |         pred, predScore, attn, goldScore = self.translateBatch(batch, data)
201 |         assert(len(goldScore) == len(pred))
202 |         pred, predScore, attn, goldScore, i = list(zip(
203 |             *sorted(zip(pred, predScore, attn, goldScore,
204 |                         batch.indices.data),
205 |                     key=lambda x: x[-1])))
206 |         inds, perm = torch.sort(batch.indices.data)
207 | 
208 |         #  (3) convert indexes to words
209 |         predBatch, goldBatch = [], []
210 |         src = batch.src[0].data.index_select(1, perm)
211 |         if self.opt.tgt:
212 |             tgt = batch.tgt.data.index_select(1, perm)
213 |         for b in range(batch_size):
214 |             src_vocab = data.src_vocabs[inds[b]]
215 |             predBatch.append(
216 |                 [self.buildTargetTokens(pred[b][n], src[:, b],
217 |                                         attn[b][n], src_vocab)
218 |                  for n in range(self.opt.n_best)])
219 |             if self.opt.tgt:
220 |                 goldBatch.append(
221 |                     self.buildTargetTokens(tgt[1:, b], src[:, b],
222 |                                            None, None))
223 |         return predBatch, goldBatch, predScore, goldScore, attn, src
224 | 


--------------------------------------------------------------------------------
/onmt/Utils.py:
--------------------------------------------------------------------------------
 1 | def aeq(*args):
 2 |     """
 3 |     Assert all arguments have the same value
 4 |     """
 5 |     arguments = (arg for arg in args)
 6 |     first = next(arguments)
 7 |     assert all(arg == first for arg in arguments), \
 8 |         "Not all arguments have the same value: " + str(args)
 9 | 
10 | 
11 | def use_gpu(opt):
12 |     return (hasattr(opt, 'gpuid') and len(opt.gpuid) > 0) or \
13 |         (hasattr(opt, 'gpu') and opt.gpu > -1)
14 | 


--------------------------------------------------------------------------------
/onmt/__init__.py:
--------------------------------------------------------------------------------
 1 | import onmt.IO
 2 | import onmt.Models
 3 | import onmt.Loss
 4 | from onmt.Trainer import Trainer, Statistics
 5 | from onmt.Translator import Translator
 6 | from onmt.Optim import Optim
 7 | from onmt.Beam import Beam, GNMTGlobalScorer
 8 | import onmt.standard_options
 9 | 
10 | # For flake8 compatibility
11 | __all__ = [onmt.Loss, onmt.IO, onmt.Models, Trainer, Translator,
12 |            Optim, Beam, Statistics, GNMTGlobalScorer, onmt.standard_options]
13 | 


--------------------------------------------------------------------------------
/onmt/modules/Conv2Conv.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Implementation of "Convolutional Sequence to Sequence Learning"
  3 | """
  4 | import torch
  5 | import torch.nn as nn
  6 | import torch.nn.init as init
  7 | import torch.nn.functional as F
  8 | from torch.autograd import Variable
  9 | 
 10 | import onmt.modules
 11 | from onmt.modules.WeightNorm import WeightNormConv2d
 12 | from onmt.Models import EncoderBase
 13 | from onmt.Models import DecoderState
 14 | from onmt.Utils import aeq
 15 | 
 16 | 
 17 | SCALE_WEIGHT = 0.5 ** 0.5
 18 | 
 19 | 
 20 | def shape_transform(x):
 21 |     """ Tranform the size of the tensors to fit for conv input. """
 22 |     return torch.unsqueeze(torch.transpose(x, 1, 2), 3)
 23 | 
 24 | 
 25 | class GatedConv(nn.Module):
 26 |     def __init__(self, input_size, width=3, dropout=0.2, nopad=False):
 27 |         super(GatedConv, self).__init__()
 28 |         self.conv = WeightNormConv2d(input_size, 2 * input_size,
 29 |                                      kernel_size=(width, 1), stride=(1, 1),
 30 |                                      padding=(width // 2 * (1 - nopad), 0))
 31 |         init.xavier_uniform(self.conv.weight, gain=(4 * (1 - dropout))**0.5)
 32 |         self.dropout = nn.Dropout(dropout)
 33 | 
 34 |     def forward(self, x_var, hidden=None):
 35 |         x_var = self.dropout(x_var)
 36 |         x_var = self.conv(x_var)
 37 |         out, gate = x_var.split(int(x_var.size(1) / 2), 1)
 38 |         out = out * F.sigmoid(gate)
 39 |         return out
 40 | 
 41 | 
 42 | class StackedCNN(nn.Module):
 43 |     def __init__(self, num_layers, input_size, cnn_kernel_width=3,
 44 |                  dropout=0.2):
 45 |         super(StackedCNN, self).__init__()
 46 |         self.dropout = dropout
 47 |         self.num_layers = num_layers
 48 |         self.layers = nn.ModuleList()
 49 |         for i in range(num_layers):
 50 |             self.layers.append(
 51 |                 GatedConv(input_size, cnn_kernel_width, dropout))
 52 | 
 53 |     def forward(self, x, hidden=None):
 54 |         for conv in self.layers:
 55 |             x = x + conv(x)
 56 |             x *= SCALE_WEIGHT
 57 |         return x
 58 | 
 59 | 
 60 | class CNNEncoder(EncoderBase):
 61 |     """
 62 |     Encoder built on CNN.
 63 |     """
 64 |     def __init__(self, num_layers, hidden_size,
 65 |                  cnn_kernel_width, dropout, embeddings):
 66 |         super(CNNEncoder, self).__init__()
 67 | 
 68 |         self.embeddings = embeddings
 69 |         input_size = embeddings.embedding_size
 70 |         self.linear = nn.Linear(input_size, hidden_size)
 71 |         self.cnn = StackedCNN(num_layers, hidden_size,
 72 |                               cnn_kernel_width, dropout)
 73 | 
 74 |     def forward(self, input, lengths=None, hidden=None):
 75 |         """ See EncoderBase.forward() for description of args and returns."""
 76 |         self._check_args(input, lengths, hidden)
 77 | 
 78 |         emb = self.embeddings(input)
 79 |         s_len, batch, emb_dim = emb.size()
 80 | 
 81 |         emb = emb.transpose(0, 1).contiguous()
 82 |         emb_reshape = emb.view(emb.size(0) * emb.size(1), -1)
 83 |         emb_remap = self.linear(emb_reshape)
 84 |         emb_remap = emb_remap.view(emb.size(0), emb.size(1), -1)
 85 |         emb_remap = shape_transform(emb_remap)
 86 |         out = self.cnn(emb_remap)
 87 | 
 88 |         return emb_remap.squeeze(3).transpose(0, 1).contiguous(),\
 89 |             out.squeeze(3).transpose(0, 1).contiguous()
 90 | 
 91 | 
 92 | class CNNDecoder(nn.Module):
 93 |     """
 94 |     Decoder built on CNN, which consists of resduial convolutional layers,
 95 |     with ConvMultiStepAttention.
 96 |     """
 97 |     def __init__(self, num_layers, hidden_size, attn_type,
 98 |                  copy_attn, cnn_kernel_width, dropout, embeddings):
 99 |         super(CNNDecoder, self).__init__()
100 | 
101 |         # Basic attributes.
102 |         self.decoder_type = 'cnn'
103 |         self.num_layers = num_layers
104 |         self.hidden_size = hidden_size
105 |         self.cnn_kernel_width = cnn_kernel_width
106 |         self.embeddings = embeddings
107 |         self.dropout = dropout
108 | 
109 |         # Build the CNN.
110 |         input_size = self.embeddings.embedding_size
111 |         self.linear = nn.Linear(input_size, self.hidden_size)
112 |         self.conv_layers = nn.ModuleList()
113 |         for i in range(self.num_layers):
114 |             self.conv_layers.append(
115 |                 GatedConv(self.hidden_size, self.cnn_kernel_width,
116 |                           self.dropout, True))
117 | 
118 |         self.attn_layers = nn.ModuleList()
119 |         for i in range(self.num_layers):
120 |             self.attn_layers.append(
121 |                 onmt.modules.ConvMultiStepAttention(self.hidden_size))
122 | 
123 |         # CNNDecoder has its own attention mechanism.
124 |         # Set up a separated copy attention layer, if needed.
125 |         self._copy = False
126 |         if copy_attn:
127 |             self.copy_attn = onmt.modules.GlobalAttention(
128 |                 hidden_size, attn_type=attn_type)
129 |             self._copy = True
130 | 
131 |     def forward(self, input, context, state):
132 |         """
133 |         Forward through the CNNDecoder.
134 |         Args:
135 |             input (LongTensor): a sequence of input tokens tensors
136 |                                 of size (len x batch x nfeats).
137 |             context (FloatTensor): output(tensor sequence) from the encoder
138 |                         CNN of size (src_len x batch x hidden_size).
139 |             state (FloatTensor): hidden state from the encoder CNN for
140 |                                  initializing the decoder.
141 |         Returns:
142 |             outputs (FloatTensor): a Tensor sequence of output from the decoder
143 |                                    of shape (len x batch x hidden_size).
144 |             state (FloatTensor): final hidden state from the decoder.
145 |             attns (dict of (str, FloatTensor)): a dictionary of different
146 |                                 type of attention Tensor from the decoder
147 |                                 of shape (src_len x batch).
148 |         """
149 |         # CHECKS
150 |         assert isinstance(state, CNNDecoderState)
151 |         input_len, input_batch, _ = input.size()
152 |         contxt_len, contxt_batch, _ = context.size()
153 |         aeq(input_batch, contxt_batch)
154 |         # END CHECKS
155 | 
156 |         if state.previous_input is not None:
157 |             input = torch.cat([state.previous_input, input], 0)
158 | 
159 |         # Initialize return variables.
160 |         outputs = []
161 |         attns = {"std": []}
162 |         assert not self._copy, "Copy mechanism not yet tested in conv2conv"
163 |         if self._copy:
164 |             attns["copy"] = []
165 | 
166 |         emb = self.embeddings(input)
167 |         assert emb.dim() == 3  # len x batch x embedding_dim
168 | 
169 |         tgt_emb = emb.transpose(0, 1).contiguous()
170 |         # The output of CNNEncoder.
171 |         src_context_t = context.transpose(0, 1).contiguous()
172 |         # The combination of output of CNNEncoder and source embeddings.
173 |         src_context_c = state.init_src.transpose(0, 1).contiguous()
174 | 
175 |         # Run the forward pass of the CNNDecoder.
176 |         emb_reshape = tgt_emb.contiguous().view(
177 |             tgt_emb.size(0) * tgt_emb.size(1), -1)
178 |         linear_out = self.linear(emb_reshape)
179 |         x = linear_out.view(tgt_emb.size(0), tgt_emb.size(1), -1)
180 |         x = shape_transform(x)
181 | 
182 |         pad = Variable(torch.zeros(x.size(0), x.size(1),
183 |                                    self.cnn_kernel_width - 1, 1))
184 |         pad = pad.type_as(x)
185 |         base_target_emb = x
186 | 
187 |         for conv, attention in zip(self.conv_layers, self.attn_layers):
188 |             new_target_input = torch.cat([pad, x], 2)
189 |             out = conv(new_target_input)
190 |             c, attn = attention(base_target_emb, out,
191 |                                 src_context_t, src_context_c)
192 |             x = (x + (c + out) * SCALE_WEIGHT) * SCALE_WEIGHT
193 |         output = x.squeeze(3).transpose(1, 2)
194 | 
195 |         # Process the result and update the attentions.
196 |         outputs = output.transpose(0, 1).contiguous()
197 |         if state.previous_input is not None:
198 |             outputs = outputs[state.previous_input.size(0):]
199 |             attn = attn[:, state.previous_input.size(0):].squeeze()
200 |             attn = torch.stack([attn])
201 |         attns["std"] = attn
202 |         if self._copy:
203 |             attns["copy"] = attn
204 | 
205 |         # Update the state.
206 |         state.update_state(input)
207 | 
208 |         return outputs, state, attns
209 | 
210 |     def init_decoder_state(self, src, context, enc_hidden):
211 |         return CNNDecoderState(context, enc_hidden)
212 | 
213 | 
214 | class CNNDecoderState(DecoderState):
215 |     def __init__(self, context, enc_hidden):
216 |         self.init_src = (context + enc_hidden) * SCALE_WEIGHT
217 |         self.previous_input = None
218 | 
219 |     @property
220 |     def _all(self):
221 |         """
222 |         Contains attributes that need to be updated in self.beam_update().
223 |         """
224 |         return (self.previous_input,)
225 | 
226 |     def update_state(self, input):
227 |         """ Called for every decoder forward pass. """
228 |         self.previous_input = input
229 | 
230 |     def repeat_beam_size_times(self, beam_size):
231 |         """ Repeat beam_size times along batch dimension. """
232 |         self.init_src = Variable(
233 |             self.init_src.data.repeat(1, beam_size, 1), volatile=True)
234 | 


--------------------------------------------------------------------------------
/onmt/modules/ConvMultiStepAttention.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import torch.nn as nn
 3 | import torch.nn.functional as F
 4 | from onmt.Utils import aeq
 5 | 
 6 | 
 7 | SCALE_WEIGHT = 0.5 ** 0.5
 8 | 
 9 | 
10 | def seq_linear(linear, x):
11 |     # linear transform for 3-d tensor
12 |     batch, hidden_size, length, _ = x.size()
13 |     h = linear(torch.transpose(x, 1, 2).contiguous().view(
14 |         batch * length, hidden_size))
15 |     return torch.transpose(h.view(batch, length, hidden_size, 1), 1, 2)
16 | 
17 | 
18 | class ConvMultiStepAttention(nn.Module):
19 |     def __init__(self, input_size):
20 |         super(ConvMultiStepAttention, self).__init__()
21 |         self.linear_in = nn.Linear(input_size, input_size)
22 |         self.mask = None
23 | 
24 |     def applyMask(self, mask):
25 |         self.mask = mask
26 | 
27 |     def forward(self, base_target_emb, input, encoder_out_top,
28 |                 encoder_out_combine):
29 |         """
30 |         It's like Luong Attetion.
31 |         Conv attention takes a key matrix, a value matrix and a query vector.
32 |         Attention weight is calculated by key matrix with the query vector
33 |         and sum on the value matrix. And the same operation is applied
34 |         in each decode conv layer.
35 |         Args:
36 |             base_target_emb: target emb tensor
37 |             input: output of decode conv
38 |             encoder_out_t: the key matrix for calculation of attetion weight,
39 |                 which is the top output of encode conv
40 |             encoder_out_c: the value matrix for the attention-weighted sum,
41 |                 which is the combination of base emb and top output of encode
42 | 
43 |         """
44 |         # checks
45 |         batch, channel, height, width = base_target_emb.size()
46 |         batch_, channel_, height_, width_ = input.size()
47 |         aeq(batch, batch_)
48 |         aeq(height, height_)
49 | 
50 |         enc_batch, enc_channel, enc_height = encoder_out_top.size()
51 |         enc_batch_, enc_channel_, enc_height_ = encoder_out_combine.size()
52 | 
53 |         aeq(enc_batch, enc_batch_)
54 |         aeq(enc_height, enc_height_)
55 | 
56 |         preatt = seq_linear(self.linear_in, input)
57 |         target = (base_target_emb + preatt) * SCALE_WEIGHT
58 |         target = torch.squeeze(target, 3)
59 |         target = torch.transpose(target, 1, 2)
60 |         pre_attn = torch.bmm(target, encoder_out_top)
61 | 
62 |         if self.mask is not None:
63 |             pre_attn.data.masked_fill_(self.mask, -float('inf'))
64 | 
65 |         attn = F.softmax(pre_attn)
66 |         context_output = torch.bmm(
67 |             attn, torch.transpose(encoder_out_combine, 1, 2))
68 |         context_output = torch.transpose(
69 |             torch.unsqueeze(context_output, 3), 1, 2)
70 |         return context_output, attn
71 | 


--------------------------------------------------------------------------------
/onmt/modules/CopyGenerator.py:
--------------------------------------------------------------------------------
  1 | import torch.nn as nn
  2 | import torch.nn.functional as F
  3 | import torch
  4 | import torch.cuda
  5 | 
  6 | import onmt
  7 | from onmt.Utils import aeq
  8 | 
  9 | 
 10 | class CopyGenerator(nn.Module):
 11 |     """
 12 |     Generator module that additionally considers copying
 13 |     words directly from the source.
 14 |     """
 15 |     def __init__(self, opt, src_dict, tgt_dict):
 16 |         super(CopyGenerator, self).__init__()
 17 |         self.linear = nn.Linear(opt.rnn_size, len(tgt_dict))
 18 |         self.linear_copy = nn.Linear(opt.rnn_size, 1)
 19 |         self.src_dict = src_dict
 20 |         self.tgt_dict = tgt_dict
 21 | 
 22 |     def forward(self, hidden, attn, src_map):
 23 |         """
 24 |         Computes p(w) = p(z=1) p_{copy}(w|z=0)  +  p(z=0) * p_{softmax}(w|z=0)
 25 |         """
 26 |         # CHECKS
 27 |         batch_by_tlen, _ = hidden.size()
 28 |         batch_by_tlen_, slen = attn.size()
 29 |         slen_, batch, cvocab = src_map.size()
 30 |         aeq(batch_by_tlen, batch_by_tlen_)
 31 |         aeq(slen, slen_)
 32 | 
 33 |         # Original probabilities.
 34 |         logits = self.linear(hidden)
 35 |         logits[:, self.tgt_dict.stoi[onmt.IO.PAD_WORD]] = -float('inf')
 36 |         prob = F.softmax(logits)
 37 | 
 38 |         # Probability of copying p(z=1) batch.
 39 |         copy = F.sigmoid(self.linear_copy(hidden))
 40 | 
 41 |         # Probibility of not copying: p_{word}(w) * (1 - p(z))
 42 |         out_prob = torch.mul(prob,  1 - copy.expand_as(prob))
 43 |         mul_attn = torch.mul(attn, copy.expand_as(attn))
 44 |         copy_prob = torch.bmm(mul_attn.view(-1, batch, slen)
 45 |                               .transpose(0, 1),
 46 |                               src_map.transpose(0, 1)).transpose(0, 1)
 47 |         copy_prob = copy_prob.contiguous().view(-1, cvocab)
 48 |         return torch.cat([out_prob, copy_prob], 1)
 49 | 
 50 | 
 51 | class CopyGeneratorCriterion(object):
 52 |     def __init__(self, vocab_size, force_copy, pad, eps=1e-20):
 53 |         self.force_copy = force_copy
 54 |         self.eps = eps
 55 |         self.offset = vocab_size
 56 |         self.pad = pad
 57 | 
 58 |     def __call__(self, scores, align, target):
 59 |         align = align.view(-1)
 60 | 
 61 |         # Copy prob.
 62 |         out = scores.gather(1, align.view(-1, 1) + self.offset) \
 63 |                     .view(-1).mul(align.ne(0).float())
 64 |         tmp = scores.gather(1, target.view(-1, 1)).view(-1)
 65 | 
 66 |         # Regular prob (no unks and unks that can't be copied)
 67 |         if not self.force_copy:
 68 |             out = out + self.eps + tmp.mul(target.ne(0).float()) + \
 69 |                   tmp.mul(align.eq(0).float()).mul(target.eq(0).float())
 70 |         else:
 71 |             # Forced copy.
 72 |             out = out + self.eps + tmp.mul(align.eq(0).float())
 73 | 
 74 |         # Drop padding.
 75 |         loss = -out.log().mul(target.ne(self.pad).float()).sum()
 76 |         return loss
 77 | 
 78 | 
 79 | class CopyGeneratorLossCompute(onmt.Loss.LossComputeBase):
 80 |     """
 81 |     Copy Generator Loss Computation.
 82 |     """
 83 |     def __init__(self, generator, tgt_vocab, dataset,
 84 |                  force_copy, eps=1e-20):
 85 |         super(CopyGeneratorLossCompute, self).__init__(generator, tgt_vocab)
 86 | 
 87 |         self.dataset = dataset
 88 |         self.copy_attn = True
 89 |         self.force_copy = force_copy
 90 |         self.criterion = CopyGeneratorCriterion(len(tgt_vocab), force_copy,
 91 |                                                 self.padding_idx)
 92 | 
 93 |     def compute_loss(self, batch, output, target, copy_attn, align):
 94 |         """
 95 |         Compute the loss. The args must match Loss.make_gen_state().
 96 |         Args:
 97 |             batch: the current batch.
 98 |             output: the predict output from the model.
 99 |             target: the validate target to compare output with.
100 |             copy_attn: the copy attention value.
101 |             align: the align info.
102 |         """
103 |         target = target.view(-1)
104 |         align = align.view(-1)
105 |         scores = self.generator(self.bottle(output),
106 |                                 self.bottle(copy_attn),
107 |                                 batch.src_map)
108 | 
109 |         loss = self.criterion(scores, align, target)
110 | 
111 |         scores_data = scores.data.clone()
112 |         scores_data = self.dataset.collapse_copy_scores(
113 |                 self.unbottle(scores_data, batch.batch_size),
114 |                 batch, self.tgt_vocab)
115 |         scores_data = self.bottle(scores_data)
116 | 
117 |         # Correct target is copy when only option.
118 |         # TODO: replace for loop with masking or boolean indexing
119 |         target_data = target.data.clone()
120 |         for i in range(target_data.size(0)):
121 |             if target_data[i] == 0 and align.data[i] != 0:
122 |                 target_data[i] = align.data[i] + len(self.tgt_vocab)
123 | 
124 |         # Coverage loss term.
125 |         loss_data = loss.data.clone()
126 | 
127 |         stats = self.stats(loss_data, scores_data, target_data)
128 | 
129 |         return loss, stats
130 | 


--------------------------------------------------------------------------------
/onmt/modules/Embeddings.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | from torch.autograd import Variable
  4 | 
  5 | from onmt.modules import BottleLinear, Elementwise
  6 | from onmt.Utils import aeq
  7 | 
  8 | 
  9 | class PositionalEncoding(nn.Module):
 10 | 
 11 |     def __init__(self, dropout, dim, max_len=5000):
 12 |         pe = torch.arange(0, max_len).unsqueeze(1).expand(max_len, dim)
 13 |         div_term = 1 / torch.pow(10000, torch.arange(0, dim * 2, 2) / dim)
 14 |         pe = pe * div_term.expand_as(pe)
 15 |         pe[:, 0::2] = torch.sin(pe[:, 0::2])
 16 |         pe[:, 1::2] = torch.cos(pe[:, 1::2])
 17 |         pe = pe.unsqueeze(1)
 18 |         super(PositionalEncoding, self).__init__()
 19 |         self.register_buffer('pe', pe)
 20 |         self.dropout = nn.Dropout(p=dropout)
 21 | 
 22 |     def forward(self, emb):
 23 |         # We must wrap the self.pe in Variable to compute, not the other
 24 |         # way - unwrap emb(i.e. emb.data). Otherwise the computation
 25 |         # wouldn't be watched to build the compute graph.
 26 |         emb = emb + Variable(self.pe[:emb.size(0), :1, :emb.size(2)]
 27 |                              .expand_as(emb), requires_grad=False)
 28 |         emb = self.dropout(emb)
 29 |         return emb
 30 | 
 31 | 
 32 | class Embeddings(nn.Module):
 33 |     """
 34 |     Words embeddings dictionary for encoder/decoder.
 35 | 
 36 |     Args:
 37 |         word_vec_size (int): size of the dictionary of embeddings.
 38 |         position_encoding (bool): use a sin to mark relative words positions.
 39 |         feat_merge (string): merge action for the features embeddings:
 40 |                     concat, sum or mlp.
 41 |         feat_vec_exponent (float): when using '-feat_merge concat', feature
 42 |                     embedding size is N^feat_dim_exponent, where N is the
 43 |                     number of values of feature takes.
 44 |         feat_vec_size (int): embedding dimension for features when using
 45 |                     '-feat_merge mlp'
 46 |         dropout (float): dropout probability.
 47 |         word_padding_idx (int): padding index for words in the embeddings.
 48 |         feats_padding_idx ([int]): padding index for a list of features
 49 |                                    in the embeddings.
 50 |         word_vocab_size (int): size of dictionary of embeddings for words.
 51 |         feat_vocab_sizes ([int], optional): list of size of dictionary
 52 |                                     of embeddings for each feature.
 53 |     """
 54 |     def __init__(self, word_vec_size, position_encoding, feat_merge,
 55 |                  feat_vec_exponent, feat_vec_size, dropout,
 56 |                  word_padding_idx, feat_padding_idx,
 57 |                  word_vocab_size, feat_vocab_sizes=[]):
 58 | 
 59 |         self.word_padding_idx = word_padding_idx
 60 | 
 61 |         # Dimensions and padding for constructing the word embedding matrix
 62 |         vocab_sizes = [word_vocab_size]
 63 |         emb_dims = [word_vec_size]
 64 |         pad_indices = [word_padding_idx]
 65 | 
 66 |         # Dimensions and padding for feature embedding matrices
 67 |         # (these have no effect if feat_vocab_sizes is empty)
 68 |         if feat_merge == 'sum':
 69 |             feat_dims = [word_vec_size] * len(feat_vocab_sizes)
 70 |         elif feat_vec_size > 0:
 71 |             feat_dims = [feat_vec_size] * len(feat_vocab_sizes)
 72 |         else:
 73 |             feat_dims = [int(vocab ** feat_vec_exponent)
 74 |                          for vocab in feat_vocab_sizes]
 75 |         vocab_sizes.extend(feat_vocab_sizes)
 76 |         emb_dims.extend(feat_dims)
 77 |         pad_indices.extend(feat_padding_idx)
 78 | 
 79 |         # The embedding matrix look-up tables. The first look-up table
 80 |         # is for words. Subsequent ones are for features, if any exist.
 81 |         emb_params = zip(vocab_sizes, emb_dims, pad_indices)
 82 |         embeddings = [nn.Embedding(vocab, dim, padding_idx=pad)
 83 |                       for vocab, dim, pad in emb_params]
 84 |         emb_luts = Elementwise(feat_merge, embeddings)
 85 | 
 86 |         # The final output size of word + feature vectors. This can vary
 87 |         # from the word vector size if and only if features are defined.
 88 |         # This is the attribute you should access if you need to know
 89 |         # how big your embeddings are going to be.
 90 |         self.embedding_size = (sum(emb_dims) if feat_merge == 'concat'
 91 |                                else word_vec_size)
 92 | 
 93 |         # The sequence of operations that converts the input sequence
 94 |         # into a sequence of embeddings. At minimum this consists of
 95 |         # looking up the embeddings for each word and feature in the
 96 |         # input. Model parameters may require the sequence to contain
 97 |         # additional operations as well.
 98 |         super(Embeddings, self).__init__()
 99 |         self.make_embedding = nn.Sequential()
100 |         self.make_embedding.add_module('emb_luts', emb_luts)
101 | 
102 |         if feat_merge == 'mlp':
103 |             in_dim = sum(emb_dims)
104 |             out_dim = word_vec_size
105 |             mlp = nn.Sequential(BottleLinear(in_dim, out_dim), nn.ReLU())
106 |             self.make_embedding.add_module('mlp', mlp)
107 | 
108 |         if position_encoding:
109 |             pe = PositionalEncoding(dropout, self.embedding_size)
110 |             self.make_embedding.add_module('pe', pe)
111 | 
112 |     @property
113 |     def word_lut(self):
114 |         return self.make_embedding[0][0]
115 | 
116 |     @property
117 |     def emb_luts(self):
118 |         return self.make_embedding[0]
119 | 
120 |     def load_pretrained_vectors(self, emb_file, fixed):
121 |         if emb_file:
122 |             pretrained = torch.load(emb_file)
123 |             self.word_lut.weight.data.copy_(pretrained)
124 |             if fixed:
125 |                 self.word_lut.weight.requires_grad = False
126 | 
127 |     def forward(self, input):
128 |         """
129 |         Return the embeddings for words, and features if there are any.
130 |         Args:
131 |             input (LongTensor): len x batch x nfeat
132 |         Return:
133 |             emb (FloatTensor): len x batch x self.embedding_size
134 |         """
135 |         in_length, in_batch, nfeat = input.size()
136 |         aeq(nfeat, len(self.emb_luts))
137 | 
138 |         emb = self.make_embedding(input)
139 | 
140 |         out_length, out_batch, emb_size = emb.size()
141 |         aeq(in_length, out_length)
142 |         aeq(in_batch, out_batch)
143 |         aeq(emb_size, self.embedding_size)
144 | 
145 |         return emb
146 | 


--------------------------------------------------------------------------------
/onmt/modules/Gate.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Context gate is a decoder module that takes as input the previous word
 3 | embedding, the current decoder state and the attention state, and produces a
 4 | gate.
 5 | The gate can be used to select the input from the target side context
 6 | (decoder state), from the source context (attention state) or both.
 7 | """
 8 | import torch
 9 | import torch.nn as nn
10 | 
11 | 
12 | def ContextGateFactory(type, embeddings_size, decoder_size,
13 |                        attention_size, output_size):
14 |     """Returns the correct ContextGate class"""
15 | 
16 |     gate_types = {'source': SourceContextGate,
17 |                   'target': TargetContextGate,
18 |                   'both': BothContextGate}
19 | 
20 |     assert type in gate_types, "Not valid ContextGate type: {0}".format(type)
21 |     return gate_types[type](embeddings_size, decoder_size, attention_size,
22 |                             output_size)
23 | 
24 | 
25 | class ContextGate(nn.Module):
26 |     """Implement up to the computation of the gate"""
27 | 
28 |     def __init__(self, embeddings_size, decoder_size,
29 |                  attention_size, output_size):
30 |         super(ContextGate, self).__init__()
31 |         input_size = embeddings_size + decoder_size + attention_size
32 |         self.gate = nn.Linear(input_size, output_size, bias=True)
33 |         self.sig = nn.Sigmoid()
34 |         self.source_proj = nn.Linear(attention_size, output_size)
35 |         self.target_proj = nn.Linear(embeddings_size + decoder_size,
36 |                                      output_size)
37 | 
38 |     def forward(self, prev_emb, dec_state, attn_state):
39 |         input_tensor = torch.cat((prev_emb, dec_state, attn_state), dim=1)
40 |         z = self.sig(self.gate(input_tensor))
41 |         proj_source = self.source_proj(attn_state)
42 |         proj_target = self.target_proj(
43 |             torch.cat((prev_emb, dec_state), dim=1))
44 |         return z, proj_source, proj_target
45 | 
46 | 
47 | class SourceContextGate(nn.Module):
48 |     """Apply the context gate only to the source context"""
49 | 
50 |     def __init__(self, embeddings_size, decoder_size,
51 |                  attention_size, output_size):
52 |         super(SourceContextGate, self).__init__()
53 |         self.context_gate = ContextGate(embeddings_size, decoder_size,
54 |                                         attention_size, output_size)
55 |         self.tanh = nn.Tanh()
56 | 
57 |     def forward(self, prev_emb, dec_state, attn_state):
58 |         z, source, target = self.context_gate(
59 |             prev_emb, dec_state, attn_state)
60 |         return self.tanh(target + z * source)
61 | 
62 | 
63 | class TargetContextGate(nn.Module):
64 |     """Apply the context gate only to the target context"""
65 | 
66 |     def __init__(self, embeddings_size, decoder_size,
67 |                  attention_size, output_size):
68 |         super(TargetContextGate, self).__init__()
69 |         self.context_gate = ContextGate(embeddings_size, decoder_size,
70 |                                         attention_size, output_size)
71 |         self.tanh = nn.Tanh()
72 | 
73 |     def forward(self, prev_emb, dec_state, attn_state):
74 |         z, source, target = self.context_gate(prev_emb, dec_state, attn_state)
75 |         return self.tanh(z * target + source)
76 | 
77 | 
78 | class BothContextGate(nn.Module):
79 |     """Apply the context gate to both contexts"""
80 | 
81 |     def __init__(self, embeddings_size, decoder_size,
82 |                  attention_size, output_size):
83 |         super(BothContextGate, self).__init__()
84 |         self.context_gate = ContextGate(embeddings_size, decoder_size,
85 |                                         attention_size, output_size)
86 |         self.tanh = nn.Tanh()
87 | 
88 |     def forward(self, prev_emb, dec_state, attn_state):
89 |         z, source, target = self.context_gate(prev_emb, dec_state, attn_state)
90 |         return self.tanh((1. - z) * target + z * source)
91 | 


--------------------------------------------------------------------------------
/onmt/modules/GlobalAttention.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | 
  4 | from onmt.modules.UtilClass import BottleLinear
  5 | from onmt.Utils import aeq
  6 | 
  7 | 
  8 | class GlobalAttention(nn.Module):
  9 |     """
 10 |     Luong Attention.
 11 | 
 12 |     Global attention takes a matrix and a query vector. It
 13 |     then computes a parameterized convex combination of the matrix
 14 |     based on the input query.
 15 | 
 16 | 
 17 |         H_1 H_2 H_3 ... H_n
 18 |           q   q   q       q
 19 |             |  |   |       |
 20 |               \ |   |      /
 21 |                       .....
 22 |                   \   |  /
 23 |                       a
 24 | 
 25 |     Constructs a unit mapping.
 26 |     $$(H_1 + H_n, q) => (a)$$
 27 |     Where H is of `batch x n x dim` and q is of `batch x dim`.
 28 | 
 29 |     Luong Attention (dot, general):
 30 |     The full function is
 31 |     $$\tanh(W_2 [(softmax((W_1 q + b_1) H) H), q] + b_2)$$.
 32 | 
 33 |     * dot: $$score(h_t,{\overline{h}}_s) = h_t^T{\overline{h}}_s$$
 34 |     * general: $$score(h_t,{\overline{h}}_s) = h_t^T W_a {\overline{h}}_s$$
 35 | 
 36 |     Bahdanau Attention (mlp):
 37 |     $$c = \sum_{j=1}^{SeqLength}\a_jh_j$$.
 38 |     The Alignment-function $$a$$ computes an alignment as:
 39 |     $$a_j = softmax(v_a^T \tanh(W_a q + U_a h_j) )$$.
 40 | 
 41 |     """
 42 |     def __init__(self, dim, coverage=False, attn_type="dot"):
 43 |         super(GlobalAttention, self).__init__()
 44 | 
 45 |         self.dim = dim
 46 |         self.attn_type = attn_type
 47 |         assert (self.attn_type in ["dot", "general", "mlp"]), (
 48 |                 "Please select a valid attention type.")
 49 | 
 50 |         if self.attn_type == "general":
 51 |             self.linear_in = nn.Linear(dim, dim, bias=False)
 52 |         elif self.attn_type == "mlp":
 53 |             self.linear_context = BottleLinear(dim, dim, bias=False)
 54 |             self.linear_query = nn.Linear(dim, dim, bias=True)
 55 |             self.v = BottleLinear(dim, 1, bias=False)
 56 |         # mlp wants it with bias
 57 |         out_bias = self.attn_type == "mlp"
 58 |         self.linear_out = nn.Linear(dim*2, dim, bias=out_bias)
 59 | 
 60 |         self.sm = nn.Softmax()
 61 |         self.tanh = nn.Tanh()
 62 |         self.mask = None
 63 | 
 64 |         if coverage:
 65 |             self.linear_cover = nn.Linear(1, dim, bias=False)
 66 | 
 67 |     def applyMask(self, mask):
 68 |         self.mask = mask
 69 | 
 70 |     def score(self, h_t, h_s):
 71 |         """
 72 |         h_t (FloatTensor): batch x tgt_len x dim
 73 |         h_s (FloatTensor): batch x src_len x dim
 74 |         returns scores (FloatTensor): batch x tgt_len x src_len:
 75 |             raw attention scores for each src index
 76 |         """
 77 | 
 78 |         # Check input sizes
 79 |         src_batch, src_len, src_dim = h_s.size()
 80 |         tgt_batch, tgt_len, tgt_dim = h_t.size()
 81 |         aeq(src_batch, tgt_batch)
 82 |         aeq(src_dim, tgt_dim)
 83 |         aeq(self.dim, src_dim)
 84 | 
 85 |         if self.attn_type in ["general", "dot"]:
 86 |             if self.attn_type == "general":
 87 |                 h_t_ = h_t.view(tgt_batch*tgt_len, tgt_dim)
 88 |                 h_t_ = self.linear_in(h_t_)
 89 |                 h_t = h_t_.view(tgt_batch, tgt_len, tgt_dim)
 90 |             h_s_ = h_s.transpose(1, 2)
 91 |             # (batch, t_len, d) x (batch, d, s_len) --> (batch, t_len, s_len)
 92 |             return torch.bmm(h_t, h_s_)
 93 |         else:
 94 |             dim = self.dim
 95 |             wq = self.linear_query(h_t.view(-1, dim))
 96 |             wq = wq.view(tgt_batch, tgt_len, 1, dim)
 97 |             wq = wq.expand(tgt_batch, tgt_len, src_len, dim)
 98 | 
 99 |             uh = self.linear_context(h_s.contiguous().view(-1, dim))
100 |             uh = uh.view(src_batch, 1, src_len, dim)
101 |             uh = uh.expand(src_batch, tgt_len, src_len, dim)
102 | 
103 |             # (batch, t_len, s_len, d)
104 |             wquh = self.tanh(wq + uh)
105 | 
106 |             return self.v(wquh.view(-1, dim)).view(tgt_batch, tgt_len, src_len)
107 | 
108 |     def forward(self, input, context, coverage=None):
109 |         """
110 |         input (FloatTensor): batch x tgt_len x dim: decoder's rnn's output.
111 |         context (FloatTensor): batch x src_len x dim: src hidden states
112 |         coverage (FloatTensor): None (not supported yet)
113 |         """
114 | 
115 |         # one step input
116 |         if input.dim() == 2:
117 |             one_step = True
118 |             input = input.unsqueeze(1)
119 |         else:
120 |             one_step = False
121 | 
122 |         batch, sourceL, dim = context.size()
123 |         batch_, targetL, dim_ = input.size()
124 |         aeq(batch, batch_)
125 |         aeq(dim, dim_)
126 |         aeq(self.dim, dim)
127 |         if coverage is not None:
128 |             batch_, sourceL_ = coverage.size()
129 |             aeq(batch, batch_)
130 |             aeq(sourceL, sourceL_)
131 | 
132 |         if self.mask is not None:
133 |             beam_, batch_, sourceL_ = self.mask.size()
134 |             aeq(batch, batch_*beam_)
135 |             aeq(sourceL, sourceL_)
136 | 
137 |         if coverage is not None:
138 |             cover = coverage.view(-1).unsqueeze(1)
139 |             context += self.linear_cover(cover).view_as(context)
140 |             context = self.tanh(context)
141 | 
142 |         # compute attention scores, as in Luong et al.
143 |         align = self.score(input, context)
144 | 
145 |         if self.mask is not None:
146 |             mask_ = self.mask.view(batch, 1, sourceL)  # make it broardcastable
147 |             align.data.masked_fill_(mask_, -float('inf'))
148 | 
149 |         # Softmax to normalize attention weights
150 |         align_vectors = self.sm(align.view(batch*targetL, sourceL))
151 |         align_vectors = align_vectors.view(batch, targetL, sourceL)
152 | 
153 |         # each context vector c_t is the weighted average
154 |         # over all the source hidden states
155 |         c = torch.bmm(align_vectors, context)
156 | 
157 |         # concatenate
158 |         concat_c = torch.cat([c, input], 2).view(batch*targetL, dim*2)
159 |         attn_h = self.linear_out(concat_c).view(batch, targetL, dim)
160 |         if self.attn_type in ["general", "dot"]:
161 |             attn_h = self.tanh(attn_h)
162 | 
163 |         if one_step:
164 |             attn_h = attn_h.squeeze(1)
165 |             align_vectors = align_vectors.squeeze(1)
166 | 
167 |             # Check output sizes
168 |             batch_, dim_ = attn_h.size()
169 |             aeq(batch, batch_)
170 |             aeq(dim, dim_)
171 |             batch_, sourceL_ = align_vectors.size()
172 |             aeq(batch, batch_)
173 |             aeq(sourceL, sourceL_)
174 |         else:
175 |             attn_h = attn_h.transpose(0, 1).contiguous()
176 |             align_vectors = align_vectors.transpose(0, 1).contiguous()
177 | 
178 |             # Check output sizes
179 |             targetL_, batch_, dim_ = attn_h.size()
180 |             aeq(targetL, targetL_)
181 |             aeq(batch, batch_)
182 |             aeq(dim, dim_)
183 |             targetL_, batch_, sourceL_ = align_vectors.size()
184 |             aeq(targetL, targetL_)
185 |             aeq(batch, batch_)
186 |             aeq(sourceL, sourceL_)
187 | 
188 |         return attn_h, align_vectors
189 | 


--------------------------------------------------------------------------------
/onmt/modules/ImageEncoder.py:
--------------------------------------------------------------------------------
  1 | import torch.nn as nn
  2 | import torch.nn.functional as F
  3 | import torch
  4 | import torch.cuda
  5 | from torch.autograd import Variable
  6 | 
  7 | 
  8 | class ImageEncoder(nn.Module):
  9 |     """
 10 |     Encoder recurrent neural network for Images.
 11 |     """
 12 |     def __init__(self, num_layers, bidirectional, rnn_size, dropout):
 13 |         """
 14 |         Args:
 15 |             num_layers (int): number of encoder layers.
 16 |             bidirectional (bool): bidirectional encoder.
 17 |             rnn_size (int): size of hidden states of the rnn.
 18 |             dropout (float): dropout probablity.
 19 |         """
 20 |         super(ImageEncoder, self).__init__()
 21 |         self.num_layers = num_layers
 22 |         self.num_directions = 2 if bidirectional else 1
 23 |         self.hidden_size = rnn_size
 24 | 
 25 |         self.layer1 = nn.Conv2d(3,   64, kernel_size=(3, 3),
 26 |                                 padding=(1, 1), stride=(1, 1))
 27 |         self.layer2 = nn.Conv2d(64,  128, kernel_size=(3, 3),
 28 |                                 padding=(1, 1), stride=(1, 1))
 29 |         self.layer3 = nn.Conv2d(128, 256, kernel_size=(3, 3),
 30 |                                 padding=(1, 1), stride=(1, 1))
 31 |         self.layer4 = nn.Conv2d(256, 256, kernel_size=(3, 3),
 32 |                                 padding=(1, 1), stride=(1, 1))
 33 |         self.layer5 = nn.Conv2d(256, 512, kernel_size=(3, 3),
 34 |                                 padding=(1, 1), stride=(1, 1))
 35 |         self.layer6 = nn.Conv2d(512, 512, kernel_size=(3, 3),
 36 |                                 padding=(1, 1), stride=(1, 1))
 37 | 
 38 |         self.batch_norm1 = nn.BatchNorm2d(256)
 39 |         self.batch_norm2 = nn.BatchNorm2d(512)
 40 |         self.batch_norm3 = nn.BatchNorm2d(512)
 41 | 
 42 |         input_size = 512
 43 |         self.rnn = nn.LSTM(input_size, rnn_size,
 44 |                            num_layers=num_layers,
 45 |                            dropout=dropout,
 46 |                            bidirectional=bidirectional)
 47 |         self.pos_lut = nn.Embedding(1000, input_size)
 48 | 
 49 |     def load_pretrained_vectors(self, opt):
 50 |         # Pass in needed options only when modify function definition.
 51 |         pass
 52 | 
 53 |     def forward(self, input, lengths=None):
 54 |         batchSize = input.size(0)
 55 |         # (batch_size, 64, imgH, imgW)
 56 |         # layer 1
 57 |         input = F.relu(self.layer1(input[:, :, :, :]-0.5), True)
 58 | 
 59 |         # (batch_size, 64, imgH/2, imgW/2)
 60 |         input = F.max_pool2d(input, kernel_size=(2, 2), stride=(2, 2))
 61 | 
 62 |         # (batch_size, 128, imgH/2, imgW/2)
 63 |         # layer 2
 64 |         input = F.relu(self.layer2(input), True)
 65 | 
 66 |         # (batch_size, 128, imgH/2/2, imgW/2/2)
 67 |         input = F.max_pool2d(input, kernel_size=(2, 2), stride=(2, 2))
 68 | 
 69 |         #  (batch_size, 256, imgH/2/2, imgW/2/2)
 70 |         # layer 3
 71 |         # batch norm 1
 72 |         input = F.relu(self.batch_norm1(self.layer3(input)), True)
 73 | 
 74 |         # (batch_size, 256, imgH/2/2, imgW/2/2)
 75 |         # layer4
 76 |         input = F.relu(self.layer4(input), True)
 77 | 
 78 |         # (batch_size, 256, imgH/2/2/2, imgW/2/2)
 79 |         input = F.max_pool2d(input, kernel_size=(1, 2), stride=(1, 2))
 80 | 
 81 |         # (batch_size, 512, imgH/2/2/2, imgW/2/2)
 82 |         # layer 5
 83 |         # batch norm 2
 84 |         input = F.relu(self.batch_norm2(self.layer5(input)), True)
 85 | 
 86 |         # (batch_size, 512, imgH/2/2/2, imgW/2/2/2)
 87 |         input = F.max_pool2d(input, kernel_size=(2, 1), stride=(2, 1))
 88 | 
 89 |         # (batch_size, 512, imgH/2/2/2, imgW/2/2/2)
 90 |         input = F.relu(self.batch_norm3(self.layer6(input)), True)
 91 | 
 92 |         # # (batch_size, 512, H, W)
 93 |         # # (batch_size, H, W, 512)
 94 |         all_outputs = []
 95 |         for row in range(input.size(2)):
 96 |             inp = input[:, :, row, :].transpose(0, 2)\
 97 |                                      .transpose(1, 2)
 98 |             pos_emb = self.pos_lut(
 99 |                 Variable(torch.cuda.LongTensor(batchSize).fill_(row)))
100 |             with_pos = torch.cat(
101 |                 (pos_emb.view(1, pos_emb.size(0), pos_emb.size(1)), inp), 0)
102 |             outputs, hidden_t = self.rnn(with_pos)
103 |             all_outputs.append(outputs)
104 |         out = torch.cat(all_outputs, 0)
105 | 
106 |         return hidden_t, out
107 | 


--------------------------------------------------------------------------------
/onmt/modules/MultiHeadedAttn.py:
--------------------------------------------------------------------------------
  1 | import math
  2 | import torch
  3 | import torch.nn as nn
  4 | from torch.autograd import Variable
  5 | 
  6 | from onmt.Utils import aeq
  7 | from onmt.modules.UtilClass import BottleLinear, \
  8 |                     BottleLayerNorm, BottleSoftmax
  9 | 
 10 | 
 11 | class MultiHeadedAttention(nn.Module):
 12 |     ''' Multi-Head Attention module from
 13 |         "Attention is All You Need".
 14 |     '''
 15 |     def __init__(self, head_count, model_dim, p=0.1):
 16 |         """
 17 |         Args:
 18 |             head_count(int): number of parallel heads.
 19 |             model_dim(int): the dimension of keys/values/queries in this
 20 |                 MultiHeadedAttention, must be divisible by head_count.
 21 |         """
 22 |         assert model_dim % head_count == 0
 23 |         self.dim_per_head = model_dim // head_count
 24 |         self.model_dim = model_dim
 25 | 
 26 |         super(MultiHeadedAttention, self).__init__()
 27 |         self.head_count = head_count
 28 | 
 29 |         self.linear_keys = BottleLinear(model_dim,
 30 |                                         head_count * self.dim_per_head,
 31 |                                         bias=False)
 32 |         self.linear_values = BottleLinear(model_dim,
 33 |                                           head_count * self.dim_per_head,
 34 |                                           bias=False)
 35 |         self.linear_query = BottleLinear(model_dim,
 36 |                                          head_count * self.dim_per_head,
 37 |                                          bias=False)
 38 |         self.sm = BottleSoftmax()
 39 |         self.activation = nn.ReLU()
 40 |         self.layer_norm = BottleLayerNorm(model_dim)
 41 |         self.dropout = nn.Dropout(p)
 42 |         self.res_dropout = nn.Dropout(p)
 43 | 
 44 |     def forward(self, key, value, query, mask=None):
 45 |         # CHECKS
 46 |         batch, k_len, d = key.size()
 47 |         batch_, k_len_, d_ = value.size()
 48 |         aeq(batch, batch_)
 49 |         aeq(k_len, k_len_)
 50 |         aeq(d, d_)
 51 |         batch_, q_len, d_ = query.size()
 52 |         aeq(batch, batch_)
 53 |         aeq(d, d_)
 54 |         aeq(self.model_dim % 8, 0)
 55 |         if mask is not None:
 56 |             batch_, q_len_, k_len_ = mask.size()
 57 |             aeq(batch_, batch)
 58 |             aeq(k_len_, k_len)
 59 |             aeq(q_len_ == q_len)
 60 |         # END CHECKS
 61 | 
 62 |         def shape_projection(x):
 63 |             b, l, d = x.size()
 64 |             return x.view(b, l, self.head_count, self.dim_per_head) \
 65 |                 .transpose(1, 2).contiguous() \
 66 |                 .view(b * self.head_count, l, self.dim_per_head)
 67 | 
 68 |         def unshape_projection(x, q):
 69 |             b, l, d = q.size()
 70 |             return x.view(b, self.head_count, l, self.dim_per_head) \
 71 |                     .transpose(1, 2).contiguous() \
 72 |                     .view(b, l, self.head_count * self.dim_per_head)
 73 | 
 74 |         residual = query
 75 |         key_up = shape_projection(self.linear_keys(key))
 76 |         value_up = shape_projection(self.linear_values(value))
 77 |         query_up = shape_projection(self.linear_query(query))
 78 | 
 79 |         scaled = torch.bmm(query_up, key_up.transpose(1, 2))
 80 |         scaled = scaled / math.sqrt(self.dim_per_head)
 81 |         bh, l, dim_per_head = scaled.size()
 82 |         b = bh // self.head_count
 83 |         if mask is not None:
 84 | 
 85 |             scaled = scaled.view(b, self.head_count, l, dim_per_head)
 86 |             mask = mask.unsqueeze(1).expand_as(scaled)
 87 |             scaled = scaled.masked_fill(Variable(mask), -float('inf')) \
 88 |                            .view(bh, l, dim_per_head)
 89 |         attn = self.sm(scaled)
 90 |         # Return one attn
 91 |         top_attn = attn \
 92 |             .view(b, self.head_count, l, dim_per_head)[:, 0, :, :] \
 93 |             .contiguous()
 94 | 
 95 |         drop_attn = self.dropout(self.sm(scaled))
 96 | 
 97 |         # values : (batch * 8) x qlen x dim
 98 |         out = unshape_projection(torch.bmm(drop_attn, value_up), residual)
 99 | 
100 |         # Residual and layer norm
101 |         res = self.res_dropout(out) + residual
102 |         ret = self.layer_norm(res)
103 | 
104 |         # CHECK
105 |         batch_, q_len_, d_ = ret.size()
106 |         aeq(q_len, q_len_)
107 |         aeq(batch, batch_)
108 |         aeq(d, d_)
109 |         # END CHECK
110 |         return ret, top_attn
111 | 


--------------------------------------------------------------------------------
/onmt/modules/StackedRNN.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import torch.nn as nn
 3 | 
 4 | 
 5 | class StackedLSTM(nn.Module):
 6 |     """
 7 |     Our own implementation of stacked LSTM.
 8 |     Needed for the decoder, because we do input feeding.
 9 |     """
10 |     def __init__(self, num_layers, input_size, rnn_size, dropout):
11 |         super(StackedLSTM, self).__init__()
12 |         self.dropout = nn.Dropout(dropout)
13 |         self.num_layers = num_layers
14 |         self.layers = nn.ModuleList()
15 | 
16 |         for i in range(num_layers):
17 |             self.layers.append(nn.LSTMCell(input_size, rnn_size))
18 |             input_size = rnn_size
19 | 
20 |     def forward(self, input, hidden):
21 |         h_0, c_0 = hidden
22 |         h_1, c_1 = [], []
23 |         for i, layer in enumerate(self.layers):
24 |             h_1_i, c_1_i = layer(input, (h_0[i], c_0[i]))
25 |             input = h_1_i
26 |             if i + 1 != self.num_layers:
27 |                 input = self.dropout(input)
28 |             h_1 += [h_1_i]
29 |             c_1 += [c_1_i]
30 | 
31 |         h_1 = torch.stack(h_1)
32 |         c_1 = torch.stack(c_1)
33 | 
34 |         return input, (h_1, c_1)
35 | 
36 | 
37 | class StackedGRU(nn.Module):
38 | 
39 |     def __init__(self, num_layers, input_size, rnn_size, dropout):
40 |         super(StackedGRU, self).__init__()
41 |         self.dropout = nn.Dropout(dropout)
42 |         self.num_layers = num_layers
43 |         self.layers = nn.ModuleList()
44 | 
45 |         for i in range(num_layers):
46 |             self.layers.append(nn.GRUCell(input_size, rnn_size))
47 |             input_size = rnn_size
48 | 
49 |     def forward(self, input, hidden):
50 |         h_1 = []
51 |         for i, layer in enumerate(self.layers):
52 |             h_1_i = layer(input, hidden[0][i])
53 |             input = h_1_i
54 |             if i + 1 != self.num_layers:
55 |                 input = self.dropout(input)
56 |             h_1 += [h_1_i]
57 | 
58 |         h_1 = torch.stack(h_1)
59 |         return input, (h_1,)
60 | 


--------------------------------------------------------------------------------
/onmt/modules/StructuredAttention.py:
--------------------------------------------------------------------------------
 1 | import torch.nn as nn
 2 | import torch
 3 | import torch.cuda
 4 | from torch.autograd import Variable
 5 | 
 6 | 
 7 | class MatrixTree(nn.Module):
 8 |     """Implementation of the matrix-tree theorem for computing marginals
 9 |     of non-projective dependency parsing. This attention layer is used
10 |     in the paper "Learning Structured Text Representations."
11 |     """
12 |     def __init__(self, eps=1e-5):
13 |         self.eps = eps
14 |         super(MatrixTree, self).__init__()
15 | 
16 |     def forward(self, input):
17 |         laplacian = input.exp() + self.eps
18 |         output = input.clone()
19 |         for b in range(input.size(0)):
20 |             lap = laplacian[b].masked_fill(
21 |                 Variable(torch.eye(input.size(1)).cuda().ne(0)), 0)
22 |             lap = -lap + torch.diag(lap.sum(0))
23 |             # store roots on diagonal
24 |             lap[0] = input[b].diag().exp()
25 |             inv_laplacian = lap.inverse()
26 | 
27 |             factor = inv_laplacian.diag().unsqueeze(1)\
28 |                                          .expand_as(input[b]).transpose(0, 1)
29 |             term1 = input[b].exp().mul(factor).clone()
30 |             term2 = input[b].exp().mul(inv_laplacian.transpose(0, 1)).clone()
31 |             term1[:, 0] = 0
32 |             term2[0] = 0
33 |             output[b] = term1 - term2
34 |             roots_output = input[b].diag().exp().mul(
35 |                 inv_laplacian.transpose(0, 1)[0])
36 |             output[b] = output[b] + torch.diag(roots_output)
37 |         return output
38 | 
39 | 
40 | if __name__ == "__main__":
41 |     dtree = MatrixTree()
42 |     q = torch.rand(1, 5, 5).cuda()
43 |     marg = dtree.forward(Variable(q))
44 |     print(marg.sum(1))
45 | 


--------------------------------------------------------------------------------
/onmt/modules/UtilClass.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import torch.nn as nn
 3 | 
 4 | 
 5 | class Bottle(nn.Module):
 6 |         def forward(self, input):
 7 |             if len(input.size()) <= 2:
 8 |                 return super(Bottle, self).forward(input)
 9 |             size = input.size()[:2]
10 |             out = super(Bottle, self).forward(input.view(size[0]*size[1], -1))
11 |             return out.contiguous().view(size[0], size[1], -1)
12 | 
13 | 
14 | class Bottle2(nn.Module):
15 |         def forward(self, input):
16 |             if len(input.size()) <= 3:
17 |                 return super(Bottle2, self).forward(input)
18 |             size = input.size()
19 |             out = super(Bottle2, self).forward(input.view(size[0]*size[1],
20 |                                                           size[2], size[3]))
21 |             return out.contiguous().view(size[0], size[1], size[2], size[3])
22 | 
23 | 
24 | class LayerNorm(nn.Module):
25 |     ''' Layer normalization module '''
26 | 
27 |     def __init__(self, d_hid, eps=1e-3):
28 |         super(LayerNorm, self).__init__()
29 | 
30 |         self.eps = eps
31 |         self.a_2 = nn.Parameter(torch.ones(d_hid), requires_grad=True)
32 |         self.b_2 = nn.Parameter(torch.zeros(d_hid), requires_grad=True)
33 | 
34 |     def forward(self, z):
35 |         if z.size(1) == 1:
36 |             return z
37 |         mu = torch.mean(z, dim=1)
38 |         sigma = torch.std(z, dim=1)
39 |         # HACK. PyTorch is changing behavior
40 |         if mu.dim() == 1:
41 |             mu = mu.unsqueeze(1)
42 |             sigma = sigma.unsqueeze(1)
43 |         ln_out = (z - mu.expand_as(z)) / (sigma.expand_as(z) + self.eps)
44 |         ln_out = ln_out.mul(self.a_2.expand_as(ln_out)) \
45 |             + self.b_2.expand_as(ln_out)
46 |         return ln_out
47 | 
48 | 
49 | class BottleLinear(Bottle, nn.Linear):
50 |     pass
51 | 
52 | 
53 | class BottleLayerNorm(Bottle, LayerNorm):
54 |     pass
55 | 
56 | 
57 | class BottleSoftmax(Bottle, nn.Softmax):
58 |     pass
59 | 
60 | 
61 | class Elementwise(nn.ModuleList):
62 |     """
63 |     A simple network container.
64 |     Parameters are a list of modules.
65 |     Inputs are a 3d Variable whose last dimension is the same length
66 |     as the list.
67 |     Outputs are the result of applying modules to inputs elementwise.
68 |     An optional merge parameter allows the outputs to be reduced to a
69 |     single Variable.
70 |     """
71 | 
72 |     def __init__(self, merge=None, *args):
73 |         assert merge in [None, 'first', 'concat', 'sum', 'mlp']
74 |         self.merge = merge
75 |         super(Elementwise, self).__init__(*args)
76 | 
77 |     def forward(self, input):
78 |         inputs = [feat.squeeze(2) for feat in input.split(1, dim=2)]
79 |         assert len(self) == len(inputs)
80 |         outputs = [f(x) for f, x in zip(self, inputs)]
81 |         if self.merge == 'first':
82 |             return outputs[0]
83 |         elif self.merge == 'concat' or self.merge == 'mlp':
84 |             return torch.cat(outputs, 2)
85 |         elif self.merge == 'sum':
86 |             return sum(outputs)
87 |         else:
88 |             return outputs
89 | 


--------------------------------------------------------------------------------
/onmt/modules/WeightNorm.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Implementation of "Weight Normalization: A Simple Reparameterization
  3 | to Accelerate Training of Deep Neural Networks"
  4 | As a reparameterization method, weight normalization is same
  5 | as BatchNormalization, but it doesn't depend on minibatch.
  6 | """
  7 | import torch
  8 | import torch.nn as nn
  9 | import torch.nn.functional as F
 10 | from torch.nn import Parameter
 11 | from torch.autograd import Variable
 12 | 
 13 | 
 14 | def get_var_maybe_avg(namespace, var_name, training, polyak_decay):
 15 |     # utility for retrieving polyak averaged params
 16 |     # Update average
 17 |     v = getattr(namespace, var_name)
 18 |     v_avg = getattr(namespace, var_name + '_avg')
 19 |     v_avg -= (1 - polyak_decay) * (v_avg - v.data)
 20 | 
 21 |     if training:
 22 |         return v
 23 |     else:
 24 |         return Variable(v_avg)
 25 | 
 26 | 
 27 | def get_vars_maybe_avg(namespace, var_names, training, polyak_decay):
 28 |     # utility for retrieving polyak averaged params
 29 |     vars = []
 30 |     for vn in var_names:
 31 |         vars.append(get_var_maybe_avg(
 32 |             namespace, vn, training, polyak_decay))
 33 |     return vars
 34 | 
 35 | 
 36 | class WeightNormLinear(nn.Linear):
 37 |     def __init__(self, in_features, out_features,
 38 |                  init_scale=1., polyak_decay=0.9995):
 39 |         super(WeightNormLinear, self).__init__(
 40 |             in_features, out_features, bias=True)
 41 | 
 42 |         self.V = self.weight
 43 |         self.g = Parameter(torch.Tensor(out_features))
 44 |         self.b = self.bias
 45 | 
 46 |         self.register_buffer(
 47 |             'V_avg', torch.zeros(out_features, in_features))
 48 |         self.register_buffer('g_avg', torch.zeros(out_features))
 49 |         self.register_buffer('b_avg', torch.zeros(out_features))
 50 | 
 51 |         self.init_scale = init_scale
 52 |         self.polyak_decay = polyak_decay
 53 |         self.reset_parameters()
 54 | 
 55 |     def reset_parameters(self):
 56 |         return
 57 | 
 58 |     def forward(self, x, init=False):
 59 |         if init is True:
 60 |             # out_features * in_features
 61 |             self.V.data.copy_(torch.randn(self.V.data.size()).type_as(
 62 |                 self.V.data) * 0.05)
 63 |             # norm is out_features * 1
 64 |             V_norm = self.V.data / \
 65 |                 self.V.data.norm(2, 1).expand_as(self.V.data)
 66 |             # batch_size * out_features
 67 |             x_init = F.linear(x, Variable(V_norm)).data
 68 |             # out_features
 69 |             m_init, v_init = x_init.mean(0).squeeze(
 70 |                 0), x_init.var(0).squeeze(0)
 71 |             # out_features
 72 |             scale_init = self.init_scale / \
 73 |                 torch.sqrt(v_init + 1e-10)
 74 |             self.g.data.copy_(scale_init)
 75 |             self.b.data.copy_(-m_init * scale_init)
 76 |             x_init = scale_init.view(1, -1).expand_as(x_init) \
 77 |                 * (x_init - m_init.view(1, -1).expand_as(x_init))
 78 |             self.V_avg.copy_(self.V.data)
 79 |             self.g_avg.copy_(self.g.data)
 80 |             self.b_avg.copy_(self.b.data)
 81 |             return Variable(x_init)
 82 |         else:
 83 |             V, g, b = get_vars_maybe_avg(self, ['V', 'g', 'b'],
 84 |                                          self.training,
 85 |                                          polyak_decay=self.polyak_decay)
 86 |             # batch_size * out_features
 87 |             x = F.linear(x, V)
 88 |             scalar = g / torch.norm(V, 2, 1).squeeze(1)
 89 |             x = scalar.view(1, -1).expand_as(x) * x + \
 90 |                 b.view(1, -1).expand_as(x)
 91 |             return x
 92 | 
 93 | 
 94 | class WeightNormConv2d(nn.Conv2d):
 95 |     def __init__(self, in_channels, out_channels, kernel_size, stride=1,
 96 |                  padding=0, dilation=1, groups=1, init_scale=1.,
 97 |                  polyak_decay=0.9995):
 98 |         super(WeightNormConv2d, self).__init__(in_channels, out_channels,
 99 |                                                kernel_size, stride, padding,
100 |                                                dilation, groups)
101 | 
102 |         self.V = self.weight
103 |         self.g = Parameter(torch.Tensor(out_channels))
104 |         self.b = self.bias
105 | 
106 |         self.register_buffer('V_avg', torch.zeros(self.V.size()))
107 |         self.register_buffer('g_avg', torch.zeros(out_channels))
108 |         self.register_buffer('b_avg', torch.zeros(out_channels))
109 | 
110 |         self.init_scale = init_scale
111 |         self.polyak_decay = polyak_decay
112 |         self.reset_parameters()
113 | 
114 |     def reset_parameters(self):
115 |         return
116 | 
117 |     def forward(self, x, init=False):
118 |         if init is True:
119 |             # out_channels, in_channels // groups, * kernel_size
120 |             self.V.data.copy_(torch.randn(self.V.data.size()
121 |                                           ).type_as(self.V.data) * 0.05)
122 |             V_norm = self.V.data / self.V.data.view(self.out_channels, -1)\
123 |                 .norm(2, 1).view(self.out_channels, *(
124 |                     [1] * (len(self.kernel_size) + 1))).expand_as(self.V.data)
125 |             x_init = F.conv2d(x, Variable(V_norm), None, self.stride,
126 |                               self.padding, self.dilation, self.groups).data
127 |             t_x_init = x_init.transpose(0, 1).contiguous().view(
128 |                 self.out_channels, -1)
129 |             m_init, v_init = t_x_init.mean(1).squeeze(
130 |                 1), t_x_init.var(1).squeeze(1)
131 |             # out_features
132 |             scale_init = self.init_scale / \
133 |                 torch.sqrt(v_init + 1e-10)
134 |             self.g.data.copy_(scale_init)
135 |             self.b.data.copy_(-m_init * scale_init)
136 |             scale_init_shape = scale_init.view(
137 |                 1, self.out_channels, *([1] * (len(x_init.size()) - 2)))
138 |             m_init_shape = m_init.view(
139 |                 1, self.out_channels, *([1] * (len(x_init.size()) - 2)))
140 |             x_init = scale_init_shape.expand_as(
141 |                 x_init) * (x_init - m_init_shape.expand_as(x_init))
142 |             self.V_avg.copy_(self.V.data)
143 |             self.g_avg.copy_(self.g.data)
144 |             self.b_avg.copy_(self.b.data)
145 |             return Variable(x_init)
146 |         else:
147 |             V, g, b = get_vars_maybe_avg(
148 |                 self, ['V', 'g', 'b'], self.training,
149 |                 polyak_decay=self.polyak_decay)
150 | 
151 |             scalar = torch.norm(V.view(self.out_channels, -1), 2, 1)
152 |             if len(scalar.size()) == 2:
153 |                 scalar = g / scalar.squeeze(1)
154 |             else:
155 |                 scalar = g / scalar
156 | 
157 |             W = scalar.view(self.out_channels, *
158 |                             ([1] * (len(V.size()) - 1))).expand_as(V) * V
159 | 
160 |             x = F.conv2d(x, W, b, self.stride,
161 |                          self.padding, self.dilation, self.groups)
162 |             return x
163 | 
164 | 
165 | class WeightNormConvTranspose2d(nn.ConvTranspose2d):
166 |     def __init__(self, in_channels, out_channels, kernel_size, stride=1,
167 |                  padding=0, output_padding=0, groups=1, init_scale=1.,
168 |                  polyak_decay=0.9995):
169 |         super(WeightNormConvTranspose2d, self).__init__(
170 |             in_channels, out_channels,
171 |             kernel_size, stride,
172 |             padding, output_padding,
173 |             groups)
174 |         # in_channels, out_channels, *kernel_size
175 |         self.V = self.weight
176 |         self.g = Parameter(torch.Tensor(out_channels))
177 |         self.b = self.bias
178 | 
179 |         self.register_buffer('V_avg', torch.zeros(self.V.size()))
180 |         self.register_buffer('g_avg', torch.zeros(out_channels))
181 |         self.register_buffer('b_avg', torch.zeros(out_channels))
182 | 
183 |         self.init_scale = init_scale
184 |         self.polyak_decay = polyak_decay
185 |         self.reset_parameters()
186 | 
187 |     def reset_parameters(self):
188 |         return
189 | 
190 |     def forward(self, x, init=False):
191 |         if init is True:
192 |             # in_channels, out_channels, *kernel_size
193 |             self.V.data.copy_(torch.randn(self.V.data.size()).type_as(
194 |                 self.V.data) * 0.05)
195 |             V_norm = self.V.data / self.V.data.transpose(0, 1).contiguous() \
196 |                 .view(self.out_channels, -1).norm(2, 1).view(
197 |                 self.in_channels, self.out_channels,
198 |                 *([1] * len(self.kernel_size))).expand_as(self.V.data)
199 |             x_init = F.conv_transpose2d(
200 |                 x, Variable(V_norm), None, self.stride,
201 |                 self.padding, self.output_padding, self.groups).data
202 |             # self.out_channels, 1
203 |             t_x_init = x_init.tranpose(0, 1).contiguous().view(
204 |                 self.out_channels, -1)
205 |             # out_features
206 |             m_init, v_init = t_x_init.mean(1).squeeze(
207 |                 1), t_x_init.var(1).squeeze(1)
208 |             # out_features
209 |             scale_init = self.init_scale / \
210 |                 torch.sqrt(v_init + 1e-10)
211 |             self.g.data.copy_(scale_init)
212 |             self.b.data.copy_(-m_init * scale_init)
213 |             scale_init_shape = scale_init.view(
214 |                 1, self.out_channels, *([1] * (len(x_init.size()) - 2)))
215 |             m_init_shape = m_init.view(
216 |                 1, self.out_channels, *([1] * (len(x_init.size()) - 2)))
217 | 
218 |             x_init = scale_init_shape.expand_as(x_init)\
219 |                 * (x_init - m_init_shape.expand_as(x_init))
220 |             self.V_avg.copy_(self.V.data)
221 |             self.g_avg.copy_(self.g.data)
222 |             self.b_avg.copy_(self.b.data)
223 |             return Variable(x_init)
224 |         else:
225 |             V, g, b = get_vars_maybe_avg(
226 |                 self, ['V', 'g', 'b'], self.training,
227 |                 polyak_decay=self.polyak_decay)
228 |             scalar = g / \
229 |                 torch.norm(V.transpose(0, 1).contiguous().view(
230 |                     self.out_channels, -1), 2, 1).squeeze(1)
231 |             W = scalar.view(self.in_channels, self.out_channels,
232 |                             *([1] * (len(V.size()) - 2))).expand_as(V) * V
233 | 
234 |             x = F.conv_transpose2d(x, W, b, self.stride,
235 |                                    self.padding, self.output_padding,
236 |                                    self.groups)
237 |             return x
238 | 


--------------------------------------------------------------------------------
/onmt/modules/__init__.py:
--------------------------------------------------------------------------------
 1 | from onmt.modules.UtilClass import LayerNorm, Bottle, BottleLinear, \
 2 |     BottleLayerNorm, BottleSoftmax, Elementwise
 3 | from onmt.modules.Gate import ContextGateFactory
 4 | from onmt.modules.GlobalAttention import GlobalAttention
 5 | from onmt.modules.ConvMultiStepAttention import ConvMultiStepAttention
 6 | from onmt.modules.ImageEncoder import ImageEncoder
 7 | from onmt.modules.CopyGenerator import CopyGenerator, CopyGeneratorLossCompute
 8 | from onmt.modules.StructuredAttention import MatrixTree
 9 | from onmt.modules.Transformer import TransformerEncoder, TransformerDecoder
10 | from onmt.modules.Conv2Conv import CNNEncoder, CNNDecoder
11 | from onmt.modules.MultiHeadedAttn import MultiHeadedAttention
12 | from onmt.modules.StackedRNN import StackedLSTM, StackedGRU
13 | from onmt.modules.Embeddings import Embeddings
14 | from onmt.modules.WeightNorm import WeightNormConv2d
15 | 
16 | from onmt.modules.SRU import check_sru_requirement
17 | can_use_sru = check_sru_requirement()
18 | if can_use_sru:
19 |     from onmt.modules.SRU import SRU
20 | 
21 | 
22 | # For flake8 compatibility.
23 | __all__ = [GlobalAttention, ImageEncoder, CopyGenerator, MultiHeadedAttention,
24 |            LayerNorm, Bottle, BottleLinear, BottleLayerNorm, BottleSoftmax,
25 |            TransformerEncoder, TransformerDecoder, Embeddings, Elementwise,
26 |            MatrixTree, WeightNormConv2d, ConvMultiStepAttention,
27 |            CNNEncoder, CNNDecoder, StackedLSTM, StackedGRU, ContextGateFactory,
28 |            CopyGeneratorLossCompute]
29 | 
30 | if can_use_sru:
31 |     __all__.extend([SRU, check_sru_requirement])
32 | 


--------------------------------------------------------------------------------
/onmt/standard_options.py:
--------------------------------------------------------------------------------
  1 | from collections import namedtuple
  2 | import torch
  3 | 
  4 | USE_CUDA = torch.cuda.is_available()
  5 | 
  6 | 
  7 | '''
  8 | This file will store the standard options used by openNMT-py in a dictionary
  9 | 
 10 | '''
 11 | 
 12 | stdOptions = {
 13 | 
 14 |     # Model options
 15 |     'model_type':'text', # Type of encoder to use. Options are [text|img]
 16 | 
 17 |     # Embedding Options
 18 |     'word_vec_size':-1,       # Word embedding for both.
 19 |     'src_word_vec_size':500, # Src word embedding sizes
 20 |     'tgt_word_vec_size':500, # Tgt word embedding sizes
 21 |     'feat_vec_size':-1,      # If specified, feature embedding sizes will be set to this. Otherwise, feat_vec_exponent
 22 |                              # will be used.
 23 |     'feat_merge':'concat',   # Merge action for the features embeddings
 24 |     'feat_vec_exponent':0.7, # When using -feat_merge concat, feature embedding sizes will be set to
 25 |                              # N^feat_vec_exponent where N is the number of values the feature takes.
 26 |     'position_encoding':False, #Use a sin to mark relative words positions.
 27 |     'share_decoder_embeddings':False, #Share the word and out embeddings for decoder.
 28 | 
 29 |     # RNN Options
 30 |     'encoder_type':'rnn', # Type of encoder layer to use. Choices: ['rnn', 'brnn', 'mean', 'transformer']
 31 |     'decoder_type':'rnn', # Type of decoder layer to use. Choices: ['rnn', 'transformer']
 32 |     'layers':-1,           # Number of layers in enc/dec.
 33 |     'enc_layers':2,       # Number of layers in the encoder
 34 |     'dec_layers':2,       # Number of layers in the decoder
 35 |     'rnn_size':500,       # Size of LSTM hidden states
 36 |     'input_feed':1,       # Feed the context vector at each time step as additional input (via concatenation
 37 |                           # with the word embeddings) to the decoder.
 38 |     'rnn_type':'LSTM',    # The gate type to use in the RNNs. Choices: ['LSTM', 'GRU']
 39 |     'brnn':False,         # Deprecated, use `encoder_type`.
 40 |     'brnn_merge':'concat',# Merge action for the bidir hidden states
 41 |     'context_gate':None,  # Type of context gate to use. Do not select for no context gate.
 42 |                           # Choices: ['concat', 'sum']
 43 | 
 44 |     # Attention options
 45 | 
 46 |     'global_attention':'general', # The attention type to use: dotprot or general (Luong) or MLP (Bahdanau)
 47 |                                   # Choices: ['dot', 'general', 'mlp']
 48 | 
 49 |     # Genenerator and loss options.
 50 | 
 51 |     'copy_attn':False,            # Train copy attention layer
 52 |     'copy_attn_force':False,      # When available, train to copy.
 53 |     'coverage_attn':False,        # Train a coverage attention layer.
 54 |     'lambda_coverage':1,           # Lambda value for coverage.
 55 | 
 56 |     # Training Options
 57 | 
 58 |     'save_model':'model', # Model filename (the model will be saved as <save_model>_epochN_PPL.pt where PPL is the
 59 |                           # validation perplexity
 60 |     'train_from':'',      # If training from a checkpoint then this is the path to the pretrained model's state_dict.
 61 | 
 62 |     # GPU options
 63 | 
 64 |     'gpuid':[0],  # Use CUDA on the listed devices.
 65 |     'seed':-1,    # Random seed used for the experiments reproducibility.
 66 | 
 67 |     # Init options
 68 | 
 69 |     'start_epoch':1,  # The epoch from which to start
 70 |     'param_init':0.1, # Parameters are initialized over uniform distribution with support (-param_init, param_init).
 71 |                       # Use 0 to not use initialization
 72 | 
 73 |     # Pretrained word vectors
 74 | 
 75 |     'pre_word_vecs_enc':None, # If a valid path is specified, then this will load pretrained word embeddings
 76 |                               # on the encoder side. See README for specific formatting instructions.
 77 |     'pre_word_vecs_dec':None, # If a valid path is specified, then this will load pretrained word embeddings
 78 |                               # on the decoder side. See README for specific formatting instructions.
 79 | 
 80 |     # Fixed word vectors
 81 | 
 82 |     'fix_word_vecs_enc':False, # Fix word embeddings on the encoder side
 83 |     'fix_word_vecs_dec':False, # Fix word embeddings on the decoder side
 84 | 
 85 |     # Optimization options
 86 | 
 87 |     'batch_size':64,
 88 |     'max_generator_batches':32, # Maximum batches of words in a sequence to run the generator on in parallel.
 89 |                                 # Higher is faster, but uses more memory.
 90 |     'epochs':13,                # Number of training epochs
 91 |     'optim':'sgd',              # Optimization method. Choices: ['sgd', 'adagrad', 'adadelta', 'adam']
 92 |     'max_grad_norm':5.0,        # If the norm of the gradient vector exceeds this, renormalize it to have the norm
 93 |                                 # equal to max_grad_norm
 94 |     'dropout':0.3,              # Dropout probability; applied in LSTM stacks.
 95 |     'truncated_decoder':0,      # Truncated bptt
 96 | 
 97 |     #Learning rate
 98 | 
 99 |     'learning_rate':1.0,        # Starting learning rate. If adagrad/adadelta/adam is used, then this is the global
100 |                                 # learning rate. Recommended settings: sgd = 1, adagrad = 0.1,
101 |     'learning_rate_decay':0.5, # If update_learning_rate, decay learning rate by this much if (i) perplexity does
102 |                                 # not decrease on the validation set or (ii) epoch has gone past start_decay_at
103 |     'start_decay_at':8,         # Start decaying every epoch after and including this epoch
104 |     'start_checkpoint_at':0,    # Start checkpointing every epoch after and including this epoch
105 |     'decay_method': '',         # Use a custom decay rate. Choices: ['noam']
106 |     'warmup_steps': 4000,       # Number of warmup steps for custom decay.
107 |     'report_every': 50,         # Print stats at this interval.
108 |     'exp_host':'',              # Send logs to this crayon server.
109 |     'exp': ''                   # Name of the experiment for logging.
110 | 
111 | }
112 | 
113 | # the standard preprocessing options to use with openNMT-py dataset parser
114 | standardPreProcessingOptions = {
115 | 
116 |     # Dictionary options
117 |     'src_vocab_size': 50000,  # Size of the source vocabulary
118 |     'tgt_vocab_size': 50000,  # Size of the target vocabulary
119 |     'src_words_min_frequency': 0,
120 |     'tgt_words_min_frequency': 0,
121 | 
122 |     # Truncation options
123 |     'src_seq_length': 50,  # Maximum source sequence length
124 |     'src_seq_length_trunc': 0,  # Truncate source sequence length.
125 |     'tgt_seq_length': 50,  # Maximum target sequence length to keep
126 |     'tgt_seq_length_trunc': 0,  # Truncate target sequence length.
127 | 
128 |     # Data processing options
129 | 
130 |     'shuffle': 1,  # Shuffle data
131 |     'lower': True,  # lowercase data
132 | 
133 |     # Options most relevant to summarization
134 | 
135 |     'dynamic_dict': False,  # Create dynamic dictionaries
136 |     'share_vocab': False  # Share source and target vocabulary
137 | }
138 | 
139 | standardTranslationOptions = {
140 | 
141 |     'tgt':None,            # True target sequence (optional)
142 |     'output':'pred.txt',   # Path to output the predictions (each line will be the decoded sequence)
143 |     'beam_size':5,
144 |     'batch_size':1,
145 |     'max_sent_length':100,
146 |     'replace_unk':True,    # Replace the generated UNK tokens with the source token that had highest attention weight.
147 |                            # If phrase_table is provided, it will lookup the identified source token and
148 |                            # give the corresponding target token. If it is not provided (or the identified source token
149 |                            # does not exist in the table) then it will copy the source token
150 |     'verbose':False,       # Print scores and predictions for each sentence'
151 |     'attn_debug':False,    # Print best attn for each word
152 |     'dump_beam':'',        # File to dump beam information to.
153 |     'n_best':1,            # If verbose is set, will output the n_best decoded sentences
154 |     'gpu':0,               # Device to run on
155 |     'dynamic_dict':False,   # Create dynamic dictionaries
156 |     'share_vocab':False,    # Share source and target vocabulary
157 |     'cuda':USE_CUDA
158 | }
159 | 
160 | 
161 | 
162 | 
163 | 


--------------------------------------------------------------------------------
/perl_scripts/README.md:
--------------------------------------------------------------------------------
1 | Helpful files taken from the moses project: https://github.com/moses-smt/mosesdecoder
2 | 
3 | 


--------------------------------------------------------------------------------
/perl_scripts/multi-bleu.perl:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env perl
  2 | #
  3 | # This file is part of moses.  Its use is licensed under the GNU Lesser General
  4 | # Public License version 2.1 or, at your option, any later version.
  5 | 
  6 | # $Id$
  7 | use warnings;
  8 | use strict;
  9 | 
 10 | my $lowercase = 0;
 11 | if ($ARGV[0] eq "-lc") {
 12 |   $lowercase = 1;
 13 |   shift;
 14 | }
 15 | 
 16 | my $stem = $ARGV[0];
 17 | if (!defined $stem) {
 18 |   print STDERR "usage: multi-bleu.pl [-lc] reference < hypothesis\n";
 19 |   print STDERR "Reads the references from reference or reference0, reference1, ...\n";
 20 |   exit(1);
 21 | }
 22 | 
 23 | $stem .= ".ref" if !-e $stem && !-e $stem."0" && -e $stem.".ref0";
 24 | 
 25 | my @REF;
 26 | my $ref=0;
 27 | while(-e "$stem$ref") {
 28 |     &add_to_ref("$stem$ref",\@REF);
 29 |     $ref++;
 30 | }
 31 | &add_to_ref($stem,\@REF) if -e $stem;
 32 | die("ERROR: could not find reference file $stem") unless scalar @REF;
 33 | 
 34 | # add additional references explicitly specified on the command line
 35 | shift;
 36 | foreach my $stem (@ARGV) {
 37 |     &add_to_ref($stem,\@REF) if -e $stem;
 38 | }
 39 | 
 40 | 
 41 | 
 42 | sub add_to_ref {
 43 |     my ($file,$REF) = @_;
 44 |     my $s=0;
 45 |     if ($file =~ /.gz$/) {
 46 | 	open(REF,"gzip -dc $file|") or die "Can't read $file";
 47 |     } else { 
 48 | 	open(REF,$file) or die "Can't read $file";
 49 |     }
 50 |     while(<REF>) {
 51 | 	chop;
 52 | 	push @{$$REF[$s++]}, $_;
 53 |     }
 54 |     close(REF);
 55 | }
 56 | 
 57 | my(@CORRECT,@TOTAL,$length_translation,$length_reference);
 58 | my $s=0;
 59 | while(<STDIN>) {
 60 |     chop;
 61 |     $_ = lc if $lowercase;
 62 |     my @WORD = split;
 63 |     my %REF_NGRAM = ();
 64 |     my $length_translation_this_sentence = scalar(@WORD);
 65 |     my ($closest_diff,$closest_length) = (9999,9999);
 66 |     foreach my $reference (@{$REF[$s]}) {
 67 | #      print "$s $_ <=> $reference\n";
 68 |   $reference = lc($reference) if $lowercase;
 69 | 	my @WORD = split(' ',$reference);
 70 | 	my $length = scalar(@WORD);
 71 |         my $diff = abs($length_translation_this_sentence-$length);
 72 | 	if ($diff < $closest_diff) {
 73 | 	    $closest_diff = $diff;
 74 | 	    $closest_length = $length;
 75 | 	    # print STDERR "$s: closest diff ".abs($length_translation_this_sentence-$length)." = abs($length_translation_this_sentence-$length), setting len: $closest_length\n";
 76 | 	} elsif ($diff == $closest_diff) {
 77 |             $closest_length = $length if $length < $closest_length;
 78 |             # from two references with the same closeness to me
 79 |             # take the *shorter* into account, not the "first" one.
 80 |         }
 81 | 	for(my $n=1;$n<=4;$n++) {
 82 | 	    my %REF_NGRAM_N = ();
 83 | 	    for(my $start=0;$start<=$#WORD-($n-1);$start++) {
 84 | 		my $ngram = "$n";
 85 | 		for(my $w=0;$w<$n;$w++) {
 86 | 		    $ngram .= " ".$WORD[$start+$w];
 87 | 		}
 88 | 		$REF_NGRAM_N{$ngram}++;
 89 | 	    }
 90 | 	    foreach my $ngram (keys %REF_NGRAM_N) {
 91 | 		if (!defined($REF_NGRAM{$ngram}) ||
 92 | 		    $REF_NGRAM{$ngram} < $REF_NGRAM_N{$ngram}) {
 93 | 		    $REF_NGRAM{$ngram} = $REF_NGRAM_N{$ngram};
 94 | #	    print "$i: REF_NGRAM{$ngram} = $REF_NGRAM{$ngram}<BR>\n";
 95 | 		}
 96 | 	    }
 97 | 	}
 98 |     }
 99 |     $length_translation += $length_translation_this_sentence;
100 |     $length_reference += $closest_length;
101 |     for(my $n=1;$n<=4;$n++) {
102 | 	my %T_NGRAM = ();
103 | 	for(my $start=0;$start<=$#WORD-($n-1);$start++) {
104 | 	    my $ngram = "$n";
105 | 	    for(my $w=0;$w<$n;$w++) {
106 | 		$ngram .= " ".$WORD[$start+$w];
107 | 	    }
108 | 	    $T_NGRAM{$ngram}++;
109 | 	}
110 | 	foreach my $ngram (keys %T_NGRAM) {
111 | 	    $ngram =~ /^(\d+) /;
112 | 	    my $n = $1;
113 |             # my $corr = 0;
114 | #	print "$i e $ngram $T_NGRAM{$ngram}<BR>\n";
115 | 	    $TOTAL[$n] += $T_NGRAM{$ngram};
116 | 	    if (defined($REF_NGRAM{$ngram})) {
117 | 		if ($REF_NGRAM{$ngram} >= $T_NGRAM{$ngram}) {
118 | 		    $CORRECT[$n] += $T_NGRAM{$ngram};
119 |                     # $corr =  $T_NGRAM{$ngram};
120 | #	    print "$i e correct1 $T_NGRAM{$ngram}<BR>\n";
121 | 		}
122 | 		else {
123 | 		    $CORRECT[$n] += $REF_NGRAM{$ngram};
124 |                     # $corr =  $REF_NGRAM{$ngram};
125 | #	    print "$i e correct2 $REF_NGRAM{$ngram}<BR>\n";
126 | 		}
127 | 	    }
128 |             # $REF_NGRAM{$ngram} = 0 if !defined $REF_NGRAM{$ngram};
129 |             # print STDERR "$ngram: {$s, $REF_NGRAM{$ngram}, $T_NGRAM{$ngram}, $corr}\n"
130 | 	}
131 |     }
132 |     $s++;
133 | }
134 | my $brevity_penalty = 1;
135 | my $bleu = 0;
136 | 
137 | my @bleu=();
138 | 
139 | for(my $n=1;$n<=4;$n++) {
140 |   if (defined ($TOTAL[$n])){
141 |     $bleu[$n]=($TOTAL[$n])?$CORRECT[$n]/$TOTAL[$n]:0;
142 |     # print STDERR "CORRECT[$n]:$CORRECT[$n] TOTAL[$n]:$TOTAL[$n]\n";
143 |   }else{
144 |     $bleu[$n]=0;
145 |   }
146 | }
147 | 
148 | if ($length_reference==0){
149 |   printf "BLEU = 0, 0/0/0/0 (BP=0, ratio=0, hyp_len=0, ref_len=0)\n";
150 |   exit(1);
151 | }
152 | 
153 | if ($length_translation<$length_reference) {
154 |   $brevity_penalty = exp(1-$length_reference/$length_translation);
155 | }
156 | $bleu = $brevity_penalty * exp((my_log( $bleu[1] ) +
157 | 				my_log( $bleu[2] ) +
158 | 				my_log( $bleu[3] ) +
159 | 				my_log( $bleu[4] ) ) / 4) ;
160 | printf "BLEU = %.2f, %.1f/%.1f/%.1f/%.1f (BP=%.3f, ratio=%.3f, hyp_len=%d, ref_len=%d)\n",
161 |     100*$bleu,
162 |     100*$bleu[1],
163 |     100*$bleu[2],
164 |     100*$bleu[3],
165 |     100*$bleu[4],
166 |     $brevity_penalty,
167 |     $length_translation / $length_reference,
168 |     $length_translation,
169 |     $length_reference;
170 | 
171 | sub my_log {
172 |   return -9999999999 unless $_[0];
173 |   return log($_[0]);
174 | }
175 | 


--------------------------------------------------------------------------------
/perl_scripts/nonbreaking_prefix.de:
--------------------------------------------------------------------------------
  1 | #Anything in this file, followed by a period (and an upper-case word), does NOT indicate an end-of-sentence marker.
  2 | #Special cases are included for prefixes that ONLY appear before 0-9 numbers.
  3 | 
  4 | #any single upper case letter  followed by a period is not a sentence ender (excluding I occasionally, but we leave it in)
  5 | #usually upper case letters are initials in a name
  6 | #no german words end in single lower-case letters, so we throw those in too.
  7 | A
  8 | B
  9 | C
 10 | D
 11 | E
 12 | F
 13 | G
 14 | H
 15 | I
 16 | J
 17 | K
 18 | L
 19 | M
 20 | N
 21 | O
 22 | P
 23 | Q
 24 | R
 25 | S
 26 | T
 27 | U
 28 | V
 29 | W
 30 | X
 31 | Y
 32 | Z
 33 | a
 34 | b
 35 | c
 36 | d
 37 | e
 38 | f
 39 | g
 40 | h
 41 | i
 42 | j
 43 | k
 44 | l
 45 | m
 46 | n
 47 | o
 48 | p
 49 | q
 50 | r
 51 | s
 52 | t
 53 | u
 54 | v
 55 | w
 56 | x
 57 | y
 58 | z
 59 | 
 60 | 
 61 | #Roman Numerals. A dot after one of these is not a sentence break in German.
 62 | I
 63 | II
 64 | III
 65 | IV
 66 | V
 67 | VI
 68 | VII
 69 | VIII
 70 | IX
 71 | X
 72 | XI
 73 | XII
 74 | XIII
 75 | XIV
 76 | XV
 77 | XVI
 78 | XVII
 79 | XVIII
 80 | XIX
 81 | XX
 82 | i
 83 | ii
 84 | iii
 85 | iv
 86 | v
 87 | vi
 88 | vii
 89 | viii
 90 | ix
 91 | x
 92 | xi
 93 | xii
 94 | xiii
 95 | xiv
 96 | xv
 97 | xvi
 98 | xvii
 99 | xviii
100 | xix
101 | xx
102 | 
103 | #Titles and Honorifics
104 | Adj
105 | Adm
106 | Adv
107 | Asst
108 | Bart
109 | Bldg
110 | Brig
111 | Bros
112 | Capt
113 | Cmdr
114 | Col
115 | Comdr
116 | Con
117 | Corp
118 | Cpl
119 | DR
120 | Dr
121 | Ens
122 | Gen
123 | Gov
124 | Hon
125 | Hosp
126 | Insp
127 | Lt
128 | MM
129 | MR
130 | MRS
131 | MS
132 | Maj
133 | Messrs
134 | Mlle
135 | Mme
136 | Mr
137 | Mrs
138 | Ms
139 | Msgr
140 | Op
141 | Ord
142 | Pfc
143 | Ph
144 | Prof
145 | Pvt
146 | Rep
147 | Reps
148 | Res
149 | Rev
150 | Rt
151 | Sen
152 | Sens
153 | Sfc
154 | Sgt
155 | Sr
156 | St
157 | Supt
158 | Surg
159 | 
160 | #Misc symbols
161 | Mio
162 | Mrd
163 | bzw
164 | v
165 | vs
166 | usw
167 | d.h
168 | z.B
169 | u.a
170 | etc
171 | Mrd
172 | MwSt
173 | ggf
174 | d.J
175 | D.h
176 | m.E
177 | vgl
178 | I.F
179 | z.T
180 | sogen
181 | ff
182 | u.E
183 | g.U
184 | g.g.A
185 | c.-à-d
186 | Buchst
187 | u.s.w
188 | sog
189 | u.ä
190 | Std
191 | evtl
192 | Zt
193 | Chr
194 | u.U
195 | o.ä
196 | Ltd
197 | b.A
198 | z.Zt
199 | spp
200 | sen
201 | SA
202 | k.o
203 | jun
204 | i.H.v
205 | dgl
206 | dergl
207 | Co
208 | zzt
209 | usf
210 | s.p.a
211 | Dkr
212 | Corp
213 | bzgl
214 | BSE
215 | 
216 | #Number indicators
217 | # add #NUMERIC_ONLY# after the word if it should ONLY be non-breaking when a 0-9 digit follows it
218 | No
219 | Nos
220 | Art
221 | Nr
222 | pp
223 | ca
224 | Ca
225 | 
226 | #Ordinals are done with . in German - "1." = "1st" in English
227 | 1
228 | 2
229 | 3
230 | 4
231 | 5
232 | 6
233 | 7
234 | 8
235 | 9
236 | 10
237 | 11
238 | 12
239 | 13
240 | 14
241 | 15
242 | 16
243 | 17
244 | 18
245 | 19
246 | 20
247 | 21
248 | 22
249 | 23
250 | 24
251 | 25
252 | 26
253 | 27
254 | 28
255 | 29
256 | 30
257 | 31
258 | 32
259 | 33
260 | 34
261 | 35
262 | 36
263 | 37
264 | 38
265 | 39
266 | 40
267 | 41
268 | 42
269 | 43
270 | 44
271 | 45
272 | 46
273 | 47
274 | 48
275 | 49
276 | 50
277 | 51
278 | 52
279 | 53
280 | 54
281 | 55
282 | 56
283 | 57
284 | 58
285 | 59
286 | 60
287 | 61
288 | 62
289 | 63
290 | 64
291 | 65
292 | 66
293 | 67
294 | 68
295 | 69
296 | 70
297 | 71
298 | 72
299 | 73
300 | 74
301 | 75
302 | 76
303 | 77
304 | 78
305 | 79
306 | 80
307 | 81
308 | 82
309 | 83
310 | 84
311 | 85
312 | 86
313 | 87
314 | 88
315 | 89
316 | 90
317 | 91
318 | 92
319 | 93
320 | 94
321 | 95
322 | 96
323 | 97
324 | 98
325 | 99
326 | 


--------------------------------------------------------------------------------
/perl_scripts/nonbreaking_prefix.en:
--------------------------------------------------------------------------------
  1 | #Anything in this file, followed by a period (and an upper-case word), does NOT indicate an end-of-sentence marker.
  2 | #Special cases are included for prefixes that ONLY appear before 0-9 numbers.
  3 | 
  4 | #any single upper case letter  followed by a period is not a sentence ender (excluding I occasionally, but we leave it in)
  5 | #usually upper case letters are initials in a name
  6 | A
  7 | B
  8 | C
  9 | D
 10 | E
 11 | F
 12 | G
 13 | H
 14 | I
 15 | J
 16 | K
 17 | L
 18 | M
 19 | N
 20 | O
 21 | P
 22 | Q
 23 | R
 24 | S
 25 | T
 26 | U
 27 | V
 28 | W
 29 | X
 30 | Y
 31 | Z
 32 | 
 33 | #List of titles. These are often followed by upper-case names, but do not indicate sentence breaks
 34 | Adj
 35 | Adm
 36 | Adv
 37 | Asst
 38 | Bart
 39 | Bldg
 40 | Brig
 41 | Bros
 42 | Capt
 43 | Cmdr
 44 | Col
 45 | Comdr
 46 | Con
 47 | Corp
 48 | Cpl
 49 | DR
 50 | Dr
 51 | Drs
 52 | Ens
 53 | Gen
 54 | Gov
 55 | Hon
 56 | Hr
 57 | Hosp
 58 | Insp
 59 | Lt
 60 | MM
 61 | MR
 62 | MRS
 63 | MS
 64 | Maj
 65 | Messrs
 66 | Mlle
 67 | Mme
 68 | Mr
 69 | Mrs
 70 | Ms
 71 | Msgr
 72 | Op
 73 | Ord
 74 | Pfc
 75 | Ph
 76 | Prof
 77 | Pvt
 78 | Rep
 79 | Reps
 80 | Res
 81 | Rev
 82 | Rt
 83 | Sen
 84 | Sens
 85 | Sfc
 86 | Sgt
 87 | Sr
 88 | St
 89 | Supt
 90 | Surg
 91 | 
 92 | #misc - odd period-ending items that NEVER indicate breaks (p.m. does NOT fall into this category - it sometimes ends a sentence)
 93 | v
 94 | vs
 95 | i.e
 96 | rev
 97 | e.g
 98 | 
 99 | #Numbers only. These should only induce breaks when followed by a numeric sequence
100 | # add NUMERIC_ONLY after the word for this function
101 | #This case is mostly for the english "No." which can either be a sentence of its own, or
102 | #if followed by a number, a non-breaking prefix
103 | No #NUMERIC_ONLY# 
104 | Nos
105 | Art #NUMERIC_ONLY#
106 | Nr
107 | pp #NUMERIC_ONLY#
108 | 
109 | #month abbreviations
110 | Jan
111 | Feb
112 | Mar
113 | Apr
114 | #May is a full word
115 | Jun
116 | Jul
117 | Aug
118 | Sep
119 | Oct
120 | Nov
121 | Dec
122 | 


--------------------------------------------------------------------------------
/quantization/__init__.py:
--------------------------------------------------------------------------------
1 | import torch
2 | 
3 | USE_CUDA = torch.cuda.is_available()
4 | from .quant_functions import uniformQuantization, nonUniformQuantization, ScalingFunction, \
5 |                         uniformQuantization_variable, nonUniformQuantization_variable
6 | 
7 | __all__ = ('uniformQuantization', 'ScalingFunction', 'nonUniformQuantization',
8 |            'uniformQuantization_variable', 'nonUniformQuantization_variable')


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | torch==0.3.1
2 | torchvision
3 | numpy
4 | scipy
5 | torchtext==0.1.1
6 | 


--------------------------------------------------------------------------------
/resnet34_doublefilters.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import torch
  3 | import torchvision
  4 | import cnn_models.conv_forward_model as convForwModel
  5 | import cnn_models.help_fun as cnn_hf
  6 | import datasets
  7 | import model_manager
  8 | from cnn_models.wide_resnet_imagenet import Wide_ResNet_imagenet
  9 | import cnn_models.resnet_kfilters as resnet_kfilters
 10 | import functools
 11 | import quantization
 12 | import helpers.functions as mhf
 13 | 
 14 | cuda_devices = os.environ['CUDA_VISIBLE_DEVICES'].split(',')
 15 | print('CUDA_VISIBLE_DEVICES: {} for a total of {} GPUs'.format(cuda_devices, len(cuda_devices)))
 16 | 
 17 | print('Number of bits in training: {}'.format(4))
 18 | 
 19 | datasets.BASE_DATA_FOLDER = '...'
 20 | SAVED_MODELS_FOLDER = '...'
 21 | 
 22 | USE_CUDA = torch.cuda.is_available()
 23 | NUM_GPUS = len(cuda_devices)
 24 | 
 25 | try:
 26 |     os.mkdir(datasets.BASE_DATA_FOLDER)
 27 | except:pass
 28 | try:
 29 |     os.mkdir(SAVED_MODELS_FOLDER)
 30 | except:pass
 31 | 
 32 | epochsToTrainImageNet = 90
 33 | imageNet12modelsFolder = os.path.join(SAVED_MODELS_FOLDER, 'imagenet12_new')
 34 | imagenet_manager = model_manager.ModelManager('model_manager_resnet34double.tst',
 35 |                                               'model_manager', create_new_model_manager=False)
 36 | 
 37 | for x in imagenet_manager.list_models():
 38 |     if imagenet_manager.get_num_training_runs(x) >= 1:
 39 |         s = '{}; Last prediction acc: {}, Best prediction acc: {}'.format(x,
 40 |                                             imagenet_manager.load_metadata(x)[1]['predictionAccuracy'][-1],
 41 |                                             max(imagenet_manager.load_metadata(x)[1]['predictionAccuracy']))
 42 |         print(s)
 43 | 
 44 | try:
 45 |     os.mkdir(imageNet12modelsFolder)
 46 | except:pass
 47 | 
 48 | TRAIN_QUANTIZED_DISTILLED = True
 49 | 
 50 | print('Batch size: {}'.format(batch_size))
 51 | 
 52 | if batch_size % NUM_GPUS != 0:
 53 |     raise ValueError('Batch size: {} must be a multiple of the number of gpus:{}'.format(batch_size, NUM_GPUS))
 54 | 
 55 | 
 56 | imageNet12 = datasets.ImageNet12('...',
 57 |                                  '...',
 58 |                                  type_of_data_augmentation='extended', already_scaled=False,
 59 |                                  pin_memory=True)
 60 | 
 61 | 
 62 | train_loader = imageNet12.getTrainLoader(batch_size, shuffle=True)
 63 | test_loader = imageNet12.getTestLoader(batch_size, shuffle=False)
 64 | 
 65 | # # Teacher model
 66 | resnet34 = torchvision.models.resnet34(True)  #already trained
 67 | if USE_CUDA:
 68 |     resnet34 = resnet34.cuda()
 69 | if NUM_GPUS > 1:
 70 |     resnet34 = torch.nn.parallel.DataParallel(resnet34)
 71 | 
 72 | 
 73 | #Train a wide-resNet with quantized distillation
 74 | quant_distilled_model_name = 'resnet18_1.5xfilters_quant_distilled4bits'
 75 | quantDistilledModelPath = os.path.join(imageNet12modelsFolder, quant_distilled_model_name)
 76 | quantDistilledOptions = {}
 77 | quant_distilled_model = resnet_kfilters.resnet18(k=1.5)
 78 | 
 79 | if USE_CUDA:
 80 |     quant_distilled_model = quant_distilled_model.cuda()
 81 | if NUM_GPUS > 1:
 82 |     quant_distilled_model = torch.nn.parallel.DataParallel(quant_distilled_model)
 83 | 
 84 | if not quant_distilled_model_name in imagenet_manager.saved_models:
 85 |     imagenet_manager.add_new_model(quant_distilled_model_name, quantDistilledModelPath,
 86 |                                    arguments_creator_function=quantDistilledOptions)
 87 | 
 88 | if TRAIN_QUANTIZED_DISTILLED:
 89 |     imagenet_manager.train_model(quant_distilled_model, model_name=quant_distilled_model_name,
 90 |                                  train_function=convForwModel.train_model,
 91 |                                  arguments_train_function={'epochs_to_train': epochsToTrainImageNet,
 92 |                                                            'learning_rate_style': 'imagenet',
 93 |                                                            'initial_learning_rate': 0.1,
 94 |                                                            'use_nesterov':True,
 95 |                                                            'initial_momentum':0.9,
 96 |                                                            'weight_decayL2':1e-4,
 97 |                                                            'start_epoch': 0,
 98 |                                                            'print_every':30,
 99 |                                                            'use_distillation_loss':True,
100 |                                                            'teacher_model': resnet34,
101 |                                                            'quantizeWeights':True,
102 |                                                            'numBits':4,
103 |                                                            'bucket_size':256,
104 |                                                            'quantize_first_and_last_layer': False},
105 |                                  train_loader=train_loader, test_loader=test_loader)
106 | quant_distilled_model.load_state_dict(imagenet_manager.load_model_state_dict(quant_distilled_model_name))
107 | 
108 | # print(cnn_hf.evaluateModel(quant_distilled_model, test_loader, fastEvaluation=False))
109 | # print(cnn_hf.evaluateModel(quant_distilled_model, test_loader, fastEvaluation=False, k=5))
110 | # print(cnn_hf.evaluateModel(resnet34, test_loader, fastEvaluation=False))
111 | # print(cnn_hf.evaluateModel(resnet34, test_loader, fastEvaluation=False, k=5))
112 | # quant_fun = functools.partial(quantization.uniformQuantization, s=2**4, bucket_size=256)
113 | # size_mb = mhf.get_size_quantized_model(quant_distilled_model, 4, quant_fun, 256,
114 | #                                        quantizeFirstLastLayer=False)
115 | # print(size_mb)
116 | # print(mhf.getNumberOfParameters(quant_distilled_model)/1000000)
117 | # print(mhf.getNumberOfParameters(resnet34) / 1000000)
118 | 


--------------------------------------------------------------------------------
/translation_models/__init__.py:
--------------------------------------------------------------------------------
1 | import os
2 | _currDir = os.path.dirname(os.path.abspath(__file__))
3 | PATH_PERL_SCRIPTS_FOLDER = os.path.abspath(os.path.join(_currDir, '..', 'perl_scripts'))


--------------------------------------------------------------------------------