├── .gitignore
├── LICENSE
├── README.md
├── libtorch
    ├── CMakeLists.txt
    └── predict.cpp
└── pytorch
    ├── predict.py
    ├── test.py
    ├── to_torch_script.py
    ├── train.py
    ├── utils.py
    └── vgg.py


/.gitignore:
--------------------------------------------------------------------------------
 1 | # vscode stuff
 2 | .vscode/
 3 | 
 4 | # python stuff
 5 | __pycache__/
 6 | 
 7 | # don't commit heavy data
 8 | data/
 9 | 
10 | # cpp
11 | build/
12 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2019 Guillaume Lagrange
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # pytorch-cpp
  2 | In this repo I experiment with PyTorch 1.0 and their new JIT compiler, as well as their C++ API Libtorch.
  3 | 
  4 | Currently, the repo contains a VGG16 based network implementation in PyTorch for CIFAR-10 classification (based on my [previous experiment](https://github.com/laggui/NN_compress)), and the C++ source for inference.
  5 | 
  6 | **Note:** timings may vary. In my previous experiments, I found that the traced TorchScript model does not bring any significant improvements when using it with the Python API, but the Libtorch inference time was much faster in C++. This is pretty cool because it means you can easily run your experiments and training in Python, and then bring your models over to your C++ project for serving.
  7 | 
  8 | ## pytorch/
  9 | This subdirectory includes the network's [architecture definition](pytorch/vgg.py), the [training script](pytorch/train.py), the [test script](pytorch/test.py) on the CIFAR-10 dataset, a [prediction script](pytorch/predict.py) for inference and, most importantly, the [script to convert the model to Torch Script](pytorch/to_torch_script.py).
 10 | 
 11 | ## libtorch/
 12 | This is where you'll find the source for the network's inference in C++. In [predict.cpp](libtorch/predict.cpp), we load the Torch Script module generated in PyTorch, read the input image and pre-process it in order to feed it to our network for inference.
 13 | 
 14 | ## Example Usage
 15 | 
 16 | ### PyTorch Predict
 17 | 
 18 | ```sh
 19 | pytorch$ python predict.py pytorch --model=../data/VGG16model.pth --image=../data/dog.png
 20 | ==> Building model...
 21 | ==> Loading PyTorch model...
 22 | Predicted: dog | 10.056212425231934
 23 | Forward pass time: 0.0043811798095703125 seconds
 24 | Total time: 0.0052343260031193495 seconds
 25 | ```
 26 | 
 27 | ```sh
 28 | pytorch$ python predict.py torch-script --model=../data/VGG16-traced-eval.pt --image=../data/dog.png 
 29 | ==> Building model...
 30 | ==> Loading Torch Script model...
 31 | Predicted: dog | 10.056212425231934
 32 | Forward pass time: 0.01126241683959961 seconds
 33 | Total time: 0.012680109008215368 seconds
 34 | ```
 35 | 
 36 | Predictions were done using a 1080 Ti GPU. Interestingly, the traced (static) network has slower inference time. Further investigation on a more realisitc application needs to be done, since this sample example is using CIFAR-10 images (32x32 RGB, which is a very small input size), and only predicting for one sample instead of continuously predicting in real-time.
 37 | 
 38 | #### Further Testing
 39 | 
 40 | In order to realistically test the traced (static) network versus its standard (dynamic) PyTorch model counterpart, I trained the same VGG16 network (with depthwise separable convolutions) for a single epoch, and used the saved model to predict multiple times on the same input (upscaled 224x224 image of a dog from CIFAR-10).
 41 | 
 42 | **Standard Model (Dynamic)**
 43 | 
 44 | ```sh
 45 | pytorch$ python predict.py pytorch --model=../data/VGG16model-224.pth --image=../data/dog-224.png --input=224 --test_timing=1
 46 | ==> Building model...
 47 | ==> Loading PyTorch model...
 48 | Predicted: dog | 1.722057580947876
 49 | Forward pass time: 0.005976676940917969 seconds
 50 | Predicted: dog | 1.722057580947876
 51 | Forward pass time: 0.004324197769165039 seconds
 52 | Predicted: dog | 1.722057580947876
 53 | Forward pass time: 0.00431060791015625 seconds
 54 | Predicted: dog | 1.722057580947876
 55 | Forward pass time: 0.0046079158782958984 seconds
 56 | Predicted: dog | 1.722057580947876
 57 | Forward pass time: 0.0043218135833740234 seconds
 58 | Predicted: dog | 1.722057580947876
 59 | Forward pass time: 0.004750728607177734 seconds
 60 | Predicted: dog | 1.722057580947876
 61 | Forward pass time: 0.00461125373840332 seconds
 62 | Predicted: dog | 1.722057580947876
 63 | Forward pass time: 0.0052700042724609375 seconds
 64 | Predicted: dog | 1.722057580947876
 65 | Forward pass time: 0.004312992095947266 seconds
 66 | Predicted: dog | 1.722057580947876
 67 | Forward pass time: 0.004832744598388672 seconds
 68 | Predicted: dog | 1.722057580947876
 69 | Forward pass time: 0.004314422607421875 seconds
 70 | Predicted: dog | 1.722057580947876
 71 | Forward pass time: 0.004302263259887695 seconds
 72 | Predicted: dog | 1.722057580947876
 73 | Forward pass time: 0.0047190189361572266 seconds
 74 | Predicted: dog | 1.722057580947876
 75 | Forward pass time: 0.005443096160888672 seconds
 76 | Predicted: dog | 1.722057580947876
 77 | Forward pass time: 0.004314899444580078 seconds
 78 | Avg forward pass time (excluding first): 0.00460256849016462 seconds
 79 | Total time: 0.0730239039985463 seconds
 80 | ```
 81 | 
 82 | **Torch Script Model (Static)**
 83 | 
 84 | ```sh
 85 | pytorch$ python predict.py torch-script --model=../data/VGG16model-224-traced-eval.pt --image=../data/dog-224.png --input=224 --test_timing=1
 86 | ==> Building model...
 87 | ==> Loading Torch Script model...
 88 | Predicted: dog | 1.722057580947876
 89 | Forward pass time: 0.014840841293334961 seconds
 90 | Predicted: dog | 1.722057580947876
 91 | Forward pass time: 0.0043413639068603516 seconds
 92 | Predicted: dog | 1.722057580947876
 93 | Forward pass time: 0.0043256282806396484 seconds
 94 | Predicted: dog | 1.722057580947876
 95 | Forward pass time: 0.005699634552001953 seconds
 96 | Predicted: dog | 1.722057580947876
 97 | Forward pass time: 0.004336118698120117 seconds
 98 | Predicted: dog | 1.722057580947876
 99 | Forward pass time: 0.004330635070800781 seconds
100 | Predicted: dog | 1.722057580947876
101 | Forward pass time: 0.0050067901611328125 seconds
102 | Predicted: dog | 1.722057580947876
103 | Forward pass time: 0.00433039665222168 seconds
104 | Predicted: dog | 1.722057580947876
105 | Forward pass time: 0.0043239593505859375 seconds
106 | Predicted: dog | 1.722057580947876
107 | Forward pass time: 0.0047681331634521484 seconds
108 | Predicted: dog | 1.722057580947876
109 | Forward pass time: 0.004338264465332031 seconds
110 | Predicted: dog | 1.722057580947876
111 | Forward pass time: 0.004318952560424805 seconds
112 | Predicted: dog | 1.722057580947876
113 | Forward pass time: 0.004320621490478516 seconds
114 | Predicted: dog | 1.722057580947876
115 | Forward pass time: 0.004678487777709961 seconds
116 | Predicted: dog | 1.722057580947876
117 | Forward pass time: 0.004454374313354492 seconds
118 | Avg forward pass time (excluding first): 0.004540954317365374 seconds
119 | Total time: 0.08327161299530417 seconds
120 | ```
121 | 
122 | As you can see, the difference in timing (averaged) is very slim. In both cases, the first forward pass takes longer than the following (and actually, the Torch Script model takes a lot longer).
123 | 
124 | ### Libtorch
125 | Before running our prediction, we need to compile the source. In your `libtorch` directory, create a build directory and compile+build the application from source.
126 | 
127 | ```sh
128 | libtorch$ mkdir build 
129 | libtorch$ cd build
130 | libtorch/build$ cmake -DCMAKE_PREFIX_PATH=/path/to/libtorch ..
131 | -- The C compiler identification is GNU 5.4.0
132 | -- The CXX compiler identification is GNU 5.4.0
133 | -- Check for working C compiler: /usr/bin/cc
134 | -- Check for working C compiler: /usr/bin/cc -- works
135 | -- Detecting C compiler ABI info
136 | .
137 | .
138 | .
139 | -- Configuring done
140 | -- Generating done
141 | -- Build files have been written to: libtorch/build
142 | libtorch/build$ make
143 | Scanning dependencies of target vgg-predict
144 | [ 50%] Building CXX object CMakeFiles/vgg-predict.dir/predict.cpp.o
145 | [100%] Linking CXX executable vgg-predict
146 | [100%] Built target vgg-predict  
147 | ```
148 | 
149 | You're now ready to run the application.
150 | 
151 | ```sh
152 | libtorch/build$ ./vgg-predict ../../data/VGG16model.pth ../../data/dog.png
153 | Model loaded
154 | Moving model to GPU
155 | Predicted: dog | 10.0562
156 | Time: 0.009481 seconds
157 | ```
158 | 
159 | ### TO-DO
160 | 
161 | - Update experiments timings for the latest PyTorch release (1.3)
162 | 
163 | 
164 | 


--------------------------------------------------------------------------------
/libtorch/CMakeLists.txt:
--------------------------------------------------------------------------------
 1 | cmake_minimum_required(VERSION 3.0 FATAL_ERROR)
 2 | project(vgg-predict)
 3 | 
 4 | find_package(Torch REQUIRED)
 5 | find_package(OpenCV REQUIRED)
 6 | include_directories( ${OpenCV_INCLUDE_DIRS} )
 7 | 
 8 | add_executable(vgg-predict ${PROJECT_SOURCE_DIR}/predict.cpp)
 9 | target_link_libraries(vgg-predict "${TORCH_LIBRARIES}" "${OpenCV_LIBS}")
10 | set_property(TARGET vgg-predict PROPERTY CXX_STANDARD 11)
11 | 


--------------------------------------------------------------------------------
/libtorch/predict.cpp:
--------------------------------------------------------------------------------
  1 | #include <torch/torch.h>
  2 | #include <torch/script.h> // One-stop header
  3 | 
  4 | #include <ctime>
  5 | #include <iostream>
  6 | #include <memory>
  7 | #include <opencv2/core/core.hpp>
  8 | #include <opencv2/highgui/highgui.hpp> // opencv input/output
  9 | #include <opencv2/imgproc/imgproc.hpp> // cvtColor
 10 | 
 11 | // CIFAR-10 classes
 12 | const std::vector<std::string> classes{"plane", "car", "bird", "cat", "deer", "dog", "frog", "horse", "ship", "truck"};
 13 | 
 14 | at::Tensor imageToTensor(cv::Mat & image);
 15 | void predict(torch::jit::script::Module & module, cv::Mat & image);
 16 | 
 17 | // Adapted from https://github.com/goldsborough/examples/blob/cpp/cpp/mnist/mnist.cpp#L106
 18 | // Parameters: means and stddevs lists size must match number of channels for input Tensor
 19 | //             e.g., means and stddevs must be of size C for Tensor of shape 1 x C x H x W
 20 | struct Normalize : public torch::data::transforms::TensorTransform<> {
 21 |     Normalize(const std::initializer_list<float> & means, const std::initializer_list<float> & stddevs)
 22 |         : means_(insertValues(means)), stddevs_(insertValues(stddevs)) {}
 23 |     std::list<torch::Tensor> insertValues(const std::initializer_list<float> & values) {
 24 |         std::list<torch::Tensor> tensorList;
 25 |         for (auto val : values) {
 26 |             tensorList.push_back(torch::tensor(val));
 27 |         }
 28 |         return tensorList;
 29 |     }
 30 |     torch::Tensor operator()(torch::Tensor input) {
 31 |       std::list<torch::Tensor>::iterator meanIter = means_.begin();
 32 |       std::list<torch::Tensor>::iterator stddevIter = stddevs_.begin();
 33 |       //  Substract each channel's mean and divide by stddev in place
 34 |       for (int i{0}; meanIter != means_.end() && stddevIter != stddevs_.end(); ++i, ++meanIter, ++stddevIter){
 35 |           //std::cout << "Mean: " << *meanIter << " Stddev: " << *stddevIter << std::endl;
 36 |           //std::cout << input[0][i] << std::endl;
 37 |           input[0][i].sub_(*meanIter).div_(*stddevIter);
 38 |       }
 39 |     return input;
 40 |     }
 41 | 
 42 |     std::list<torch::Tensor> means_, stddevs_;
 43 | };
 44 | 
 45 | int main(int argc, const char* argv[]) {
 46 |     if (argc != 3) {
 47 |         std::cerr << "usage: vgg-predict <path-to-exported-script-module> <path-to-input-image>" << std::endl;
 48 |         return -1;
 49 |     }
 50 |     // Deserialize the ScriptModule from a file using torch::jit::load().
 51 |     torch::jit::script::Module module;
 52 |     try {
 53 |         module = torch::jit::load(argv[1]);
 54 |     }
 55 |     catch (const c10::Error& e) {
 56 |         std::cerr << "Error loading the model\n";
 57 |         return -1;
 58 |     }
 59 | 
 60 |     std::cout << "Model loaded" << std::endl;
 61 | 
 62 |     // Read the image file
 63 |     cv::Mat image;
 64 |     image = cv::imread(argv[2], cv::IMREAD_COLOR);
 65 | 
 66 |     // Check for invalid input
 67 |     if(! image.data ) {
 68 |         std::cout <<  "Could not open or find the image" << std::endl ;
 69 |         return -1;
 70 |     }
 71 | 
 72 |     // Check for cuda
 73 |     if (torch::cuda::is_available()) {
 74 |         std::cout << "Moving model to GPU" << std::endl;
 75 |         module.to(at::kCUDA);
 76 |     }
 77 |     std::clock_t start{std::clock()};
 78 |     predict(module, image);
 79 |     std::cout << "Time: " << (std::clock() - start) / (double)(CLOCKS_PER_SEC) << " seconds" << std::endl;
 80 | 
 81 |     return 0;
 82 | }
 83 | 
 84 | at::Tensor imageToTensor(cv::Mat & image) {
 85 |     // BGR to RGB, which is what our network was trained on
 86 |     cv::cvtColor(image, image, cv::COLOR_BGR2RGB);
 87 |     
 88 |     // Convert Mat image to tensor 1 x C x H x W
 89 |     at::Tensor tensorImage = torch::from_blob(image.data, {1, image.rows, image.cols, image.channels()}, at::kByte);
 90 | 
 91 |     // Normalize tensor values from [0, 255] to [0, 1]
 92 |     tensorImage = tensorImage.toType(at::kFloat);
 93 |     tensorImage = tensorImage.div_(255);
 94 | 
 95 |     // Transpose the image for [channels, rows, columns] format of torch tensor
 96 |     tensorImage = at::transpose(tensorImage, 1, 2);
 97 |     tensorImage = at::transpose(tensorImage, 1, 3);
 98 |     return tensorImage; // 1 x C x H x W
 99 | }
100 | 
101 | void predict(torch::jit::script::Module & module, cv::Mat & image) {
102 |     at::Tensor tensorImage{imageToTensor(image)};
103 | 
104 |     // Normalize
105 |     struct Normalize normalizeChannels({0.4914, 0.4822, 0.4465}, {0.2023, 0.1994, 0.2010});
106 |     tensorImage = normalizeChannels(tensorImage);
107 |     //std::cout << "Image tensor shape: " << tensorImage.sizes() << std::endl;
108 | 
109 |     // Move tensor to CUDA memory
110 |     tensorImage = tensorImage.to(at::kCUDA);
111 |     // Forward pass
112 |     at::Tensor result = module.forward({tensorImage}).toTensor();
113 |     auto maxResult = result.max(1);
114 |     auto maxIndex = std::get<1>(maxResult).item<float>();
115 |     auto maxOut = std::get<0>(maxResult).item<float>();
116 |     std::cout << "Predicted: " << classes[maxIndex] << " | " << maxOut << std::endl;
117 | }


--------------------------------------------------------------------------------
/pytorch/predict.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | import torch.backends.cudnn as cudnn
  4 | 
  5 | import torchvision.transforms as transforms
  6 | 
  7 | from torch import jit
  8 | 
  9 | import io
 10 | import time
 11 | import argparse
 12 | import cv2
 13 | 
 14 | from vgg import VGGNet
 15 | from utils import try_load
 16 | 
 17 | # Check device
 18 | use_cuda = torch.cuda.is_available()
 19 | device = torch.device('cuda' if use_cuda else 'cpu')
 20 | # CIFAR-10 classes
 21 | classes = ('plane', 'car', 'bird', 'cat',
 22 | 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
 23 | 
 24 | def predict(model, image, test=False):
 25 |     # apply transform and convert BGR -> RGB
 26 |     x = image[:, :, (2, 1, 0)]
 27 |     #print('Image shape: {}'.format(x.shape))
 28 |     # H x W x C -> C x H x W for conv input
 29 |     x = torch.from_numpy(x).permute(2, 0, 1).to(device)
 30 |     torch.set_printoptions(threshold=5000)
 31 | 
 32 |     to_norm_tensor = transforms.Compose([
 33 |         #transforms.ToTensor(),
 34 |         transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
 35 |     ])
 36 | 
 37 |     img_tensor = to_norm_tensor(x.float().div_(255))
 38 |     #print('Image tensor: {}'.format(img_tensor))
 39 |     #print('Image tensor shape: {}'.format(img_tensor.shape))
 40 |     img_tensor.unsqueeze_(0) # add a dimension for the batch
 41 |     #print('New shape: {}'.format(img_tensor.shape))
 42 | 
 43 |     if test:
 44 |         ttime = 0
 45 |         for i in range (15):
 46 |             t0 = time.time()
 47 |             with torch.no_grad():
 48 |                 # forward pass
 49 |                 outputs = model(img_tensor)
 50 |             if use_cuda:
 51 |                 torch.cuda.synchronize() # wait for operations to be complete
 52 |             tf = time.time() - t0
 53 |             ttime += tf if i > 0 else 0
 54 |             score, predicted = outputs.max(1)
 55 |             #print(outputs)
 56 |             print(f'Predicted: {classes[predicted.item()]} | {score.item()}')
 57 |             print(f'Forward pass time: {tf} seconds')
 58 |         print(f'Avg forward pass time (excluding first): {ttime/14} seconds')
 59 |     else:
 60 |         t0 = time.time()
 61 |         with torch.no_grad():
 62 |             # forward pass
 63 |             outputs = model(img_tensor)
 64 |         if use_cuda:
 65 |             torch.cuda.synchronize()
 66 |         tf = time.time() - t0
 67 |         score, predicted = outputs.max(1)
 68 |         #print(outputs)
 69 |         print(f'Predicted: {classes[predicted.item()]} | {score.item()}')
 70 |         print(f'Forward pass time: {tf} seconds')
 71 | 
 72 | 
 73 | 
 74 | if __name__ == '__main__':
 75 |     parser = argparse.ArgumentParser(description='VGGNet Predict Tool')
 76 |     parser.add_argument('mtype', type=str, choices=['pytorch', 'torch-script'], help='Model type')
 77 |     parser.add_argument('--model', type=str, default='../data/VGG16model.pth', help='Pre-trained model')
 78 |     parser.add_argument('--classes', type=int, default=10, help='Number of classes')
 79 |     parser.add_argument('--input', type=int, default=32, help='Network input size')
 80 |     parser.add_argument('--image', type=str, default='../data/dog.png', help='Input image')
 81 |     parser.add_argument('--test_timing', type=int, default=0, help='Test timing with multiple forward pass iterations')
 82 |     args = parser.parse_args()
 83 | 
 84 |     # Model
 85 |     print('==> Building model...')
 86 |     if args.mtype == 'pytorch':
 87 |         model = VGGNet('D-DSM', num_classes=args.classes, input_size=args.input) # depthwise separable
 88 |         # Load model
 89 |         print('==> Loading PyTorch model...')
 90 |         model.load_state_dict(try_load(args.model))
 91 |         model.eval()
 92 |         model.to(device)
 93 |     else:
 94 |         print('==> Loading Torch Script model...')
 95 |         # Load ScriptModule from io.BytesIO object
 96 |         with open(args.model, 'rb') as f:
 97 |             buffer = io.BytesIO(f.read())
 98 |         model = torch.jit.load(buffer, map_location=device)
 99 |         #print('[WARNING] ScriptModules cannot be moved to a GPU device yet. Running strictly on CPU for now.')
100 |         #device = torch.device('cpu') # 'to' is not supported on TracedModules (yet)
101 | 
102 |     # if device.type == 'cuda':
103 |     #     cudnn.benchmark = True
104 |     #     model = torch.nn.DataParallel(model)
105 | 
106 |     t0 = time.perf_counter()
107 |     predict(model, cv2.imread(args.image), test=args.test_timing)
108 |     print(f'Total time: {time.perf_counter()-t0} seconds')


--------------------------------------------------------------------------------
/pytorch/test.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import torch.nn as nn
 3 | import torch.backends.cudnn as cudnn
 4 | 
 5 | import torchvision.datasets as datasets
 6 | import torchvision.transforms as transforms
 7 | 
 8 | from torch.utils.data import DataLoader
 9 | from torch import jit
10 | 
11 | import io
12 | import time
13 | import argparse
14 | 
15 | from vgg import VGGNet
16 | 
17 | # Check device    
18 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
19 | #device = torch.device('cpu') # 'to' is not supported on TracedModules, ref: https://github.com/pytorch/pytorch/issues/6008
20 | 
21 | def test(model, test_loader):
22 |     #model.eval()
23 |     print_freq = 10 # print every 10 batches
24 |     correct = 0
25 |     total = 0
26 |     
27 |     with torch.no_grad(): # no need to track history
28 |         for batch_idx, (inputs, targets) in enumerate(test_loader):
29 |             inputs, targets = inputs.to(device), targets.to(device)
30 | 
31 |             # compute output
32 |             outputs = model(inputs)        
33 | 
34 |             # record prediction accuracy
35 |             _, predicted = outputs.max(1)
36 |             total += targets.size(0)
37 |             correct += predicted.eq(targets).sum().item()
38 | 
39 |             if batch_idx % print_freq == 0:
40 |                 print('Batch: %d, Acc: %.3f%% (%d/%d)' % (batch_idx+1, 100.*correct/total, correct, total))
41 |     return correct, total
42 | 
43 | if __name__ == '__main__':
44 |     parser = argparse.ArgumentParser(description='VGGNet Test Tool')
45 |     parser.add_argument('mtype', type=str, choices=['pytorch', 'torch-script'], help='Model type')
46 |     args = parser.parse_args()
47 | 
48 |     # Model
49 |     print('==> Building model...')
50 |     if args.mtype == 'pytorch':
51 |         model = VGGNet('D-DSM', num_classes=10, input_size=32) # depthwise separable
52 |         # Load model
53 |         print('==> Loading PyTorch model...')
54 |         model.load_state_dict(torch.load('VGG16model.pth'))
55 |         model.to(device)
56 |     else:
57 |         print('==> Loading Torch Script model...')
58 |         # Load ScriptModule from io.BytesIO object
59 |         with open('VGG16-traced-eval.pt', 'rb') as f:
60 |             buffer = io.BytesIO(f.read())
61 |         model = torch.jit.load(buffer)
62 |         print('[WARNING] ScriptModules cannot be moved to a GPU device yet. Running strictly on CPU for now.')
63 |         device = torch.device('cpu') # 'to' is not supported on TracedModules (yet)
64 | 
65 |     transform_test = transforms.Compose([
66 |         transforms.ToTensor(),
67 |         transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
68 |     ])
69 | 
70 |     testset = datasets.CIFAR10(root='./data', train=False, download=True, transform=transform_test)
71 |     test_loader = DataLoader(testset, batch_size=100, shuffle=False, num_workers=2)
72 | 
73 |     if device.type == 'cuda':
74 |         cudnn.benchmark = True
75 |         model = torch.nn.DataParallel(model)
76 | 
77 |     t0 = time.time()
78 |     correct, total = test(model, test_loader)
79 |     t1 = time.time()
80 |     print('Accuracy of the network on test dataset: %f (%d/%d)' % (100.*correct/total, correct, total))
81 |     print('Elapsed time: {} seconds'.format(t1-t0))


--------------------------------------------------------------------------------
/pytorch/to_torch_script.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import argparse
 3 | 
 4 | from torch.jit import trace
 5 | 
 6 | from vgg import VGGNet
 7 | from utils import try_load
 8 | 
 9 | # Check device    
10 | device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
11 | print('[Device] {}'.format(device))
12 | 
13 | if __name__ == '__main__':
14 |     parser = argparse.ArgumentParser(description='PyTorch Model to Torch Script')
15 |     parser.add_argument('mode', type=str, choices=['train', 'eval'], help='Model mode')
16 |     parser.add_argument('--classes', type=int, default=10, help='Number of classes')
17 |     parser.add_argument('--input', type=int, default=32, help='Network input size')
18 |     parser.add_argument('--model', type=str, default='../data/VGG16model.pth', help='Model to trace')
19 |     parser.add_argument('--save', type=str, default='../data/VGG16', help='Traced model save path')
20 |     args = parser.parse_args()
21 | 
22 |     example_input = torch.rand(1, 3, args.input, args.input)
23 |     # TracedModule objects do not inherit the .to() or .eval() methods
24 | 
25 |     if args.mode == 'train':
26 |         print('==> Building model...')
27 |         model = VGGNet('D-DSM', num_classes=args.classes, input_size=args.input)
28 |         #model.to(device)
29 |         model.train()
30 | 
31 |         # convert to Torch Script
32 |         print('==> Tracing model...')
33 |         traced_model = trace(model, example_input)
34 | 
35 |         # save model for training
36 |         traced_model.save(args.save + '-traced-train.pt')
37 |     else:
38 |         # load "normal" pytorch trained model
39 |         print('==> Building model...')
40 |         model = VGGNet('D-DSM', num_classes=args.classes, input_size=args.input)
41 |         print('==> Loading pre-trained model...')
42 |         model.load_state_dict(try_load(args.model))
43 |         #model = model.to(device)
44 |         model.eval()
45 | 
46 |         # convert to Torch Script
47 |         print('==> Tracing model...')
48 |         traced_model = trace(model, example_input)
49 | 
50 |         # save model for eval
51 |         traced_model.save(args.save + '-traced-eval.pt')
52 | 


--------------------------------------------------------------------------------
/pytorch/train.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | import torchvision.datasets as datasets
  4 | import torchvision.transforms as transforms
  5 | import torch.backends.cudnn as cudnn
  6 | import numpy as np
  7 | import argparse
  8 | import time
  9 | import io
 10 | 
 11 | from torch.utils.data.sampler import SubsetRandomSampler
 12 | from torch.utils.data import Dataset, DataLoader
 13 | from torch.optim.lr_scheduler import ReduceLROnPlateau, StepLR
 14 | 
 15 | from torch import jit
 16 | 
 17 | from vgg import VGGNet
 18 | 
 19 | # Check device    
 20 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
 21 | #device = torch.device('cpu')
 22 | 
 23 | def train(model, train_loader, criterion, optimizer, epoch):
 24 |     model.train()
 25 |     print_freq = 10 # print every 10 batches
 26 |     train_loss = 0
 27 |     correct = 0
 28 |     total = 0
 29 |     print('\nEpoch: %d' % epoch)
 30 |     
 31 |     for batch_idx, (inputs, targets) in enumerate(train_loader):
 32 |         inputs, targets = inputs.to(device), targets.to(device)
 33 |         optimizer.zero_grad()
 34 |         
 35 |         # compute output
 36 |         outputs = model(inputs)
 37 |         loss = criterion(outputs, targets)
 38 |         
 39 |         # compute gradient and do SGD step
 40 |         loss.backward()
 41 |         optimizer.step()
 42 |         
 43 |         # record loss and accuracy
 44 |         train_loss += loss.item()
 45 |         _, predicted = outputs.max(1)
 46 |         total += targets.size(0)
 47 |         correct += predicted.eq(targets).sum().item()
 48 |         
 49 |         if batch_idx % print_freq == 0:
 50 |             print('Batch: %d, Loss: %.3f | Acc: %.3f%% (%d/%d)' % (batch_idx+1, train_loss/(batch_idx+1), 100.*correct/total, correct, total))
 51 | 
 52 | def validate(model, val_loader, criterion):
 53 |     model.eval()
 54 |     print_freq = 10 # print every 10 batches
 55 |     val_loss = 0.0
 56 |     
 57 |     with torch.no_grad(): # no need to track history
 58 |         for batch_idx, (inputs, targets) in enumerate(val_loader):
 59 |             inputs, targets = inputs.to(device), targets.to(device)
 60 | 
 61 |             # compute output
 62 |             outputs = model(inputs)        
 63 |             loss = criterion(outputs, targets)
 64 | 
 65 |             # record loss
 66 |             val_loss += loss.item()
 67 | 
 68 |             if batch_idx % print_freq == 0:
 69 |                 print('Validation on Batch: %d, Loss: %f' % (batch_idx+1, val_loss/(batch_idx+1)))
 70 |     return val_loss
 71 | 
 72 | if __name__ == '__main__':
 73 |     parser = argparse.ArgumentParser(description='VGGNet Training Tool')
 74 |     parser.add_argument('dataset', type=str, choices=['cifar10'], help='Dataset') # only cifar10 support for now
 75 |     parser.add_argument('--upscale', type=int, default=0, help='Upscale to 224x224 for test purposes')
 76 |     parser.add_argument('--output', type=str, default='VGG16model.pth', help='Model output name')
 77 |     args = parser.parse_args()
 78 | 
 79 |     #cifar10 = True if args.dataset == 'cifar10' else False
 80 |     num_classes = 10
 81 |     input_size = 224 if args.upscale else 32
 82 |     # Load CIFAR10 dataset
 83 |     print('==> Preparing data...')
 84 |     transform_train = transforms.Compose([
 85 |         transforms.Resize(input_size), # for testing purposes
 86 |         transforms.RandomCrop(input_size, padding=4),
 87 |         transforms.RandomHorizontalFlip(),
 88 |         transforms.ToTensor(),
 89 |         transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
 90 |     ])
 91 | 
 92 |     trainset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform_train)
 93 |     train_loader = DataLoader(trainset, batch_size=32 if args.upscale else 128, shuffle=True, num_workers=4)
 94 | 
 95 |     # Model
 96 |     print('==> Building model...')
 97 |     #model = VGGNet('D', num_classes=10, input_size=32) # VGG16 is configuration D (refer to paper)
 98 |     model = VGGNet('D-DSM', num_classes=num_classes, input_size=input_size) # depthwise separable
 99 |     model = model.to(device)
100 | 
101 |     if device.type == 'cuda':
102 |         cudnn.benchmark = True
103 |         model = torch.nn.DataParallel(model)
104 | 
105 |     # Training
106 |     num_epochs = 200
107 |     lr = 0.1
108 |     # define loss function (criterion) and optimizer
109 |     criterion = nn.CrossEntropyLoss()
110 |     optimizer = torch.optim.SGD(model.parameters(), lr, momentum=0.9, weight_decay=5e-4)
111 | 
112 |     print('==> Training...')
113 |     train_time = 0
114 |     #scheduler = ReduceLROnPlateau(optimizer, 'min')
115 |     scheduler = StepLR(optimizer, step_size=100, gamma=0.1) # adjust lr by factor of 10 every 100 epochs
116 |     for epoch in range(num_epochs):
117 |         t0 = time.time()
118 |         # train one epoch
119 |         train(model, train_loader, criterion, optimizer, epoch)
120 |         t1 = time.time() - t0
121 |         print('{} seconds'.format(t1))
122 |         train_time += t1
123 | 
124 |         # validate
125 |         #val_loss = validate(model, val_loader, criterion)
126 |         # adjust learning rate with scheduler
127 |         #scheduler.step(val_loss)
128 |         scheduler.step()
129 |         
130 |     print('==> Finished Training: {} seconds'.format(train_time))
131 |     # Save trained model
132 |     torch.save(model.state_dict(), args.output)
133 | 


--------------------------------------------------------------------------------
/pytorch/utils.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | 
 3 | def try_load(load_path):
 4 |     state_dict = torch.load(load_path, map_location=lambda storage, loc: storage)
 5 |     if next(iter(state_dict.keys()))[0:6] == 'module':
 6 |         from collections import OrderedDict
 7 |         new_state_dict = OrderedDict()
 8 |         for k, v in state_dict.items():
 9 |             name = k[7:] # remove `module.`
10 |             new_state_dict[name] = v
11 |         return new_state_dict
12 |     else:
13 |         return state_dict


--------------------------------------------------------------------------------
/pytorch/vgg.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import torch.nn as nn
 3 | 
 4 | class VGGNet(nn.Module):
 5 |     """
 6 |     Base VGG model
 7 |     """
 8 |     def __init__(self, vgg_cfg,  num_classes=1000, input_size=224):
 9 |         super(VGGNet, self).__init__()
10 |         self.features = self._make_layers(vgg_cfg)
11 |         self.classifier = nn.Sequential(
12 |                                         #nn.Linear(int((input_size/(2**5))*(input_size/(2**5))*512), 4096),
13 |                                         #nn.ReLU(inplace=True),
14 |                                         #nn.Dropout(), # Dropout of 0.5 is default, as in paper
15 |                                         #nn.Linear(4096, 4096),
16 |                                         #nn.ReLU(inplace=True),
17 |                                         #nn.Dropout(),
18 |                                         #nn.Linear(4096, num_classes) # For input_size = 224
19 |                                         nn.Linear(int((input_size/(2**5))*(input_size/(2**5))*512), num_classes) # For input_size = 32 (CIFAR-10)
20 |                         		        )
21 | 
22 |     def forward(self, x): # computation performed at every call
23 |         x = self.features(x)
24 |         x = x.view(x.size(0), -1) # flatten
25 |         x = self.classifier(x)
26 |         return x
27 |     
28 |     def _make_layers(self, vgg_cfg):
29 |         """
30 |         For portability, if other configurations of the network
31 |         need to be defined (e.g., A, B, C or E from the VGG paper)
32 |         
33 |         D: Standard D config as per the VGG paper
34 |         D-DSM: Compact D config with depthwise separable (DS) convolutions with maxpool layers (as per the classic VGGNet)
35 |         D-DS: Compact D config with depthwise separable (DS) convolutions, using conv stride=2 instead of maxpool layers to reduce spatial size
36 |         """
37 |         cfg = {
38 | 		'D': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],
39 | 		'D-DSM': [64, (64, 1), 'M', (128, 1), (128, 1), 'M', (256, 1), (256, 1), (256, 1), 'M', (512, 1), (512, 1), (512, 1), 'M', (512, 1), (512, 1), (512, 1), 'M'],
40 | 		'D-DS': [64, (64, 2), (128, 2), (128, 1), (256, 2), (256, 1), (256, 1), (512, 2), (512, 1), (512, 1), (512, 2), (512, 1), (512, 1)]
41 | 		}
42 |         
43 |         in_channels = 3 # RGB images
44 |         layers = []
45 |         
46 |         for x in cfg[vgg_cfg]:
47 |             if x == 'M':
48 |                 layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
49 |             else:
50 |                 if isinstance(x, int):
51 |                     layers += [nn.Conv2d(in_channels, x, kernel_size=3, padding=1), nn.BatchNorm2d(x), nn.ReLU(inplace=True)]
52 |                     in_channels = x # Next input is the size of current output
53 |                 else: # Depthwise separable
54 |                     layers += [nn.Conv2d(in_channels, in_channels, kernel_size=3, stride=x[1], padding=1, groups=in_channels, bias=True),
55 |                                nn.BatchNorm2d(in_channels),
56 |                                nn.ReLU(inplace=True),
57 |                                nn.Conv2d(in_channels, x[0], kernel_size=1, padding=0, bias=True),
58 |                                nn.BatchNorm2d(x[0]),
59 |                                nn.ReLU(inplace=True)]
60 |                     in_channels = x[0] # Next input is the size of current output
61 |         
62 |         return nn.Sequential(*layers)


--------------------------------------------------------------------------------