├── .gitignore ├── LICENSE ├── README.md ├── libtorch ├── CMakeLists.txt └── predict.cpp └── pytorch ├── predict.py ├── test.py ├── to_torch_script.py ├── train.py ├── utils.py └── vgg.py /.gitignore: -------------------------------------------------------------------------------- 1 | # vscode stuff 2 | .vscode/ 3 | 4 | # python stuff 5 | __pycache__/ 6 | 7 | # don't commit heavy data 8 | data/ 9 | 10 | # cpp 11 | build/ 12 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2019 Guillaume Lagrange 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # pytorch-cpp 2 | In this repo I experiment with PyTorch 1.0 and their new JIT compiler, as well as their C++ API Libtorch. 3 | 4 | Currently, the repo contains a VGG16 based network implementation in PyTorch for CIFAR-10 classification (based on my [previous experiment](https://github.com/laggui/NN_compress)), and the C++ source for inference. 5 | 6 | **Note:** timings may vary. In my previous experiments, I found that the traced TorchScript model does not bring any significant improvements when using it with the Python API, but the Libtorch inference time was much faster in C++. This is pretty cool because it means you can easily run your experiments and training in Python, and then bring your models over to your C++ project for serving. 7 | 8 | ## pytorch/ 9 | This subdirectory includes the network's [architecture definition](pytorch/vgg.py), the [training script](pytorch/train.py), the [test script](pytorch/test.py) on the CIFAR-10 dataset, a [prediction script](pytorch/predict.py) for inference and, most importantly, the [script to convert the model to Torch Script](pytorch/to_torch_script.py). 10 | 11 | ## libtorch/ 12 | This is where you'll find the source for the network's inference in C++. In [predict.cpp](libtorch/predict.cpp), we load the Torch Script module generated in PyTorch, read the input image and pre-process it in order to feed it to our network for inference. 13 | 14 | ## Example Usage 15 | 16 | ### PyTorch Predict 17 | 18 | ```sh 19 | pytorch$ python predict.py pytorch --model=../data/VGG16model.pth --image=../data/dog.png 20 | ==> Building model... 21 | ==> Loading PyTorch model... 22 | Predicted: dog | 10.056212425231934 23 | Forward pass time: 0.0043811798095703125 seconds 24 | Total time: 0.0052343260031193495 seconds 25 | ``` 26 | 27 | ```sh 28 | pytorch$ python predict.py torch-script --model=../data/VGG16-traced-eval.pt --image=../data/dog.png 29 | ==> Building model... 30 | ==> Loading Torch Script model... 31 | Predicted: dog | 10.056212425231934 32 | Forward pass time: 0.01126241683959961 seconds 33 | Total time: 0.012680109008215368 seconds 34 | ``` 35 | 36 | Predictions were done using a 1080 Ti GPU. Interestingly, the traced (static) network has slower inference time. Further investigation on a more realisitc application needs to be done, since this sample example is using CIFAR-10 images (32x32 RGB, which is a very small input size), and only predicting for one sample instead of continuously predicting in real-time. 37 | 38 | #### Further Testing 39 | 40 | In order to realistically test the traced (static) network versus its standard (dynamic) PyTorch model counterpart, I trained the same VGG16 network (with depthwise separable convolutions) for a single epoch, and used the saved model to predict multiple times on the same input (upscaled 224x224 image of a dog from CIFAR-10). 41 | 42 | **Standard Model (Dynamic)** 43 | 44 | ```sh 45 | pytorch$ python predict.py pytorch --model=../data/VGG16model-224.pth --image=../data/dog-224.png --input=224 --test_timing=1 46 | ==> Building model... 47 | ==> Loading PyTorch model... 48 | Predicted: dog | 1.722057580947876 49 | Forward pass time: 0.005976676940917969 seconds 50 | Predicted: dog | 1.722057580947876 51 | Forward pass time: 0.004324197769165039 seconds 52 | Predicted: dog | 1.722057580947876 53 | Forward pass time: 0.00431060791015625 seconds 54 | Predicted: dog | 1.722057580947876 55 | Forward pass time: 0.0046079158782958984 seconds 56 | Predicted: dog | 1.722057580947876 57 | Forward pass time: 0.0043218135833740234 seconds 58 | Predicted: dog | 1.722057580947876 59 | Forward pass time: 0.004750728607177734 seconds 60 | Predicted: dog | 1.722057580947876 61 | Forward pass time: 0.00461125373840332 seconds 62 | Predicted: dog | 1.722057580947876 63 | Forward pass time: 0.0052700042724609375 seconds 64 | Predicted: dog | 1.722057580947876 65 | Forward pass time: 0.004312992095947266 seconds 66 | Predicted: dog | 1.722057580947876 67 | Forward pass time: 0.004832744598388672 seconds 68 | Predicted: dog | 1.722057580947876 69 | Forward pass time: 0.004314422607421875 seconds 70 | Predicted: dog | 1.722057580947876 71 | Forward pass time: 0.004302263259887695 seconds 72 | Predicted: dog | 1.722057580947876 73 | Forward pass time: 0.0047190189361572266 seconds 74 | Predicted: dog | 1.722057580947876 75 | Forward pass time: 0.005443096160888672 seconds 76 | Predicted: dog | 1.722057580947876 77 | Forward pass time: 0.004314899444580078 seconds 78 | Avg forward pass time (excluding first): 0.00460256849016462 seconds 79 | Total time: 0.0730239039985463 seconds 80 | ``` 81 | 82 | **Torch Script Model (Static)** 83 | 84 | ```sh 85 | pytorch$ python predict.py torch-script --model=../data/VGG16model-224-traced-eval.pt --image=../data/dog-224.png --input=224 --test_timing=1 86 | ==> Building model... 87 | ==> Loading Torch Script model... 88 | Predicted: dog | 1.722057580947876 89 | Forward pass time: 0.014840841293334961 seconds 90 | Predicted: dog | 1.722057580947876 91 | Forward pass time: 0.0043413639068603516 seconds 92 | Predicted: dog | 1.722057580947876 93 | Forward pass time: 0.0043256282806396484 seconds 94 | Predicted: dog | 1.722057580947876 95 | Forward pass time: 0.005699634552001953 seconds 96 | Predicted: dog | 1.722057580947876 97 | Forward pass time: 0.004336118698120117 seconds 98 | Predicted: dog | 1.722057580947876 99 | Forward pass time: 0.004330635070800781 seconds 100 | Predicted: dog | 1.722057580947876 101 | Forward pass time: 0.0050067901611328125 seconds 102 | Predicted: dog | 1.722057580947876 103 | Forward pass time: 0.00433039665222168 seconds 104 | Predicted: dog | 1.722057580947876 105 | Forward pass time: 0.0043239593505859375 seconds 106 | Predicted: dog | 1.722057580947876 107 | Forward pass time: 0.0047681331634521484 seconds 108 | Predicted: dog | 1.722057580947876 109 | Forward pass time: 0.004338264465332031 seconds 110 | Predicted: dog | 1.722057580947876 111 | Forward pass time: 0.004318952560424805 seconds 112 | Predicted: dog | 1.722057580947876 113 | Forward pass time: 0.004320621490478516 seconds 114 | Predicted: dog | 1.722057580947876 115 | Forward pass time: 0.004678487777709961 seconds 116 | Predicted: dog | 1.722057580947876 117 | Forward pass time: 0.004454374313354492 seconds 118 | Avg forward pass time (excluding first): 0.004540954317365374 seconds 119 | Total time: 0.08327161299530417 seconds 120 | ``` 121 | 122 | As you can see, the difference in timing (averaged) is very slim. In both cases, the first forward pass takes longer than the following (and actually, the Torch Script model takes a lot longer). 123 | 124 | ### Libtorch 125 | Before running our prediction, we need to compile the source. In your `libtorch` directory, create a build directory and compile+build the application from source. 126 | 127 | ```sh 128 | libtorch$ mkdir build 129 | libtorch$ cd build 130 | libtorch/build$ cmake -DCMAKE_PREFIX_PATH=/path/to/libtorch .. 131 | -- The C compiler identification is GNU 5.4.0 132 | -- The CXX compiler identification is GNU 5.4.0 133 | -- Check for working C compiler: /usr/bin/cc 134 | -- Check for working C compiler: /usr/bin/cc -- works 135 | -- Detecting C compiler ABI info 136 | . 137 | . 138 | . 139 | -- Configuring done 140 | -- Generating done 141 | -- Build files have been written to: libtorch/build 142 | libtorch/build$ make 143 | Scanning dependencies of target vgg-predict 144 | [ 50%] Building CXX object CMakeFiles/vgg-predict.dir/predict.cpp.o 145 | [100%] Linking CXX executable vgg-predict 146 | [100%] Built target vgg-predict 147 | ``` 148 | 149 | You're now ready to run the application. 150 | 151 | ```sh 152 | libtorch/build$ ./vgg-predict ../../data/VGG16model.pth ../../data/dog.png 153 | Model loaded 154 | Moving model to GPU 155 | Predicted: dog | 10.0562 156 | Time: 0.009481 seconds 157 | ``` 158 | 159 | ### TO-DO 160 | 161 | - Update experiments timings for the latest PyTorch release (1.3) 162 | 163 | 164 | -------------------------------------------------------------------------------- /libtorch/CMakeLists.txt: -------------------------------------------------------------------------------- 1 | cmake_minimum_required(VERSION 3.0 FATAL_ERROR) 2 | project(vgg-predict) 3 | 4 | find_package(Torch REQUIRED) 5 | find_package(OpenCV REQUIRED) 6 | include_directories( ${OpenCV_INCLUDE_DIRS} ) 7 | 8 | add_executable(vgg-predict ${PROJECT_SOURCE_DIR}/predict.cpp) 9 | target_link_libraries(vgg-predict "${TORCH_LIBRARIES}" "${OpenCV_LIBS}") 10 | set_property(TARGET vgg-predict PROPERTY CXX_STANDARD 11) 11 | -------------------------------------------------------------------------------- /libtorch/predict.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include // One-stop header 3 | 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include // opencv input/output 9 | #include // cvtColor 10 | 11 | // CIFAR-10 classes 12 | const std::vector classes{"plane", "car", "bird", "cat", "deer", "dog", "frog", "horse", "ship", "truck"}; 13 | 14 | at::Tensor imageToTensor(cv::Mat & image); 15 | void predict(torch::jit::script::Module & module, cv::Mat & image); 16 | 17 | // Adapted from https://github.com/goldsborough/examples/blob/cpp/cpp/mnist/mnist.cpp#L106 18 | // Parameters: means and stddevs lists size must match number of channels for input Tensor 19 | // e.g., means and stddevs must be of size C for Tensor of shape 1 x C x H x W 20 | struct Normalize : public torch::data::transforms::TensorTransform<> { 21 | Normalize(const std::initializer_list & means, const std::initializer_list & stddevs) 22 | : means_(insertValues(means)), stddevs_(insertValues(stddevs)) {} 23 | std::list insertValues(const std::initializer_list & values) { 24 | std::list tensorList; 25 | for (auto val : values) { 26 | tensorList.push_back(torch::tensor(val)); 27 | } 28 | return tensorList; 29 | } 30 | torch::Tensor operator()(torch::Tensor input) { 31 | std::list::iterator meanIter = means_.begin(); 32 | std::list::iterator stddevIter = stddevs_.begin(); 33 | // Substract each channel's mean and divide by stddev in place 34 | for (int i{0}; meanIter != means_.end() && stddevIter != stddevs_.end(); ++i, ++meanIter, ++stddevIter){ 35 | //std::cout << "Mean: " << *meanIter << " Stddev: " << *stddevIter << std::endl; 36 | //std::cout << input[0][i] << std::endl; 37 | input[0][i].sub_(*meanIter).div_(*stddevIter); 38 | } 39 | return input; 40 | } 41 | 42 | std::list means_, stddevs_; 43 | }; 44 | 45 | int main(int argc, const char* argv[]) { 46 | if (argc != 3) { 47 | std::cerr << "usage: vgg-predict " << std::endl; 48 | return -1; 49 | } 50 | // Deserialize the ScriptModule from a file using torch::jit::load(). 51 | torch::jit::script::Module module; 52 | try { 53 | module = torch::jit::load(argv[1]); 54 | } 55 | catch (const c10::Error& e) { 56 | std::cerr << "Error loading the model\n"; 57 | return -1; 58 | } 59 | 60 | std::cout << "Model loaded" << std::endl; 61 | 62 | // Read the image file 63 | cv::Mat image; 64 | image = cv::imread(argv[2], cv::IMREAD_COLOR); 65 | 66 | // Check for invalid input 67 | if(! image.data ) { 68 | std::cout << "Could not open or find the image" << std::endl ; 69 | return -1; 70 | } 71 | 72 | // Check for cuda 73 | if (torch::cuda::is_available()) { 74 | std::cout << "Moving model to GPU" << std::endl; 75 | module.to(at::kCUDA); 76 | } 77 | std::clock_t start{std::clock()}; 78 | predict(module, image); 79 | std::cout << "Time: " << (std::clock() - start) / (double)(CLOCKS_PER_SEC) << " seconds" << std::endl; 80 | 81 | return 0; 82 | } 83 | 84 | at::Tensor imageToTensor(cv::Mat & image) { 85 | // BGR to RGB, which is what our network was trained on 86 | cv::cvtColor(image, image, cv::COLOR_BGR2RGB); 87 | 88 | // Convert Mat image to tensor 1 x C x H x W 89 | at::Tensor tensorImage = torch::from_blob(image.data, {1, image.rows, image.cols, image.channels()}, at::kByte); 90 | 91 | // Normalize tensor values from [0, 255] to [0, 1] 92 | tensorImage = tensorImage.toType(at::kFloat); 93 | tensorImage = tensorImage.div_(255); 94 | 95 | // Transpose the image for [channels, rows, columns] format of torch tensor 96 | tensorImage = at::transpose(tensorImage, 1, 2); 97 | tensorImage = at::transpose(tensorImage, 1, 3); 98 | return tensorImage; // 1 x C x H x W 99 | } 100 | 101 | void predict(torch::jit::script::Module & module, cv::Mat & image) { 102 | at::Tensor tensorImage{imageToTensor(image)}; 103 | 104 | // Normalize 105 | struct Normalize normalizeChannels({0.4914, 0.4822, 0.4465}, {0.2023, 0.1994, 0.2010}); 106 | tensorImage = normalizeChannels(tensorImage); 107 | //std::cout << "Image tensor shape: " << tensorImage.sizes() << std::endl; 108 | 109 | // Move tensor to CUDA memory 110 | tensorImage = tensorImage.to(at::kCUDA); 111 | // Forward pass 112 | at::Tensor result = module.forward({tensorImage}).toTensor(); 113 | auto maxResult = result.max(1); 114 | auto maxIndex = std::get<1>(maxResult).item(); 115 | auto maxOut = std::get<0>(maxResult).item(); 116 | std::cout << "Predicted: " << classes[maxIndex] << " | " << maxOut << std::endl; 117 | } -------------------------------------------------------------------------------- /pytorch/predict.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import torch.backends.cudnn as cudnn 4 | 5 | import torchvision.transforms as transforms 6 | 7 | from torch import jit 8 | 9 | import io 10 | import time 11 | import argparse 12 | import cv2 13 | 14 | from vgg import VGGNet 15 | from utils import try_load 16 | 17 | # Check device 18 | use_cuda = torch.cuda.is_available() 19 | device = torch.device('cuda' if use_cuda else 'cpu') 20 | # CIFAR-10 classes 21 | classes = ('plane', 'car', 'bird', 'cat', 22 | 'deer', 'dog', 'frog', 'horse', 'ship', 'truck') 23 | 24 | def predict(model, image, test=False): 25 | # apply transform and convert BGR -> RGB 26 | x = image[:, :, (2, 1, 0)] 27 | #print('Image shape: {}'.format(x.shape)) 28 | # H x W x C -> C x H x W for conv input 29 | x = torch.from_numpy(x).permute(2, 0, 1).to(device) 30 | torch.set_printoptions(threshold=5000) 31 | 32 | to_norm_tensor = transforms.Compose([ 33 | #transforms.ToTensor(), 34 | transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)), 35 | ]) 36 | 37 | img_tensor = to_norm_tensor(x.float().div_(255)) 38 | #print('Image tensor: {}'.format(img_tensor)) 39 | #print('Image tensor shape: {}'.format(img_tensor.shape)) 40 | img_tensor.unsqueeze_(0) # add a dimension for the batch 41 | #print('New shape: {}'.format(img_tensor.shape)) 42 | 43 | if test: 44 | ttime = 0 45 | for i in range (15): 46 | t0 = time.time() 47 | with torch.no_grad(): 48 | # forward pass 49 | outputs = model(img_tensor) 50 | if use_cuda: 51 | torch.cuda.synchronize() # wait for operations to be complete 52 | tf = time.time() - t0 53 | ttime += tf if i > 0 else 0 54 | score, predicted = outputs.max(1) 55 | #print(outputs) 56 | print(f'Predicted: {classes[predicted.item()]} | {score.item()}') 57 | print(f'Forward pass time: {tf} seconds') 58 | print(f'Avg forward pass time (excluding first): {ttime/14} seconds') 59 | else: 60 | t0 = time.time() 61 | with torch.no_grad(): 62 | # forward pass 63 | outputs = model(img_tensor) 64 | if use_cuda: 65 | torch.cuda.synchronize() 66 | tf = time.time() - t0 67 | score, predicted = outputs.max(1) 68 | #print(outputs) 69 | print(f'Predicted: {classes[predicted.item()]} | {score.item()}') 70 | print(f'Forward pass time: {tf} seconds') 71 | 72 | 73 | 74 | if __name__ == '__main__': 75 | parser = argparse.ArgumentParser(description='VGGNet Predict Tool') 76 | parser.add_argument('mtype', type=str, choices=['pytorch', 'torch-script'], help='Model type') 77 | parser.add_argument('--model', type=str, default='../data/VGG16model.pth', help='Pre-trained model') 78 | parser.add_argument('--classes', type=int, default=10, help='Number of classes') 79 | parser.add_argument('--input', type=int, default=32, help='Network input size') 80 | parser.add_argument('--image', type=str, default='../data/dog.png', help='Input image') 81 | parser.add_argument('--test_timing', type=int, default=0, help='Test timing with multiple forward pass iterations') 82 | args = parser.parse_args() 83 | 84 | # Model 85 | print('==> Building model...') 86 | if args.mtype == 'pytorch': 87 | model = VGGNet('D-DSM', num_classes=args.classes, input_size=args.input) # depthwise separable 88 | # Load model 89 | print('==> Loading PyTorch model...') 90 | model.load_state_dict(try_load(args.model)) 91 | model.eval() 92 | model.to(device) 93 | else: 94 | print('==> Loading Torch Script model...') 95 | # Load ScriptModule from io.BytesIO object 96 | with open(args.model, 'rb') as f: 97 | buffer = io.BytesIO(f.read()) 98 | model = torch.jit.load(buffer, map_location=device) 99 | #print('[WARNING] ScriptModules cannot be moved to a GPU device yet. Running strictly on CPU for now.') 100 | #device = torch.device('cpu') # 'to' is not supported on TracedModules (yet) 101 | 102 | # if device.type == 'cuda': 103 | # cudnn.benchmark = True 104 | # model = torch.nn.DataParallel(model) 105 | 106 | t0 = time.perf_counter() 107 | predict(model, cv2.imread(args.image), test=args.test_timing) 108 | print(f'Total time: {time.perf_counter()-t0} seconds') -------------------------------------------------------------------------------- /pytorch/test.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import torch.backends.cudnn as cudnn 4 | 5 | import torchvision.datasets as datasets 6 | import torchvision.transforms as transforms 7 | 8 | from torch.utils.data import DataLoader 9 | from torch import jit 10 | 11 | import io 12 | import time 13 | import argparse 14 | 15 | from vgg import VGGNet 16 | 17 | # Check device 18 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') 19 | #device = torch.device('cpu') # 'to' is not supported on TracedModules, ref: https://github.com/pytorch/pytorch/issues/6008 20 | 21 | def test(model, test_loader): 22 | #model.eval() 23 | print_freq = 10 # print every 10 batches 24 | correct = 0 25 | total = 0 26 | 27 | with torch.no_grad(): # no need to track history 28 | for batch_idx, (inputs, targets) in enumerate(test_loader): 29 | inputs, targets = inputs.to(device), targets.to(device) 30 | 31 | # compute output 32 | outputs = model(inputs) 33 | 34 | # record prediction accuracy 35 | _, predicted = outputs.max(1) 36 | total += targets.size(0) 37 | correct += predicted.eq(targets).sum().item() 38 | 39 | if batch_idx % print_freq == 0: 40 | print('Batch: %d, Acc: %.3f%% (%d/%d)' % (batch_idx+1, 100.*correct/total, correct, total)) 41 | return correct, total 42 | 43 | if __name__ == '__main__': 44 | parser = argparse.ArgumentParser(description='VGGNet Test Tool') 45 | parser.add_argument('mtype', type=str, choices=['pytorch', 'torch-script'], help='Model type') 46 | args = parser.parse_args() 47 | 48 | # Model 49 | print('==> Building model...') 50 | if args.mtype == 'pytorch': 51 | model = VGGNet('D-DSM', num_classes=10, input_size=32) # depthwise separable 52 | # Load model 53 | print('==> Loading PyTorch model...') 54 | model.load_state_dict(torch.load('VGG16model.pth')) 55 | model.to(device) 56 | else: 57 | print('==> Loading Torch Script model...') 58 | # Load ScriptModule from io.BytesIO object 59 | with open('VGG16-traced-eval.pt', 'rb') as f: 60 | buffer = io.BytesIO(f.read()) 61 | model = torch.jit.load(buffer) 62 | print('[WARNING] ScriptModules cannot be moved to a GPU device yet. Running strictly on CPU for now.') 63 | device = torch.device('cpu') # 'to' is not supported on TracedModules (yet) 64 | 65 | transform_test = transforms.Compose([ 66 | transforms.ToTensor(), 67 | transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)), 68 | ]) 69 | 70 | testset = datasets.CIFAR10(root='./data', train=False, download=True, transform=transform_test) 71 | test_loader = DataLoader(testset, batch_size=100, shuffle=False, num_workers=2) 72 | 73 | if device.type == 'cuda': 74 | cudnn.benchmark = True 75 | model = torch.nn.DataParallel(model) 76 | 77 | t0 = time.time() 78 | correct, total = test(model, test_loader) 79 | t1 = time.time() 80 | print('Accuracy of the network on test dataset: %f (%d/%d)' % (100.*correct/total, correct, total)) 81 | print('Elapsed time: {} seconds'.format(t1-t0)) -------------------------------------------------------------------------------- /pytorch/to_torch_script.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import argparse 3 | 4 | from torch.jit import trace 5 | 6 | from vgg import VGGNet 7 | from utils import try_load 8 | 9 | # Check device 10 | device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu') 11 | print('[Device] {}'.format(device)) 12 | 13 | if __name__ == '__main__': 14 | parser = argparse.ArgumentParser(description='PyTorch Model to Torch Script') 15 | parser.add_argument('mode', type=str, choices=['train', 'eval'], help='Model mode') 16 | parser.add_argument('--classes', type=int, default=10, help='Number of classes') 17 | parser.add_argument('--input', type=int, default=32, help='Network input size') 18 | parser.add_argument('--model', type=str, default='../data/VGG16model.pth', help='Model to trace') 19 | parser.add_argument('--save', type=str, default='../data/VGG16', help='Traced model save path') 20 | args = parser.parse_args() 21 | 22 | example_input = torch.rand(1, 3, args.input, args.input) 23 | # TracedModule objects do not inherit the .to() or .eval() methods 24 | 25 | if args.mode == 'train': 26 | print('==> Building model...') 27 | model = VGGNet('D-DSM', num_classes=args.classes, input_size=args.input) 28 | #model.to(device) 29 | model.train() 30 | 31 | # convert to Torch Script 32 | print('==> Tracing model...') 33 | traced_model = trace(model, example_input) 34 | 35 | # save model for training 36 | traced_model.save(args.save + '-traced-train.pt') 37 | else: 38 | # load "normal" pytorch trained model 39 | print('==> Building model...') 40 | model = VGGNet('D-DSM', num_classes=args.classes, input_size=args.input) 41 | print('==> Loading pre-trained model...') 42 | model.load_state_dict(try_load(args.model)) 43 | #model = model.to(device) 44 | model.eval() 45 | 46 | # convert to Torch Script 47 | print('==> Tracing model...') 48 | traced_model = trace(model, example_input) 49 | 50 | # save model for eval 51 | traced_model.save(args.save + '-traced-eval.pt') 52 | -------------------------------------------------------------------------------- /pytorch/train.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import torchvision.datasets as datasets 4 | import torchvision.transforms as transforms 5 | import torch.backends.cudnn as cudnn 6 | import numpy as np 7 | import argparse 8 | import time 9 | import io 10 | 11 | from torch.utils.data.sampler import SubsetRandomSampler 12 | from torch.utils.data import Dataset, DataLoader 13 | from torch.optim.lr_scheduler import ReduceLROnPlateau, StepLR 14 | 15 | from torch import jit 16 | 17 | from vgg import VGGNet 18 | 19 | # Check device 20 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') 21 | #device = torch.device('cpu') 22 | 23 | def train(model, train_loader, criterion, optimizer, epoch): 24 | model.train() 25 | print_freq = 10 # print every 10 batches 26 | train_loss = 0 27 | correct = 0 28 | total = 0 29 | print('\nEpoch: %d' % epoch) 30 | 31 | for batch_idx, (inputs, targets) in enumerate(train_loader): 32 | inputs, targets = inputs.to(device), targets.to(device) 33 | optimizer.zero_grad() 34 | 35 | # compute output 36 | outputs = model(inputs) 37 | loss = criterion(outputs, targets) 38 | 39 | # compute gradient and do SGD step 40 | loss.backward() 41 | optimizer.step() 42 | 43 | # record loss and accuracy 44 | train_loss += loss.item() 45 | _, predicted = outputs.max(1) 46 | total += targets.size(0) 47 | correct += predicted.eq(targets).sum().item() 48 | 49 | if batch_idx % print_freq == 0: 50 | print('Batch: %d, Loss: %.3f | Acc: %.3f%% (%d/%d)' % (batch_idx+1, train_loss/(batch_idx+1), 100.*correct/total, correct, total)) 51 | 52 | def validate(model, val_loader, criterion): 53 | model.eval() 54 | print_freq = 10 # print every 10 batches 55 | val_loss = 0.0 56 | 57 | with torch.no_grad(): # no need to track history 58 | for batch_idx, (inputs, targets) in enumerate(val_loader): 59 | inputs, targets = inputs.to(device), targets.to(device) 60 | 61 | # compute output 62 | outputs = model(inputs) 63 | loss = criterion(outputs, targets) 64 | 65 | # record loss 66 | val_loss += loss.item() 67 | 68 | if batch_idx % print_freq == 0: 69 | print('Validation on Batch: %d, Loss: %f' % (batch_idx+1, val_loss/(batch_idx+1))) 70 | return val_loss 71 | 72 | if __name__ == '__main__': 73 | parser = argparse.ArgumentParser(description='VGGNet Training Tool') 74 | parser.add_argument('dataset', type=str, choices=['cifar10'], help='Dataset') # only cifar10 support for now 75 | parser.add_argument('--upscale', type=int, default=0, help='Upscale to 224x224 for test purposes') 76 | parser.add_argument('--output', type=str, default='VGG16model.pth', help='Model output name') 77 | args = parser.parse_args() 78 | 79 | #cifar10 = True if args.dataset == 'cifar10' else False 80 | num_classes = 10 81 | input_size = 224 if args.upscale else 32 82 | # Load CIFAR10 dataset 83 | print('==> Preparing data...') 84 | transform_train = transforms.Compose([ 85 | transforms.Resize(input_size), # for testing purposes 86 | transforms.RandomCrop(input_size, padding=4), 87 | transforms.RandomHorizontalFlip(), 88 | transforms.ToTensor(), 89 | transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)), 90 | ]) 91 | 92 | trainset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform_train) 93 | train_loader = DataLoader(trainset, batch_size=32 if args.upscale else 128, shuffle=True, num_workers=4) 94 | 95 | # Model 96 | print('==> Building model...') 97 | #model = VGGNet('D', num_classes=10, input_size=32) # VGG16 is configuration D (refer to paper) 98 | model = VGGNet('D-DSM', num_classes=num_classes, input_size=input_size) # depthwise separable 99 | model = model.to(device) 100 | 101 | if device.type == 'cuda': 102 | cudnn.benchmark = True 103 | model = torch.nn.DataParallel(model) 104 | 105 | # Training 106 | num_epochs = 200 107 | lr = 0.1 108 | # define loss function (criterion) and optimizer 109 | criterion = nn.CrossEntropyLoss() 110 | optimizer = torch.optim.SGD(model.parameters(), lr, momentum=0.9, weight_decay=5e-4) 111 | 112 | print('==> Training...') 113 | train_time = 0 114 | #scheduler = ReduceLROnPlateau(optimizer, 'min') 115 | scheduler = StepLR(optimizer, step_size=100, gamma=0.1) # adjust lr by factor of 10 every 100 epochs 116 | for epoch in range(num_epochs): 117 | t0 = time.time() 118 | # train one epoch 119 | train(model, train_loader, criterion, optimizer, epoch) 120 | t1 = time.time() - t0 121 | print('{} seconds'.format(t1)) 122 | train_time += t1 123 | 124 | # validate 125 | #val_loss = validate(model, val_loader, criterion) 126 | # adjust learning rate with scheduler 127 | #scheduler.step(val_loss) 128 | scheduler.step() 129 | 130 | print('==> Finished Training: {} seconds'.format(train_time)) 131 | # Save trained model 132 | torch.save(model.state_dict(), args.output) 133 | -------------------------------------------------------------------------------- /pytorch/utils.py: -------------------------------------------------------------------------------- 1 | import torch 2 | 3 | def try_load(load_path): 4 | state_dict = torch.load(load_path, map_location=lambda storage, loc: storage) 5 | if next(iter(state_dict.keys()))[0:6] == 'module': 6 | from collections import OrderedDict 7 | new_state_dict = OrderedDict() 8 | for k, v in state_dict.items(): 9 | name = k[7:] # remove `module.` 10 | new_state_dict[name] = v 11 | return new_state_dict 12 | else: 13 | return state_dict -------------------------------------------------------------------------------- /pytorch/vgg.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | 4 | class VGGNet(nn.Module): 5 | """ 6 | Base VGG model 7 | """ 8 | def __init__(self, vgg_cfg, num_classes=1000, input_size=224): 9 | super(VGGNet, self).__init__() 10 | self.features = self._make_layers(vgg_cfg) 11 | self.classifier = nn.Sequential( 12 | #nn.Linear(int((input_size/(2**5))*(input_size/(2**5))*512), 4096), 13 | #nn.ReLU(inplace=True), 14 | #nn.Dropout(), # Dropout of 0.5 is default, as in paper 15 | #nn.Linear(4096, 4096), 16 | #nn.ReLU(inplace=True), 17 | #nn.Dropout(), 18 | #nn.Linear(4096, num_classes) # For input_size = 224 19 | nn.Linear(int((input_size/(2**5))*(input_size/(2**5))*512), num_classes) # For input_size = 32 (CIFAR-10) 20 | ) 21 | 22 | def forward(self, x): # computation performed at every call 23 | x = self.features(x) 24 | x = x.view(x.size(0), -1) # flatten 25 | x = self.classifier(x) 26 | return x 27 | 28 | def _make_layers(self, vgg_cfg): 29 | """ 30 | For portability, if other configurations of the network 31 | need to be defined (e.g., A, B, C or E from the VGG paper) 32 | 33 | D: Standard D config as per the VGG paper 34 | D-DSM: Compact D config with depthwise separable (DS) convolutions with maxpool layers (as per the classic VGGNet) 35 | D-DS: Compact D config with depthwise separable (DS) convolutions, using conv stride=2 instead of maxpool layers to reduce spatial size 36 | """ 37 | cfg = { 38 | 'D': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'], 39 | 'D-DSM': [64, (64, 1), 'M', (128, 1), (128, 1), 'M', (256, 1), (256, 1), (256, 1), 'M', (512, 1), (512, 1), (512, 1), 'M', (512, 1), (512, 1), (512, 1), 'M'], 40 | 'D-DS': [64, (64, 2), (128, 2), (128, 1), (256, 2), (256, 1), (256, 1), (512, 2), (512, 1), (512, 1), (512, 2), (512, 1), (512, 1)] 41 | } 42 | 43 | in_channels = 3 # RGB images 44 | layers = [] 45 | 46 | for x in cfg[vgg_cfg]: 47 | if x == 'M': 48 | layers += [nn.MaxPool2d(kernel_size=2, stride=2)] 49 | else: 50 | if isinstance(x, int): 51 | layers += [nn.Conv2d(in_channels, x, kernel_size=3, padding=1), nn.BatchNorm2d(x), nn.ReLU(inplace=True)] 52 | in_channels = x # Next input is the size of current output 53 | else: # Depthwise separable 54 | layers += [nn.Conv2d(in_channels, in_channels, kernel_size=3, stride=x[1], padding=1, groups=in_channels, bias=True), 55 | nn.BatchNorm2d(in_channels), 56 | nn.ReLU(inplace=True), 57 | nn.Conv2d(in_channels, x[0], kernel_size=1, padding=0, bias=True), 58 | nn.BatchNorm2d(x[0]), 59 | nn.ReLU(inplace=True)] 60 | in_channels = x[0] # Next input is the size of current output 61 | 62 | return nn.Sequential(*layers) --------------------------------------------------------------------------------