├── .gitignore
├── LICENSE
├── README.md
├── logo
    ├── pytorch_logo.png
    └── pytorch_logo_2018.svg
└── tutorials
    ├── 01-basics
        ├── feedforward_neural_network
        │   └── main.py
        ├── linear_regression
        │   └── main.py
        ├── logistic_regression
        │   └── main.py
        └── pytorch_basics
        │   └── main.py
    ├── 02-intermediate
        ├── bidirectional_recurrent_neural_network
        │   └── main.py
        ├── convolutional_neural_network
        │   └── main.py
        ├── deep_residual_network
        │   └── main.py
        ├── language_model
        │   ├── data
        │   │   └── train.txt
        │   ├── data_utils.py
        │   └── main.py
        └── recurrent_neural_network
        │   └── main.py
    ├── 03-advanced
        ├── generative_adversarial_network
        │   └── main.py
        ├── image_captioning
        │   ├── README.md
        │   ├── build_vocab.py
        │   ├── data_loader.py
        │   ├── download.sh
        │   ├── model.py
        │   ├── png
        │   │   ├── example.png
        │   │   ├── image_captioning.png
        │   │   └── model.png
        │   ├── requirements.txt
        │   ├── resize.py
        │   ├── sample.py
        │   └── train.py
        ├── neural_style_transfer
        │   ├── README.md
        │   ├── main.py
        │   ├── png
        │   │   ├── content.png
        │   │   ├── neural_style.png
        │   │   ├── neural_style2.png
        │   │   ├── style.png
        │   │   ├── style2.png
        │   │   ├── style3.png
        │   │   └── style4.png
        │   └── requirements.txt
        └── variational_autoencoder
        │   └── main.py
    └── 04-utils
        └── tensorboard
            ├── README.md
            ├── gif
                └── tensorboard.gif
            ├── logger.py
            ├── main.py
            └── requirements.txt


/.gitignore:
--------------------------------------------------------------------------------
1 | *.pkl
2 | *.zip
3 | data/
4 | .ipynb_checkpoints
5 | 
6 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2017 
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | <p align="center"><img width="40%" src="logo/pytorch_logo_2018.svg" /></p>
 2 | 
 3 | --------------------------------------------------------------------------------
 4 | 
 5 | This repository provides tutorial code for deep learning researchers to learn [PyTorch](https://github.com/pytorch/pytorch). In the tutorial, most of the models were implemented with less than 30 lines of code. Before starting this tutorial, it is recommended to finish [Official Pytorch Tutorial](http://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html).
 6 | 
 7 | 
 8 | <br/>
 9 | 
10 | ## Table of Contents
11 | 
12 | #### 1. Basics
13 | * [PyTorch Basics](https://github.com/yunjey/pytorch-tutorial/tree/master/tutorials/01-basics/pytorch_basics/main.py)
14 | * [Linear Regression](https://github.com/yunjey/pytorch-tutorial/tree/master/tutorials/01-basics/linear_regression/main.py#L22-L23)
15 | * [Logistic Regression](https://github.com/yunjey/pytorch-tutorial/tree/master/tutorials/01-basics/logistic_regression/main.py#L33-L34)
16 | * [Feedforward Neural Network](https://github.com/yunjey/pytorch-tutorial/tree/master/tutorials/01-basics/feedforward_neural_network/main.py#L37-L49)
17 | 
18 | #### 2. Intermediate
19 | * [Convolutional Neural Network](https://github.com/yunjey/pytorch-tutorial/tree/master/tutorials/02-intermediate/convolutional_neural_network/main.py#L35-L56)
20 | * [Deep Residual Network](https://github.com/yunjey/pytorch-tutorial/tree/master/tutorials/02-intermediate/deep_residual_network/main.py#L76-L113)
21 | * [Recurrent Neural Network](https://github.com/yunjey/pytorch-tutorial/tree/master/tutorials/02-intermediate/recurrent_neural_network/main.py#L39-L58)
22 | * [Bidirectional Recurrent Neural Network](https://github.com/yunjey/pytorch-tutorial/tree/master/tutorials/02-intermediate/bidirectional_recurrent_neural_network/main.py#L39-L58)
23 | * [Language Model (RNN-LM)](https://github.com/yunjey/pytorch-tutorial/tree/master/tutorials/02-intermediate/language_model/main.py#L30-L50)
24 | 
25 | #### 3. Advanced
26 | * [Generative Adversarial Networks](https://github.com/yunjey/pytorch-tutorial/blob/master/tutorials/03-advanced/generative_adversarial_network/main.py#L41-L57)
27 | * [Variational Auto-Encoder](https://github.com/yunjey/pytorch-tutorial/blob/master/tutorials/03-advanced/variational_autoencoder/main.py#L38-L65)
28 | * [Neural Style Transfer](https://github.com/yunjey/pytorch-tutorial/tree/master/tutorials/03-advanced/neural_style_transfer)
29 | * [Image Captioning (CNN-RNN)](https://github.com/yunjey/pytorch-tutorial/tree/master/tutorials/03-advanced/image_captioning)
30 | 
31 | #### 4. Utilities
32 | * [TensorBoard in PyTorch](https://github.com/yunjey/pytorch-tutorial/tree/master/tutorials/04-utils/tensorboard)
33 | 
34 | 
35 | <br/>
36 | 
37 | ## Getting Started
38 | ```bash
39 | $ git clone https://github.com/yunjey/pytorch-tutorial.git
40 | $ cd pytorch-tutorial/tutorials/PATH_TO_PROJECT
41 | $ python main.py
42 | ```
43 | 
44 | <br/>
45 | 
46 | ## Dependencies
47 | * [Python 2.7 or 3.5+](https://www.continuum.io/downloads)
48 | * [PyTorch 0.4.0+](http://pytorch.org/)
49 | 
50 | 
51 | 
52 | 
53 | 


--------------------------------------------------------------------------------
/logo/pytorch_logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yunjey/pytorch-tutorial/0500d3df5a2a8080ccfccbc00aca0eacc21818db/logo/pytorch_logo.png


--------------------------------------------------------------------------------
/logo/pytorch_logo_2018.svg:
--------------------------------------------------------------------------------
 1 | <?xml version="1.0" encoding="utf-8"?>
 2 | <!-- Generator: Adobe Illustrator 22.1.0, SVG Export Plug-In . SVG Version: 6.00 Build 0)  -->
 3 | <svg version="1.1" id="Layer_1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" x="0px" y="0px"
 4 | 	 viewBox="0 0 199.7 40.2" style="enable-background:new 0 0 199.7 40.2;" xml:space="preserve">
 5 | <style type="text/css">
 6 | 	.st0{fill:#EE4C2C;}
 7 | 	.st1{fill:#252525;}
 8 | </style>
 9 | <g>
10 | 	<path class="st0" d="M40.8,9.3l-2.1,2.1c3.5,3.5,3.5,9.2,0,12.7c-3.5,3.5-9.2,3.5-12.7,0c-3.5-3.5-3.5-9.2,0-12.7l0,0l5.6-5.6
11 | 		L32.3,5l0,0V0.8l-8.5,8.5c-4.7,4.7-4.7,12.2,0,16.9s12.2,4.7,16.9,0C45.5,21.5,45.5,13.9,40.8,9.3z"/>
12 | 	<circle class="st0" cx="36.6" cy="7.1" r="1.6"/>
13 | </g>
14 | <g>
15 | 	<g>
16 | 		<path class="st1" d="M62.6,20l-3.6,0v9.3h-2.7V2.9c0,0,6.3,0,6.6,0c7,0,10.3,3.4,10.3,8.3C73.2,17,69.1,19.9,62.6,20z M62.8,5.4
17 | 			c-0.3,0-3.9,0-3.9,0v12.1l3.8-0.1c5-0.1,7.7-2.1,7.7-6.2C70.4,7.5,67.8,5.4,62.8,5.4z"/>
18 | 		<path class="st1" d="M85.4,29.2l-1.6,4.2c-1.8,4.7-3.6,6.1-6.3,6.1c-1.5,0-2.6-0.4-3.8-0.9l0.8-2.4c0.9,0.5,1.9,0.8,3,0.8
19 | 			c1.5,0,2.6-0.8,4-4.5l1.3-3.4L75.3,10h2.8l6.1,16l6-16h2.7L85.4,29.2z"/>
20 | 		<path class="st1" d="M101.9,5.5v23.9h-2.7V5.5h-9.3V2.9h21.3v2.5H101.9z"/>
21 | 		<path class="st1" d="M118.8,29.9c-5.4,0-9.4-4-9.4-10.2c0-6.2,4.1-10.3,9.6-10.3c5.4,0,9.3,4,9.3,10.2
22 | 			C128.3,25.8,124.2,29.9,118.8,29.9z M118.9,11.8c-4.1,0-6.8,3.3-6.8,7.8c0,4.7,2.8,7.9,6.9,7.9s6.8-3.3,6.8-7.8
23 | 			C125.8,15,123,11.8,118.9,11.8z"/>
24 | 		<path class="st1" d="M135,29.4h-2.6V10l2.6-0.5v4.1c1.3-2.5,3.2-4.1,5.7-4.1c1.3,0,2.5,0.4,3.4,0.9l-0.7,2.5
25 | 			c-0.8-0.5-1.9-0.8-3-0.8c-2,0-3.9,1.5-5.5,5V29.4z"/>
26 | 		<path class="st1" d="M154.4,29.9c-5.8,0-9.5-4.2-9.5-10.2c0-6.1,4-10.3,9.5-10.3c2.4,0,4.4,0.6,6.1,1.7l-0.7,2.4
27 | 			c-1.5-1-3.3-1.6-5.4-1.6c-4.2,0-6.8,3.1-6.8,7.7c0,4.7,2.8,7.8,6.9,7.8c1.9,0,3.9-0.6,5.4-1.6l0.5,2.4
28 | 			C158.7,29.3,156.6,29.9,154.4,29.9z"/>
29 | 		<path class="st1" d="M176.7,29.4V16.9c0-3.4-1.4-4.9-4.1-4.9c-2.2,0-4.4,1.1-6,2.8v14.7h-2.6V0.9l2.6-0.5c0,0,0,12.1,0,12.2
30 | 			c2-2,4.6-3.1,6.7-3.1c3.8,0,6.1,2.4,6.1,6.6v13.3H176.7z"/>
31 | 	</g>
32 | </g>
33 | </svg>
34 | 


--------------------------------------------------------------------------------
/tutorials/01-basics/feedforward_neural_network/main.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import torch.nn as nn
 3 | import torchvision
 4 | import torchvision.transforms as transforms
 5 | 
 6 | 
 7 | # Device configuration
 8 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
 9 | 
10 | # Hyper-parameters 
11 | input_size = 784
12 | hidden_size = 500
13 | num_classes = 10
14 | num_epochs = 5
15 | batch_size = 100
16 | learning_rate = 0.001
17 | 
18 | # MNIST dataset 
19 | train_dataset = torchvision.datasets.MNIST(root='../../data', 
20 |                                            train=True, 
21 |                                            transform=transforms.ToTensor(),  
22 |                                            download=True)
23 | 
24 | test_dataset = torchvision.datasets.MNIST(root='../../data', 
25 |                                           train=False, 
26 |                                           transform=transforms.ToTensor())
27 | 
28 | # Data loader
29 | train_loader = torch.utils.data.DataLoader(dataset=train_dataset, 
30 |                                            batch_size=batch_size, 
31 |                                            shuffle=True)
32 | 
33 | test_loader = torch.utils.data.DataLoader(dataset=test_dataset, 
34 |                                           batch_size=batch_size, 
35 |                                           shuffle=False)
36 | 
37 | # Fully connected neural network with one hidden layer
38 | class NeuralNet(nn.Module):
39 |     def __init__(self, input_size, hidden_size, num_classes):
40 |         super(NeuralNet, self).__init__()
41 |         self.fc1 = nn.Linear(input_size, hidden_size) 
42 |         self.relu = nn.ReLU()
43 |         self.fc2 = nn.Linear(hidden_size, num_classes)  
44 |     
45 |     def forward(self, x):
46 |         out = self.fc1(x)
47 |         out = self.relu(out)
48 |         out = self.fc2(out)
49 |         return out
50 | 
51 | model = NeuralNet(input_size, hidden_size, num_classes).to(device)
52 | 
53 | # Loss and optimizer
54 | criterion = nn.CrossEntropyLoss()
55 | optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)  
56 | 
57 | # Train the model
58 | total_step = len(train_loader)
59 | for epoch in range(num_epochs):
60 |     for i, (images, labels) in enumerate(train_loader):  
61 |         # Move tensors to the configured device
62 |         images = images.reshape(-1, 28*28).to(device)
63 |         labels = labels.to(device)
64 |         
65 |         # Forward pass
66 |         outputs = model(images)
67 |         loss = criterion(outputs, labels)
68 |         
69 |         # Backward and optimize
70 |         optimizer.zero_grad()
71 |         loss.backward()
72 |         optimizer.step()
73 |         
74 |         if (i+1) % 100 == 0:
75 |             print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' 
76 |                    .format(epoch+1, num_epochs, i+1, total_step, loss.item()))
77 | 
78 | # Test the model
79 | # In test phase, we don't need to compute gradients (for memory efficiency)
80 | with torch.no_grad():
81 |     correct = 0
82 |     total = 0
83 |     for images, labels in test_loader:
84 |         images = images.reshape(-1, 28*28).to(device)
85 |         labels = labels.to(device)
86 |         outputs = model(images)
87 |         _, predicted = torch.max(outputs.data, 1)
88 |         total += labels.size(0)
89 |         correct += (predicted == labels).sum().item()
90 | 
91 |     print('Accuracy of the network on the 10000 test images: {} %'.format(100 * correct / total))
92 | 
93 | # Save the model checkpoint
94 | torch.save(model.state_dict(), 'model.ckpt')


--------------------------------------------------------------------------------
/tutorials/01-basics/linear_regression/main.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import torch.nn as nn
 3 | import numpy as np
 4 | import matplotlib.pyplot as plt
 5 | 
 6 | 
 7 | # Hyper-parameters
 8 | input_size = 1
 9 | output_size = 1
10 | num_epochs = 60
11 | learning_rate = 0.001
12 | 
13 | # Toy dataset
14 | x_train = np.array([[3.3], [4.4], [5.5], [6.71], [6.93], [4.168], 
15 |                     [9.779], [6.182], [7.59], [2.167], [7.042], 
16 |                     [10.791], [5.313], [7.997], [3.1]], dtype=np.float32)
17 | 
18 | y_train = np.array([[1.7], [2.76], [2.09], [3.19], [1.694], [1.573], 
19 |                     [3.366], [2.596], [2.53], [1.221], [2.827], 
20 |                     [3.465], [1.65], [2.904], [1.3]], dtype=np.float32)
21 | 
22 | # Linear regression model
23 | model = nn.Linear(input_size, output_size)
24 | 
25 | # Loss and optimizer
26 | criterion = nn.MSELoss()
27 | optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)  
28 | 
29 | # Train the model
30 | for epoch in range(num_epochs):
31 |     # Convert numpy arrays to torch tensors
32 |     inputs = torch.from_numpy(x_train)
33 |     targets = torch.from_numpy(y_train)
34 | 
35 |     # Forward pass
36 |     outputs = model(inputs)
37 |     loss = criterion(outputs, targets)
38 |     
39 |     # Backward and optimize
40 |     optimizer.zero_grad()
41 |     loss.backward()
42 |     optimizer.step()
43 |     
44 |     if (epoch+1) % 5 == 0:
45 |         print ('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, loss.item()))
46 | 
47 | # Plot the graph
48 | predicted = model(torch.from_numpy(x_train)).detach().numpy()
49 | plt.plot(x_train, y_train, 'ro', label='Original data')
50 | plt.plot(x_train, predicted, label='Fitted line')
51 | plt.legend()
52 | plt.show()
53 | 
54 | # Save the model checkpoint
55 | torch.save(model.state_dict(), 'model.ckpt')


--------------------------------------------------------------------------------
/tutorials/01-basics/logistic_regression/main.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import torch.nn as nn
 3 | import torchvision
 4 | import torchvision.transforms as transforms
 5 | 
 6 | 
 7 | # Hyper-parameters 
 8 | input_size = 28 * 28    # 784
 9 | num_classes = 10
10 | num_epochs = 5
11 | batch_size = 100
12 | learning_rate = 0.001
13 | 
14 | # MNIST dataset (images and labels)
15 | train_dataset = torchvision.datasets.MNIST(root='../../data', 
16 |                                            train=True, 
17 |                                            transform=transforms.ToTensor(),
18 |                                            download=True)
19 | 
20 | test_dataset = torchvision.datasets.MNIST(root='../../data', 
21 |                                           train=False, 
22 |                                           transform=transforms.ToTensor())
23 | 
24 | # Data loader (input pipeline)
25 | train_loader = torch.utils.data.DataLoader(dataset=train_dataset, 
26 |                                            batch_size=batch_size, 
27 |                                            shuffle=True)
28 | 
29 | test_loader = torch.utils.data.DataLoader(dataset=test_dataset, 
30 |                                           batch_size=batch_size, 
31 |                                           shuffle=False)
32 | 
33 | # Logistic regression model
34 | model = nn.Linear(input_size, num_classes)
35 | 
36 | # Loss and optimizer
37 | # nn.CrossEntropyLoss() computes softmax internally
38 | criterion = nn.CrossEntropyLoss()  
39 | optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)  
40 | 
41 | # Train the model
42 | total_step = len(train_loader)
43 | for epoch in range(num_epochs):
44 |     for i, (images, labels) in enumerate(train_loader):
45 |         # Reshape images to (batch_size, input_size)
46 |         images = images.reshape(-1, input_size)
47 |         
48 |         # Forward pass
49 |         outputs = model(images)
50 |         loss = criterion(outputs, labels)
51 |         
52 |         # Backward and optimize
53 |         optimizer.zero_grad()
54 |         loss.backward()
55 |         optimizer.step()
56 |         
57 |         if (i+1) % 100 == 0:
58 |             print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' 
59 |                    .format(epoch+1, num_epochs, i+1, total_step, loss.item()))
60 | 
61 | # Test the model
62 | # In test phase, we don't need to compute gradients (for memory efficiency)
63 | with torch.no_grad():
64 |     correct = 0
65 |     total = 0
66 |     for images, labels in test_loader:
67 |         images = images.reshape(-1, input_size)
68 |         outputs = model(images)
69 |         _, predicted = torch.max(outputs.data, 1)
70 |         total += labels.size(0)
71 |         correct += (predicted == labels).sum()
72 | 
73 |     print('Accuracy of the model on the 10000 test images: {} %'.format(100 * correct / total))
74 | 
75 | # Save the model checkpoint
76 | torch.save(model.state_dict(), 'model.ckpt')
77 | 


--------------------------------------------------------------------------------
/tutorials/01-basics/pytorch_basics/main.py:
--------------------------------------------------------------------------------
  1 | import torch 
  2 | import torchvision
  3 | import torch.nn as nn
  4 | import numpy as np
  5 | import torchvision.transforms as transforms
  6 | 
  7 | 
  8 | # ================================================================== #
  9 | #                         Table of Contents                          #
 10 | # ================================================================== #
 11 | 
 12 | # 1. Basic autograd example 1               (Line 25 to 39)
 13 | # 2. Basic autograd example 2               (Line 46 to 83)
 14 | # 3. Loading data from numpy                (Line 90 to 97)
 15 | # 4. Input pipline                          (Line 104 to 129)
 16 | # 5. Input pipline for custom dataset       (Line 136 to 156)
 17 | # 6. Pretrained model                       (Line 163 to 176)
 18 | # 7. Save and load model                    (Line 183 to 189) 
 19 | 
 20 | 
 21 | # ================================================================== #
 22 | #                     1. Basic autograd example 1                    #
 23 | # ================================================================== #
 24 | 
 25 | # Create tensors.
 26 | x = torch.tensor(1., requires_grad=True)
 27 | w = torch.tensor(2., requires_grad=True)
 28 | b = torch.tensor(3., requires_grad=True)
 29 | 
 30 | # Build a computational graph.
 31 | y = w * x + b    # y = 2 * x + 3
 32 | 
 33 | # Compute gradients.
 34 | y.backward()
 35 | 
 36 | # Print out the gradients.
 37 | print(x.grad)    # x.grad = 2 
 38 | print(w.grad)    # w.grad = 1 
 39 | print(b.grad)    # b.grad = 1 
 40 | 
 41 | 
 42 | # ================================================================== #
 43 | #                    2. Basic autograd example 2                     #
 44 | # ================================================================== #
 45 | 
 46 | # Create tensors of shape (10, 3) and (10, 2).
 47 | x = torch.randn(10, 3)
 48 | y = torch.randn(10, 2)
 49 | 
 50 | # Build a fully connected layer.
 51 | linear = nn.Linear(3, 2)
 52 | print ('w: ', linear.weight)
 53 | print ('b: ', linear.bias)
 54 | 
 55 | # Build loss function and optimizer.
 56 | criterion = nn.MSELoss()
 57 | optimizer = torch.optim.SGD(linear.parameters(), lr=0.01)
 58 | 
 59 | # Forward pass.
 60 | pred = linear(x)
 61 | 
 62 | # Compute loss.
 63 | loss = criterion(pred, y)
 64 | print('loss: ', loss.item())
 65 | 
 66 | # Backward pass.
 67 | loss.backward()
 68 | 
 69 | # Print out the gradients.
 70 | print ('dL/dw: ', linear.weight.grad) 
 71 | print ('dL/db: ', linear.bias.grad)
 72 | 
 73 | # 1-step gradient descent.
 74 | optimizer.step()
 75 | 
 76 | # You can also perform gradient descent at the low level.
 77 | # linear.weight.data.sub_(0.01 * linear.weight.grad.data)
 78 | # linear.bias.data.sub_(0.01 * linear.bias.grad.data)
 79 | 
 80 | # Print out the loss after 1-step gradient descent.
 81 | pred = linear(x)
 82 | loss = criterion(pred, y)
 83 | print('loss after 1 step optimization: ', loss.item())
 84 | 
 85 | 
 86 | # ================================================================== #
 87 | #                     3. Loading data from numpy                     #
 88 | # ================================================================== #
 89 | 
 90 | # Create a numpy array.
 91 | x = np.array([[1, 2], [3, 4]])
 92 | 
 93 | # Convert the numpy array to a torch tensor.
 94 | y = torch.from_numpy(x)
 95 | 
 96 | # Convert the torch tensor to a numpy array.
 97 | z = y.numpy()
 98 | 
 99 | 
100 | # ================================================================== #
101 | #                         4. Input pipeline                           #
102 | # ================================================================== #
103 | 
104 | # Download and construct CIFAR-10 dataset.
105 | train_dataset = torchvision.datasets.CIFAR10(root='../../data/',
106 |                                              train=True, 
107 |                                              transform=transforms.ToTensor(),
108 |                                              download=True)
109 | 
110 | # Fetch one data pair (read data from disk).
111 | image, label = train_dataset[0]
112 | print (image.size())
113 | print (label)
114 | 
115 | # Data loader (this provides queues and threads in a very simple way).
116 | train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
117 |                                            batch_size=64, 
118 |                                            shuffle=True)
119 | 
120 | # When iteration starts, queue and thread start to load data from files.
121 | data_iter = iter(train_loader)
122 | 
123 | # Mini-batch images and labels.
124 | images, labels = data_iter.next()
125 | 
126 | # Actual usage of the data loader is as below.
127 | for images, labels in train_loader:
128 |     # Training code should be written here.
129 |     pass
130 | 
131 | 
132 | # ================================================================== #
133 | #                5. Input pipeline for custom dataset                 #
134 | # ================================================================== #
135 | 
136 | # You should build your custom dataset as below.
137 | class CustomDataset(torch.utils.data.Dataset):
138 |     def __init__(self):
139 |         # TODO
140 |         # 1. Initialize file paths or a list of file names. 
141 |         pass
142 |     def __getitem__(self, index):
143 |         # TODO
144 |         # 1. Read one data from file (e.g. using numpy.fromfile, PIL.Image.open).
145 |         # 2. Preprocess the data (e.g. torchvision.Transform).
146 |         # 3. Return a data pair (e.g. image and label).
147 |         pass
148 |     def __len__(self):
149 |         # You should change 0 to the total size of your dataset.
150 |         return 0 
151 | 
152 | # You can then use the prebuilt data loader. 
153 | custom_dataset = CustomDataset()
154 | train_loader = torch.utils.data.DataLoader(dataset=custom_dataset,
155 |                                            batch_size=64, 
156 |                                            shuffle=True)
157 | 
158 | 
159 | # ================================================================== #
160 | #                        6. Pretrained model                         #
161 | # ================================================================== #
162 | 
163 | # Download and load the pretrained ResNet-18.
164 | resnet = torchvision.models.resnet18(pretrained=True)
165 | 
166 | # If you want to finetune only the top layer of the model, set as below.
167 | for param in resnet.parameters():
168 |     param.requires_grad = False
169 | 
170 | # Replace the top layer for finetuning.
171 | resnet.fc = nn.Linear(resnet.fc.in_features, 100)  # 100 is an example.
172 | 
173 | # Forward pass.
174 | images = torch.randn(64, 3, 224, 224)
175 | outputs = resnet(images)
176 | print (outputs.size())     # (64, 100)
177 | 
178 | 
179 | # ================================================================== #
180 | #                      7. Save and load the model                    #
181 | # ================================================================== #
182 | 
183 | # Save and load the entire model.
184 | torch.save(resnet, 'model.ckpt')
185 | model = torch.load('model.ckpt')
186 | 
187 | # Save and load only the model parameters (recommended).
188 | torch.save(resnet.state_dict(), 'params.ckpt')
189 | resnet.load_state_dict(torch.load('params.ckpt'))
190 | 


--------------------------------------------------------------------------------
/tutorials/02-intermediate/bidirectional_recurrent_neural_network/main.py:
--------------------------------------------------------------------------------
  1 | import torch 
  2 | import torch.nn as nn
  3 | import torchvision
  4 | import torchvision.transforms as transforms
  5 | 
  6 | 
  7 | # Device configuration
  8 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
  9 | 
 10 | # Hyper-parameters
 11 | sequence_length = 28
 12 | input_size = 28
 13 | hidden_size = 128
 14 | num_layers = 2
 15 | num_classes = 10
 16 | batch_size = 100
 17 | num_epochs = 2
 18 | learning_rate = 0.003
 19 | 
 20 | # MNIST dataset
 21 | train_dataset = torchvision.datasets.MNIST(root='../../data/',
 22 |                                            train=True, 
 23 |                                            transform=transforms.ToTensor(),
 24 |                                            download=True)
 25 | 
 26 | test_dataset = torchvision.datasets.MNIST(root='../../data/',
 27 |                                           train=False, 
 28 |                                           transform=transforms.ToTensor())
 29 | 
 30 | # Data loader
 31 | train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
 32 |                                            batch_size=batch_size, 
 33 |                                            shuffle=True)
 34 | 
 35 | test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
 36 |                                           batch_size=batch_size, 
 37 |                                           shuffle=False)
 38 | 
 39 | # Bidirectional recurrent neural network (many-to-one)
 40 | class BiRNN(nn.Module):
 41 |     def __init__(self, input_size, hidden_size, num_layers, num_classes):
 42 |         super(BiRNN, self).__init__()
 43 |         self.hidden_size = hidden_size
 44 |         self.num_layers = num_layers
 45 |         self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True, bidirectional=True)
 46 |         self.fc = nn.Linear(hidden_size*2, num_classes)  # 2 for bidirection
 47 |     
 48 |     def forward(self, x):
 49 |         # Set initial states
 50 |         h0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device) # 2 for bidirection 
 51 |         c0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device)
 52 |         
 53 |         # Forward propagate LSTM
 54 |         out, _ = self.lstm(x, (h0, c0))  # out: tensor of shape (batch_size, seq_length, hidden_size*2)
 55 |         
 56 |         # Decode the hidden state of the last time step
 57 |         out = self.fc(out[:, -1, :])
 58 |         return out
 59 | 
 60 | model = BiRNN(input_size, hidden_size, num_layers, num_classes).to(device)
 61 | 
 62 | 
 63 | # Loss and optimizer
 64 | criterion = nn.CrossEntropyLoss()
 65 | optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
 66 |     
 67 | # Train the model
 68 | total_step = len(train_loader)
 69 | for epoch in range(num_epochs):
 70 |     for i, (images, labels) in enumerate(train_loader):
 71 |         images = images.reshape(-1, sequence_length, input_size).to(device)
 72 |         labels = labels.to(device)
 73 |         
 74 |         # Forward pass
 75 |         outputs = model(images)
 76 |         loss = criterion(outputs, labels)
 77 |         
 78 |         # Backward and optimize
 79 |         optimizer.zero_grad()
 80 |         loss.backward()
 81 |         optimizer.step()
 82 |         
 83 |         if (i+1) % 100 == 0:
 84 |             print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' 
 85 |                    .format(epoch+1, num_epochs, i+1, total_step, loss.item()))
 86 | 
 87 | # Test the model
 88 | with torch.no_grad():
 89 |     correct = 0
 90 |     total = 0
 91 |     for images, labels in test_loader:
 92 |         images = images.reshape(-1, sequence_length, input_size).to(device)
 93 |         labels = labels.to(device)
 94 |         outputs = model(images)
 95 |         _, predicted = torch.max(outputs.data, 1)
 96 |         total += labels.size(0)
 97 |         correct += (predicted == labels).sum().item()
 98 | 
 99 |     print('Test Accuracy of the model on the 10000 test images: {} %'.format(100 * correct / total)) 
100 | 
101 | # Save the model checkpoint
102 | torch.save(model.state_dict(), 'model.ckpt')


--------------------------------------------------------------------------------
/tutorials/02-intermediate/convolutional_neural_network/main.py:
--------------------------------------------------------------------------------
  1 | import torch 
  2 | import torch.nn as nn
  3 | import torchvision
  4 | import torchvision.transforms as transforms
  5 | 
  6 | 
  7 | # Device configuration
  8 | device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
  9 | 
 10 | # Hyper parameters
 11 | num_epochs = 5
 12 | num_classes = 10
 13 | batch_size = 100
 14 | learning_rate = 0.001
 15 | 
 16 | # MNIST dataset
 17 | train_dataset = torchvision.datasets.MNIST(root='../../data/',
 18 |                                            train=True, 
 19 |                                            transform=transforms.ToTensor(),
 20 |                                            download=True)
 21 | 
 22 | test_dataset = torchvision.datasets.MNIST(root='../../data/',
 23 |                                           train=False, 
 24 |                                           transform=transforms.ToTensor())
 25 | 
 26 | # Data loader
 27 | train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
 28 |                                            batch_size=batch_size, 
 29 |                                            shuffle=True)
 30 | 
 31 | test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
 32 |                                           batch_size=batch_size, 
 33 |                                           shuffle=False)
 34 | 
 35 | # Convolutional neural network (two convolutional layers)
 36 | class ConvNet(nn.Module):
 37 |     def __init__(self, num_classes=10):
 38 |         super(ConvNet, self).__init__()
 39 |         self.layer1 = nn.Sequential(
 40 |             nn.Conv2d(1, 16, kernel_size=5, stride=1, padding=2),
 41 |             nn.BatchNorm2d(16),
 42 |             nn.ReLU(),
 43 |             nn.MaxPool2d(kernel_size=2, stride=2))
 44 |         self.layer2 = nn.Sequential(
 45 |             nn.Conv2d(16, 32, kernel_size=5, stride=1, padding=2),
 46 |             nn.BatchNorm2d(32),
 47 |             nn.ReLU(),
 48 |             nn.MaxPool2d(kernel_size=2, stride=2))
 49 |         self.fc = nn.Linear(7*7*32, num_classes)
 50 |         
 51 |     def forward(self, x):
 52 |         out = self.layer1(x)
 53 |         out = self.layer2(out)
 54 |         out = out.reshape(out.size(0), -1)
 55 |         out = self.fc(out)
 56 |         return out
 57 | 
 58 | model = ConvNet(num_classes).to(device)
 59 | 
 60 | # Loss and optimizer
 61 | criterion = nn.CrossEntropyLoss()
 62 | optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
 63 | 
 64 | # Train the model
 65 | total_step = len(train_loader)
 66 | for epoch in range(num_epochs):
 67 |     for i, (images, labels) in enumerate(train_loader):
 68 |         images = images.to(device)
 69 |         labels = labels.to(device)
 70 |         
 71 |         # Forward pass
 72 |         outputs = model(images)
 73 |         loss = criterion(outputs, labels)
 74 |         
 75 |         # Backward and optimize
 76 |         optimizer.zero_grad()
 77 |         loss.backward()
 78 |         optimizer.step()
 79 |         
 80 |         if (i+1) % 100 == 0:
 81 |             print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' 
 82 |                    .format(epoch+1, num_epochs, i+1, total_step, loss.item()))
 83 | 
 84 | # Test the model
 85 | model.eval()  # eval mode (batchnorm uses moving mean/variance instead of mini-batch mean/variance)
 86 | with torch.no_grad():
 87 |     correct = 0
 88 |     total = 0
 89 |     for images, labels in test_loader:
 90 |         images = images.to(device)
 91 |         labels = labels.to(device)
 92 |         outputs = model(images)
 93 |         _, predicted = torch.max(outputs.data, 1)
 94 |         total += labels.size(0)
 95 |         correct += (predicted == labels).sum().item()
 96 | 
 97 |     print('Test Accuracy of the model on the 10000 test images: {} %'.format(100 * correct / total))
 98 | 
 99 | # Save the model checkpoint
100 | torch.save(model.state_dict(), 'model.ckpt')


--------------------------------------------------------------------------------
/tutorials/02-intermediate/deep_residual_network/main.py:
--------------------------------------------------------------------------------
  1 | # ---------------------------------------------------------------------------- #
  2 | # An implementation of https://arxiv.org/pdf/1512.03385.pdf                    #
  3 | # See section 4.2 for the model architecture on CIFAR-10                       #
  4 | # Some part of the code was referenced from below                              #
  5 | # https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py   #
  6 | # ---------------------------------------------------------------------------- #
  7 | 
  8 | import torch
  9 | import torch.nn as nn
 10 | import torchvision
 11 | import torchvision.transforms as transforms
 12 | 
 13 | 
 14 | # Device configuration
 15 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
 16 | 
 17 | # Hyper-parameters
 18 | num_epochs = 80
 19 | batch_size = 100
 20 | learning_rate = 0.001
 21 | 
 22 | # Image preprocessing modules
 23 | transform = transforms.Compose([
 24 |     transforms.Pad(4),
 25 |     transforms.RandomHorizontalFlip(),
 26 |     transforms.RandomCrop(32),
 27 |     transforms.ToTensor()])
 28 | 
 29 | # CIFAR-10 dataset
 30 | train_dataset = torchvision.datasets.CIFAR10(root='../../data/',
 31 |                                              train=True, 
 32 |                                              transform=transform,
 33 |                                              download=True)
 34 | 
 35 | test_dataset = torchvision.datasets.CIFAR10(root='../../data/',
 36 |                                             train=False, 
 37 |                                             transform=transforms.ToTensor())
 38 | 
 39 | # Data loader
 40 | train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
 41 |                                            batch_size=batch_size,
 42 |                                            shuffle=True)
 43 | 
 44 | test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
 45 |                                           batch_size=batch_size,
 46 |                                           shuffle=False)
 47 | 
 48 | # 3x3 convolution
 49 | def conv3x3(in_channels, out_channels, stride=1):
 50 |     return nn.Conv2d(in_channels, out_channels, kernel_size=3, 
 51 |                      stride=stride, padding=1, bias=False)
 52 | 
 53 | # Residual block
 54 | class ResidualBlock(nn.Module):
 55 |     def __init__(self, in_channels, out_channels, stride=1, downsample=None):
 56 |         super(ResidualBlock, self).__init__()
 57 |         self.conv1 = conv3x3(in_channels, out_channels, stride)
 58 |         self.bn1 = nn.BatchNorm2d(out_channels)
 59 |         self.relu = nn.ReLU(inplace=True)
 60 |         self.conv2 = conv3x3(out_channels, out_channels)
 61 |         self.bn2 = nn.BatchNorm2d(out_channels)
 62 |         self.downsample = downsample
 63 |         
 64 |     def forward(self, x):
 65 |         residual = x
 66 |         out = self.conv1(x)
 67 |         out = self.bn1(out)
 68 |         out = self.relu(out)
 69 |         out = self.conv2(out)
 70 |         out = self.bn2(out)
 71 |         if self.downsample:
 72 |             residual = self.downsample(x)
 73 |         out += residual
 74 |         out = self.relu(out)
 75 |         return out
 76 | 
 77 | # ResNet
 78 | class ResNet(nn.Module):
 79 |     def __init__(self, block, layers, num_classes=10):
 80 |         super(ResNet, self).__init__()
 81 |         self.in_channels = 16
 82 |         self.conv = conv3x3(3, 16)
 83 |         self.bn = nn.BatchNorm2d(16)
 84 |         self.relu = nn.ReLU(inplace=True)
 85 |         self.layer1 = self.make_layer(block, 16, layers[0])
 86 |         self.layer2 = self.make_layer(block, 32, layers[1], 2)
 87 |         self.layer3 = self.make_layer(block, 64, layers[2], 2)
 88 |         self.avg_pool = nn.AvgPool2d(8)
 89 |         self.fc = nn.Linear(64, num_classes)
 90 |         
 91 |     def make_layer(self, block, out_channels, blocks, stride=1):
 92 |         downsample = None
 93 |         if (stride != 1) or (self.in_channels != out_channels):
 94 |             downsample = nn.Sequential(
 95 |                 conv3x3(self.in_channels, out_channels, stride=stride),
 96 |                 nn.BatchNorm2d(out_channels))
 97 |         layers = []
 98 |         layers.append(block(self.in_channels, out_channels, stride, downsample))
 99 |         self.in_channels = out_channels
100 |         for i in range(1, blocks):
101 |             layers.append(block(out_channels, out_channels))
102 |         return nn.Sequential(*layers)
103 |     
104 |     def forward(self, x):
105 |         out = self.conv(x)
106 |         out = self.bn(out)
107 |         out = self.relu(out)
108 |         out = self.layer1(out)
109 |         out = self.layer2(out)
110 |         out = self.layer3(out)
111 |         out = self.avg_pool(out)
112 |         out = out.view(out.size(0), -1)
113 |         out = self.fc(out)
114 |         return out
115 |     
116 | model = ResNet(ResidualBlock, [2, 2, 2]).to(device)
117 | 
118 | 
119 | # Loss and optimizer
120 | criterion = nn.CrossEntropyLoss()
121 | optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
122 | 
123 | # For updating learning rate
124 | def update_lr(optimizer, lr):    
125 |     for param_group in optimizer.param_groups:
126 |         param_group['lr'] = lr
127 | 
128 | # Train the model
129 | total_step = len(train_loader)
130 | curr_lr = learning_rate
131 | for epoch in range(num_epochs):
132 |     for i, (images, labels) in enumerate(train_loader):
133 |         images = images.to(device)
134 |         labels = labels.to(device)
135 |         
136 |         # Forward pass
137 |         outputs = model(images)
138 |         loss = criterion(outputs, labels)
139 |         
140 |         # Backward and optimize
141 |         optimizer.zero_grad()
142 |         loss.backward()
143 |         optimizer.step()
144 |         
145 |         if (i+1) % 100 == 0:
146 |             print ("Epoch [{}/{}], Step [{}/{}] Loss: {:.4f}"
147 |                    .format(epoch+1, num_epochs, i+1, total_step, loss.item()))
148 | 
149 |     # Decay learning rate
150 |     if (epoch+1) % 20 == 0:
151 |         curr_lr /= 3
152 |         update_lr(optimizer, curr_lr)
153 | 
154 | # Test the model
155 | model.eval()
156 | with torch.no_grad():
157 |     correct = 0
158 |     total = 0
159 |     for images, labels in test_loader:
160 |         images = images.to(device)
161 |         labels = labels.to(device)
162 |         outputs = model(images)
163 |         _, predicted = torch.max(outputs.data, 1)
164 |         total += labels.size(0)
165 |         correct += (predicted == labels).sum().item()
166 | 
167 |     print('Accuracy of the model on the test images: {} %'.format(100 * correct / total))
168 | 
169 | # Save the model checkpoint
170 | torch.save(model.state_dict(), 'resnet.ckpt')
171 | 


--------------------------------------------------------------------------------
/tutorials/02-intermediate/language_model/data_utils.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import os
 3 | 
 4 | 
 5 | class Dictionary(object):
 6 |     def __init__(self):
 7 |         self.word2idx = {}
 8 |         self.idx2word = {}
 9 |         self.idx = 0
10 |     
11 |     def add_word(self, word):
12 |         if not word in self.word2idx:
13 |             self.word2idx[word] = self.idx
14 |             self.idx2word[self.idx] = word
15 |             self.idx += 1
16 |     
17 |     def __len__(self):
18 |         return len(self.word2idx)
19 | 
20 | 
21 | class Corpus(object):
22 |     def __init__(self):
23 |         self.dictionary = Dictionary()
24 | 
25 |     def get_data(self, path, batch_size=20):
26 |         # Add words to the dictionary
27 |         with open(path, 'r') as f:
28 |             tokens = 0
29 |             for line in f:
30 |                 words = line.split() + ['<eos>']
31 |                 tokens += len(words)
32 |                 for word in words: 
33 |                     self.dictionary.add_word(word)  
34 |         
35 |         # Tokenize the file content
36 |         ids = torch.LongTensor(tokens)
37 |         token = 0
38 |         with open(path, 'r') as f:
39 |             for line in f:
40 |                 words = line.split() + ['<eos>']
41 |                 for word in words:
42 |                     ids[token] = self.dictionary.word2idx[word]
43 |                     token += 1
44 |         num_batches = ids.size(0) // batch_size
45 |         ids = ids[:num_batches*batch_size]
46 |         return ids.view(batch_size, -1)


--------------------------------------------------------------------------------
/tutorials/02-intermediate/language_model/main.py:
--------------------------------------------------------------------------------
  1 | # Some part of the code was referenced from below.
  2 | # https://github.com/pytorch/examples/tree/master/word_language_model 
  3 | import torch
  4 | import torch.nn as nn
  5 | import numpy as np
  6 | from torch.nn.utils import clip_grad_norm_
  7 | from data_utils import Dictionary, Corpus
  8 | 
  9 | 
 10 | # Device configuration
 11 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
 12 | 
 13 | # Hyper-parameters
 14 | embed_size = 128
 15 | hidden_size = 1024
 16 | num_layers = 1
 17 | num_epochs = 5
 18 | num_samples = 1000     # number of words to be sampled
 19 | batch_size = 20
 20 | seq_length = 30
 21 | learning_rate = 0.002
 22 | 
 23 | # Load "Penn Treebank" dataset
 24 | corpus = Corpus()
 25 | ids = corpus.get_data('data/train.txt', batch_size)
 26 | vocab_size = len(corpus.dictionary)
 27 | num_batches = ids.size(1) // seq_length
 28 | 
 29 | 
 30 | # RNN based language model
 31 | class RNNLM(nn.Module):
 32 |     def __init__(self, vocab_size, embed_size, hidden_size, num_layers):
 33 |         super(RNNLM, self).__init__()
 34 |         self.embed = nn.Embedding(vocab_size, embed_size)
 35 |         self.lstm = nn.LSTM(embed_size, hidden_size, num_layers, batch_first=True)
 36 |         self.linear = nn.Linear(hidden_size, vocab_size)
 37 |         
 38 |     def forward(self, x, h):
 39 |         # Embed word ids to vectors
 40 |         x = self.embed(x)
 41 |         
 42 |         # Forward propagate LSTM
 43 |         out, (h, c) = self.lstm(x, h)
 44 |         
 45 |         # Reshape output to (batch_size*sequence_length, hidden_size)
 46 |         out = out.reshape(out.size(0)*out.size(1), out.size(2))
 47 |         
 48 |         # Decode hidden states of all time steps
 49 |         out = self.linear(out)
 50 |         return out, (h, c)
 51 | 
 52 | model = RNNLM(vocab_size, embed_size, hidden_size, num_layers).to(device)
 53 | 
 54 | # Loss and optimizer
 55 | criterion = nn.CrossEntropyLoss()
 56 | optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
 57 | 
 58 | # Truncated backpropagation
 59 | def detach(states):
 60 |     return [state.detach() for state in states] 
 61 | 
 62 | # Train the model
 63 | for epoch in range(num_epochs):
 64 |     # Set initial hidden and cell states
 65 |     states = (torch.zeros(num_layers, batch_size, hidden_size).to(device),
 66 |               torch.zeros(num_layers, batch_size, hidden_size).to(device))
 67 |     
 68 |     for i in range(0, ids.size(1) - seq_length, seq_length):
 69 |         # Get mini-batch inputs and targets
 70 |         inputs = ids[:, i:i+seq_length].to(device)
 71 |         targets = ids[:, (i+1):(i+1)+seq_length].to(device)
 72 |         
 73 |         # Forward pass
 74 |         states = detach(states)
 75 |         outputs, states = model(inputs, states)
 76 |         loss = criterion(outputs, targets.reshape(-1))
 77 |         
 78 |         # Backward and optimize
 79 |         optimizer.zero_grad()
 80 |         loss.backward()
 81 |         clip_grad_norm_(model.parameters(), 0.5)
 82 |         optimizer.step()
 83 | 
 84 |         step = (i+1) // seq_length
 85 |         if step % 100 == 0:
 86 |             print ('Epoch [{}/{}], Step[{}/{}], Loss: {:.4f}, Perplexity: {:5.2f}'
 87 |                    .format(epoch+1, num_epochs, step, num_batches, loss.item(), np.exp(loss.item())))
 88 | 
 89 | # Test the model
 90 | with torch.no_grad():
 91 |     with open('sample.txt', 'w') as f:
 92 |         # Set intial hidden ane cell states
 93 |         state = (torch.zeros(num_layers, 1, hidden_size).to(device),
 94 |                  torch.zeros(num_layers, 1, hidden_size).to(device))
 95 | 
 96 |         # Select one word id randomly
 97 |         prob = torch.ones(vocab_size)
 98 |         input = torch.multinomial(prob, num_samples=1).unsqueeze(1).to(device)
 99 | 
100 |         for i in range(num_samples):
101 |             # Forward propagate RNN 
102 |             output, state = model(input, state)
103 | 
104 |             # Sample a word id
105 |             prob = output.exp()
106 |             word_id = torch.multinomial(prob, num_samples=1).item()
107 | 
108 |             # Fill input with sampled word id for the next time step
109 |             input.fill_(word_id)
110 | 
111 |             # File write
112 |             word = corpus.dictionary.idx2word[word_id]
113 |             word = '\n' if word == '<eos>' else word + ' '
114 |             f.write(word)
115 | 
116 |             if (i+1) % 100 == 0:
117 |                 print('Sampled [{}/{}] words and save to {}'.format(i+1, num_samples, 'sample.txt'))
118 | 
119 | # Save the model checkpoints
120 | torch.save(model.state_dict(), 'model.ckpt')


--------------------------------------------------------------------------------
/tutorials/02-intermediate/recurrent_neural_network/main.py:
--------------------------------------------------------------------------------
  1 | import torch 
  2 | import torch.nn as nn
  3 | import torchvision
  4 | import torchvision.transforms as transforms
  5 | 
  6 | 
  7 | # Device configuration
  8 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
  9 | 
 10 | # Hyper-parameters
 11 | sequence_length = 28
 12 | input_size = 28
 13 | hidden_size = 128
 14 | num_layers = 2
 15 | num_classes = 10
 16 | batch_size = 100
 17 | num_epochs = 2
 18 | learning_rate = 0.01
 19 | 
 20 | # MNIST dataset
 21 | train_dataset = torchvision.datasets.MNIST(root='../../data/',
 22 |                                            train=True, 
 23 |                                            transform=transforms.ToTensor(),
 24 |                                            download=True)
 25 | 
 26 | test_dataset = torchvision.datasets.MNIST(root='../../data/',
 27 |                                           train=False, 
 28 |                                           transform=transforms.ToTensor())
 29 | 
 30 | # Data loader
 31 | train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
 32 |                                            batch_size=batch_size, 
 33 |                                            shuffle=True)
 34 | 
 35 | test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
 36 |                                           batch_size=batch_size, 
 37 |                                           shuffle=False)
 38 | 
 39 | # Recurrent neural network (many-to-one)
 40 | class RNN(nn.Module):
 41 |     def __init__(self, input_size, hidden_size, num_layers, num_classes):
 42 |         super(RNN, self).__init__()
 43 |         self.hidden_size = hidden_size
 44 |         self.num_layers = num_layers
 45 |         self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
 46 |         self.fc = nn.Linear(hidden_size, num_classes)
 47 |     
 48 |     def forward(self, x):
 49 |         # Set initial hidden and cell states 
 50 |         h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device) 
 51 |         c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device)
 52 |         
 53 |         # Forward propagate LSTM
 54 |         out, _ = self.lstm(x, (h0, c0))  # out: tensor of shape (batch_size, seq_length, hidden_size)
 55 |         
 56 |         # Decode the hidden state of the last time step
 57 |         out = self.fc(out[:, -1, :])
 58 |         return out
 59 | 
 60 | model = RNN(input_size, hidden_size, num_layers, num_classes).to(device)
 61 | 
 62 | 
 63 | # Loss and optimizer
 64 | criterion = nn.CrossEntropyLoss()
 65 | optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
 66 | 
 67 | # Train the model
 68 | total_step = len(train_loader)
 69 | for epoch in range(num_epochs):
 70 |     for i, (images, labels) in enumerate(train_loader):
 71 |         images = images.reshape(-1, sequence_length, input_size).to(device)
 72 |         labels = labels.to(device)
 73 |         
 74 |         # Forward pass
 75 |         outputs = model(images)
 76 |         loss = criterion(outputs, labels)
 77 |         
 78 |         # Backward and optimize
 79 |         optimizer.zero_grad()
 80 |         loss.backward()
 81 |         optimizer.step()
 82 |         
 83 |         if (i+1) % 100 == 0:
 84 |             print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' 
 85 |                    .format(epoch+1, num_epochs, i+1, total_step, loss.item()))
 86 | 
 87 | # Test the model
 88 | model.eval()
 89 | with torch.no_grad():
 90 |     correct = 0
 91 |     total = 0
 92 |     for images, labels in test_loader:
 93 |         images = images.reshape(-1, sequence_length, input_size).to(device)
 94 |         labels = labels.to(device)
 95 |         outputs = model(images)
 96 |         _, predicted = torch.max(outputs.data, 1)
 97 |         total += labels.size(0)
 98 |         correct += (predicted == labels).sum().item()
 99 | 
100 |     print('Test Accuracy of the model on the 10000 test images: {} %'.format(100 * correct / total)) 
101 | 
102 | # Save the model checkpoint
103 | torch.save(model.state_dict(), 'model.ckpt')


--------------------------------------------------------------------------------
/tutorials/03-advanced/generative_adversarial_network/main.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import torch
  3 | import torchvision
  4 | import torch.nn as nn
  5 | from torchvision import transforms
  6 | from torchvision.utils import save_image
  7 | 
  8 | 
  9 | # Device configuration
 10 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
 11 | 
 12 | # Hyper-parameters
 13 | latent_size = 64
 14 | hidden_size = 256
 15 | image_size = 784
 16 | num_epochs = 200
 17 | batch_size = 100
 18 | sample_dir = 'samples'
 19 | 
 20 | # Create a directory if not exists
 21 | if not os.path.exists(sample_dir):
 22 |     os.makedirs(sample_dir)
 23 | 
 24 | # Image processing
 25 | # transform = transforms.Compose([
 26 | #                 transforms.ToTensor(),
 27 | #                 transforms.Normalize(mean=(0.5, 0.5, 0.5),   # 3 for RGB channels
 28 | #                                      std=(0.5, 0.5, 0.5))])
 29 | transform = transforms.Compose([
 30 |                 transforms.ToTensor(),
 31 |                 transforms.Normalize(mean=[0.5],   # 1 for greyscale channels
 32 |                                      std=[0.5])])
 33 | 
 34 | # MNIST dataset
 35 | mnist = torchvision.datasets.MNIST(root='../../data/',
 36 |                                    train=True,
 37 |                                    transform=transform,
 38 |                                    download=True)
 39 | 
 40 | # Data loader
 41 | data_loader = torch.utils.data.DataLoader(dataset=mnist,
 42 |                                           batch_size=batch_size, 
 43 |                                           shuffle=True)
 44 | 
 45 | # Discriminator
 46 | D = nn.Sequential(
 47 |     nn.Linear(image_size, hidden_size),
 48 |     nn.LeakyReLU(0.2),
 49 |     nn.Linear(hidden_size, hidden_size),
 50 |     nn.LeakyReLU(0.2),
 51 |     nn.Linear(hidden_size, 1),
 52 |     nn.Sigmoid())
 53 | 
 54 | # Generator 
 55 | G = nn.Sequential(
 56 |     nn.Linear(latent_size, hidden_size),
 57 |     nn.ReLU(),
 58 |     nn.Linear(hidden_size, hidden_size),
 59 |     nn.ReLU(),
 60 |     nn.Linear(hidden_size, image_size),
 61 |     nn.Tanh())
 62 | 
 63 | # Device setting
 64 | D = D.to(device)
 65 | G = G.to(device)
 66 | 
 67 | # Binary cross entropy loss and optimizer
 68 | criterion = nn.BCELoss()
 69 | d_optimizer = torch.optim.Adam(D.parameters(), lr=0.0002)
 70 | g_optimizer = torch.optim.Adam(G.parameters(), lr=0.0002)
 71 | 
 72 | def denorm(x):
 73 |     out = (x + 1) / 2
 74 |     return out.clamp(0, 1)
 75 | 
 76 | def reset_grad():
 77 |     d_optimizer.zero_grad()
 78 |     g_optimizer.zero_grad()
 79 | 
 80 | # Start training
 81 | total_step = len(data_loader)
 82 | for epoch in range(num_epochs):
 83 |     for i, (images, _) in enumerate(data_loader):
 84 |         images = images.reshape(batch_size, -1).to(device)
 85 |         
 86 |         # Create the labels which are later used as input for the BCE loss
 87 |         real_labels = torch.ones(batch_size, 1).to(device)
 88 |         fake_labels = torch.zeros(batch_size, 1).to(device)
 89 | 
 90 |         # ================================================================== #
 91 |         #                      Train the discriminator                       #
 92 |         # ================================================================== #
 93 | 
 94 |         # Compute BCE_Loss using real images where BCE_Loss(x, y): - y * log(D(x)) - (1-y) * log(1 - D(x))
 95 |         # Second term of the loss is always zero since real_labels == 1
 96 |         outputs = D(images)
 97 |         d_loss_real = criterion(outputs, real_labels)
 98 |         real_score = outputs
 99 |         
100 |         # Compute BCELoss using fake images
101 |         # First term of the loss is always zero since fake_labels == 0
102 |         z = torch.randn(batch_size, latent_size).to(device)
103 |         fake_images = G(z)
104 |         outputs = D(fake_images)
105 |         d_loss_fake = criterion(outputs, fake_labels)
106 |         fake_score = outputs
107 |         
108 |         # Backprop and optimize
109 |         d_loss = d_loss_real + d_loss_fake
110 |         reset_grad()
111 |         d_loss.backward()
112 |         d_optimizer.step()
113 |         
114 |         # ================================================================== #
115 |         #                        Train the generator                         #
116 |         # ================================================================== #
117 | 
118 |         # Compute loss with fake images
119 |         z = torch.randn(batch_size, latent_size).to(device)
120 |         fake_images = G(z)
121 |         outputs = D(fake_images)
122 |         
123 |         # We train G to maximize log(D(G(z)) instead of minimizing log(1-D(G(z)))
124 |         # For the reason, see the last paragraph of section 3. https://arxiv.org/pdf/1406.2661.pdf
125 |         g_loss = criterion(outputs, real_labels)
126 |         
127 |         # Backprop and optimize
128 |         reset_grad()
129 |         g_loss.backward()
130 |         g_optimizer.step()
131 |         
132 |         if (i+1) % 200 == 0:
133 |             print('Epoch [{}/{}], Step [{}/{}], d_loss: {:.4f}, g_loss: {:.4f}, D(x): {:.2f}, D(G(z)): {:.2f}' 
134 |                   .format(epoch, num_epochs, i+1, total_step, d_loss.item(), g_loss.item(), 
135 |                           real_score.mean().item(), fake_score.mean().item()))
136 |     
137 |     # Save real images
138 |     if (epoch+1) == 1:
139 |         images = images.reshape(images.size(0), 1, 28, 28)
140 |         save_image(denorm(images), os.path.join(sample_dir, 'real_images.png'))
141 |     
142 |     # Save sampled images
143 |     fake_images = fake_images.reshape(fake_images.size(0), 1, 28, 28)
144 |     save_image(denorm(fake_images), os.path.join(sample_dir, 'fake_images-{}.png'.format(epoch+1)))
145 | 
146 | # Save the model checkpoints 
147 | torch.save(G.state_dict(), 'G.ckpt')
148 | torch.save(D.state_dict(), 'D.ckpt')


--------------------------------------------------------------------------------
/tutorials/03-advanced/image_captioning/README.md:
--------------------------------------------------------------------------------
 1 | # Image Captioning
 2 | The goal of image captioning is to convert a given input image into a natural language description. The encoder-decoder framework is widely used for this task. The image encoder is a convolutional neural network (CNN). In this tutorial, we used [resnet-152](https://arxiv.org/abs/1512.03385) model pretrained on the [ILSVRC-2012-CLS](http://www.image-net.org/challenges/LSVRC/2012/) image classification dataset. The decoder is a long short-term memory (LSTM) network. 
 3 | 
 4 | ![alt text](png/model.png)
 5 | 
 6 | #### Training phase
 7 | For the encoder part, the pretrained CNN extracts the feature vector from a given input image. The feature vector is linearly transformed to have the same dimension as the input dimension of the LSTM network. For the decoder part, source and target texts are predefined. For example, if the image description is **"Giraffes standing next to each other"**, the source sequence is a list containing **['\<start\>', 'Giraffes', 'standing', 'next', 'to', 'each', 'other']** and the target sequence is a list containing **['Giraffes', 'standing', 'next', 'to', 'each', 'other', '\<end\>']**. Using these source and target sequences and the feature vector, the LSTM decoder is trained as a language model conditioned on the feature vector.
 8 | 
 9 | #### Test phase
10 | In the test phase, the encoder part is almost same as the training phase. The only difference is that batchnorm layer uses moving average and variance instead of mini-batch statistics. This can be easily implemented using [encoder.eval()](https://github.com/yunjey/pytorch-tutorial/blob/master/tutorials/03-advanced/image_captioning/sample.py#L37). For the decoder part, there is a significant difference between the training phase and the test phase. In the test phase, the LSTM decoder can't see the image description. To deal with this problem, the LSTM decoder feeds back the previosly generated word to the next input. This can be implemented using a [for-loop](https://github.com/yunjey/pytorch-tutorial/blob/master/tutorials/03-advanced/image_captioning/model.py#L48).
11 | 
12 | 
13 | 
14 | ## Usage 
15 | 
16 | 
17 | #### 1. Clone the repositories
18 | ```bash
19 | git clone https://github.com/pdollar/coco.git
20 | cd coco/PythonAPI/
21 | make
22 | python setup.py build
23 | python setup.py install
24 | cd ../../
25 | git clone https://github.com/yunjey/pytorch-tutorial.git
26 | cd pytorch-tutorial/tutorials/03-advanced/image_captioning/
27 | ```
28 | 
29 | #### 2. Download the dataset
30 | 
31 | ```bash
32 | pip install -r requirements.txt
33 | chmod +x download.sh
34 | ./download.sh
35 | ```
36 | 
37 | #### 3. Preprocessing
38 | 
39 | ```bash
40 | python build_vocab.py   
41 | python resize.py
42 | ```
43 | 
44 | #### 4. Train the model
45 | 
46 | ```bash
47 | python train.py    
48 | ```
49 | 
50 | #### 5. Test the model 
51 | 
52 | ```bash
53 | python sample.py --image='png/example.png'
54 | ```
55 | 
56 | <br>
57 | 
58 | ## Pretrained model
59 | If you do not want to train the model from scratch, you can use a pretrained model. You can download the pretrained model [here](https://www.dropbox.com/s/ne0ixz5d58ccbbz/pretrained_model.zip?dl=0) and the vocabulary file [here](https://www.dropbox.com/s/26adb7y9m98uisa/vocap.zip?dl=0). You should extract pretrained_model.zip to `./models/` and vocab.pkl to `./data/` using `unzip` command.
60 | 


--------------------------------------------------------------------------------
/tutorials/03-advanced/image_captioning/build_vocab.py:
--------------------------------------------------------------------------------
 1 | import nltk
 2 | import pickle
 3 | import argparse
 4 | from collections import Counter
 5 | from pycocotools.coco import COCO
 6 | 
 7 | 
 8 | class Vocabulary(object):
 9 |     """Simple vocabulary wrapper."""
10 |     def __init__(self):
11 |         self.word2idx = {}
12 |         self.idx2word = {}
13 |         self.idx = 0
14 | 
15 |     def add_word(self, word):
16 |         if not word in self.word2idx:
17 |             self.word2idx[word] = self.idx
18 |             self.idx2word[self.idx] = word
19 |             self.idx += 1
20 | 
21 |     def __call__(self, word):
22 |         if not word in self.word2idx:
23 |             return self.word2idx['<unk>']
24 |         return self.word2idx[word]
25 | 
26 |     def __len__(self):
27 |         return len(self.word2idx)
28 | 
29 | def build_vocab(json, threshold):
30 |     """Build a simple vocabulary wrapper."""
31 |     coco = COCO(json)
32 |     counter = Counter()
33 |     ids = coco.anns.keys()
34 |     for i, id in enumerate(ids):
35 |         caption = str(coco.anns[id]['caption'])
36 |         tokens = nltk.tokenize.word_tokenize(caption.lower())
37 |         counter.update(tokens)
38 | 
39 |         if (i+1) % 1000 == 0:
40 |             print("[{}/{}] Tokenized the captions.".format(i+1, len(ids)))
41 | 
42 |     # If the word frequency is less than 'threshold', then the word is discarded.
43 |     words = [word for word, cnt in counter.items() if cnt >= threshold]
44 | 
45 |     # Create a vocab wrapper and add some special tokens.
46 |     vocab = Vocabulary()
47 |     vocab.add_word('<pad>')
48 |     vocab.add_word('<start>')
49 |     vocab.add_word('<end>')
50 |     vocab.add_word('<unk>')
51 | 
52 |     # Add the words to the vocabulary.
53 |     for i, word in enumerate(words):
54 |         vocab.add_word(word)
55 |     return vocab
56 | 
57 | def main(args):
58 |     vocab = build_vocab(json=args.caption_path, threshold=args.threshold)
59 |     vocab_path = args.vocab_path
60 |     with open(vocab_path, 'wb') as f:
61 |         pickle.dump(vocab, f)
62 |     print("Total vocabulary size: {}".format(len(vocab)))
63 |     print("Saved the vocabulary wrapper to '{}'".format(vocab_path))
64 | 
65 | 
66 | if __name__ == '__main__':
67 |     parser = argparse.ArgumentParser()
68 |     parser.add_argument('--caption_path', type=str, 
69 |                         default='data/annotations/captions_train2014.json', 
70 |                         help='path for train annotation file')
71 |     parser.add_argument('--vocab_path', type=str, default='./data/vocab.pkl', 
72 |                         help='path for saving vocabulary wrapper')
73 |     parser.add_argument('--threshold', type=int, default=4, 
74 |                         help='minimum word count threshold')
75 |     args = parser.parse_args()
76 |     main(args)


--------------------------------------------------------------------------------
/tutorials/03-advanced/image_captioning/data_loader.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torchvision.transforms as transforms
  3 | import torch.utils.data as data
  4 | import os
  5 | import pickle
  6 | import numpy as np
  7 | import nltk
  8 | from PIL import Image
  9 | from build_vocab import Vocabulary
 10 | from pycocotools.coco import COCO
 11 | 
 12 | 
 13 | class CocoDataset(data.Dataset):
 14 |     """COCO Custom Dataset compatible with torch.utils.data.DataLoader."""
 15 |     def __init__(self, root, json, vocab, transform=None):
 16 |         """Set the path for images, captions and vocabulary wrapper.
 17 |         
 18 |         Args:
 19 |             root: image directory.
 20 |             json: coco annotation file path.
 21 |             vocab: vocabulary wrapper.
 22 |             transform: image transformer.
 23 |         """
 24 |         self.root = root
 25 |         self.coco = COCO(json)
 26 |         self.ids = list(self.coco.anns.keys())
 27 |         self.vocab = vocab
 28 |         self.transform = transform
 29 | 
 30 |     def __getitem__(self, index):
 31 |         """Returns one data pair (image and caption)."""
 32 |         coco = self.coco
 33 |         vocab = self.vocab
 34 |         ann_id = self.ids[index]
 35 |         caption = coco.anns[ann_id]['caption']
 36 |         img_id = coco.anns[ann_id]['image_id']
 37 |         path = coco.loadImgs(img_id)[0]['file_name']
 38 | 
 39 |         image = Image.open(os.path.join(self.root, path)).convert('RGB')
 40 |         if self.transform is not None:
 41 |             image = self.transform(image)
 42 | 
 43 |         # Convert caption (string) to word ids.
 44 |         tokens = nltk.tokenize.word_tokenize(str(caption).lower())
 45 |         caption = []
 46 |         caption.append(vocab('<start>'))
 47 |         caption.extend([vocab(token) for token in tokens])
 48 |         caption.append(vocab('<end>'))
 49 |         target = torch.Tensor(caption)
 50 |         return image, target
 51 | 
 52 |     def __len__(self):
 53 |         return len(self.ids)
 54 | 
 55 | 
 56 | def collate_fn(data):
 57 |     """Creates mini-batch tensors from the list of tuples (image, caption).
 58 |     
 59 |     We should build custom collate_fn rather than using default collate_fn, 
 60 |     because merging caption (including padding) is not supported in default.
 61 | 
 62 |     Args:
 63 |         data: list of tuple (image, caption). 
 64 |             - image: torch tensor of shape (3, 256, 256).
 65 |             - caption: torch tensor of shape (?); variable length.
 66 | 
 67 |     Returns:
 68 |         images: torch tensor of shape (batch_size, 3, 256, 256).
 69 |         targets: torch tensor of shape (batch_size, padded_length).
 70 |         lengths: list; valid length for each padded caption.
 71 |     """
 72 |     # Sort a data list by caption length (descending order).
 73 |     data.sort(key=lambda x: len(x[1]), reverse=True)
 74 |     images, captions = zip(*data)
 75 | 
 76 |     # Merge images (from tuple of 3D tensor to 4D tensor).
 77 |     images = torch.stack(images, 0)
 78 | 
 79 |     # Merge captions (from tuple of 1D tensor to 2D tensor).
 80 |     lengths = [len(cap) for cap in captions]
 81 |     targets = torch.zeros(len(captions), max(lengths)).long()
 82 |     for i, cap in enumerate(captions):
 83 |         end = lengths[i]
 84 |         targets[i, :end] = cap[:end]        
 85 |     return images, targets, lengths
 86 | 
 87 | def get_loader(root, json, vocab, transform, batch_size, shuffle, num_workers):
 88 |     """Returns torch.utils.data.DataLoader for custom coco dataset."""
 89 |     # COCO caption dataset
 90 |     coco = CocoDataset(root=root,
 91 |                        json=json,
 92 |                        vocab=vocab,
 93 |                        transform=transform)
 94 |     
 95 |     # Data loader for COCO dataset
 96 |     # This will return (images, captions, lengths) for each iteration.
 97 |     # images: a tensor of shape (batch_size, 3, 224, 224).
 98 |     # captions: a tensor of shape (batch_size, padded_length).
 99 |     # lengths: a list indicating valid length for each caption. length is (batch_size).
100 |     data_loader = torch.utils.data.DataLoader(dataset=coco, 
101 |                                               batch_size=batch_size,
102 |                                               shuffle=shuffle,
103 |                                               num_workers=num_workers,
104 |                                               collate_fn=collate_fn)
105 |     return data_loader


--------------------------------------------------------------------------------
/tutorials/03-advanced/image_captioning/download.sh:
--------------------------------------------------------------------------------
 1 | mkdir data
 2 | wget http://msvocds.blob.core.windows.net/annotations-1-0-3/captions_train-val2014.zip -P ./data/
 3 | wget http://images.cocodataset.org/zips/train2014.zip -P ./data/
 4 | wget http://images.cocodataset.org/zips/val2014.zip -P ./data/
 5 | 
 6 | unzip ./data/captions_train-val2014.zip -d ./data/
 7 | rm ./data/captions_train-val2014.zip
 8 | unzip ./data/train2014.zip -d ./data/
 9 | rm ./data/train2014.zip 
10 | unzip ./data/val2014.zip -d ./data/ 
11 | rm ./data/val2014.zip 
12 | 


--------------------------------------------------------------------------------
/tutorials/03-advanced/image_captioning/model.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import torch.nn as nn
 3 | import torchvision.models as models
 4 | from torch.nn.utils.rnn import pack_padded_sequence
 5 | 
 6 | 
 7 | class EncoderCNN(nn.Module):
 8 |     def __init__(self, embed_size):
 9 |         """Load the pretrained ResNet-152 and replace top fc layer."""
10 |         super(EncoderCNN, self).__init__()
11 |         resnet = models.resnet152(pretrained=True)
12 |         modules = list(resnet.children())[:-1]      # delete the last fc layer.
13 |         self.resnet = nn.Sequential(*modules)
14 |         self.linear = nn.Linear(resnet.fc.in_features, embed_size)
15 |         self.bn = nn.BatchNorm1d(embed_size, momentum=0.01)
16 |         
17 |     def forward(self, images):
18 |         """Extract feature vectors from input images."""
19 |         with torch.no_grad():
20 |             features = self.resnet(images)
21 |         features = features.reshape(features.size(0), -1)
22 |         features = self.bn(self.linear(features))
23 |         return features
24 | 
25 | 
26 | class DecoderRNN(nn.Module):
27 |     def __init__(self, embed_size, hidden_size, vocab_size, num_layers, max_seq_length=20):
28 |         """Set the hyper-parameters and build the layers."""
29 |         super(DecoderRNN, self).__init__()
30 |         self.embed = nn.Embedding(vocab_size, embed_size)
31 |         self.lstm = nn.LSTM(embed_size, hidden_size, num_layers, batch_first=True)
32 |         self.linear = nn.Linear(hidden_size, vocab_size)
33 |         self.max_seg_length = max_seq_length
34 |         
35 |     def forward(self, features, captions, lengths):
36 |         """Decode image feature vectors and generates captions."""
37 |         embeddings = self.embed(captions)
38 |         embeddings = torch.cat((features.unsqueeze(1), embeddings), 1)
39 |         packed = pack_padded_sequence(embeddings, lengths, batch_first=True) 
40 |         hiddens, _ = self.lstm(packed)
41 |         outputs = self.linear(hiddens[0])
42 |         return outputs
43 |     
44 |     def sample(self, features, states=None):
45 |         """Generate captions for given image features using greedy search."""
46 |         sampled_ids = []
47 |         inputs = features.unsqueeze(1)
48 |         for i in range(self.max_seg_length):
49 |             hiddens, states = self.lstm(inputs, states)          # hiddens: (batch_size, 1, hidden_size)
50 |             outputs = self.linear(hiddens.squeeze(1))            # outputs:  (batch_size, vocab_size)
51 |             _, predicted = outputs.max(1)                        # predicted: (batch_size)
52 |             sampled_ids.append(predicted)
53 |             inputs = self.embed(predicted)                       # inputs: (batch_size, embed_size)
54 |             inputs = inputs.unsqueeze(1)                         # inputs: (batch_size, 1, embed_size)
55 |         sampled_ids = torch.stack(sampled_ids, 1)                # sampled_ids: (batch_size, max_seq_length)
56 |         return sampled_ids


--------------------------------------------------------------------------------
/tutorials/03-advanced/image_captioning/png/example.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yunjey/pytorch-tutorial/0500d3df5a2a8080ccfccbc00aca0eacc21818db/tutorials/03-advanced/image_captioning/png/example.png


--------------------------------------------------------------------------------
/tutorials/03-advanced/image_captioning/png/image_captioning.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yunjey/pytorch-tutorial/0500d3df5a2a8080ccfccbc00aca0eacc21818db/tutorials/03-advanced/image_captioning/png/image_captioning.png


--------------------------------------------------------------------------------
/tutorials/03-advanced/image_captioning/png/model.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yunjey/pytorch-tutorial/0500d3df5a2a8080ccfccbc00aca0eacc21818db/tutorials/03-advanced/image_captioning/png/model.png


--------------------------------------------------------------------------------
/tutorials/03-advanced/image_captioning/requirements.txt:
--------------------------------------------------------------------------------
1 | matplotlib
2 | nltk
3 | numpy
4 | Pillow
5 | argparse


--------------------------------------------------------------------------------
/tutorials/03-advanced/image_captioning/resize.py:
--------------------------------------------------------------------------------
 1 | import argparse
 2 | import os
 3 | from PIL import Image
 4 | 
 5 | 
 6 | def resize_image(image, size):
 7 |     """Resize an image to the given size."""
 8 |     return image.resize(size, Image.ANTIALIAS)
 9 | 
10 | def resize_images(image_dir, output_dir, size):
11 |     """Resize the images in 'image_dir' and save into 'output_dir'."""
12 |     if not os.path.exists(output_dir):
13 |         os.makedirs(output_dir)
14 | 
15 |     images = os.listdir(image_dir)
16 |     num_images = len(images)
17 |     for i, image in enumerate(images):
18 |         with open(os.path.join(image_dir, image), 'r+b') as f:
19 |             with Image.open(f) as img:
20 |                 img = resize_image(img, size)
21 |                 img.save(os.path.join(output_dir, image), img.format)
22 |         if (i+1) % 100 == 0:
23 |             print ("[{}/{}] Resized the images and saved into '{}'."
24 |                    .format(i+1, num_images, output_dir))
25 | 
26 | def main(args):
27 |     image_dir = args.image_dir
28 |     output_dir = args.output_dir
29 |     image_size = [args.image_size, args.image_size]
30 |     resize_images(image_dir, output_dir, image_size)
31 | 
32 | 
33 | if __name__ == '__main__':
34 |     parser = argparse.ArgumentParser()
35 |     parser.add_argument('--image_dir', type=str, default='./data/train2014/',
36 |                         help='directory for train images')
37 |     parser.add_argument('--output_dir', type=str, default='./data/resized2014/',
38 |                         help='directory for saving resized images')
39 |     parser.add_argument('--image_size', type=int, default=256,
40 |                         help='size for image after processing')
41 |     args = parser.parse_args()
42 |     main(args)


--------------------------------------------------------------------------------
/tutorials/03-advanced/image_captioning/sample.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import matplotlib.pyplot as plt
 3 | import numpy as np 
 4 | import argparse
 5 | import pickle 
 6 | import os
 7 | from torchvision import transforms 
 8 | from build_vocab import Vocabulary
 9 | from model import EncoderCNN, DecoderRNN
10 | from PIL import Image
11 | 
12 | 
13 | # Device configuration
14 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
15 | 
16 | def load_image(image_path, transform=None):
17 |     image = Image.open(image_path).convert('RGB')
18 |     image = image.resize([224, 224], Image.LANCZOS)
19 |     
20 |     if transform is not None:
21 |         image = transform(image).unsqueeze(0)
22 |     
23 |     return image
24 | 
25 | def main(args):
26 |     # Image preprocessing
27 |     transform = transforms.Compose([
28 |         transforms.ToTensor(), 
29 |         transforms.Normalize((0.485, 0.456, 0.406), 
30 |                              (0.229, 0.224, 0.225))])
31 |     
32 |     # Load vocabulary wrapper
33 |     with open(args.vocab_path, 'rb') as f:
34 |         vocab = pickle.load(f)
35 | 
36 |     # Build models
37 |     encoder = EncoderCNN(args.embed_size).eval()  # eval mode (batchnorm uses moving mean/variance)
38 |     decoder = DecoderRNN(args.embed_size, args.hidden_size, len(vocab), args.num_layers)
39 |     encoder = encoder.to(device)
40 |     decoder = decoder.to(device)
41 | 
42 |     # Load the trained model parameters
43 |     encoder.load_state_dict(torch.load(args.encoder_path))
44 |     decoder.load_state_dict(torch.load(args.decoder_path))
45 | 
46 |     # Prepare an image
47 |     image = load_image(args.image, transform)
48 |     image_tensor = image.to(device)
49 |     
50 |     # Generate an caption from the image
51 |     feature = encoder(image_tensor)
52 |     sampled_ids = decoder.sample(feature)
53 |     sampled_ids = sampled_ids[0].cpu().numpy()          # (1, max_seq_length) -> (max_seq_length)
54 |     
55 |     # Convert word_ids to words
56 |     sampled_caption = []
57 |     for word_id in sampled_ids:
58 |         word = vocab.idx2word[word_id]
59 |         sampled_caption.append(word)
60 |         if word == '<end>':
61 |             break
62 |     sentence = ' '.join(sampled_caption)
63 |     
64 |     # Print out the image and the generated caption
65 |     print (sentence)
66 |     image = Image.open(args.image)
67 |     plt.imshow(np.asarray(image))
68 |     
69 | if __name__ == '__main__':
70 |     parser = argparse.ArgumentParser()
71 |     parser.add_argument('--image', type=str, required=True, help='input image for generating caption')
72 |     parser.add_argument('--encoder_path', type=str, default='models/encoder-5-3000.pkl', help='path for trained encoder')
73 |     parser.add_argument('--decoder_path', type=str, default='models/decoder-5-3000.pkl', help='path for trained decoder')
74 |     parser.add_argument('--vocab_path', type=str, default='data/vocab.pkl', help='path for vocabulary wrapper')
75 |     
76 |     # Model parameters (should be same as paramters in train.py)
77 |     parser.add_argument('--embed_size', type=int , default=256, help='dimension of word embedding vectors')
78 |     parser.add_argument('--hidden_size', type=int , default=512, help='dimension of lstm hidden states')
79 |     parser.add_argument('--num_layers', type=int , default=1, help='number of layers in lstm')
80 |     args = parser.parse_args()
81 |     main(args)
82 | 


--------------------------------------------------------------------------------
/tutorials/03-advanced/image_captioning/train.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | import torch
  3 | import torch.nn as nn
  4 | import numpy as np
  5 | import os
  6 | import pickle
  7 | from data_loader import get_loader 
  8 | from build_vocab import Vocabulary
  9 | from model import EncoderCNN, DecoderRNN
 10 | from torch.nn.utils.rnn import pack_padded_sequence
 11 | from torchvision import transforms
 12 | 
 13 | 
 14 | # Device configuration
 15 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
 16 | 
 17 | def main(args):
 18 |     # Create model directory
 19 |     if not os.path.exists(args.model_path):
 20 |         os.makedirs(args.model_path)
 21 |     
 22 |     # Image preprocessing, normalization for the pretrained resnet
 23 |     transform = transforms.Compose([ 
 24 |         transforms.RandomCrop(args.crop_size),
 25 |         transforms.RandomHorizontalFlip(), 
 26 |         transforms.ToTensor(), 
 27 |         transforms.Normalize((0.485, 0.456, 0.406), 
 28 |                              (0.229, 0.224, 0.225))])
 29 |     
 30 |     # Load vocabulary wrapper
 31 |     with open(args.vocab_path, 'rb') as f:
 32 |         vocab = pickle.load(f)
 33 |     
 34 |     # Build data loader
 35 |     data_loader = get_loader(args.image_dir, args.caption_path, vocab, 
 36 |                              transform, args.batch_size,
 37 |                              shuffle=True, num_workers=args.num_workers) 
 38 | 
 39 |     # Build the models
 40 |     encoder = EncoderCNN(args.embed_size).to(device)
 41 |     decoder = DecoderRNN(args.embed_size, args.hidden_size, len(vocab), args.num_layers).to(device)
 42 |     
 43 |     # Loss and optimizer
 44 |     criterion = nn.CrossEntropyLoss()
 45 |     params = list(decoder.parameters()) + list(encoder.linear.parameters()) + list(encoder.bn.parameters())
 46 |     optimizer = torch.optim.Adam(params, lr=args.learning_rate)
 47 |     
 48 |     # Train the models
 49 |     total_step = len(data_loader)
 50 |     for epoch in range(args.num_epochs):
 51 |         for i, (images, captions, lengths) in enumerate(data_loader):
 52 |             
 53 |             # Set mini-batch dataset
 54 |             images = images.to(device)
 55 |             captions = captions.to(device)
 56 |             targets = pack_padded_sequence(captions, lengths, batch_first=True)[0]
 57 |             
 58 |             # Forward, backward and optimize
 59 |             features = encoder(images)
 60 |             outputs = decoder(features, captions, lengths)
 61 |             loss = criterion(outputs, targets)
 62 |             decoder.zero_grad()
 63 |             encoder.zero_grad()
 64 |             loss.backward()
 65 |             optimizer.step()
 66 | 
 67 |             # Print log info
 68 |             if i % args.log_step == 0:
 69 |                 print('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}, Perplexity: {:5.4f}'
 70 |                       .format(epoch, args.num_epochs, i, total_step, loss.item(), np.exp(loss.item()))) 
 71 |                 
 72 |             # Save the model checkpoints
 73 |             if (i+1) % args.save_step == 0:
 74 |                 torch.save(decoder.state_dict(), os.path.join(
 75 |                     args.model_path, 'decoder-{}-{}.ckpt'.format(epoch+1, i+1)))
 76 |                 torch.save(encoder.state_dict(), os.path.join(
 77 |                     args.model_path, 'encoder-{}-{}.ckpt'.format(epoch+1, i+1)))
 78 | 
 79 | 
 80 | if __name__ == '__main__':
 81 |     parser = argparse.ArgumentParser()
 82 |     parser.add_argument('--model_path', type=str, default='models/' , help='path for saving trained models')
 83 |     parser.add_argument('--crop_size', type=int, default=224 , help='size for randomly cropping images')
 84 |     parser.add_argument('--vocab_path', type=str, default='data/vocab.pkl', help='path for vocabulary wrapper')
 85 |     parser.add_argument('--image_dir', type=str, default='data/resized2014', help='directory for resized images')
 86 |     parser.add_argument('--caption_path', type=str, default='data/annotations/captions_train2014.json', help='path for train annotation json file')
 87 |     parser.add_argument('--log_step', type=int , default=10, help='step size for prining log info')
 88 |     parser.add_argument('--save_step', type=int , default=1000, help='step size for saving trained models')
 89 |     
 90 |     # Model parameters
 91 |     parser.add_argument('--embed_size', type=int , default=256, help='dimension of word embedding vectors')
 92 |     parser.add_argument('--hidden_size', type=int , default=512, help='dimension of lstm hidden states')
 93 |     parser.add_argument('--num_layers', type=int , default=1, help='number of layers in lstm')
 94 |     
 95 |     parser.add_argument('--num_epochs', type=int, default=5)
 96 |     parser.add_argument('--batch_size', type=int, default=128)
 97 |     parser.add_argument('--num_workers', type=int, default=2)
 98 |     parser.add_argument('--learning_rate', type=float, default=0.001)
 99 |     args = parser.parse_args()
100 |     print(args)
101 |     main(args)


--------------------------------------------------------------------------------
/tutorials/03-advanced/neural_style_transfer/README.md:
--------------------------------------------------------------------------------
 1 | # Neural Style Transfer
 2 | 
 3 | [Neural style transfer](https://arxiv.org/abs/1508.06576) is an algorithm that combines the content of one image with the style of another image using CNN. Given a content image and a style image, the goal is to generate a target image that minimizes the content difference with the content image and the style difference with the style image. 
 4 | 
 5 | <p align="center"><img width="100%" src="png/neural_style2.png" /></p>
 6 | 
 7 | 
 8 | #### Content loss
 9 | 
10 | To minimize the content difference, we forward propagate the content image and the target image to pretrained [VGGNet](https://arxiv.org/abs/1409.1556) respectively, and extract feature maps from multiple convolutional layers. Then, the target image is updated to minimize the [mean-squared error](https://github.com/yunjey/pytorch-tutorial/blob/master/tutorials/03-advanced/neural_style_transfer/main.py#L81-L82) between the feature maps of the content image and its feature maps. 
11 | 
12 | #### Style loss
13 | 
14 | As in computing the content loss, we forward propagate the style image and the target image to the VGGNet and extract convolutional feature maps. To generate a texture that matches the style of the style image, we update the target image by minimizing the mean-squared error between the Gram matrix of the style image and the Gram matrix of the target image (feature correlation minimization). See [here](https://github.com/yunjey/pytorch-tutorial/blob/master/tutorials/03-advanced/neural_style_transfer/main.py#L84-L94) for how to compute the style loss.
15 | 
16 | 
17 | 
18 | 
19 | <br>
20 | 
21 | ## Usage 
22 | 
23 | ```bash
24 | $ pip install -r requirements.txt
25 | $ python main.py --content='png/content.png' --style='png/style.png'
26 | ```
27 | 
28 | <br>
29 | 
30 | ## Results
31 | The following is the result of applying variaous styles of artwork to Anne Hathaway's photograph.
32 | 
33 | ![alt text](png/neural_style.png)
34 | 


--------------------------------------------------------------------------------
/tutorials/03-advanced/neural_style_transfer/main.py:
--------------------------------------------------------------------------------
  1 | from __future__ import division
  2 | from torchvision import models
  3 | from torchvision import transforms
  4 | from PIL import Image
  5 | import argparse
  6 | import torch
  7 | import torchvision
  8 | import torch.nn as nn
  9 | import numpy as np
 10 | 
 11 | 
 12 | # Device configuration
 13 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
 14 | 
 15 | def load_image(image_path, transform=None, max_size=None, shape=None):
 16 |     """Load an image and convert it to a torch tensor."""
 17 |     image = Image.open(image_path)
 18 |     
 19 |     if max_size:
 20 |         scale = max_size / max(image.size)
 21 |         size = np.array(image.size) * scale
 22 |         image = image.resize(size.astype(int), Image.ANTIALIAS)
 23 |     
 24 |     if shape:
 25 |         image = image.resize(shape, Image.LANCZOS)
 26 |     
 27 |     if transform:
 28 |         image = transform(image).unsqueeze(0)
 29 |     
 30 |     return image.to(device)
 31 | 
 32 | 
 33 | class VGGNet(nn.Module):
 34 |     def __init__(self):
 35 |         """Select conv1_1 ~ conv5_1 activation maps."""
 36 |         super(VGGNet, self).__init__()
 37 |         self.select = ['0', '5', '10', '19', '28'] 
 38 |         self.vgg = models.vgg19(pretrained=True).features
 39 |         
 40 |     def forward(self, x):
 41 |         """Extract multiple convolutional feature maps."""
 42 |         features = []
 43 |         for name, layer in self.vgg._modules.items():
 44 |             x = layer(x)
 45 |             if name in self.select:
 46 |                 features.append(x)
 47 |         return features
 48 | 
 49 | 
 50 | def main(config):
 51 |     
 52 |     # Image preprocessing
 53 |     # VGGNet was trained on ImageNet where images are normalized by mean=[0.485, 0.456, 0.406] and std=[0.229, 0.224, 0.225].
 54 |     # We use the same normalization statistics here.
 55 |     transform = transforms.Compose([
 56 |         transforms.ToTensor(),
 57 |         transforms.Normalize(mean=(0.485, 0.456, 0.406), 
 58 |                              std=(0.229, 0.224, 0.225))])
 59 |     
 60 |     # Load content and style images
 61 |     # Make the style image same size as the content image
 62 |     content = load_image(config.content, transform, max_size=config.max_size)
 63 |     style = load_image(config.style, transform, shape=[content.size(2), content.size(3)])
 64 |     
 65 |     # Initialize a target image with the content image
 66 |     target = content.clone().requires_grad_(True)
 67 |     
 68 |     optimizer = torch.optim.Adam([target], lr=config.lr, betas=[0.5, 0.999])
 69 |     vgg = VGGNet().to(device).eval()
 70 |     
 71 |     for step in range(config.total_step):
 72 |         
 73 |         # Extract multiple(5) conv feature vectors
 74 |         target_features = vgg(target)
 75 |         content_features = vgg(content)
 76 |         style_features = vgg(style)
 77 | 
 78 |         style_loss = 0
 79 |         content_loss = 0
 80 |         for f1, f2, f3 in zip(target_features, content_features, style_features):
 81 |             # Compute content loss with target and content images
 82 |             content_loss += torch.mean((f1 - f2)**2)
 83 | 
 84 |             # Reshape convolutional feature maps
 85 |             _, c, h, w = f1.size()
 86 |             f1 = f1.view(c, h * w)
 87 |             f3 = f3.view(c, h * w)
 88 | 
 89 |             # Compute gram matrix
 90 |             f1 = torch.mm(f1, f1.t())
 91 |             f3 = torch.mm(f3, f3.t())
 92 | 
 93 |             # Compute style loss with target and style images
 94 |             style_loss += torch.mean((f1 - f3)**2) / (c * h * w) 
 95 |         
 96 |         # Compute total loss, backprop and optimize
 97 |         loss = content_loss + config.style_weight * style_loss 
 98 |         optimizer.zero_grad()
 99 |         loss.backward()
100 |         optimizer.step()
101 | 
102 |         if (step+1) % config.log_step == 0:
103 |             print ('Step [{}/{}], Content Loss: {:.4f}, Style Loss: {:.4f}' 
104 |                    .format(step+1, config.total_step, content_loss.item(), style_loss.item()))
105 | 
106 |         if (step+1) % config.sample_step == 0:
107 |             # Save the generated image
108 |             denorm = transforms.Normalize((-2.12, -2.04, -1.80), (4.37, 4.46, 4.44))
109 |             img = target.clone().squeeze()
110 |             img = denorm(img).clamp_(0, 1)
111 |             torchvision.utils.save_image(img, 'output-{}.png'.format(step+1))
112 | 
113 | 
114 | if __name__ == "__main__":
115 |     parser = argparse.ArgumentParser()
116 |     parser.add_argument('--content', type=str, default='png/content.png')
117 |     parser.add_argument('--style', type=str, default='png/style.png')
118 |     parser.add_argument('--max_size', type=int, default=400)
119 |     parser.add_argument('--total_step', type=int, default=2000)
120 |     parser.add_argument('--log_step', type=int, default=10)
121 |     parser.add_argument('--sample_step', type=int, default=500)
122 |     parser.add_argument('--style_weight', type=float, default=100)
123 |     parser.add_argument('--lr', type=float, default=0.003)
124 |     config = parser.parse_args()
125 |     print(config)
126 |     main(config)


--------------------------------------------------------------------------------
/tutorials/03-advanced/neural_style_transfer/png/content.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yunjey/pytorch-tutorial/0500d3df5a2a8080ccfccbc00aca0eacc21818db/tutorials/03-advanced/neural_style_transfer/png/content.png


--------------------------------------------------------------------------------
/tutorials/03-advanced/neural_style_transfer/png/neural_style.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yunjey/pytorch-tutorial/0500d3df5a2a8080ccfccbc00aca0eacc21818db/tutorials/03-advanced/neural_style_transfer/png/neural_style.png


--------------------------------------------------------------------------------
/tutorials/03-advanced/neural_style_transfer/png/neural_style2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yunjey/pytorch-tutorial/0500d3df5a2a8080ccfccbc00aca0eacc21818db/tutorials/03-advanced/neural_style_transfer/png/neural_style2.png


--------------------------------------------------------------------------------
/tutorials/03-advanced/neural_style_transfer/png/style.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yunjey/pytorch-tutorial/0500d3df5a2a8080ccfccbc00aca0eacc21818db/tutorials/03-advanced/neural_style_transfer/png/style.png


--------------------------------------------------------------------------------
/tutorials/03-advanced/neural_style_transfer/png/style2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yunjey/pytorch-tutorial/0500d3df5a2a8080ccfccbc00aca0eacc21818db/tutorials/03-advanced/neural_style_transfer/png/style2.png


--------------------------------------------------------------------------------
/tutorials/03-advanced/neural_style_transfer/png/style3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yunjey/pytorch-tutorial/0500d3df5a2a8080ccfccbc00aca0eacc21818db/tutorials/03-advanced/neural_style_transfer/png/style3.png


--------------------------------------------------------------------------------
/tutorials/03-advanced/neural_style_transfer/png/style4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yunjey/pytorch-tutorial/0500d3df5a2a8080ccfccbc00aca0eacc21818db/tutorials/03-advanced/neural_style_transfer/png/style4.png


--------------------------------------------------------------------------------
/tutorials/03-advanced/neural_style_transfer/requirements.txt:
--------------------------------------------------------------------------------
1 | argparse
2 | torch
3 | torchvision
4 | Pillow
5 | 


--------------------------------------------------------------------------------
/tutorials/03-advanced/variational_autoencoder/main.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import torch
  3 | import torch.nn as nn
  4 | import torch.nn.functional as F
  5 | import torchvision
  6 | from torchvision import transforms
  7 | from torchvision.utils import save_image
  8 | 
  9 | 
 10 | # Device configuration
 11 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
 12 | 
 13 | # Create a directory if not exists
 14 | sample_dir = 'samples'
 15 | if not os.path.exists(sample_dir):
 16 |     os.makedirs(sample_dir)
 17 | 
 18 | # Hyper-parameters
 19 | image_size = 784
 20 | h_dim = 400
 21 | z_dim = 20
 22 | num_epochs = 15
 23 | batch_size = 128
 24 | learning_rate = 1e-3
 25 | 
 26 | # MNIST dataset
 27 | dataset = torchvision.datasets.MNIST(root='../../data',
 28 |                                      train=True,
 29 |                                      transform=transforms.ToTensor(),
 30 |                                      download=True)
 31 | 
 32 | # Data loader
 33 | data_loader = torch.utils.data.DataLoader(dataset=dataset,
 34 |                                           batch_size=batch_size, 
 35 |                                           shuffle=True)
 36 | 
 37 | 
 38 | # VAE model
 39 | class VAE(nn.Module):
 40 |     def __init__(self, image_size=784, h_dim=400, z_dim=20):
 41 |         super(VAE, self).__init__()
 42 |         self.fc1 = nn.Linear(image_size, h_dim)
 43 |         self.fc2 = nn.Linear(h_dim, z_dim)
 44 |         self.fc3 = nn.Linear(h_dim, z_dim)
 45 |         self.fc4 = nn.Linear(z_dim, h_dim)
 46 |         self.fc5 = nn.Linear(h_dim, image_size)
 47 |         
 48 |     def encode(self, x):
 49 |         h = F.relu(self.fc1(x))
 50 |         return self.fc2(h), self.fc3(h)
 51 |     
 52 |     def reparameterize(self, mu, log_var):
 53 |         std = torch.exp(log_var/2)
 54 |         eps = torch.randn_like(std)
 55 |         return mu + eps * std
 56 | 
 57 |     def decode(self, z):
 58 |         h = F.relu(self.fc4(z))
 59 |         return F.sigmoid(self.fc5(h))
 60 |     
 61 |     def forward(self, x):
 62 |         mu, log_var = self.encode(x)
 63 |         z = self.reparameterize(mu, log_var)
 64 |         x_reconst = self.decode(z)
 65 |         return x_reconst, mu, log_var
 66 | 
 67 | model = VAE().to(device)
 68 | optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
 69 | 
 70 | # Start training
 71 | for epoch in range(num_epochs):
 72 |     for i, (x, _) in enumerate(data_loader):
 73 |         # Forward pass
 74 |         x = x.to(device).view(-1, image_size)
 75 |         x_reconst, mu, log_var = model(x)
 76 |         
 77 |         # Compute reconstruction loss and kl divergence
 78 |         # For KL divergence, see Appendix B in VAE paper or http://yunjey47.tistory.com/43
 79 |         reconst_loss = F.binary_cross_entropy(x_reconst, x, size_average=False)
 80 |         kl_div = - 0.5 * torch.sum(1 + log_var - mu.pow(2) - log_var.exp())
 81 |         
 82 |         # Backprop and optimize
 83 |         loss = reconst_loss + kl_div
 84 |         optimizer.zero_grad()
 85 |         loss.backward()
 86 |         optimizer.step()
 87 |         
 88 |         if (i+1) % 10 == 0:
 89 |             print ("Epoch[{}/{}], Step [{}/{}], Reconst Loss: {:.4f}, KL Div: {:.4f}" 
 90 |                    .format(epoch+1, num_epochs, i+1, len(data_loader), reconst_loss.item(), kl_div.item()))
 91 |     
 92 |     with torch.no_grad():
 93 |         # Save the sampled images
 94 |         z = torch.randn(batch_size, z_dim).to(device)
 95 |         out = model.decode(z).view(-1, 1, 28, 28)
 96 |         save_image(out, os.path.join(sample_dir, 'sampled-{}.png'.format(epoch+1)))
 97 | 
 98 |         # Save the reconstructed images
 99 |         out, _, _ = model(x)
100 |         x_concat = torch.cat([x.view(-1, 1, 28, 28), out.view(-1, 1, 28, 28)], dim=3)
101 |         save_image(x_concat, os.path.join(sample_dir, 'reconst-{}.png'.format(epoch+1)))


--------------------------------------------------------------------------------
/tutorials/04-utils/tensorboard/README.md:
--------------------------------------------------------------------------------
 1 | # TensorBoard in PyTorch
 2 | 
 3 | In this tutorial, we implement a MNIST classifier using a simple neural network and visualize the training process using [TensorBoard](https://www.tensorflow.org/get_started/summaries_and_tensorboard). In training phase, we plot the loss and accuracy functions through `scalar_summary` and visualize the training images through `image_summary`. In addition, we visualize the weight and gradient values of the parameters of the neural network using `histogram_summary`. PyTorch code for handling these summary functions can be found [here](https://github.com/yunjey/pytorch-tutorial/blob/master/tutorials/04-utils/tensorboard/main.py#L81-L97).
 4 | 
 5 | ![alt text](gif/tensorboard.gif)
 6 | 
 7 | <br>
 8 | 
 9 | ## Usage 
10 | 
11 | #### 1. Install the dependencies
12 | ```bash
13 | $ pip install -r requirements.txt
14 | ```
15 | 
16 | #### 2. Train the model
17 | ```bash
18 | $ python main.py
19 | ```
20 | 
21 | #### 3. Open the TensorBoard
22 | To run the TensorBoard, open a new terminal and run the command below. Then, open http://localhost:6006/ on your web browser.
23 | ```bash
24 | $ tensorboard --logdir='./logs' --port=6006
25 | ```
26 | 


--------------------------------------------------------------------------------
/tutorials/04-utils/tensorboard/gif/tensorboard.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yunjey/pytorch-tutorial/0500d3df5a2a8080ccfccbc00aca0eacc21818db/tutorials/04-utils/tensorboard/gif/tensorboard.gif


--------------------------------------------------------------------------------
/tutorials/04-utils/tensorboard/logger.py:
--------------------------------------------------------------------------------
 1 | # Code referenced from https://gist.github.com/gyglim/1f8dfb1b5c82627ae3efcfbbadb9f514
 2 | import tensorflow as tf
 3 | import numpy as np
 4 | import scipy.misc 
 5 | try:
 6 |     from StringIO import StringIO  # Python 2.7
 7 | except ImportError:
 8 |     from io import BytesIO         # Python 3.x
 9 | 
10 | 
11 | class Logger(object):
12 |     
13 |     def __init__(self, log_dir):
14 |         """Create a summary writer logging to log_dir."""
15 |         self.writer = tf.summary.FileWriter(log_dir)
16 | 
17 |     def scalar_summary(self, tag, value, step):
18 |         """Log a scalar variable."""
19 |         summary = tf.Summary(value=[tf.Summary.Value(tag=tag, simple_value=value)])
20 |         self.writer.add_summary(summary, step)
21 | 
22 |     def image_summary(self, tag, images, step):
23 |         """Log a list of images."""
24 | 
25 |         img_summaries = []
26 |         for i, img in enumerate(images):
27 |             # Write the image to a string
28 |             try:
29 |                 s = StringIO()
30 |             except:
31 |                 s = BytesIO()
32 |             scipy.misc.toimage(img).save(s, format="png")
33 | 
34 |             # Create an Image object
35 |             img_sum = tf.Summary.Image(encoded_image_string=s.getvalue(),
36 |                                        height=img.shape[0],
37 |                                        width=img.shape[1])
38 |             # Create a Summary value
39 |             img_summaries.append(tf.Summary.Value(tag='%s/%d' % (tag, i), image=img_sum))
40 | 
41 |         # Create and write Summary
42 |         summary = tf.Summary(value=img_summaries)
43 |         self.writer.add_summary(summary, step)
44 |         
45 |     def histo_summary(self, tag, values, step, bins=1000):
46 |         """Log a histogram of the tensor of values."""
47 | 
48 |         # Create a histogram using numpy
49 |         counts, bin_edges = np.histogram(values, bins=bins)
50 | 
51 |         # Fill the fields of the histogram proto
52 |         hist = tf.HistogramProto()
53 |         hist.min = float(np.min(values))
54 |         hist.max = float(np.max(values))
55 |         hist.num = int(np.prod(values.shape))
56 |         hist.sum = float(np.sum(values))
57 |         hist.sum_squares = float(np.sum(values**2))
58 | 
59 |         # Drop the start of the first bin
60 |         bin_edges = bin_edges[1:]
61 | 
62 |         # Add bin edges and counts
63 |         for edge in bin_edges:
64 |             hist.bucket_limit.append(edge)
65 |         for c in counts:
66 |             hist.bucket.append(c)
67 | 
68 |         # Create and write Summary
69 |         summary = tf.Summary(value=[tf.Summary.Value(tag=tag, histo=hist)])
70 |         self.writer.add_summary(summary, step)
71 |         self.writer.flush()


--------------------------------------------------------------------------------
/tutorials/04-utils/tensorboard/main.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import torch.nn as nn
 3 | import torchvision
 4 | from torchvision import transforms
 5 | from logger import Logger
 6 | 
 7 | 
 8 | # Device configuration
 9 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
10 | 
11 | # MNIST dataset 
12 | dataset = torchvision.datasets.MNIST(root='../../data', 
13 |                                      train=True, 
14 |                                      transform=transforms.ToTensor(),  
15 |                                      download=True)
16 | 
17 | # Data loader
18 | data_loader = torch.utils.data.DataLoader(dataset=dataset, 
19 |                                           batch_size=100, 
20 |                                           shuffle=True)
21 | 
22 | 
23 | # Fully connected neural network with one hidden layer
24 | class NeuralNet(nn.Module):
25 |     def __init__(self, input_size=784, hidden_size=500, num_classes=10):
26 |         super(NeuralNet, self).__init__()
27 |         self.fc1 = nn.Linear(input_size, hidden_size) 
28 |         self.relu = nn.ReLU()
29 |         self.fc2 = nn.Linear(hidden_size, num_classes)  
30 |     
31 |     def forward(self, x):
32 |         out = self.fc1(x)
33 |         out = self.relu(out)
34 |         out = self.fc2(out)
35 |         return out
36 | 
37 | model = NeuralNet().to(device)
38 | 
39 | logger = Logger('./logs')
40 | 
41 | # Loss and optimizer
42 | criterion = nn.CrossEntropyLoss()  
43 | optimizer = torch.optim.Adam(model.parameters(), lr=0.00001)  
44 | 
45 | data_iter = iter(data_loader)
46 | iter_per_epoch = len(data_loader)
47 | total_step = 50000
48 | 
49 | # Start training
50 | for step in range(total_step):
51 |     
52 |     # Reset the data_iter
53 |     if (step+1) % iter_per_epoch == 0:
54 |         data_iter = iter(data_loader)
55 | 
56 |     # Fetch images and labels
57 |     images, labels = next(data_iter)
58 |     images, labels = images.view(images.size(0), -1).to(device), labels.to(device)
59 |     
60 |     # Forward pass
61 |     outputs = model(images)
62 |     loss = criterion(outputs, labels)
63 |     
64 |     # Backward and optimize
65 |     optimizer.zero_grad()
66 |     loss.backward()
67 |     optimizer.step()
68 | 
69 |     # Compute accuracy
70 |     _, argmax = torch.max(outputs, 1)
71 |     accuracy = (labels == argmax.squeeze()).float().mean()
72 | 
73 |     if (step+1) % 100 == 0:
74 |         print ('Step [{}/{}], Loss: {:.4f}, Acc: {:.2f}' 
75 |                .format(step+1, total_step, loss.item(), accuracy.item()))
76 | 
77 |         # ================================================================== #
78 |         #                        Tensorboard Logging                         #
79 |         # ================================================================== #
80 | 
81 |         # 1. Log scalar values (scalar summary)
82 |         info = { 'loss': loss.item(), 'accuracy': accuracy.item() }
83 | 
84 |         for tag, value in info.items():
85 |             logger.scalar_summary(tag, value, step+1)
86 | 
87 |         # 2. Log values and gradients of the parameters (histogram summary)
88 |         for tag, value in model.named_parameters():
89 |             tag = tag.replace('.', '/')
90 |             logger.histo_summary(tag, value.data.cpu().numpy(), step+1)
91 |             logger.histo_summary(tag+'/grad', value.grad.data.cpu().numpy(), step+1)
92 | 
93 |         # 3. Log training images (image summary)
94 |         info = { 'images': images.view(-1, 28, 28)[:10].cpu().numpy() }
95 | 
96 |         for tag, images in info.items():
97 |             logger.image_summary(tag, images, step+1)


--------------------------------------------------------------------------------
/tutorials/04-utils/tensorboard/requirements.txt:
--------------------------------------------------------------------------------
1 | tensorflow
2 | torch
3 | torchvision
4 | scipy
5 | numpy
6 | 


--------------------------------------------------------------------------------