├── chapter7 ├── casper.jpg ├── casper2.jpg ├── flame_graphs │ ├── create_flame_graph_bad_random.sh │ ├── create_flame_graph_good_random.sh │ ├── good_random.py │ └── bad_random.py └── tensorboard-example.py ├── chapter8 ├── catfish │ ├── run-flask-server.sh │ ├── run-waitress-server.sh │ ├── get-prediction.sh │ ├── catfish_model.py │ └── catfish_server.py ├── catfish_docker_local │ ├── run-docker.sh │ ├── run-model-service.sh │ ├── get-prediction.sh │ ├── catfish_model.py │ ├── Dockerfile │ └── catfish_server.py ├── catfish_docker_cloud │ ├── run-model-service.sh │ ├── run-docker.sh │ ├── get-prediction.sh │ ├── catfish_model.py │ ├── Dockerfile │ └── catfish_server.py ├── libtorch │ ├── hello │ │ ├── hello.cpp │ │ └── CMakeLists.txt │ └── load-cnn │ │ ├── CMakeLists.txt │ │ ├── load-cnn.cpp │ │ └── cnnnet.py ├── CMakeLists.txt └── Chapter_8_5_Quantizing_Models.ipynb ├── requirements.txt ├── requirements_cuda_available.txt ├── environment.yml ├── LICENSE ├── chapter2 ├── download.py └── Chapter 2.ipynb ├── train.py ├── chapter9 ├── fastai.ipynb ├── Fast_bert_.ipynb └── Chapter9.5.ipynb ├── README.md ├── chapter3 └── Chapter 3.ipynb ├── chapter4 └── Chapter 4.ipynb ├── chapter5 └── Chapter 5.ipynb └── chapter6 └── Chapter 6.ipynb /chapter7/casper.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/falloutdurham/beginners-pytorch-deep-learning/HEAD/chapter7/casper.jpg -------------------------------------------------------------------------------- /chapter7/casper2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/falloutdurham/beginners-pytorch-deep-learning/HEAD/chapter7/casper2.jpg -------------------------------------------------------------------------------- /chapter8/catfish/run-flask-server.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | #run-flask-server.sh 3 | 4 | FLASK_APP=catfish_server.py FLASK_RUN_PORT=8080 flask run -------------------------------------------------------------------------------- /chapter8/catfish/run-waitress-server.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | #run-waitress-server.sh 3 | 4 | waitress-serve --call 'catfish_server:create_app' -------------------------------------------------------------------------------- /chapter7/flame_graphs/create_flame_graph_bad_random.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | py-spy record -r 99 -d 30 -o badrandom.svg -- python bad_random.py -------------------------------------------------------------------------------- /chapter7/flame_graphs/create_flame_graph_good_random.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | py-spy record -r 99 -d 30 -o goodrandom.svg -- python good_random.py -------------------------------------------------------------------------------- /chapter8/catfish_docker_local/run-docker.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | #run-docker.sh 3 | 4 | docker build -t catfish-service . 5 | docker run -d -p 5000:5000 catfish-service:latest 6 | -------------------------------------------------------------------------------- /chapter8/catfish_docker_cloud/run-model-service.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | #run-model-service.sh 3 | 4 | cd /app 5 | waitress-serve --port ${CATFISH_PORT} --call 'catfish_server:create_app' 6 | -------------------------------------------------------------------------------- /chapter8/catfish_docker_local/run-model-service.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | #run-model-service.sh 3 | 4 | cd /app 5 | waitress-serve --port ${CATFISH_PORT} --call 'catfish_server:create_app' 6 | -------------------------------------------------------------------------------- /chapter8/catfish_docker_cloud/run-docker.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | #run-docker.sh 3 | 4 | docker build -t catfish-service . 5 | docker run -d -p 5000:5000 --env CATFISH_MODEL_LOCATION=[URL] catfish-service:latest -------------------------------------------------------------------------------- /chapter8/libtorch/hello/hello.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | int main() { 5 | torch::Tensor tensor = torch::ones({2, 2}); 6 | std::cout << tensor << std::endl; 7 | } 8 | -------------------------------------------------------------------------------- /chapter8/catfish/get-prediction.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | #run-model-service.sh 3 | 4 | curl http://127.0.0.1:8080/predict\?image_url\=https://upload.wikimedia.org/wikipedia/commons/thumb/3/36/A_domestic_shorthair_tortie-tabby_cat.jpg/412px-A_domestic_shorthair_tortie-tabby_cat.jpg -------------------------------------------------------------------------------- /chapter8/catfish_docker_cloud/get-prediction.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | #get-prediction.sh 3 | 4 | curl http://127.0.0.1:5000/predict?image_url=https://upload.wikimedia.org/wikipedia/commons/thumb/3/36/A_domestic_shorthair_tortie-tabby_cat.jpg/412px-A_domestic_shorthair_tortie-tabby_cat.jpg -------------------------------------------------------------------------------- /chapter8/catfish_docker_local/get-prediction.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | #get-prediction.sh 3 | 4 | curl http://127.0.0.1:5000/predict?image_url=https://upload.wikimedia.org/wikipedia/commons/thumb/3/36/A_domestic_shorthair_tortie-tabby_cat.jpg/412px-A_domestic_shorthair_tortie-tabby_cat.jpg -------------------------------------------------------------------------------- /chapter8/CMakeLists.txt: -------------------------------------------------------------------------------- 1 | cmake_minimum_required(VERSION 3.0 FATAL_ERROR) 2 | project(load-cnn) 3 | 4 | find_package(Torch REQUIRED) 5 | 6 | add_executable(load-cnn load-cnn.cpp) 7 | target_link_libraries(load-cnn "${TORCH_LIBRARIES}") 8 | set_property(TARGET load-cnn PROPERTY CXX_STANDARD 11) 9 | -------------------------------------------------------------------------------- /chapter8/catfish/catfish_model.py: -------------------------------------------------------------------------------- 1 | import torch.nn as nn 2 | from torchvision import models 3 | 4 | CatfishClasses = ["cat","fish"] 5 | 6 | CatfishModel = models.resnet50() 7 | CatfishModel.fc = nn.Sequential(nn.Linear(CatfishModel.fc.in_features,500), 8 | nn.ReLU(), 9 | nn.Dropout(), nn.Linear(500,2)) 10 | -------------------------------------------------------------------------------- /chapter8/libtorch/hello/CMakeLists.txt: -------------------------------------------------------------------------------- 1 | cmake_minimum_required(VERSION 3.0 FATAL_ERROR) 2 | project(hello) 3 | 4 | find_package(Torch REQUIRED) 5 | set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${TORCH_CXX_FLAGS}") 6 | 7 | add_executable(hello hello.cpp) 8 | target_link_libraries(hello "${TORCH_LIBRARIES}") 9 | set_property(TARGET hello PROPERTY CXX_STANDARD 14) -------------------------------------------------------------------------------- /chapter8/catfish_docker_cloud/catfish_model.py: -------------------------------------------------------------------------------- 1 | import torch.nn as nn 2 | from torchvision import models 3 | 4 | CatfishClasses = ["cat","fish"] 5 | 6 | CatfishModel = models.resnet50() 7 | CatfishModel.fc = nn.Sequential(nn.Linear(CatfishModel.fc.in_features,500), 8 | nn.ReLU(), 9 | nn.Dropout(), nn.Linear(500,2)) 10 | -------------------------------------------------------------------------------- /chapter8/catfish_docker_local/catfish_model.py: -------------------------------------------------------------------------------- 1 | import torch.nn as nn 2 | from torchvision import models 3 | 4 | CatfishClasses = ["cat","fish"] 5 | 6 | CatfishModel = models.resnet50() 7 | CatfishModel.fc = nn.Sequential(nn.Linear(CatfishModel.fc.in_features,500), 8 | nn.ReLU(), 9 | nn.Dropout(), nn.Linear(500,2)) 10 | -------------------------------------------------------------------------------- /chapter8/libtorch/load-cnn/CMakeLists.txt: -------------------------------------------------------------------------------- 1 | cmake_minimum_required(VERSION 3.0 FATAL_ERROR) 2 | project(load-cnn) 3 | 4 | find_package(Torch REQUIRED) 5 | set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${TORCH_CXX_FLAGS}") 6 | 7 | add_executable(load-cnn load-cnn.cpp) 8 | target_link_libraries(load-cnn "${TORCH_LIBRARIES}") 9 | set_property(TARGET load-cnn PROPERTY CXX_STANDARD 14) -------------------------------------------------------------------------------- /chapter8/catfish_docker_cloud/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM continuumio/miniconda3:latest 2 | 3 | ARG port=5000 4 | 5 | ENV CATFISH_PORT=$port 6 | 7 | RUN conda install -y flask \ 8 | && conda install -c pytorch torchvision \ 9 | && conda install waitress 10 | RUN mkdir -p /app 11 | 12 | COPY ./catfish_model.py /app 13 | COPY ./catfish_server.py /app 14 | COPY ./run-model-service.sh / 15 | 16 | EXPOSE $port 17 | 18 | ENTRYPOINT ["/run-model-service.sh"] -------------------------------------------------------------------------------- /chapter8/libtorch/load-cnn/load-cnn.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | 5 | int main(int argc, const char* argv[]) { 6 | 7 | torch::jit::script::Module module = torch::jit::load("cnnnet"); 8 | 9 | std::cout << "model loaded ok\n"; 10 | 11 | std::vector inputs; 12 | inputs.push_back(torch::rand({1, 3, 224, 224})); 13 | 14 | at::Tensor output = module.forward(inputs).toTensor(); 15 | 16 | std::cout << output << '\n'; 17 | } 18 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | cmake==3.12.0 2 | 3 | docker==4.3.1 4 | 5 | fastai==1.0.61 6 | 7 | fast-bert==1.9.1 8 | 9 | Flask==1.1.2 10 | 11 | googletrans==3.0.0 12 | 13 | jupyter==1.0.0 14 | 15 | matplotlib==3.2.2 16 | 17 | numpy==1.18.5 18 | 19 | pandas==1.0.5 20 | 21 | Pillow==7.0.0 22 | 23 | py-spy==0.3.3 24 | 25 | python == 3.6.9 26 | 27 | requests==2.23.0 28 | 29 | spacy==2.2.4 30 | 31 | tensorboard==2.3.0 32 | 33 | torch==1.6.0 34 | 35 | torchaudio==0.6.0 36 | 37 | torchtext==0.7.0 38 | 39 | torchvision==0.7.0 40 | 41 | transformers==3.0.2 42 | 43 | urllib3==1.24.3 -------------------------------------------------------------------------------- /chapter8/catfish_docker_local/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM continuumio/miniconda3:latest 2 | 3 | ARG model_parameter_file=catfishweights.pth 4 | ARG port=5000 5 | 6 | ENV CATFISH_PORT=$port 7 | ENV CATFISH_MODEL_LOCATION=/app/$model_parameter_file 8 | 9 | RUN conda install -y flask \ 10 | && conda install -c pytorch torchvision \ 11 | && conda install waitress 12 | RUN mkdir -p /app 13 | 14 | COPY ./catfish_model.py /app 15 | COPY ./catfish_server.py /app 16 | COPY ./$model_parameter_file /app/ 17 | COPY ./run-model-service.sh / 18 | 19 | EXPOSE $port 20 | 21 | ENTRYPOINT ["/run-model-service.sh"] -------------------------------------------------------------------------------- /requirements_cuda_available.txt: -------------------------------------------------------------------------------- 1 | cmake==3.12.0 2 | 3 | docker==4.3.1 4 | 5 | fastai==1.0.61 6 | 7 | fast-bert==1.9.1 8 | 9 | Flask==1.1.2 10 | 11 | googletrans==3.0.0 12 | 13 | jupyter==1.0.0 14 | 15 | matplotlib==3.2.2 16 | 17 | numpy==1.18.5 18 | 19 | pandas==1.0.5 20 | 21 | Pillow==7.0.0 22 | 23 | py-spy==0.3.3 24 | 25 | python == 3.6.9 26 | 27 | requests==2.23.0 28 | 29 | spacy==2.2.4 30 | 31 | tensorboard==2.3.0 32 | 33 | torch==1.6.0+cu101 34 | 35 | torchaudio==0.6.0 36 | 37 | torchtext==0.7.0 38 | 39 | torchvision==0.7.0+cu101 40 | 41 | transformers==3.0.2 42 | 43 | urllib3==1.24.3 -------------------------------------------------------------------------------- /environment.yml: -------------------------------------------------------------------------------- 1 | name: myenv 2 | 3 | channels: 4 | - pytorch 5 | - conda-forge 6 | - defaults 7 | 8 | dependencies: 9 | - cmake=3.12.0 10 | - cudatoolkit-dev=10.1 11 | - Flask=1.1.2 12 | - jupyter=1.0.0 13 | - matplotlib=3.2.2 14 | - numpy=1.18.5 15 | - pandas=1.0.5 16 | - Pillow=7.0.0 17 | - pip=20.2 18 | - py-spy=0.3.3 19 | - python>=3.6.9 20 | - pytorch=1.6.0 21 | - requests=2.23.0 22 | - spacy=2.2.4 23 | - tensorboard=2.3.0 24 | - torchaudio=0.6.0 25 | - torchtext=0.7.0 26 | - torchvision=0.7.0 27 | - transformers=3.0.2 28 | - urllib3=1.24.3 29 | - pip: 30 | - fast-bert==1.9.1 31 | - googletrans==3.0.0 32 | - docker==4.3.1 33 | - fastai==1.0.61 34 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2019 Ian Pointer 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /chapter8/catfish/catfish_server.py: -------------------------------------------------------------------------------- 1 | import requests 2 | import torch 3 | from flask import Flask, jsonify, request 4 | from io import BytesIO 5 | from PIL import Image 6 | from torchvision import transforms 7 | 8 | from catfish_model import CatfishModel, CatfishClasses 9 | 10 | def load_model(): 11 | m = CatfishModel 12 | m.eval() 13 | return m 14 | 15 | model = load_model() 16 | 17 | img_transforms = transforms.Compose([ 18 | transforms.Resize((224,224)), 19 | transforms.ToTensor(), 20 | transforms.Normalize(mean=[0.485, 0.456, 0.406], 21 | std=[0.229, 0.224, 0.225]) 22 | ]) 23 | 24 | def create_app(): 25 | app = Flask(__name__) 26 | 27 | @app.route("/") 28 | def status(): 29 | return jsonify({"status": "ok"}) 30 | 31 | @app.route("/predict", methods=['GET', 'POST']) 32 | def predict(): 33 | if request.method == 'POST': 34 | img_url = request.form.image_url 35 | else: 36 | img_url = request.args.get('image_url', '') 37 | 38 | response = requests.get(img_url) 39 | img = Image.open(BytesIO(response.content)) 40 | img_tensor = img_transforms(img).unsqueeze(0) 41 | prediction = model(img_tensor) 42 | predicted_class = CatfishClasses[torch.argmax(prediction)] 43 | return jsonify({"image": img_url, "prediction": predicted_class}) 44 | 45 | return app -------------------------------------------------------------------------------- /chapter8/catfish_docker_local/catfish_server.py: -------------------------------------------------------------------------------- 1 | import os 2 | import requests 3 | import torch 4 | from flask import Flask, jsonify, request 5 | from io import BytesIO 6 | from PIL import Image 7 | from torchvision import transforms 8 | 9 | from catfish_model import CatfishModel, CatfishClasses 10 | 11 | 12 | def load_model(): 13 | location = os.environ["CATFISH_MODEL_LOCATION"] 14 | m = CatfishModel 15 | m.load_state_dict(torch.load(location, map_location="cpu")) 16 | m.eval() 17 | return m 18 | 19 | 20 | model = load_model() 21 | 22 | img_transforms = transforms.Compose([ 23 | transforms.Resize((224,224)), 24 | transforms.ToTensor(), 25 | transforms.Normalize(mean=[0.485, 0.456, 0.406], 26 | std=[0.229, 0.224, 0.225] ) 27 | ]) 28 | 29 | def create_app(): 30 | app = Flask(__name__) 31 | 32 | @app.route("/") 33 | def status(): 34 | return jsonify({"status": "ok"}) 35 | 36 | @app.route("/predict", methods=['GET', 'POST']) 37 | def predict(): 38 | if request.method == 'POST': 39 | img_url = request.form.image_url 40 | else: 41 | img_url = request.args.get('image_url', '') 42 | 43 | response = requests.get(img_url) 44 | img = Image.open(BytesIO(response.content)) 45 | img_tensor = img_transforms(img).unsqueeze(0) 46 | prediction = model(img_tensor) 47 | predicted_class = CatfishClasses[torch.argmax(prediction)] 48 | return jsonify({"image": img_url, "prediction": predicted_class}) 49 | 50 | return app -------------------------------------------------------------------------------- /chapter8/libtorch/load-cnn/cnnnet.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from torch import nn 3 | 4 | 5 | class CNNNet(nn.Module): 6 | def __init__(self, num_classes=2): 7 | super(CNNNet, self).__init__() 8 | self.features = nn.Sequential( 9 | nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2), 10 | nn.ReLU(), 11 | nn.MaxPool2d(kernel_size=3, stride=2), 12 | nn.Conv2d(64, 192, kernel_size=5, padding=2), 13 | nn.ReLU(), 14 | nn.MaxPool2d(kernel_size=3, stride=2), 15 | nn.Conv2d(192, 384, kernel_size=3, padding=1), 16 | nn.ReLU(), 17 | nn.Conv2d(384, 256, kernel_size=3, padding=1), 18 | nn.ReLU(), 19 | nn.Conv2d(256, 256, kernel_size=3, padding=1), 20 | nn.ReLU(), 21 | nn.MaxPool2d(kernel_size=3, stride=2), 22 | ) 23 | self.avgpool = nn.AdaptiveAvgPool2d((6, 6)) 24 | self.classifier = nn.Sequential( 25 | nn.Dropout(), 26 | nn.Linear(256 * 6 * 6, 4096), 27 | nn.ReLU(), 28 | nn.Dropout(), 29 | nn.Linear(4096, 4096), 30 | nn.ReLU(), 31 | nn.Linear(4096, num_classes), 32 | ) 33 | 34 | def forward(self, x): 35 | x = self.features(x) 36 | x = self.avgpool(x) 37 | x = torch.flatten(x, 1) 38 | x = self.classifier(x) 39 | return x 40 | 41 | 42 | cnn_model = CNNNet() 43 | cnn_model.eval() 44 | cnn_traced = torch.jit.trace(cnn_model, torch.rand([1, 3, 224, 224])) 45 | torch.jit.save(cnn_traced, "cnnnet") 46 | -------------------------------------------------------------------------------- /chapter8/catfish_docker_cloud/catfish_server.py: -------------------------------------------------------------------------------- 1 | import os 2 | import requests 3 | import torch 4 | from flask import Flask, jsonify, request 5 | from io import BytesIO 6 | from PIL import Image 7 | from shutil import copyfileobj 8 | from tempfile import NamedTemporaryFile 9 | from torchvision import transforms 10 | from urllib.request import urlopen 11 | 12 | from catfish_model import CatfishModel, CatfishClasses 13 | 14 | 15 | def load_model(): 16 | m = CatfishModel 17 | if "CATFISH_MODEL_LOCATION" in os.environ: 18 | parameter_url = os.environ["CATFISH_MODEL_LOCATION"] 19 | print(f"downloading {parameter_url}") 20 | with urlopen(parameter_url) as fsrc, NamedTemporaryFile() as fdst: 21 | copyfileobj(fsrc, fdst) 22 | m.load_state_dict(torch.load(fdst, map_location="cpu")) 23 | m.eval() 24 | return m 25 | 26 | 27 | model = load_model() 28 | 29 | img_transforms = transforms.Compose([ 30 | transforms.Resize((224,224)), 31 | transforms.ToTensor(), 32 | transforms.Normalize(mean=[0.485, 0.456, 0.406], 33 | std=[0.229, 0.224, 0.225]) 34 | ]) 35 | 36 | def create_app(): 37 | app = Flask(__name__) 38 | 39 | @app.route("/") 40 | def status(): 41 | return jsonify({"status": "ok"}) 42 | 43 | @app.route("/predict", methods=['GET', 'POST']) 44 | def predict(): 45 | if request.method == 'POST': 46 | img_url = request.form.image_url 47 | else: 48 | img_url = request.args.get('image_url', '') 49 | 50 | response = requests.get(img_url) 51 | img = Image.open(BytesIO(response.content)) 52 | img_tensor = img_transforms(img).unsqueeze(0) 53 | prediction = model(img_tensor) 54 | predicted_class = CatfishClasses[torch.argmax(prediction)] 55 | return jsonify({"image": img_url, "prediction": predicted_class}) 56 | 57 | return app 58 | -------------------------------------------------------------------------------- /chapter2/download.py: -------------------------------------------------------------------------------- 1 | # download.py 2 | 3 | import os 4 | import sys 5 | import urllib3 6 | from urllib.parse import urlparse 7 | import pandas as pd 8 | import itertools 9 | import shutil 10 | 11 | from urllib3.util import Retry 12 | 13 | urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning) 14 | 15 | classes = ["cat", "fish"] 16 | set_types = ["train", "test", "val"] 17 | 18 | def download_image(url, klass, data_type): 19 | basename = os.path.basename(urlparse(url).path) 20 | filename = "{}/{}/{}".format(data_type, klass, basename) 21 | if not os.path.exists(filename): 22 | try: 23 | http = urllib3.PoolManager(retries=Retry(connect=1, read=1, redirect=2)) 24 | with http.request("GET", url, preload_content=False) as resp, open( 25 | filename, "wb" 26 | ) as out_file: 27 | if resp.status == 200: 28 | shutil.copyfileobj(resp, out_file) 29 | else: 30 | print("Error downloading {}".format(url)) 31 | resp.release_conn() 32 | except: 33 | print("Error downloading {}".format(url)) 34 | 35 | 36 | if __name__ == "__main__": 37 | if not os.path.exists("images.csv"): 38 | print("Error: can't find images.csv!") 39 | sys.exit(0) 40 | 41 | # get args and create output directory 42 | imagesDF = pd.read_csv("images.csv") 43 | 44 | for set_type, klass in list(itertools.product(set_types, classes)): 45 | path = "./{}/{}".format(set_type, klass) 46 | if not os.path.exists(path): 47 | print("Creating directory {}".format(path)) 48 | os.makedirs(path) 49 | 50 | print("Downloading {} images".format(len(imagesDF))) 51 | 52 | result = [ 53 | download_image(url, klass, data_type) 54 | for url, klass, data_type in zip( 55 | imagesDF["url"], imagesDF["class"], imagesDF["type"] 56 | ) 57 | ] 58 | sys.exit(0) 59 | -------------------------------------------------------------------------------- /chapter7/flame_graphs/good_random.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torchvision 3 | from torch import optim 4 | import torch.nn as nn 5 | from torch.utils.tensorboard import SummaryWriter 6 | from torchvision import datasets, transforms,models 7 | import torch.utils.data 8 | 9 | 10 | model = models.resnet18(pretrained=True) 11 | device = "cuda:0" 12 | 13 | def add_gpu_noise(device, tensor): 14 | a = torch.randn_like(tensor).to(device) 15 | return tensor + a 16 | 17 | train_data_path = "." # Add correct path here! 18 | model.to(device) 19 | image_transforms = torchvision.transforms.Compose([transforms.Resize((224,224)), transforms.ToTensor()]) 20 | 21 | train_data = torchvision.datasets.ImageFolder(root=train_data_path,transform=image_transforms) 22 | batch_size=32 23 | train_data_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size) 24 | 25 | optimizer = optim.Adam(model.parameters(), lr=2e-2) 26 | criterion = nn.CrossEntropyLoss() 27 | 28 | def train(model, optimizer, loss_fn, train_loader, val_loader, epochs=20, device='cuda:0'): 29 | model.to(device) 30 | for epoch in range(epochs): 31 | print(f"epoch {epoch}") 32 | model.train() 33 | for batch in train_loader: 34 | optimizer.zero_grad() 35 | ww, target = batch 36 | ww = ww.to(device) 37 | ww = add_gpu_noise(device,ww) 38 | target= target.to(device) 39 | output = model(ww) 40 | loss = loss_fn(output, target) 41 | loss.backward() 42 | optimizer.step() 43 | 44 | model.eval() 45 | num_correct = 0 46 | num_examples = 0 47 | for batch in val_loader: 48 | ww, target = batch 49 | ww = ww.to(device) 50 | target= target.to(device) 51 | output = model(ww) 52 | correct = torch.eq(torch.max(output, dim=1)[1], target).view(-1) 53 | num_correct += torch.sum(correct).item() 54 | num_examples += correct.shape[0] 55 | print("Epoch {}, accuracy = {:.2f}".format(epoch, num_correct / num_examples)) 56 | 57 | train(model,optimizer,criterion,train_data_loader,train_data_loader,epochs=1) 58 | -------------------------------------------------------------------------------- /chapter7/flame_graphs/bad_random.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torchvision 3 | from torch import optim 4 | import torch.nn as nn 5 | from torch.utils.tensorboard import SummaryWriter 6 | from torchvision import datasets, transforms,models 7 | import torch.utils.data 8 | from PIL import Image 9 | import numpy as np 10 | 11 | model = models.resnet18(pretrained=True) 12 | device = "cuda:0" 13 | 14 | class BadRandom(object): 15 | def __call__(self, img): 16 | img_np = np.array(img) 17 | random = np.random.random_sample(img_np.shape) 18 | out_np = img_np + random 19 | out = Image.fromarray(out_np.astype('uint8'), 'RGB') 20 | return out 21 | 22 | def __repr__(self): 23 | str = f"{self.__class__.__name__ }" 24 | return str 25 | 26 | train_data_path = "." # Add correct path here! 27 | model.to(device) 28 | image_transforms = torchvision.transforms.Compose([transforms.Resize((224,224)),BadRandom(), transforms.ToTensor()]) 29 | 30 | train_data = torchvision.datasets.ImageFolder(root=train_data_path,transform=image_transforms) 31 | batch_size=32 32 | train_data_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size) 33 | 34 | optimizer = optim.Adam(model.parameters(), lr=2e-2) 35 | criterion = nn.CrossEntropyLoss() 36 | 37 | def train(model, optimizer, loss_fn, train_loader, val_loader, epochs=20, device='cuda:0'): 38 | model.to(device) 39 | for epoch in range(1, epochs+1): 40 | print(f"epoch {epoch}") 41 | model.train() 42 | for batch in train_loader: 43 | optimizer.zero_grad() 44 | ww, target = batch 45 | ww = ww.to(device) 46 | target= target.to(device) 47 | output = model(ww) 48 | loss = loss_fn(output, target) 49 | loss.backward() 50 | optimizer.step() 51 | 52 | model.eval() 53 | num_correct = 0 54 | num_examples = 0 55 | for batch in val_loader: 56 | ww, target = batch 57 | ww = ww.to(device) 58 | target= target.to(device) 59 | output = model(ww) 60 | correct = torch.eq(torch.max(output, dim=1)[1], target).view(-1) 61 | num_correct += torch.sum(correct).item() 62 | num_examples += correct.shape[0] 63 | print("Epoch {}, accuracy = {:.2f}".format(epoch, num_correct / num_examples)) 64 | 65 | train(model,optimizer,criterion,train_data_loader,train_data_loader,epochs=1) 66 | -------------------------------------------------------------------------------- /chapter7/tensorboard-example.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import torch.utils.data 4 | import torchvision 5 | from functools import partial 6 | from torch import optim 7 | from torch.utils.tensorboard import SummaryWriter 8 | from torchvision import datasets, transforms 9 | 10 | # Writer will output to ./runs/ directory by default 11 | 12 | writer = SummaryWriter() 13 | 14 | transform = transforms.Compose( 15 | [transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))] 16 | ) 17 | trainset = datasets.MNIST("mnist_train", train=True, download=True, transform=transform) 18 | train_data_loader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True) 19 | model = torchvision.models.resnet50(False) 20 | 21 | model.conv1 = torch.nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3, bias=False) 22 | images, labels = next(iter(train_data_loader)) 23 | 24 | grid = torchvision.utils.make_grid(images) 25 | writer.add_image("images", grid, 0) 26 | writer.add_graph(model, images) 27 | 28 | 29 | def send_stats(i, module, input, output): 30 | writer.add_scalar(f"layer {i}-mean", output.data.mean()) 31 | writer.add_scalar(f"layer {i}-stddev", output.data.std()) 32 | 33 | 34 | for i, m in enumerate(model.children()): 35 | m.register_forward_hook(partial(send_stats, i)) 36 | 37 | # Now train the model and watch output in Tensorboard 38 | 39 | optimizer = optim.Adam(model.parameters(), lr=2e-2) 40 | criterion = nn.CrossEntropyLoss() 41 | 42 | 43 | def train( 44 | model, optimizer, loss_fn, train_loader, val_loader, epochs=20, device="cuda:0" 45 | ): 46 | model.to(device) 47 | for epoch in range(epochs): 48 | print(f"epoch {epoch+1}") 49 | model.train() 50 | for batch in train_loader: 51 | optimizer.zero_grad() 52 | ww, target = batch 53 | ww = ww.to(device) 54 | target = target.to(device) 55 | output = model(ww) 56 | loss = loss_fn(output, target) 57 | loss.backward() 58 | optimizer.step() 59 | 60 | model.eval() 61 | num_correct = 0 62 | num_examples = 0 63 | for batch in val_loader: 64 | ww, target = batch 65 | ww = ww.to(device) 66 | target = target.to(device) 67 | output = model(ww) 68 | correct = torch.eq(torch.max(output, dim=1)[1], target).view(-1) 69 | num_correct += torch.sum(correct).item() 70 | num_examples += correct.shape[0] 71 | print("Epoch {}, accuracy = {:.2f}".format(epoch+1, num_correct / num_examples)) 72 | 73 | 74 | train(model, optimizer, criterion, train_data_loader, train_data_loader, epochs=5) -------------------------------------------------------------------------------- /train.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import torch.optim as optim 4 | import torch.utils.data 5 | import torch.nn.functional as F 6 | 7 | def train(model, optimizer, loss_fn, train_loader, val_loader, epochs=20, device="cpu"): 8 | for epoch in range(epochs): 9 | training_loss = 0.0 10 | valid_loss = 0.0 11 | model.train() 12 | for batch in train_loader: 13 | optimizer.zero_grad() 14 | inputs, targets = batch 15 | inputs = inputs.to(device) 16 | targets = targets.to(device) 17 | output = model(inputs) 18 | loss = loss_fn(output, targets) 19 | loss.backward() 20 | optimizer.step() 21 | training_loss += loss.data.item() * inputs.size(0) 22 | training_loss /= len(train_loader.dataset) 23 | 24 | model.eval() 25 | num_correct = 0 26 | num_examples = 0 27 | for batch in val_loader: 28 | inputs, targets = batch 29 | inputs = inputs.to(device) 30 | output = model(inputs) 31 | targets = targets.to(device) 32 | loss = loss_fn(output,targets) 33 | valid_loss += loss.data.item() * inputs.size(0) 34 | correct = torch.eq(torch.max(F.softmax(output), dim=1)[1], targets).view(-1) 35 | num_correct += torch.sum(correct).item() 36 | num_examples += correct.shape[0] 37 | valid_loss /= len(val_loader.dataset) 38 | 39 | print('Epoch: {}, Training Loss: {:.2f}, Validation Loss: {:.2f}, accuracy = {:.2f}'.format(epoch, training_loss, 40 | valid_loss, num_correct / num_examples)) 41 | 42 | def find_lr(model, loss_fn, optimizer, train_loader, init_value=1e-8, final_value=10.0, device="cpu"): 43 | number_in_epoch = len(train_loader) - 1 44 | update_step = (final_value / init_value) ** (1 / number_in_epoch) 45 | lr = init_value 46 | optimizer.param_groups[0]["lr"] = lr 47 | best_loss = 0.0 48 | batch_num = 0 49 | losses = [] 50 | log_lrs = [] 51 | for data in train_loader: 52 | batch_num += 1 53 | inputs, targets = data 54 | inputs = inputs.to(device) 55 | targets = targets.to(device) 56 | optimizer.zero_grad() 57 | outputs = model(inputs) 58 | loss = loss_fn(outputs, targets) 59 | 60 | # Crash out if loss explodes 61 | 62 | if batch_num > 1 and loss > 4 * best_loss: 63 | if(len(log_lrs) > 20): 64 | return log_lrs[10:-5], losses[10:-5] 65 | else: 66 | return log_lrs, losses 67 | 68 | # Record the best loss 69 | 70 | if loss < best_loss or batch_num == 1: 71 | best_loss = loss 72 | 73 | # Store the values 74 | losses.append(loss.item()) 75 | log_lrs.append((lr)) 76 | 77 | # Do the backward pass and optimize 78 | 79 | loss.backward() 80 | optimizer.step() 81 | 82 | # Update the lr for the next step and store 83 | 84 | lr *= update_step 85 | optimizer.param_groups[0]["lr"] = lr 86 | if(len(log_lrs) > 20): 87 | return log_lrs[10:-5], losses[10:-5] 88 | else: 89 | return log_lrs, losses -------------------------------------------------------------------------------- /chapter9/fastai.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# fast.ai ULMFiT" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": null, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "from fastai.text import *" 17 | ] 18 | }, 19 | { 20 | "cell_type": "code", 21 | "execution_count": null, 22 | "metadata": {}, 23 | "outputs": [], 24 | "source": [ 25 | "twitter_data_path = \".\"" 26 | ] 27 | }, 28 | { 29 | "cell_type": "code", 30 | "execution_count": null, 31 | "metadata": {}, 32 | "outputs": [], 33 | "source": [ 34 | "data_lm = (TextList\n", 35 | " .from_csv(\"./twitter-data/\", 'train-processed.csv', cols=5)\n", 36 | " .split_by_rand_pct()\n", 37 | " .label_for_lm()\n", 38 | " .databunch(bs=32))" 39 | ] 40 | }, 41 | { 42 | "cell_type": "code", 43 | "execution_count": null, 44 | "metadata": {}, 45 | "outputs": [], 46 | "source": [ 47 | "batchsize = 32" 48 | ] 49 | }, 50 | { 51 | "cell_type": "code", 52 | "execution_count": null, 53 | "metadata": {}, 54 | "outputs": [], 55 | "source": [ 56 | "learn = language_model_learner(data_lm, AWD_LSTM, drop_mult=0.3)" 57 | ] 58 | }, 59 | { 60 | "cell_type": "code", 61 | "execution_count": null, 62 | "metadata": {}, 63 | "outputs": [], 64 | "source": [ 65 | "learn.lr_find()" 66 | ] 67 | }, 68 | { 69 | "cell_type": "code", 70 | "execution_count": null, 71 | "metadata": {}, 72 | "outputs": [], 73 | "source": [ 74 | "learn.recorder.plot()" 75 | ] 76 | }, 77 | { 78 | "cell_type": "code", 79 | "execution_count": null, 80 | "metadata": {}, 81 | "outputs": [], 82 | "source": [ 83 | "learn.fit_one_cycle(10, 1e-2)" 84 | ] 85 | }, 86 | { 87 | "cell_type": "code", 88 | "execution_count": null, 89 | "metadata": {}, 90 | "outputs": [], 91 | "source": [ 92 | "learn.save_encoder('fine_tuned_enc')" 93 | ] 94 | }, 95 | { 96 | "cell_type": "code", 97 | "execution_count": null, 98 | "metadata": {}, 99 | "outputs": [], 100 | "source": [ 101 | "data_class = (TextList\n", 102 | " .from_csv(twitter_data_path, 'train-processed.csv', cols=5, vocab=data_lm.vocab)\n", 103 | " .split_by_rand_pct()\n", 104 | " .label_from_df(cols=0)\n", 105 | " .databunch())" 106 | ] 107 | }, 108 | { 109 | "cell_type": "code", 110 | "execution_count": null, 111 | "metadata": {}, 112 | "outputs": [], 113 | "source": [ 114 | "twitter_classifer_learner = text_classifier_learner(data_class, AWD_LSTM, drop_mult=0.5)\n", 115 | "twitter_classifer_learner.load_encoder('fine_tuned_enc')" 116 | ] 117 | }, 118 | { 119 | "cell_type": "code", 120 | "execution_count": null, 121 | "metadata": {}, 122 | "outputs": [], 123 | "source": [ 124 | "twitter_classifer_learner.lr_find()" 125 | ] 126 | }, 127 | { 128 | "cell_type": "code", 129 | "execution_count": null, 130 | "metadata": {}, 131 | "outputs": [], 132 | "source": [ 133 | "twitter_classifer_learner.recorder.plot()" 134 | ] 135 | }, 136 | { 137 | "cell_type": "code", 138 | "execution_count": null, 139 | "metadata": {}, 140 | "outputs": [], 141 | "source": [ 142 | "twitter_classifer_learner.fit_one_cycle(5, 1e-3)" 143 | ] 144 | }, 145 | { 146 | "cell_type": "code", 147 | "execution_count": null, 148 | "metadata": {}, 149 | "outputs": [], 150 | "source": [ 151 | "twitter_classifer_learner.freeze_to(-2)" 152 | ] 153 | }, 154 | { 155 | "cell_type": "code", 156 | "execution_count": null, 157 | "metadata": {}, 158 | "outputs": [], 159 | "source": [ 160 | "twitter_classifer_learner.fit_one_cycle(1, 1e-3)" 161 | ] 162 | }, 163 | { 164 | "cell_type": "code", 165 | "execution_count": null, 166 | "metadata": {}, 167 | "outputs": [], 168 | "source": [] 169 | } 170 | ], 171 | "metadata": { 172 | "kernelspec": { 173 | "display_name": "Python 3", 174 | "language": "python", 175 | "name": "python3" 176 | }, 177 | "language_info": { 178 | "codemirror_mode": { 179 | "name": "ipython", 180 | "version": 3 181 | }, 182 | "file_extension": ".py", 183 | "mimetype": "text/x-python", 184 | "name": "python", 185 | "nbconvert_exporter": "python", 186 | "pygments_lexer": "ipython3", 187 | "version": "3.6.8" 188 | } 189 | }, 190 | "nbformat": 4, 191 | "nbformat_minor": 2 192 | } 193 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # PyTorch for Deep Learning: Creating and Deploying Deep Learning Applications 2 | 3 | Repository for scripts and notebooks from the book: Programming PyTorch for Deep Learning: Creating and Deploying Deep Learning Applications 4 | 5 | ## Download of dataset for chapter 2 ([download.py](https://github.com/falloutdurham/beginners-pytorch-deep-learning/blob/master/chapter2/download.py)) 6 | 7 | Since some links are broken meanwhile, you can also find a downloadable version of the image dataset here (zip file): https://drive.google.com/file/d/16h8E7dnj5TpxF_ex4vF2do20iMWziM70 8 | 9 | ## Updates 10 | 11 | * 2020/05/25: Chapter 9.75 — Image Self-Supervised Learning 12 | 13 | * 2020/03/01: Chapter 9.5 - Text Generation With GPT-2 And (only) PyTorch, or Semi/Self-Supervision Learning Part 1 (Letters To Charlotte) 14 | 15 | * 2020/05/03: Chapter 7.5 - Quantizing Models 16 | 17 | ________________ 18 | 19 | # Deutschsprachige Ausgabe 20 | ## PyTorch für Deep Learning: Anwendungen für Bild-, Ton- und Textdaten entwickeln und deployen 21 | 22 | --> [https://dpunkt.de/produkt/pytorch-fuer-deep-learning/]() 23 | 24 | ## Hinweis zum Download des Datensatzes in Kapitel 2 ([download.py](https://github.com/falloutdurham/beginners-pytorch-deep-learning/blob/master/chapter2/download.py)) 25 | 26 | Da einige URLs inzwischen leider veraltet sind, stehen Ihnen die Bilddateien zusätzlich als Download (Zip-Datei) bereit: https://drive.google.com/file/d/16h8E7dnj5TpxF_ex4vF2do20iMWziM70 27 | 28 | ## Installationshinweise 29 | 30 | - [Python - Downloads und Dokumentation](https://www.python.org/) 31 | - [Anaconda - Dokumentation mit Installationshinweisen](https://docs.anaconda.com/anaconda/) 32 | - [pip - Installationshinweise](https://pypi.org/project/pip/) 33 | - [PyTorch - Installationshinweise](https://pytorch.org/get-started/locally/) 34 | + falls Installation nicht mit `conda env create --file environment.yml`/`pip3 install -r requirements.txt 35 | /requirements_cuda_available.txt` erfolgt; ansonsten siehe Abschnitt [_Versionskontrolle_](#Versionskontrolle) 36 | - [Jupyter Notebook / JupyterLab - Installation und Dokumentation](https://jupyter.org/) 37 | - [Google Colaboratory - Einführung und weitergehende Hinweise insb. zum Einlesen von Daten](https://colab.research.google.com/notebooks/intro.ipynb) 38 | - [Github - Forken und Klonen eines Repositorys](https://docs.github.com/en/free-pro-team@latest/github/getting-started-with-github/fork-a-repo) 39 | 40 | ## Versionskontrolle 41 | 42 | Nachdem Sie das Github-Repository lokal geklont (bzw. zuvor geforkt) haben! 43 | 44 | ### Conda 45 | 46 | 1.) Wechseln Sie zunächst in den Zielordner (`cd beginners-pytorch-deep-learning`), erstellen Sie dann eine (lokale) virtuelle Umgebung und installieren Sie die benötigten Bibliotheken und Pakete: 47 | 48 | `conda env create --file environment.yml` 49 | 50 | 2.) Anschließend aktivieren Sie die virtuelle Umgebung: 51 | 52 | `conda activate myenv` 53 | 54 | 3.) Zum Deaktivieren nutzen Sie den Befehl: 55 | 56 | `conda deactivate` 57 | 58 | ### pip 59 | 60 | 1.) Wechseln Sie zunächst in den Zielordner (`cd beginners-pytorch-deep-learning`) und erstellen Sie anschließend eine 61 | virtuelle Umgebung: 62 | 63 | `python3 -m venv myenv` 64 | 65 | 2.) Aktivieren Sie die virtuelle Umgebung (https://docs.python.org/3/library/venv.html): 66 | 67 | `source myenv/bin/activate` (Ubuntu/Mac) 68 | `myenv\Scripts\activate.bat` (Windows) 69 | 70 | 3.) Erstellen Sie eine (lokale) virtuelle Umgebung und installieren Sie die benötigten Bibliotheken und Pakete: 71 | 72 | `pip3 install -r requirements.txt` 73 | 74 | 75 | 4.) Zum Deaktivieren nutzen Sie den Befehl: 76 | 77 | `deactivate` 78 | 79 | ### Bei Nutzung von Jupyter Notebook 80 | 81 | 1.) Zunächst müssen Sie Jupyter Notebook installieren: 82 | 83 | `conda install -c conda-forge notebook` oder `pip3 install notebook` 84 | 85 | 2.) Nach Aktivierung Ihrer virtuellen Umgebung (s.o.) geben Sie den folgenden Befehl in Ihre Kommandozeile ein, um die 86 | `ipykernel`-Bibliothek herunterzuladen: 87 | 88 | `conda install ipykernel` oder `pip3 install ipykernel` 89 | 90 | 3.) Installieren Sie einen Kernel mit Ihrer virtuellen Umgebung: 91 | 92 | `ipython kernel install --user --name=myenv` 93 | 94 | 4.) Starten Sie Jupyter Notebook: 95 | 96 | `jupyter notebook` 97 | 98 | 5.) Nach Öffnen des Jupyter-Notebook-Startbildschirms wählen Sie auf der rechten Seite das Feld _New_ (bzw. in der 99 | Notebook-Ansischt den Reiter _Kernel_/_Change Kernel_) und wählen Sie _myenv_ aus. 100 | 101 | ### Google Colaboratory 102 | 103 | Hier stehen Ihnen hier für mehrere Stunden leistungsfähige GPUs zur Verfügung, die das Training der Modelle merklich beschleunigen können. In Google Colab stehen Ihnen standardmäßig einige Pakete bereits vorinstalliert zur Verfügung. Da sich 104 | Neuinstallationen immer nur auf ein Notebook beziehen, können Sie von einer Einrichtung einer virtuellen Umgebung 105 | absehen und direkt die Pakete durch Ausführen der Zellen bzw. Zeilen, in denen ein **!** vorangestellt ist, installieren. 106 | -------------------------------------------------------------------------------- /chapter9/Fast_bert_.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "collapsed": true, 7 | "pycharm": { 8 | "name": "#%% md\n" 9 | } 10 | }, 11 | "source": [ 12 | "## FastBERT" 13 | ] 14 | }, 15 | { 16 | "cell_type": "code", 17 | "execution_count": null, 18 | "outputs": [], 19 | "source": [ 20 | "!pip install fast-bert" 21 | ], 22 | "metadata": { 23 | "collapsed": false, 24 | "pycharm": { 25 | "name": "#%%\n" 26 | } 27 | } 28 | }, 29 | { 30 | "cell_type": "code", 31 | "execution_count": null, 32 | "outputs": [], 33 | "source": [ 34 | "import logging\n", 35 | "import numpy as np\n", 36 | "import pandas as pd\n", 37 | "import torch\n", 38 | "\n", 39 | "from transformers import BertTokenizer\n", 40 | "from fast_bert.data_cls import BertDataBunch\n", 41 | "from fast_bert.learner_cls import BertLearner\n", 42 | "from fast_bert.metrics import accuracy" 43 | ], 44 | "metadata": { 45 | "collapsed": false, 46 | "pycharm": { 47 | "name": "#%%\n" 48 | } 49 | } 50 | }, 51 | { 52 | "cell_type": "markdown", 53 | "source": [ 54 | "Before execution create directories with `mkdir twitterdata labels`\n", 55 | "\n", 56 | "Then set paths:" 57 | ], 58 | "metadata": { 59 | "collapsed": false 60 | } 61 | }, 62 | { 63 | "cell_type": "code", 64 | "execution_count": null, 65 | "outputs": [], 66 | "source": [ 67 | "PATH_TO_DATA = \"./twitterdata/\"\n", 68 | "PATH_TO_LABELS = \"./labels/\"\n", 69 | "OUTPUT_DIR = \"./\"" 70 | ], 71 | "metadata": { 72 | "collapsed": false, 73 | "pycharm": { 74 | "name": "#%%\n" 75 | } 76 | } 77 | }, 78 | { 79 | "cell_type": "markdown", 80 | "source": [ 81 | "Read relevant data from Chapter 5, split data set (60/20/20) and save data sets as csv" 82 | ], 83 | "metadata": { 84 | "collapsed": false 85 | } 86 | }, 87 | { 88 | "cell_type": "code", 89 | "execution_count": null, 90 | "outputs": [], 91 | "source": [ 92 | "df = pd.read_csv('../chapter5/train-processed.csv', encoding='latin-1')\n", 93 | "df = df.drop(df.columns[[0, 1, 2, 3, 4, 6]], axis=1)\n", 94 | "df.columns = ['text', 'label']\n", 95 | "\n", 96 | "# https://stackoverflow.com/questions/38250710/\n", 97 | "# how-to-split-data-into-3-sets-train-validation-and-test/38251213#38251213\n", 98 | "np.random.seed(0)\n", 99 | "train, valid, test = \\\n", 100 | " np.split(df.sample(frac=1), [int(.6*len(df)), int(.8*len(df))])\n", 101 | "\n", 102 | "train.to_csv('./twitterdata/train.csv', index=False)\n", 103 | "valid.to_csv('./twitterdata/valid.csv', index=False)\n", 104 | "test.to_csv('./twitterdata/test.csv', index=False)" 105 | ], 106 | "metadata": { 107 | "collapsed": false, 108 | "pycharm": { 109 | "name": "#%%\n" 110 | } 111 | } 112 | }, 113 | { 114 | "cell_type": "markdown", 115 | "source": [ 116 | "Get labels and save them in separate directory `labels`/`PATH_TO_LABELS` as csv" 117 | ], 118 | "metadata": { 119 | "collapsed": false 120 | } 121 | }, 122 | { 123 | "cell_type": "code", 124 | "source": [ 125 | "labels = pd.DataFrame(df.label.unique())\n", 126 | "labels.to_csv(\"./labels/labels.csv\", header=False, index=False)" 127 | ], 128 | "metadata": { 129 | "collapsed": false, 130 | "pycharm": { 131 | "name": "#%%\n" 132 | } 133 | }, 134 | "execution_count": null, 135 | "outputs": [] 136 | }, 137 | { 138 | "cell_type": "markdown", 139 | "source": [ 140 | "Define and train model" 141 | ], 142 | "metadata": { 143 | "collapsed": false, 144 | "pycharm": { 145 | "name": "#%% md\n" 146 | } 147 | } 148 | }, 149 | { 150 | "cell_type": "code", 151 | "source": [ 152 | "device = torch.device('cuda')\n", 153 | "logger = logging.getLogger()\n", 154 | "metrics = [{'name': 'accuracy', 'function': accuracy}]\n", 155 | "\n", 156 | "tokenizer = BertTokenizer.from_pretrained('bert-base-uncased',\n", 157 | " do_lower_case=True)\n", 158 | "\n", 159 | "databunch = BertDataBunch(PATH_TO_DATA,\n", 160 | " PATH_TO_LABELS,\n", 161 | " tokenizer,\n", 162 | " train_file=\"train.csv\",\n", 163 | " val_file=\"valid.csv\",\n", 164 | " test_data=\"test.csv\",\n", 165 | " text_col=0, label_col=1,\n", 166 | " batch_size_per_gpu=32,\n", 167 | " max_seq_length=140,\n", 168 | " multi_gpu=False,\n", 169 | " multi_label=False,\n", 170 | " model_type=\"bert\")\n", 171 | "\n", 172 | "learner = BertLearner.from_pretrained_model(databunch,\n", 173 | " 'bert-base-uncased',\n", 174 | " metrics=metrics,\n", 175 | " device=device,\n", 176 | " logger=logger,\n", 177 | " output_dir=OUTPUT_DIR,\n", 178 | " is_fp16=False,\n", 179 | " multi_gpu=False,\n", 180 | " multi_label=False)\n", 181 | "\n", 182 | "learner.fit(3, lr=1e-2)\n" 183 | ], 184 | "metadata": { 185 | "collapsed": false, 186 | "pycharm": { 187 | "name": "#%%\n" 188 | } 189 | }, 190 | "execution_count": null, 191 | "outputs": [] 192 | }, 193 | { 194 | "cell_type": "markdown", 195 | "source": [ 196 | "\n" 197 | ], 198 | "metadata": { 199 | "collapsed": false 200 | } 201 | } 202 | ], 203 | "metadata": { 204 | "kernelspec": { 205 | "display_name": "Python 3", 206 | "language": "python", 207 | "name": "python3" 208 | }, 209 | "language_info": { 210 | "codemirror_mode": { 211 | "name": "ipython", 212 | "version": 2 213 | }, 214 | "file_extension": ".py", 215 | "mimetype": "text/x-python", 216 | "name": "python", 217 | "nbconvert_exporter": "python", 218 | "pygments_lexer": "ipython2", 219 | "version": "2.7.6" 220 | } 221 | }, 222 | "nbformat": 4, 223 | "nbformat_minor": 0 224 | } -------------------------------------------------------------------------------- /chapter3/Chapter 3.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Chapter 3: Convolutional Neural Networks\n", 8 | "You'll need to either copy the image training files you downloaded in Chapter 2 to this directory, or alter the paths appropriately." 9 | ] 10 | }, 11 | { 12 | "cell_type": "code", 13 | "execution_count": null, 14 | "metadata": {}, 15 | "outputs": [], 16 | "source": [ 17 | "import torch\n", 18 | "import torch.nn as nn\n", 19 | "import torch.optim as optim\n", 20 | "import torch.utils.data\n", 21 | "import torch.nn.functional as F\n", 22 | "import torchvision\n", 23 | "from torchvision import transforms\n", 24 | "from PIL import Image" 25 | ] 26 | }, 27 | { 28 | "cell_type": "markdown", 29 | "metadata": {}, 30 | "source": [ 31 | "## CNNNet (or AlexNet by another name…)" 32 | ] 33 | }, 34 | { 35 | "cell_type": "code", 36 | "execution_count": null, 37 | "metadata": {}, 38 | "outputs": [], 39 | "source": [ 40 | "class CNNNet(nn.Module):\n", 41 | "\n", 42 | " def __init__(self, num_classes=2):\n", 43 | " super(CNNNet, self).__init__()\n", 44 | " self.features = nn.Sequential(\n", 45 | " nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2),\n", 46 | " nn.ReLU(),\n", 47 | " nn.MaxPool2d(kernel_size=3, stride=2),\n", 48 | " nn.Conv2d(64, 192, kernel_size=5, padding=2),\n", 49 | " nn.ReLU(),\n", 50 | " nn.MaxPool2d(kernel_size=3, stride=2),\n", 51 | " nn.Conv2d(192, 384, kernel_size=3, padding=1),\n", 52 | " nn.ReLU(),\n", 53 | " nn.Conv2d(384, 256, kernel_size=3, padding=1),\n", 54 | " nn.ReLU(),\n", 55 | " nn.Conv2d(256, 256, kernel_size=3, padding=1),\n", 56 | " nn.ReLU(),\n", 57 | " nn.MaxPool2d(kernel_size=3, stride=2),\n", 58 | " )\n", 59 | " self.avgpool = nn.AdaptiveAvgPool2d((6, 6))\n", 60 | " self.classifier = nn.Sequential(\n", 61 | " nn.Dropout(),\n", 62 | " nn.Linear(256 * 6 * 6, 4096),\n", 63 | " nn.ReLU(),\n", 64 | " nn.Dropout(),\n", 65 | " nn.Linear(4096, 4096),\n", 66 | " nn.ReLU(),\n", 67 | " nn.Linear(4096, num_classes)\n", 68 | " )\n", 69 | " \n", 70 | " def forward(self, x):\n", 71 | " x = self.features(x)\n", 72 | " x = self.avgpool(x)\n", 73 | " x = torch.flatten(x, 1)\n", 74 | " x = self.classifier(x)\n", 75 | " return x" 76 | ] 77 | }, 78 | { 79 | "cell_type": "code", 80 | "execution_count": null, 81 | "metadata": {}, 82 | "outputs": [], 83 | "source": [ 84 | "cnnnet = CNNNet()" 85 | ] 86 | }, 87 | { 88 | "cell_type": "code", 89 | "execution_count": null, 90 | "metadata": {}, 91 | "outputs": [], 92 | "source": [ 93 | "def train(model, optimizer, loss_fn, train_loader, val_loader, epochs=20, device=\"cpu\"):\n", 94 | " for epoch in range(1, epochs+1):\n", 95 | " training_loss = 0.0\n", 96 | " valid_loss = 0.0\n", 97 | " model.train()\n", 98 | " for batch in train_loader:\n", 99 | " optimizer.zero_grad()\n", 100 | " inputs, targets = batch\n", 101 | " inputs = inputs.to(device)\n", 102 | " targets = targets.to(device)\n", 103 | " output = model(inputs)\n", 104 | " loss = loss_fn(output, targets)\n", 105 | " loss.backward()\n", 106 | " optimizer.step()\n", 107 | " training_loss += loss.data.item() * inputs.size(0)\n", 108 | " training_loss /= len(train_loader.dataset)\n", 109 | " \n", 110 | " model.eval()\n", 111 | " num_correct = 0 \n", 112 | " num_examples = 0\n", 113 | " for batch in val_loader:\n", 114 | " inputs, targets = batch\n", 115 | " inputs = inputs.to(device)\n", 116 | " output = model(inputs)\n", 117 | " targets = targets.to(device)\n", 118 | " loss = loss_fn(output,targets) \n", 119 | " valid_loss += loss.data.item() * inputs.size(0)\n", 120 | " correct = torch.eq(torch.max(F.softmax(output, dim=1), dim=1)[1],\n", 121 | " targets)\n", 122 | " num_correct += torch.sum(correct).item()\n", 123 | " num_examples += correct.shape[0]\n", 124 | " valid_loss /= len(val_loader.dataset)\n", 125 | "\n", 126 | " print('Epoch: {}, Training Loss: {:.2f}, Validation Loss: {:.2f}, accuracy = {:.2f}'.format(epoch, training_loss,\n", 127 | " valid_loss, num_correct / num_examples))" 128 | ] 129 | }, 130 | { 131 | "cell_type": "code", 132 | "execution_count": null, 133 | "metadata": {}, 134 | "outputs": [], 135 | "source": [ 136 | "def check_image(path):\n", 137 | " try:\n", 138 | " im = Image.open(path)\n", 139 | " return True\n", 140 | " except:\n", 141 | " return False\n", 142 | "\n", 143 | "img_transforms = transforms.Compose([\n", 144 | " transforms.Resize((64,64)), \n", 145 | " transforms.ToTensor(),\n", 146 | " transforms.Normalize(mean=[0.485, 0.456, 0.406],\n", 147 | " std=[0.229, 0.224, 0.225])\n", 148 | " ])\n", 149 | "train_data_path = \"./train/\"\n", 150 | "train_data = torchvision.datasets.ImageFolder(root=train_data_path,transform=img_transforms, is_valid_file=check_image)\n", 151 | "val_data_path = \"./val/\"\n", 152 | "val_data = torchvision.datasets.ImageFolder(root=val_data_path,transform=img_transforms, is_valid_file=check_image)\n", 153 | "batch_size=64\n", 154 | "train_data_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size,shuffle=True)\n", 155 | "val_data_loader = torch.utils.data.DataLoader(val_data, batch_size=batch_size, shuffle=True)\n", 156 | "\n", 157 | "if torch.cuda.is_available():\n", 158 | " device = torch.device(\"cuda\") \n", 159 | "else:\n", 160 | " device = torch.device(\"cpu\")" 161 | ] 162 | }, 163 | { 164 | "cell_type": "code", 165 | "execution_count": null, 166 | "metadata": {}, 167 | "outputs": [], 168 | "source": [ 169 | "cnnnet.to(device)\n", 170 | "optimizer = optim.Adam(cnnnet.parameters(), lr=0.001)" 171 | ] 172 | }, 173 | { 174 | "cell_type": "code", 175 | "execution_count": null, 176 | "metadata": {}, 177 | "outputs": [], 178 | "source": [ 179 | "train(cnnnet, optimizer,torch.nn.CrossEntropyLoss(), train_data_loader,val_data_loader, epochs=10, device=device)" 180 | ] 181 | }, 182 | { 183 | "cell_type": "markdown", 184 | "metadata": {}, 185 | "source": [ 186 | "## Downloading a pretrained network \n", 187 | "\n", 188 | "There are two ways of downloading pre-trained image models with PyTorch. Firstly, you can use the `torchvision.models` library, or you can use PyTorch Hub. The latter is preferred as of 2019, as this is a one-stop shop for all models and the new standard for distributing models with PyTorch." 189 | ] 190 | }, 191 | { 192 | "cell_type": "code", 193 | "execution_count": null, 194 | "metadata": {}, 195 | "outputs": [], 196 | "source": [ 197 | "import torchvision.models as models\n", 198 | "alexnet = models.alexnet(num_classes=1000, pretrained=True)" 199 | ] 200 | }, 201 | { 202 | "cell_type": "code", 203 | "execution_count": null, 204 | "metadata": {}, 205 | "outputs": [], 206 | "source": [ 207 | "resnet50 = torch.hub.load('pytorch/vision', 'resnet50')" 208 | ] 209 | }, 210 | { 211 | "cell_type": "code", 212 | "execution_count": null, 213 | "metadata": {}, 214 | "outputs": [], 215 | "source": [ 216 | "print(alexnet)" 217 | ] 218 | }, 219 | { 220 | "cell_type": "code", 221 | "execution_count": null, 222 | "metadata": {}, 223 | "outputs": [], 224 | "source": [ 225 | "print(resnet50)" 226 | ] 227 | }, 228 | { 229 | "cell_type": "code", 230 | "execution_count": null, 231 | "metadata": {}, 232 | "outputs": [], 233 | "source": [] 234 | } 235 | ], 236 | "metadata": { 237 | "kernelspec": { 238 | "display_name": "Python 3", 239 | "language": "python", 240 | "name": "python3" 241 | }, 242 | "language_info": { 243 | "codemirror_mode": { 244 | "name": "ipython", 245 | "version": 3 246 | }, 247 | "file_extension": ".py", 248 | "mimetype": "text/x-python", 249 | "name": "python", 250 | "nbconvert_exporter": "python", 251 | "pygments_lexer": "ipython3", 252 | "version": "3.6.8" 253 | } 254 | }, 255 | "nbformat": 4, 256 | "nbformat_minor": 2 257 | } -------------------------------------------------------------------------------- /chapter2/Chapter 2.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Chapter 2: Our First Model" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": null, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "import torch\n", 17 | "import torch.nn as nn\n", 18 | "import torch.optim as optim\n", 19 | "import torch.utils.data\n", 20 | "import torch.nn.functional as F\n", 21 | "import torchvision\n", 22 | "from torchvision import transforms\n", 23 | "from PIL import Image, ImageFile\n", 24 | "\n", 25 | "ImageFile.LOAD_TRUNCATED_IMAGES=True" 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "metadata": {}, 31 | "source": [ 32 | "## Setting up DataLoaders\n", 33 | "\n", 34 | "We'll use the built-in dataset of `torchvision.datasets.ImageFolder` to quickly set up some dataloaders of downloaded cat and fish images. \n", 35 | "\n", 36 | "`check_image` is a quick little function that is passed to the `is_valid_file` parameter in the ImageFolder and will do a sanity check to make sure PIL can actually open the file. We're going to use this in lieu of cleaning up the downloaded dataset.\n" 37 | ] 38 | }, 39 | { 40 | "cell_type": "code", 41 | "execution_count": null, 42 | "metadata": {}, 43 | "outputs": [], 44 | "source": [ 45 | "def check_image(path):\n", 46 | " try:\n", 47 | " im = Image.open(path)\n", 48 | " return True\n", 49 | " except:\n", 50 | " return False" 51 | ] 52 | }, 53 | { 54 | "cell_type": "markdown", 55 | "metadata": {}, 56 | "source": [ 57 | "Set up the transforms for every image:\n", 58 | "\n", 59 | "* Resize to 64x64\n", 60 | "* Convert to tensor\n", 61 | "* Normalize using ImageNet mean & std\n" 62 | ] 63 | }, 64 | { 65 | "cell_type": "code", 66 | "execution_count": null, 67 | "metadata": {}, 68 | "outputs": [], 69 | "source": [ 70 | "img_transforms = transforms.Compose([\n", 71 | " transforms.Resize((64,64)), \n", 72 | " transforms.ToTensor(),\n", 73 | " transforms.Normalize(mean=[0.485, 0.456, 0.406],\n", 74 | " std=[0.229, 0.224, 0.225] )\n", 75 | " ])\n", 76 | "\n" 77 | ] 78 | }, 79 | { 80 | "cell_type": "code", 81 | "execution_count": null, 82 | "metadata": {}, 83 | "outputs": [], 84 | "source": [ 85 | "train_data_path = \"./train/\"\n", 86 | "train_data = torchvision.datasets.ImageFolder(root=train_data_path,transform=img_transforms, is_valid_file=check_image)" 87 | ] 88 | }, 89 | { 90 | "cell_type": "code", 91 | "execution_count": null, 92 | "metadata": {}, 93 | "outputs": [], 94 | "source": [ 95 | "val_data_path = \"./val/\"\n", 96 | "val_data = torchvision.datasets.ImageFolder(root=val_data_path,transform=img_transforms, is_valid_file=check_image)" 97 | ] 98 | }, 99 | { 100 | "cell_type": "code", 101 | "execution_count": null, 102 | "metadata": {}, 103 | "outputs": [], 104 | "source": [ 105 | "test_data_path = \"./test/\"\n", 106 | "test_data = torchvision.datasets.ImageFolder(root=test_data_path,transform=img_transforms, is_valid_file=check_image) " 107 | ] 108 | }, 109 | { 110 | "cell_type": "code", 111 | "execution_count": null, 112 | "metadata": {}, 113 | "outputs": [], 114 | "source": [ 115 | "batch_size=64" 116 | ] 117 | }, 118 | { 119 | "cell_type": "code", 120 | "execution_count": null, 121 | "metadata": {}, 122 | "outputs": [], 123 | "source": [ 124 | "train_data_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size)\n", 125 | "val_data_loader = torch.utils.data.DataLoader(val_data, batch_size=batch_size) \n", 126 | "test_data_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size) " 127 | ] 128 | }, 129 | { 130 | "cell_type": "markdown", 131 | "metadata": {}, 132 | "source": [ 133 | "## Our First Model, SimpleNet\n", 134 | "\n", 135 | "SimpleNet is a very simple combination of three Linear layers and ReLu activations between them. Note that as we don't do a `softmax()` in our `forward()`, we will need to make sure we do it in our training function during the validation phase." 136 | ] 137 | }, 138 | { 139 | "cell_type": "code", 140 | "execution_count": null, 141 | "metadata": {}, 142 | "outputs": [], 143 | "source": [ 144 | "class SimpleNet(nn.Module):\n", 145 | "\n", 146 | " def __init__(self):\n", 147 | " super(SimpleNet, self).__init__()\n", 148 | " self.fc1 = nn.Linear(12288, 84)\n", 149 | " self.fc2 = nn.Linear(84, 50)\n", 150 | " self.fc3 = nn.Linear(50,2)\n", 151 | " \n", 152 | " def forward(self, x):\n", 153 | " x = x.view(-1, 12288)\n", 154 | " x = F.relu(self.fc1(x))\n", 155 | " x = F.relu(self.fc2(x))\n", 156 | " x = self.fc3(x)\n", 157 | " return x" 158 | ] 159 | }, 160 | { 161 | "cell_type": "code", 162 | "execution_count": null, 163 | "metadata": {}, 164 | "outputs": [], 165 | "source": [ 166 | "simplenet = SimpleNet()" 167 | ] 168 | }, 169 | { 170 | "cell_type": "markdown", 171 | "metadata": {}, 172 | "source": [ 173 | "## Create an optimizer\n", 174 | "\n", 175 | "Here, we're just using Adam as our optimizer with a learning rate of 0.001." 176 | ] 177 | }, 178 | { 179 | "cell_type": "code", 180 | "execution_count": null, 181 | "metadata": {}, 182 | "outputs": [], 183 | "source": [ 184 | "optimizer = optim.Adam(simplenet.parameters(), lr=0.001)" 185 | ] 186 | }, 187 | { 188 | "cell_type": "markdown", 189 | "metadata": {}, 190 | "source": [ 191 | "## Copy the model to GPU\n", 192 | "\n", 193 | "Copy the model to the GPU if available." 194 | ] 195 | }, 196 | { 197 | "cell_type": "code", 198 | "execution_count": null, 199 | "metadata": {}, 200 | "outputs": [], 201 | "source": [ 202 | "if torch.cuda.is_available():\n", 203 | " device = torch.device(\"cuda\") \n", 204 | "else:\n", 205 | " device = torch.device(\"cpu\")\n", 206 | "\n", 207 | "simplenet.to(device)" 208 | ] 209 | }, 210 | { 211 | "cell_type": "markdown", 212 | "metadata": {}, 213 | "source": [ 214 | "## Training \n", 215 | "\n", 216 | "Trains the model, copying batches to the GPU if required, calculating losses, optimizing the network and perform validation for each epoch." 217 | ] 218 | }, 219 | { 220 | "cell_type": "code", 221 | "execution_count": null, 222 | "metadata": {}, 223 | "outputs": [], 224 | "source": [ 225 | "def train(model, optimizer, loss_fn, train_loader, val_loader, epochs=20, device=\"cpu\"):\n", 226 | " for epoch in range(1, epochs+1):\n", 227 | " training_loss = 0.0\n", 228 | " valid_loss = 0.0\n", 229 | " model.train()\n", 230 | " for batch in train_loader:\n", 231 | " optimizer.zero_grad()\n", 232 | " inputs, targets = batch\n", 233 | " inputs = inputs.to(device)\n", 234 | " targets = targets.to(device)\n", 235 | " output = model(inputs)\n", 236 | " loss = loss_fn(output, targets)\n", 237 | " loss.backward()\n", 238 | " optimizer.step()\n", 239 | " training_loss += loss.data.item() * inputs.size(0)\n", 240 | " training_loss /= len(train_loader.dataset)\n", 241 | " \n", 242 | " model.eval()\n", 243 | " num_correct = 0 \n", 244 | " num_examples = 0\n", 245 | " for batch in val_loader:\n", 246 | " inputs, targets = batch\n", 247 | " inputs = inputs.to(device)\n", 248 | " output = model(inputs)\n", 249 | " targets = targets.to(device)\n", 250 | " loss = loss_fn(output,targets) \n", 251 | " valid_loss += loss.data.item() * inputs.size(0)\n", 252 | " correct = torch.eq(torch.max(F.softmax(output, dim=1), dim=1)[1], targets)\n", 253 | " num_correct += torch.sum(correct).item()\n", 254 | " num_examples += correct.shape[0]\n", 255 | " valid_loss /= len(val_loader.dataset)\n", 256 | "\n", 257 | " print('Epoch: {}, Training Loss: {:.2f}, Validation Loss: {:.2f}, accuracy = {:.2f}'.format(epoch, training_loss,\n", 258 | " valid_loss, num_correct / num_examples))" 259 | ] 260 | }, 261 | { 262 | "cell_type": "code", 263 | "execution_count": null, 264 | "metadata": {}, 265 | "outputs": [], 266 | "source": [ 267 | "train(simplenet, optimizer,torch.nn.CrossEntropyLoss(), train_data_loader,val_data_loader, epochs=5, device=device)" 268 | ] 269 | }, 270 | { 271 | "cell_type": "markdown", 272 | "metadata": {}, 273 | "source": [ 274 | "## Making predictions\n", 275 | "\n", 276 | "Labels are in alphanumeric order, so `cat` will be 0, `fish` will be 1. We'll need to transform the image and also make sure that the resulting tensor is copied to the appropriate device before applying our model to it." 277 | ] 278 | }, 279 | { 280 | "cell_type": "code", 281 | "execution_count": null, 282 | "metadata": {}, 283 | "outputs": [], 284 | "source": [ 285 | "labels = ['cat','fish']\n", 286 | "\n", 287 | "img = Image.open(\"./val/fish/100_1422.JPG\") \n", 288 | "img = img_transforms(img).to(device)\n", 289 | "img = torch.unsqueeze(img, 0)\n", 290 | "\n", 291 | "simplenet.eval()\n", 292 | "prediction = F.softmax(simplenet(img), dim=1)\n", 293 | "prediction = prediction.argmax()\n", 294 | "print(labels[prediction]) " 295 | ] 296 | }, 297 | { 298 | "cell_type": "markdown", 299 | "metadata": {}, 300 | "source": [ 301 | "## Saving Models\n", 302 | "\n", 303 | "We can either save the entire model using `save` or just the parameters using `state_dict`. Using the latter is normally preferable, as it allows you to reuse parameters even if the model's structure changes (or apply parameters from one model to another)." 304 | ] 305 | }, 306 | { 307 | "cell_type": "code", 308 | "execution_count": null, 309 | "metadata": {}, 310 | "outputs": [], 311 | "source": [ 312 | "torch.save(simplenet, \"/tmp/simplenet\") \n", 313 | "simplenet = torch.load(\"/tmp/simplenet\") \n" 314 | ] 315 | }, 316 | { 317 | "cell_type": "code", 318 | "execution_count": null, 319 | "metadata": {}, 320 | "outputs": [], 321 | "source": [ 322 | "torch.save(simplenet.state_dict(), \"/tmp/simplenet\") \n", 323 | "simplenet = SimpleNet()\n", 324 | "simplenet_state_dict = torch.load(\"/tmp/simplenet\")\n", 325 | "simplenet.load_state_dict(simplenet_state_dict) " 326 | ] 327 | } 328 | ], 329 | "metadata": { 330 | "kernelspec": { 331 | "display_name": "Python 3", 332 | "language": "python", 333 | "name": "python3" 334 | }, 335 | "language_info": { 336 | "codemirror_mode": { 337 | "name": "ipython", 338 | "version": 3 339 | }, 340 | "file_extension": ".py", 341 | "mimetype": "text/x-python", 342 | "name": "python", 343 | "nbconvert_exporter": "python", 344 | "pygments_lexer": "ipython3", 345 | "version": "3.6.8" 346 | } 347 | }, 348 | "nbformat": 4, 349 | "nbformat_minor": 2 350 | } -------------------------------------------------------------------------------- /chapter4/Chapter 4.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Chapter 4: Transfer Learning And Other Tricks" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": null, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "import torch\n", 17 | "import torch.nn as nn\n", 18 | "import torch.optim as optim\n", 19 | "import torch.utils.data\n", 20 | "import torch.nn.functional as F\n", 21 | "import torchvision\n", 22 | "import torchvision.models as models\n", 23 | "from torchvision import transforms\n", 24 | "from PIL import Image\n", 25 | "import matplotlib.pyplot as plt" 26 | ] 27 | }, 28 | { 29 | "cell_type": "code", 30 | "execution_count": null, 31 | "metadata": {}, 32 | "outputs": [], 33 | "source": [ 34 | "transfer_model = models.resnet50(pretrained=True) " 35 | ] 36 | }, 37 | { 38 | "cell_type": "markdown", 39 | "metadata": {}, 40 | "source": [ 41 | "## Freezing parameters" 42 | ] 43 | }, 44 | { 45 | "cell_type": "code", 46 | "execution_count": null, 47 | "metadata": {}, 48 | "outputs": [], 49 | "source": [ 50 | "for name, param in transfer_model.named_parameters():\n", 51 | " if(\"bn\" not in name):\n", 52 | " param.requires_grad = False" 53 | ] 54 | }, 55 | { 56 | "cell_type": "markdown", 57 | "metadata": {}, 58 | "source": [ 59 | "## Replacing the classifier" 60 | ] 61 | }, 62 | { 63 | "cell_type": "code", 64 | "execution_count": null, 65 | "metadata": {}, 66 | "outputs": [], 67 | "source": [ 68 | "transfer_model.fc = nn.Sequential(nn.Linear(transfer_model.fc.in_features,500),\n", 69 | "nn.ReLU(), \n", 70 | "nn.Dropout(), nn.Linear(500,2)) " 71 | ] 72 | }, 73 | { 74 | "cell_type": "markdown", 75 | "metadata": {}, 76 | "source": [ 77 | "## Training Again" 78 | ] 79 | }, 80 | { 81 | "cell_type": "code", 82 | "execution_count": null, 83 | "metadata": {}, 84 | "outputs": [], 85 | "source": [ 86 | "def train(model, optimizer, loss_fn, train_loader, val_loader, epochs=20, device=\"cpu\"):\n", 87 | " for epoch in range(1, epochs+1):\n", 88 | " training_loss = 0.0\n", 89 | " valid_loss = 0.0\n", 90 | " model.train()\n", 91 | " for batch in train_loader:\n", 92 | " optimizer.zero_grad()\n", 93 | " inputs, targets = batch\n", 94 | " inputs = inputs.to(device)\n", 95 | " targets = targets.to(device)\n", 96 | " output = model(inputs)\n", 97 | " loss = loss_fn(output, targets)\n", 98 | " loss.backward()\n", 99 | " optimizer.step()\n", 100 | " training_loss += loss.data.item() * inputs.size(0)\n", 101 | " training_loss /= len(train_loader.dataset)\n", 102 | " \n", 103 | " model.eval()\n", 104 | " num_correct = 0 \n", 105 | " num_examples = 0\n", 106 | " for batch in val_loader:\n", 107 | " inputs, targets = batch\n", 108 | " inputs = inputs.to(device)\n", 109 | " output = model(inputs)\n", 110 | " targets = targets.to(device)\n", 111 | " loss = loss_fn(output,targets) \n", 112 | " valid_loss += loss.data.item() * inputs.size(0)\n", 113 | " correct = torch.eq(torch.max(F.softmax(output), dim=1)[1], targets).view(-1)\n", 114 | " num_correct += torch.sum(correct).item()\n", 115 | " num_examples += correct.shape[0]\n", 116 | " valid_loss /= len(val_loader.dataset)\n", 117 | "\n", 118 | " print('Epoch: {}, Training Loss: {:.2f}, Validation Loss: {:.2f}, accuracy = {:.2f}'.format(epoch, training_loss,\n", 119 | " valid_loss, num_correct / num_examples))" 120 | ] 121 | }, 122 | { 123 | "cell_type": "code", 124 | "execution_count": null, 125 | "metadata": {}, 126 | "outputs": [], 127 | "source": [ 128 | "def check_image(path):\n", 129 | " try:\n", 130 | " im = Image.open(path)\n", 131 | " return True\n", 132 | " except:\n", 133 | " return False\n", 134 | "\n", 135 | "img_transforms = transforms.Compose([\n", 136 | " transforms.Resize((64,64)), \n", 137 | " transforms.ToTensor(),\n", 138 | " transforms.Normalize(mean=[0.485, 0.456, 0.406],\n", 139 | " std=[0.229, 0.224, 0.225] )\n", 140 | " ])\n", 141 | "train_data_path = \"./train/\"\n", 142 | "train_data = torchvision.datasets.ImageFolder(root=train_data_path,transform=img_transforms, is_valid_file=check_image)\n", 143 | "val_data_path = \"./val/\"\n", 144 | "val_data = torchvision.datasets.ImageFolder(root=val_data_path,transform=img_transforms, is_valid_file=check_image)\n", 145 | "batch_size=64\n", 146 | "train_data_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, shuffle=True)\n", 147 | "val_data_loader = torch.utils.data.DataLoader(val_data, batch_size=batch_size, shuffle=True)\n", 148 | "\n", 149 | "if torch.cuda.is_available():\n", 150 | " device = torch.device(\"cuda\") \n", 151 | "else:\n", 152 | " device = torch.device(\"cpu\")" 153 | ] 154 | }, 155 | { 156 | "cell_type": "code", 157 | "execution_count": null, 158 | "metadata": {}, 159 | "outputs": [], 160 | "source": [ 161 | "print(len(val_data_loader.dataset))" 162 | ] 163 | }, 164 | { 165 | "cell_type": "code", 166 | "execution_count": null, 167 | "metadata": {}, 168 | "outputs": [], 169 | "source": [ 170 | "transfer_model.to(device)\n", 171 | "optimizer = optim.Adam(transfer_model.parameters(), lr=0.001)" 172 | ] 173 | }, 174 | { 175 | "cell_type": "code", 176 | "execution_count": null, 177 | "metadata": {}, 178 | "outputs": [], 179 | "source": [ 180 | "train(transfer_model, optimizer,torch.nn.CrossEntropyLoss(), train_data_loader, val_data_loader, epochs=5,\n", 181 | " device=device)" 182 | ] 183 | }, 184 | { 185 | "cell_type": "markdown", 186 | "metadata": {}, 187 | "source": [ 188 | "## LR Finder" 189 | ] 190 | }, 191 | { 192 | "cell_type": "code", 193 | "execution_count": null, 194 | "metadata": {}, 195 | "outputs": [], 196 | "source": [ 197 | "def find_lr(model, loss_fn, optimizer, train_loader, init_value=1e-8, final_value=10.0, device=\"cpu\"):\n", 198 | " number_in_epoch = len(train_loader) - 1\n", 199 | " update_step = (final_value / init_value) ** (1 / number_in_epoch)\n", 200 | " lr = init_value\n", 201 | " optimizer.param_groups[0][\"lr\"] = lr\n", 202 | " best_loss = 0.0\n", 203 | " batch_num = 0\n", 204 | " losses = []\n", 205 | " log_lrs = []\n", 206 | " for data in train_loader:\n", 207 | " batch_num += 1\n", 208 | " inputs, targets = data\n", 209 | " inputs = inputs.to(device)\n", 210 | " targets = targets.to(device)\n", 211 | " optimizer.zero_grad()\n", 212 | " outputs = model(inputs)\n", 213 | " loss = loss_fn(outputs, targets)\n", 214 | "\n", 215 | " # Crash out if loss explodes\n", 216 | "\n", 217 | " if batch_num > 1 and loss > 4 * best_loss:\n", 218 | " if(len(log_lrs) > 20):\n", 219 | " return log_lrs[10:-5], losses[10:-5]\n", 220 | " else:\n", 221 | " return log_lrs, losses\n", 222 | "\n", 223 | " # Record the best loss\n", 224 | "\n", 225 | " if loss < best_loss or batch_num == 1:\n", 226 | " best_loss = loss\n", 227 | "\n", 228 | " # Store the values\n", 229 | " losses.append(loss.item())\n", 230 | " log_lrs.append((lr))\n", 231 | "\n", 232 | " # Do the backward pass and optimize\n", 233 | "\n", 234 | " loss.backward()\n", 235 | " optimizer.step()\n", 236 | "\n", 237 | " # Update the lr for the next step and store\n", 238 | "\n", 239 | " lr *= update_step\n", 240 | " optimizer.param_groups[0][\"lr\"] = lr\n", 241 | " if(len(log_lrs) > 20):\n", 242 | " return log_lrs[10:-5], losses[10:-5]\n", 243 | " else:\n", 244 | " return log_lrs, losses\n" 245 | ] 246 | }, 247 | { 248 | "cell_type": "code", 249 | "execution_count": null, 250 | "metadata": {}, 251 | "outputs": [], 252 | "source": [ 253 | "(lrs, losses) = find_lr(transfer_model, torch.nn.CrossEntropyLoss(),optimizer, train_data_loader,device=device)\n", 254 | "plt.plot(lrs, losses)\n", 255 | "\n", 256 | "plt.xscale(\"log\")\n", 257 | "plt.xlabel(\"Learning rate\")\n", 258 | "plt.ylabel(\"Loss\")\n", 259 | "plt.show()" 260 | ] 261 | }, 262 | { 263 | "cell_type": "markdown", 264 | "metadata": {}, 265 | "source": [ 266 | "## Custom Transforms\n", 267 | "\n", 268 | "Here we'll create a lambda transform and a custom transform class." 269 | ] 270 | }, 271 | { 272 | "cell_type": "code", 273 | "execution_count": null, 274 | "metadata": {}, 275 | "outputs": [], 276 | "source": [ 277 | "def _random_colour_space(x):\n", 278 | " output = x.convert(\"HSV\")\n", 279 | " return output " 280 | ] 281 | }, 282 | { 283 | "cell_type": "code", 284 | "execution_count": null, 285 | "metadata": {}, 286 | "outputs": [], 287 | "source": [ 288 | "colour_transform = transforms.Lambda(lambda x: _random_colour_space(x))" 289 | ] 290 | }, 291 | { 292 | "cell_type": "code", 293 | "execution_count": null, 294 | "metadata": {}, 295 | "outputs": [], 296 | "source": [ 297 | "random_colour_transform = torchvision.transforms.RandomApply([colour_transform])" 298 | ] 299 | }, 300 | { 301 | "cell_type": "code", 302 | "execution_count": null, 303 | "metadata": {}, 304 | "outputs": [], 305 | "source": [ 306 | "class Noise():\n", 307 | " \"\"\"Adds gaussian noise to a tensor.\n", 308 | " \n", 309 | " Example:\n", 310 | " >>> transforms.Compose([\n", 311 | " >>> transforms.ToTensor(),\n", 312 | " >>> Noise(0.1, 0.05)),\n", 313 | " >>> ])\n", 314 | " \n", 315 | " \"\"\"\n", 316 | " def __init__(self, mean, stddev):\n", 317 | " self.mean = mean\n", 318 | " self.stddev = stddev\n", 319 | "\n", 320 | " def __call__(self, tensor):\n", 321 | " noise = torch.zeros_like(tensor).normal_(self.mean, self.stddev)\n", 322 | " return tensor.add_(noise)\n", 323 | " \n", 324 | " def __repr__(self):\n", 325 | " repr = f\"{self.__class__.__name__ }(mean={self.mean},sttdev={self.stddev})\"\n", 326 | " return repr" 327 | ] 328 | }, 329 | { 330 | "cell_type": "code", 331 | "execution_count": null, 332 | "metadata": {}, 333 | "outputs": [], 334 | "source": [ 335 | "custom_transform_pipeline = transforms.Compose([random_colour_transform, Noise(0.1, 0.05)])" 336 | ] 337 | }, 338 | { 339 | "cell_type": "code", 340 | "execution_count": null, 341 | "metadata": {}, 342 | "outputs": [], 343 | "source": [] 344 | }, 345 | { 346 | "cell_type": "markdown", 347 | "metadata": {}, 348 | "source": [ 349 | "## Ensembles\n", 350 | "\n", 351 | "Given a list of models, we can produce predictions for each model and then make an average to make a final prediction." 352 | ] 353 | }, 354 | { 355 | "cell_type": "code", 356 | "execution_count": null, 357 | "metadata": {}, 358 | "outputs": [], 359 | "source": [ 360 | "models_ensemble = [models.resnet50().to(device), models.resnet50().to(device)]\n", 361 | "predictions = [F.softmax(m(torch.rand(1,3,224,244).to(device))) for m in models_ensemble] \n", 362 | "avg_prediction = torch.stack(predictions).mean(0).argmax()" 363 | ] 364 | }, 365 | { 366 | "cell_type": "code", 367 | "execution_count": null, 368 | "metadata": {}, 369 | "outputs": [], 370 | "source": [ 371 | "avg_prediction" 372 | ] 373 | }, 374 | { 375 | "cell_type": "code", 376 | "execution_count": null, 377 | "metadata": {}, 378 | "outputs": [], 379 | "source": [ 380 | "torch.stack(predictions)" 381 | ] 382 | }, 383 | { 384 | "cell_type": "code", 385 | "execution_count": null, 386 | "metadata": {}, 387 | "outputs": [], 388 | "source": [] 389 | } 390 | ], 391 | "metadata": { 392 | "kernelspec": { 393 | "display_name": "Python 3", 394 | "language": "python", 395 | "name": "python3" 396 | }, 397 | "language_info": { 398 | "codemirror_mode": { 399 | "name": "ipython", 400 | "version": 3 401 | }, 402 | "file_extension": ".py", 403 | "mimetype": "text/x-python", 404 | "name": "python", 405 | "nbconvert_exporter": "python", 406 | "pygments_lexer": "ipython3", 407 | "version": "3.6.8" 408 | } 409 | }, 410 | "nbformat": 4, 411 | "nbformat_minor": 2 412 | } -------------------------------------------------------------------------------- /chapter5/Chapter 5.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Chapter 5: Text Classification" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": null, 13 | "outputs": [], 14 | "source": [ 15 | "!pip install torchtext==0.9.1\n", 16 | "!pip install torch==1.8.1" 17 | ], 18 | "metadata": { 19 | "collapsed": false, 20 | "pycharm": { 21 | "name": "#%%\n" 22 | } 23 | } 24 | }, 25 | { 26 | "cell_type": "code", 27 | "execution_count": 37, 28 | "metadata": {}, 29 | "outputs": [], 30 | "source": [ 31 | "import spacy\n", 32 | "import torchtext\n", 33 | "import pandas as pd\n", 34 | "import torch.nn as nn\n", 35 | "import torch.optim as optim\n", 36 | "from torchtext.legacy import data" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": {}, 42 | "source": [ 43 | "## Loading & Data Cleaning" 44 | ] 45 | }, 46 | { 47 | "cell_type": "code", 48 | "execution_count": 8, 49 | "metadata": {}, 50 | "outputs": [], 51 | "source": [ 52 | "device = \"cuda\"\n" 53 | ] 54 | }, 55 | { 56 | "cell_type": "code", 57 | "execution_count": null, 58 | "metadata": {}, 59 | "outputs": [], 60 | "source": [ 61 | "# You'll probably need to use the 'python' engine to load the CSV\n", 62 | "# tweetsDF = pd.read_csv(\"training.1600000.processed.noemoticon.csv\", header=None)\n", 63 | "tweetsDF = pd.read_csv(\"training.1600000.processed.noemoticon.csv\",\n", 64 | " engine=\"python\", header=None)" 65 | ] 66 | }, 67 | { 68 | "cell_type": "code", 69 | "execution_count": null, 70 | "metadata": {}, 71 | "outputs": [], 72 | "source": [ 73 | "tweetsDF[0].value_counts()" 74 | ] 75 | }, 76 | { 77 | "cell_type": "code", 78 | "execution_count": null, 79 | "metadata": {}, 80 | "outputs": [], 81 | "source": [ 82 | "tweetsDF[\"sentiment_cat\"] = tweetsDF[0].astype('category')\n", 83 | "tweetsDF[\"sentiment\"] = tweetsDF[\"sentiment_cat\"].cat.codes\n", 84 | "tweetsDF.to_csv(\"train-processed.csv\", header=None, index=None) \n", 85 | "tweetsDF.sample(10000).to_csv(\"train-processed-sample.csv\", header=None, index=None) \n" 86 | ] 87 | }, 88 | { 89 | "cell_type": "code", 90 | "execution_count": 38, 91 | "metadata": {}, 92 | "outputs": [], 93 | "source": [ 94 | "LABEL = data.LabelField()\n", 95 | "TWEET = data.Field('spacy', tokenizer_language='en_core_web_sm', lower=True)\n", 96 | "\n", 97 | "fields = [('score',None), ('id',None), ('date',None), ('query',None),\n", 98 | " ('name',None), ('tweet', TWEET), ('category',None), ('label',LABEL)]" 99 | ] 100 | }, 101 | { 102 | "cell_type": "markdown", 103 | "metadata": {}, 104 | "source": [ 105 | "## Create our Dataset and DataLoaders" 106 | ] 107 | }, 108 | { 109 | "cell_type": "code", 110 | "execution_count": 39, 111 | "metadata": {}, 112 | "outputs": [], 113 | "source": [ 114 | "twitterDataset = data.dataset.TabularDataset(\n", 115 | " path=\"train-processed-sample.csv\", \n", 116 | " format=\"CSV\", \n", 117 | " fields=fields,\n", 118 | " skip_header=False)" 119 | ] 120 | }, 121 | { 122 | "cell_type": "code", 123 | "execution_count": 40, 124 | "metadata": {}, 125 | "outputs": [ 126 | { 127 | "data": { 128 | "text/plain": [ 129 | "(6000, 2000, 2000)" 130 | ] 131 | }, 132 | "execution_count": 40, 133 | "metadata": {}, 134 | "output_type": "execute_result" 135 | } 136 | ], 137 | "source": [ 138 | "(train, test, valid) = twitterDataset.split(split_ratio=[0.6,0.2,0.2],\n", 139 | " stratified=True, strata_field='label')\n", 140 | "\n", 141 | "(len(train),len(test),len(valid))" 142 | ] 143 | }, 144 | { 145 | "cell_type": "code", 146 | "execution_count": 41, 147 | "metadata": {}, 148 | "outputs": [ 149 | { 150 | "data": { 151 | "text/plain": [ 152 | "[('i', 3742),\n", 153 | " ('!', 3315),\n", 154 | " ('.', 3084),\n", 155 | " (' ', 2175),\n", 156 | " ('to', 2115),\n", 157 | " ('the', 2022),\n", 158 | " (',', 1823),\n", 159 | " ('a', 1461),\n", 160 | " ('my', 1205),\n", 161 | " ('it', 1197)]" 162 | ] 163 | }, 164 | "execution_count": 41, 165 | "metadata": {}, 166 | "output_type": "execute_result" 167 | } 168 | ], 169 | "source": [ 170 | "vocab_size = 20000\n", 171 | "TWEET.build_vocab(train, max_size = vocab_size)\n", 172 | "LABEL.build_vocab(train)\n", 173 | "TWEET.vocab.freqs.most_common(10)" 174 | ] 175 | }, 176 | { 177 | "cell_type": "code", 178 | "execution_count": 42, 179 | "metadata": {}, 180 | "outputs": [], 181 | "source": [ 182 | "train_iterator, valid_iterator, test_iterator = data.BucketIterator.splits(\n", 183 | " (train, valid, test),\n", 184 | " batch_size = 32,\n", 185 | " device = device,\n", 186 | " sort_key = lambda x: len(x.tweet),\n", 187 | " sort_within_batch = False)" 188 | ] 189 | }, 190 | { 191 | "cell_type": "markdown", 192 | "metadata": {}, 193 | "source": [ 194 | "## Our First LSTM" 195 | ] 196 | }, 197 | { 198 | "cell_type": "code", 199 | "execution_count": 43, 200 | "metadata": {}, 201 | "outputs": [ 202 | { 203 | "data": { 204 | "text/plain": [ 205 | "OurFirstLSTM(\n", 206 | " (embedding): Embedding(20002, 300)\n", 207 | " (encoder): LSTM(300, 100)\n", 208 | " (predictor): Linear(in_features=100, out_features=2, bias=True)\n", 209 | ")" 210 | ] 211 | }, 212 | "execution_count": 43, 213 | "metadata": {}, 214 | "output_type": "execute_result" 215 | } 216 | ], 217 | "source": [ 218 | "class OurFirstLSTM(nn.Module):\n", 219 | " def __init__(self, hidden_size, embedding_dim, vocab_size):\n", 220 | " super(OurFirstLSTM, self).__init__()\n", 221 | " \n", 222 | " self.embedding = nn.Embedding(vocab_size, embedding_dim)\n", 223 | " self.encoder = nn.LSTM(input_size=embedding_dim, \n", 224 | " hidden_size=hidden_size, num_layers=1)\n", 225 | " self.predictor = nn.Linear(hidden_size, 2)\n", 226 | "\n", 227 | " def forward(self, seq):\n", 228 | " output, (hidden,_) = self.encoder(self.embedding(seq))\n", 229 | " preds = self.predictor(hidden.squeeze(0))\n", 230 | " return preds\n", 231 | "\n", 232 | "model = OurFirstLSTM(100,300, 20002)\n", 233 | "model.to(device)" 234 | ] 235 | }, 236 | { 237 | "cell_type": "markdown", 238 | "metadata": {}, 239 | "source": [ 240 | "## Training" 241 | ] 242 | }, 243 | { 244 | "cell_type": "code", 245 | "execution_count": 44, 246 | "metadata": {}, 247 | "outputs": [], 248 | "source": [ 249 | "optimizer = optim.Adam(model.parameters(), lr=2e-2)\n", 250 | "criterion = nn.CrossEntropyLoss()\n", 251 | "\n", 252 | "def train(epochs, model, optimizer, criterion, train_iterator, valid_iterator):\n", 253 | " for epoch in range(1, epochs+1):\n", 254 | " \n", 255 | " training_loss = 0.0\n", 256 | " valid_loss = 0.0\n", 257 | " model.train()\n", 258 | " for batch_idx, batch in enumerate(train_iterator):\n", 259 | " optimizer.zero_grad()\n", 260 | " predict = model(batch.tweet)\n", 261 | " loss = criterion(predict,batch.label)\n", 262 | " loss.backward()\n", 263 | " optimizer.step()\n", 264 | " training_loss += loss.data.item() * batch.tweet.size(0)\n", 265 | " training_loss /= len(train_iterator)\n", 266 | " \n", 267 | " \n", 268 | " model.eval()\n", 269 | " for batch_idx,batch in enumerate(valid_iterator):\n", 270 | " predict = model(batch.tweet)\n", 271 | " loss = criterion(predict,batch.label)\n", 272 | " valid_loss += loss.data.item() * batch.tweet.size(0)\n", 273 | " \n", 274 | " valid_loss /= len(valid_iterator)\n", 275 | " print('Epoch: {}, Training Loss: {:.2f}, Validation Loss: {:.2f}'.format(epoch, training_loss, valid_loss))" 276 | ] 277 | }, 278 | { 279 | "cell_type": "code", 280 | "execution_count": 45, 281 | "metadata": {}, 282 | "outputs": [ 283 | { 284 | "name": "stdout", 285 | "output_type": "stream", 286 | "text": [ 287 | "Epoch: 1, Training Loss: 24.47, Validation Loss: 14.04\n", 288 | "Epoch: 2, Training Loss: 23.81, Validation Loss: 14.57\n", 289 | "Epoch: 3, Training Loss: 23.25, Validation Loss: 15.69\n", 290 | "Epoch: 4, Training Loss: 23.12, Validation Loss: 16.16\n", 291 | "Epoch: 5, Training Loss: 21.71, Validation Loss: 18.80\n" 292 | ] 293 | } 294 | ], 295 | "source": [ 296 | "train(5, model, optimizer, criterion, train_iterator, valid_iterator) " 297 | ] 298 | }, 299 | { 300 | "cell_type": "markdown", 301 | "metadata": {}, 302 | "source": [ 303 | "## Making predictions" 304 | ] 305 | }, 306 | { 307 | "cell_type": "code", 308 | "execution_count": null, 309 | "metadata": {}, 310 | "outputs": [], 311 | "source": [] 312 | }, 313 | { 314 | "cell_type": "code", 315 | "execution_count": 46, 316 | "metadata": {}, 317 | "outputs": [], 318 | "source": [ 319 | "def classify_tweet(tweet):\n", 320 | " categories = {0: \"Negative\", 1:\"Positive\"}\n", 321 | " processed = TWEET.process([TWEET.preprocess(tweet)])\n", 322 | " processed = processed.to(device)\n", 323 | " model.eval()\n", 324 | " return categories[model(processed).argmax().item()]" 325 | ] 326 | }, 327 | { 328 | "cell_type": "markdown", 329 | "metadata": {}, 330 | "source": [ 331 | "## Data Augmentation" 332 | ] 333 | }, 334 | { 335 | "cell_type": "code", 336 | "execution_count": null, 337 | "metadata": {}, 338 | "outputs": [], 339 | "source": [ 340 | "def random_deletion(words, p=0.5):\n", 341 | " if len(words) == 1:\n", 342 | " return words\n", 343 | " remaining = list(filter(lambda x: random.uniform(0,1) > p,words))\n", 344 | " if len(remaining) == 0:\n", 345 | " return [random.choice(words)]\n", 346 | " else:\n", 347 | " return remaining" 348 | ] 349 | }, 350 | { 351 | "cell_type": "code", 352 | "execution_count": null, 353 | "metadata": {}, 354 | "outputs": [], 355 | "source": [ 356 | "def random_swap(sentence, n=5):\n", 357 | " length = range(len(sentence))\n", 358 | " for _ in range(n):\n", 359 | " idx1, idx2 = random.sample(length, 2)\n", 360 | " sentence[idx1], sentence[idx2] = sentence[idx2], sentence[idx1]\n", 361 | " return sentence" 362 | ] 363 | }, 364 | { 365 | "cell_type": "code", 366 | "execution_count": null, 367 | "metadata": {}, 368 | "outputs": [], 369 | "source": [ 370 | "# Note: you'll have to define remove_stopwords() and get_synonyms() elsewhere\n", 371 | "\n", 372 | "def random_insertion(sentence,n):\n", 373 | " words = remove_stopwords(sentence)\n", 374 | " for _ in range(n):\n", 375 | " new_synonym = get_synonyms(random.choice(words))\n", 376 | " sentence.insert(randrange(len(sentence)+1), new_synonym)\n", 377 | " return sentence" 378 | ] 379 | }, 380 | { 381 | "cell_type": "code", 382 | "execution_count": null, 383 | "metadata": {}, 384 | "outputs": [], 385 | "source": [ 386 | "# Install googletrans version 3.1.0a0 (temporary fix for #57)\n", 387 | "!pip install googletrans==3.1.0a0" 388 | ] 389 | }, 390 | { 391 | "cell_type": "code", 392 | "execution_count": null, 393 | "outputs": [], 394 | "source": [ 395 | "import googletrans\n", 396 | "import random\n", 397 | "\n", 398 | "translator = googletrans.Translator()\n", 399 | "\n", 400 | "sentences = ['The cat sat on the mat']\n", 401 | "\n", 402 | "translations_fr = translator.translate(sentences, dest='fr')\n", 403 | "fr_text = [t.text for t in translations_fr] \n", 404 | "translations_en = translator.translate(fr_text, dest='en')\n", 405 | "en_text = [t.text for t in translations_en]\n", 406 | "print(en_text) \n", 407 | "\n", 408 | "available_langs = list(googletrans.LANGUAGES.keys())\n", 409 | "tr_lang = random.choice(available_langs)\n", 410 | "print(f\"Translating to {googletrans.LANGUAGES[tr_lang]}\")\n", 411 | "\n", 412 | "translations = translator.translate(sentences, dest=tr_lang)\n", 413 | "t_text = [t.text for t in translations]\n", 414 | "print(t_text)\n", 415 | "\n", 416 | "translations_en_random = translator.translate(t_text, src=tr_lang, dest='en')\n", 417 | "en_text = [t.text for t in translations_en_random]\n", 418 | "print(en_text)" 419 | ], 420 | "metadata": { 421 | "collapsed": false, 422 | "pycharm": { 423 | "name": "#%%\n" 424 | } 425 | } 426 | }, 427 | { 428 | "cell_type": "code", 429 | "execution_count": null, 430 | "metadata": {}, 431 | "outputs": [], 432 | "source": [] 433 | } 434 | ], 435 | "metadata": { 436 | "kernelspec": { 437 | "display_name": "Python 3", 438 | "language": "python", 439 | "name": "python3" 440 | }, 441 | "language_info": { 442 | "codemirror_mode": { 443 | "name": "ipython", 444 | "version": 3 445 | }, 446 | "file_extension": ".py", 447 | "mimetype": "text/x-python", 448 | "name": "python", 449 | "nbconvert_exporter": "python", 450 | "pygments_lexer": "ipython3", 451 | "version": "3.6.8" 452 | } 453 | }, 454 | "nbformat": 4, 455 | "nbformat_minor": 2 456 | } 457 | -------------------------------------------------------------------------------- /chapter6/Chapter 6.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Chapter 6: A Journey Into Sound\n", 8 | "\n", 9 | "(note: uses PyTorch 1.6 and torchaudio 0.6.0)\n", 10 | "\n", 11 | "Download and extract the ESC-50 files from https://github.com/karolpiczak/ESC-50#download" 12 | ] 13 | }, 14 | { 15 | "cell_type": "code", 16 | "execution_count": null, 17 | "metadata": {}, 18 | "outputs": [], 19 | "source": [ 20 | "import IPython.display as display\n", 21 | "import librosa\n", 22 | "import librosa.display\n", 23 | "import matplotlib.pyplot as plt\n", 24 | "import numpy as np\n", 25 | "import random\n", 26 | "import torch\n", 27 | "import torchaudio\n", 28 | "import torch.optim as optim\n", 29 | "import torch.nn as nn\n", 30 | "import torch.nn.functional as F\n", 31 | "import torchvision\n", 32 | "from pathlib import Path\n", 33 | "from PIL import Image\n", 34 | "from torch.utils.data import Dataset\n", 35 | "from torchvision import models, transforms" 36 | ] 37 | }, 38 | { 39 | "cell_type": "code", 40 | "execution_count": null, 41 | "metadata": {}, 42 | "outputs": [], 43 | "source": [ 44 | "def train(model, optimizer, loss_fn, train_loader, val_loader, epochs=20, device=\"cpu\"):\n", 45 | " for epoch in range(1, epochs+1):\n", 46 | " training_loss = 0.0\n", 47 | " valid_loss = 0.0\n", 48 | " model.train()\n", 49 | " for batch in train_loader:\n", 50 | " optimizer.zero_grad()\n", 51 | " inputs, targets = batch\n", 52 | " inputs = inputs.to(device)\n", 53 | " targets = targets.to(device)\n", 54 | " output = model(inputs)\n", 55 | " loss = loss_fn(output, targets)\n", 56 | " loss.backward()\n", 57 | " optimizer.step()\n", 58 | " training_loss += loss.data.item() * inputs.size(0)\n", 59 | " training_loss /= len(train_loader.dataset)\n", 60 | " \n", 61 | " model.eval()\n", 62 | " num_correct = 0 \n", 63 | " num_examples = 0\n", 64 | " for batch in val_loader:\n", 65 | " inputs, targets = batch\n", 66 | " inputs = inputs.to(device)\n", 67 | " output = model(inputs)\n", 68 | " targets = targets.to(device)\n", 69 | " loss = loss_fn(output,targets) \n", 70 | " valid_loss += loss.data.item() * inputs.size(0)\n", 71 | " correct = torch.eq(torch.max(F.softmax(output), dim=1)[1], targets).view(-1)\n", 72 | " num_correct += torch.sum(correct).item()\n", 73 | " num_examples += correct.shape[0]\n", 74 | " valid_loss /= len(val_loader.dataset)\n", 75 | "\n", 76 | " print('Epoch: {}, Training Loss: {:.2f}, Validation Loss: {:.2f}, accuracy = {:.2f}'.format(epoch, training_loss,\n", 77 | " valid_loss, num_correct / num_examples))\n", 78 | " \n", 79 | "def find_lr(model, loss_fn, optimizer, train_loader, init_value=1e-8, final_value=10.0, device=\"cpu\"):\n", 80 | " number_in_epoch = len(train_loader) - 1\n", 81 | " update_step = (final_value / init_value) ** (1 / number_in_epoch)\n", 82 | " lr = init_value\n", 83 | " optimizer.param_groups[0][\"lr\"] = lr\n", 84 | " best_loss = 0.0\n", 85 | " batch_num = 0\n", 86 | " losses = []\n", 87 | " log_lrs = []\n", 88 | " for data in train_loader:\n", 89 | " batch_num += 1\n", 90 | " inputs, targets = data\n", 91 | " inputs = inputs.to(device)\n", 92 | " targets = targets.to(device)\n", 93 | " optimizer.zero_grad()\n", 94 | " outputs = model(inputs)\n", 95 | " loss = loss_fn(outputs, targets)\n", 96 | "\n", 97 | " # Crash out if loss explodes\n", 98 | "\n", 99 | " if batch_num > 1 and loss > 4 * best_loss:\n", 100 | " if(len(log_lrs) > 20):\n", 101 | " return log_lrs[10:-5], losses[10:-5]\n", 102 | " else:\n", 103 | " return log_lrs, losses\n", 104 | "\n", 105 | " # Record the best loss\n", 106 | "\n", 107 | " if loss < best_loss or batch_num == 1:\n", 108 | " best_loss = loss\n", 109 | "\n", 110 | " # Store the values\n", 111 | " losses.append(loss.item())\n", 112 | " log_lrs.append((lr))\n", 113 | "\n", 114 | " # Do the backward pass and optimize\n", 115 | "\n", 116 | " loss.backward()\n", 117 | " optimizer.step()\n", 118 | "\n", 119 | " # Update the lr for the next step and store\n", 120 | "\n", 121 | " lr *= update_step\n", 122 | " optimizer.param_groups[0][\"lr\"] = lr\n", 123 | " if(len(log_lrs) > 20):\n", 124 | " return log_lrs[10:-5], losses[10:-5]\n", 125 | " else:\n", 126 | " return log_lrs, losses " 127 | ] 128 | }, 129 | { 130 | "cell_type": "markdown", 131 | "metadata": {}, 132 | "source": [ 133 | "## ESC-50 Dataset & DataLoaders" 134 | ] 135 | }, 136 | { 137 | "cell_type": "code", 138 | "execution_count": null, 139 | "metadata": {}, 140 | "outputs": [], 141 | "source": [ 142 | "class ESC50(Dataset):\n", 143 | " def __init__(self,path):\n", 144 | " # Get directory listing from path\n", 145 | " files = Path(path).glob('*.wav')\n", 146 | " # Iterate through the listing and create a list of tuples (filename, label)\n", 147 | " self.items = [(str(f),f.name.split(\"-\")[-1].replace(\".wav\",\"\")) for f in files]\n", 148 | " self.length = len(self.items)\n", 149 | " def __getitem__(self, index):\n", 150 | " filename, label = self.items[index]\n", 151 | " audioTensor, rate = torchaudio.load(filename)\n", 152 | " return (audioTensor, int(label)) \n", 153 | " def __len__(self):\n", 154 | " return self.length" 155 | ] 156 | }, 157 | { 158 | "cell_type": "code", 159 | "execution_count": null, 160 | "metadata": {}, 161 | "outputs": [], 162 | "source": [ 163 | "device=\"cuda\"\n", 164 | "bs=64\n", 165 | "PATH_TO_ESC50 = Path.cwd() / 'esc50'\n", 166 | "\n", 167 | "train_esc50 = ESC50(PATH_TO_ESC50 / \"train\")\n", 168 | "valid_esc50 = ESC50(PATH_TO_ESC50 / \"valid\")\n", 169 | "test_esc50 = ESC50(PATH_TO_ESC50 / \"test\")\n", 170 | "\n", 171 | "train_loader = torch.utils.data.DataLoader(train_esc50, batch_size = bs, shuffle = True)\n", 172 | "valid_loader = torch.utils.data.DataLoader(valid_esc50, batch_size = bs, shuffle = True)\n", 173 | "test_loader = torch.utils.data.DataLoader(test_esc50, batch_size = bs, shuffle = True)" 174 | ] 175 | }, 176 | { 177 | "cell_type": "markdown", 178 | "metadata": {}, 179 | "source": [ 180 | "## M5-based CNN AudioNet" 181 | ] 182 | }, 183 | { 184 | "cell_type": "code", 185 | "execution_count": null, 186 | "metadata": {}, 187 | "outputs": [], 188 | "source": [ 189 | "class AudioNet(nn.Module):\n", 190 | " def __init__(self):\n", 191 | " super(AudioNet, self).__init__()\n", 192 | " self.conv1 = nn.Conv1d(100, 128, kernel_size=5, stride=4)\n", 193 | " self.bn1 = nn.BatchNorm1d(128)\n", 194 | " self.pool1 = nn.MaxPool1d(4)\n", 195 | " self.conv2 = nn.Conv1d(128, 128, 3)\n", 196 | " self.bn2 = nn.BatchNorm1d(128)\n", 197 | " self.pool2 = nn.MaxPool1d(4)\n", 198 | " self.conv3 = nn.Conv1d(128, 256, 3)\n", 199 | " self.bn3 = nn.BatchNorm1d(256)\n", 200 | " self.pool3 = nn.MaxPool1d(4)\n", 201 | " self.conv4 = nn.Conv1d(256, 512, 3)\n", 202 | " self.bn4 = nn.BatchNorm1d(512)\n", 203 | " self.pool4 = nn.MaxPool1d(4)\n", 204 | " self.fc1 = nn.Linear(512, 50)\n", 205 | "\n", 206 | " def forward(self, x):\n", 207 | " x = x.unsqueeze(-1).view(-1, 100, 2205)\n", 208 | " x = self.conv1(x)\n", 209 | " x = F.relu(self.bn1(x))\n", 210 | " x = self.pool1(x)\n", 211 | " x = self.conv2(x)\n", 212 | " x = F.relu(self.bn2(x))\n", 213 | " x = self.pool2(x)\n", 214 | " x = self.conv3(x)\n", 215 | " x = F.relu(self.bn3(x))\n", 216 | " x = self.pool3(x)\n", 217 | " x = self.conv4(x)\n", 218 | " x = F.relu(self.bn4(x))\n", 219 | " x = self.pool4(x)\n", 220 | " x = x.squeeze(-1)\n", 221 | " x = self.fc1(x)\n", 222 | " return x" 223 | ] 224 | }, 225 | { 226 | "cell_type": "code", 227 | "execution_count": null, 228 | "metadata": {}, 229 | "outputs": [], 230 | "source": [ 231 | "audionet = AudioNet()\n", 232 | "audionet.to(device)" 233 | ] 234 | }, 235 | { 236 | "cell_type": "markdown", 237 | "metadata": {}, 238 | "source": [ 239 | "## Find learning rate & train" 240 | ] 241 | }, 242 | { 243 | "cell_type": "code", 244 | "execution_count": null, 245 | "metadata": {}, 246 | "outputs": [], 247 | "source": [ 248 | "torch.save(audionet.state_dict(), \"audionet.pth\")\n", 249 | "optimizer = optim.Adam(audionet.parameters(), lr=0.001)\n", 250 | "logs,losses = find_lr(audionet, nn.CrossEntropyLoss(), optimizer, train_loader, device=device)\n", 251 | "\n", 252 | "plt.plot(logs,losses)" 253 | ] 254 | }, 255 | { 256 | "cell_type": "code", 257 | "execution_count": null, 258 | "metadata": {}, 259 | "outputs": [], 260 | "source": [ 261 | "lr = 1e-5\n", 262 | "audionet.load_state_dict(torch.load(\"audionet.pth\"))\n", 263 | "optimizer = optim.Adam(audionet.parameters(), lr=lr)" 264 | ] 265 | }, 266 | { 267 | "cell_type": "code", 268 | "execution_count": null, 269 | "metadata": {}, 270 | "outputs": [], 271 | "source": [ 272 | "train(audionet, optimizer, torch.nn.CrossEntropyLoss(),train_loader, valid_loader, epochs=20, device=device)" 273 | ] 274 | }, 275 | { 276 | "cell_type": "markdown", 277 | "metadata": {}, 278 | "source": [ 279 | "## Using Spectrograms" 280 | ] 281 | }, 282 | { 283 | "cell_type": "code", 284 | "execution_count": null, 285 | "metadata": {}, 286 | "outputs": [], 287 | "source": [ 288 | "sample_data, sr = librosa.load(\"ESC-50/train/1-100032-A-0.wav\", sr=None)\n", 289 | "spectrogram = librosa.feature.melspectrogram(sample_data, sr=sr)\n", 290 | "log_spectrogram = librosa.power_to_db(spectrogram, ref=np.max)" 291 | ] 292 | }, 293 | { 294 | "cell_type": "code", 295 | "execution_count": null, 296 | "metadata": {}, 297 | "outputs": [], 298 | "source": [ 299 | "def precompute_spectrograms(path, dpi=50):\n", 300 | " files = Path(path).glob('*.wav')\n", 301 | " for filename in files:\n", 302 | " audio_tensor, sr = librosa.load(filename, sr=None)\n", 303 | " spectrogram = librosa.feature.melspectrogram(audio_tensor, sr=sr)\n", 304 | " log_spectrogram = librosa.power_to_db(spectrogram, ref=np.max)\n", 305 | " librosa.display.specshow(log_spectrogram, sr=sr, x_axis='time', y_axis='mel')\n", 306 | " plt.gcf().savefig(\"{}{}_{}.png\".format(filename.parent,dpi,filename.name), dpi=dpi)\n", 307 | "\n", 308 | "PATH_ESC50_TRAIN = PATH_TO_ESC50 / \"train\"\n", 309 | "PATH_ESC50_VALID = PATH_TO_ESC50 / \"valid\"\n", 310 | "\n", 311 | "precompute_spectrograms(PATH_ESC50_TRAIN)\n", 312 | "precompute_spectrograms(PATH_ESC50_VALID)" 313 | ] 314 | }, 315 | { 316 | "cell_type": "code", 317 | "execution_count": null, 318 | "metadata": {}, 319 | "outputs": [], 320 | "source": [ 321 | "class PrecomputedESC50(Dataset):\n", 322 | " def __init__(self,path,dpi=50, img_transforms=None):\n", 323 | " files = Path(path).glob('{}{}*.wav.png'.format(path.name, dpi))\n", 324 | " self.items = [(f,int(f.name.split(\"-\")[-1].replace(\".wav.png\",\"\"))) for f in files]\n", 325 | " self.length = len(self.items)\n", 326 | " if img_transforms == None:\n", 327 | " self.img_transforms = transforms.Compose([transforms.ToTensor()])\n", 328 | " else:\n", 329 | " self.img_transforms = img_transforms\n", 330 | " \n", 331 | " def __getitem__(self, index):\n", 332 | " filename, label = self.items[index]\n", 333 | " img = Image.open(filename).convert('RGB')\n", 334 | " return (self.img_transforms(img), label)\n", 335 | " \n", 336 | " def __len__(self):\n", 337 | " return self.length" 338 | ] 339 | }, 340 | { 341 | "cell_type": "markdown", 342 | "metadata": {}, 343 | "source": [ 344 | "## Pretrained ResNet50" 345 | ] 346 | }, 347 | { 348 | "cell_type": "code", 349 | "execution_count": null, 350 | "metadata": {}, 351 | "outputs": [], 352 | "source": [ 353 | "spec_resnet = models.resnet50(pretrained=True)\n", 354 | "\n", 355 | "for param in spec_resnet.parameters():\n", 356 | " param.requires_grad = False\n", 357 | "\n", 358 | "spec_resnet.fc = nn.Sequential(nn.Linear(spec_resnet.fc.in_features,500),\n", 359 | " nn.ReLU(),\n", 360 | " nn.Dropout(), nn.Linear(500,50))" 361 | ] 362 | }, 363 | { 364 | "cell_type": "code", 365 | "execution_count": null, 366 | "metadata": {}, 367 | "outputs": [], 368 | "source": [ 369 | "esc50pre_train = PrecomputedESC50(PATH_ESC50_TRAIN,\n", 370 | " img_transforms=transforms.Compose([\n", 371 | " transforms.ToTensor(),\n", 372 | " transforms.Normalize(mean=[0.485, 0.456, 0.406],\n", 373 | " std=[0.229, 0.224, 0.225])])\n", 374 | ")\n", 375 | "\n", 376 | "esc50pre_valid = PrecomputedESC50(PATH_ESC50_VALID,\n", 377 | " img_transforms=transforms.Compose([\n", 378 | " transforms.ToTensor(),\n", 379 | " transforms.Normalize(mean=[0.485, 0.456, 0.406],\n", 380 | " std=[0.229, 0.224, 0.225])])\n", 381 | ")\n", 382 | "\n", 383 | "esc50_train_loader = torch.utils.data.DataLoader(esc50pre_train, bs, shuffle=True)\n", 384 | "esc50_val_loader = torch.utils.data.DataLoader(esc50pre_valid, bs, shuffle=True)" 385 | ] 386 | }, 387 | { 388 | "cell_type": "code", 389 | "execution_count": null, 390 | "metadata": {}, 391 | "outputs": [], 392 | "source": [ 393 | "spec_resnet.to(device) \n", 394 | "torch.save(spec_resnet.state_dict(), \"spec_resnet.pth\")\n", 395 | "loss_fn = nn.CrossEntropyLoss()\n", 396 | "optimizer = optim.Adam(spec_resnet.parameters(), lr=lr)\n", 397 | "logs,losses = find_lr(spec_resnet, loss_fn, optimizer, esc50_train_loader, device=device)\n", 398 | "plt.plot(logs, losses)" 399 | ] 400 | }, 401 | { 402 | "cell_type": "code", 403 | "execution_count": null, 404 | "metadata": {}, 405 | "outputs": [], 406 | "source": [ 407 | "spec_resnet.load_state_dict(torch.load(\"spec_resnet.pth\"))\n", 408 | "optimizer = optim.Adam([\n", 409 | " {'params': spec_resnet.conv1.parameters()},\n", 410 | " {'params': spec_resnet.bn1.parameters()},\n", 411 | " {'params': spec_resnet.relu.parameters()},\n", 412 | " {'params': spec_resnet.maxpool.parameters()},\n", 413 | " {'params': spec_resnet.layer1.parameters(), 'lr': 1e-4},\n", 414 | " {'params': spec_resnet.layer2.parameters(), 'lr': 1e-4},\n", 415 | " {'params': spec_resnet.layer3.parameters(), 'lr': 1e-4},\n", 416 | " {'params': spec_resnet.layer4.parameters(), 'lr': 1e-4},\n", 417 | " {'params': spec_resnet.avgpool.parameters(), 'lr': 1e-4},\n", 418 | " {'params': spec_resnet.fc.parameters(), 'lr': 1e-8}\n", 419 | " ], lr=1e-2)\n", 420 | "\n", 421 | "train(spec_resnet, optimizer, nn.CrossEntropyLoss(), esc50_train_loader, esc50_val_loader, epochs=5, device=device)\n", 422 | "\n", 423 | "for param in spec_resnet.parameters():\n", 424 | " param.requires_grad = True\n", 425 | "\n", 426 | "train(spec_resnet, optimizer, nn.CrossEntropyLoss(), esc50_train_loader, esc50_val_loader, epochs=5, device=device)" 427 | ] 428 | }, 429 | { 430 | "cell_type": "markdown", 431 | "metadata": {}, 432 | "source": [ 433 | "## Data Augmentation" 434 | ] 435 | }, 436 | { 437 | "cell_type": "code", 438 | "execution_count": null, 439 | "metadata": {}, 440 | "outputs": [], 441 | "source": [ 442 | "class ESC50WithPitchChange(Dataset):\n", 443 | "\n", 444 | " def __init__(self,path):\n", 445 | " # Get directory listing from path\n", 446 | " files = Path(path).glob('*.wav')\n", 447 | " # Iterate through the listing and create a list of tuples (filename, label)\n", 448 | " self.items = [(f,f.name.split(\"-\")[-1].replace(\".wav\",\"\")) for f in files]\n", 449 | " self.length = len(self.items)\n", 450 | " self.E = torchaudio.sox_effects.SoxEffectsChain()\n", 451 | " self.E.append_effect_to_chain(\"pitch\", [0.5])\n", 452 | " \n", 453 | " def __getitem__(self, index):\n", 454 | " filename, label = self.items[index]\n", 455 | " self.E.set_input_file(filename)\n", 456 | " audio_tensor, sample_rate = self.E.sox_build_flow_effects()\n", 457 | " return audio_tensor, label\n", 458 | " \n", 459 | " def __len__(self):\n", 460 | " return self.length" 461 | ] 462 | }, 463 | { 464 | "cell_type": "code", 465 | "execution_count": null, 466 | "metadata": {}, 467 | "outputs": [], 468 | "source": [ 469 | "class FrequencyMask(object):\n", 470 | " \"\"\"\n", 471 | " Example:\n", 472 | " >>> transforms.Compose([\n", 473 | " >>> transforms.ToTensor(),\n", 474 | " >>> FrequencyMask(max_width=10, use_mean=False),\n", 475 | " >>> ])\n", 476 | "\n", 477 | " \"\"\"\n", 478 | "\n", 479 | " def __init__(self, max_width, use_mean=True):\n", 480 | " self.max_width = max_width\n", 481 | " self.use_mean = use_mean\n", 482 | "\n", 483 | " def __call__(self, tensor):\n", 484 | " \"\"\"\n", 485 | " Args:\n", 486 | " tensor (Tensor): Tensor image of \n", 487 | " size (C, H, W) where the frequency \n", 488 | " mask is to be applied.\n", 489 | "\n", 490 | " Returns:\n", 491 | " Tensor: Transformed image with Frequency Mask.\n", 492 | " \"\"\"\n", 493 | " start = random.randrange(0, tensor.shape[2])\n", 494 | " end = start + random.randrange(1, self.max_width)\n", 495 | " if self.use_mean:\n", 496 | " tensor[:, start:end, :] = tensor.mean()\n", 497 | " else:\n", 498 | " tensor[:, start:end, :] = 0\n", 499 | " return tensor\n", 500 | "\n", 501 | " def __repr__(self):\n", 502 | " format_string = self.__class__.__name__ + \"(max_width=\"\n", 503 | " format_string += str(self.max_width) + \")\"\n", 504 | " format_string += 'use_mean=' + (str(self.use_mean) + ')')\n", 505 | "\n", 506 | " return format_string" 507 | ] 508 | }, 509 | { 510 | "cell_type": "code", 511 | "execution_count": null, 512 | "metadata": {}, 513 | "outputs": [], 514 | "source": [ 515 | "transforms.Compose([FrequencyMask(max_width=10, use_mean=False),\n", 516 | "transforms.ToPILImage()])(torch.rand(3,250,200))" 517 | ] 518 | }, 519 | { 520 | "cell_type": "code", 521 | "execution_count": null, 522 | "metadata": {}, 523 | "outputs": [], 524 | "source": [ 525 | "class TimeMask(object):\n", 526 | " \"\"\"\n", 527 | " Example:\n", 528 | " >>> transforms.Compose([\n", 529 | " >>> transforms.ToTensor(),\n", 530 | " >>> TimeMask(max_width=10, use_mean=False),\n", 531 | " >>> ])\n", 532 | "\n", 533 | " \"\"\"\n", 534 | "\n", 535 | " def __init__(self, max_width, use_mean=True):\n", 536 | " self.max_width = max_width\n", 537 | " self.use_mean = use_mean\n", 538 | "\n", 539 | " def __call__(self, tensor):\n", 540 | " \"\"\"\n", 541 | " Args:\n", 542 | " tensor (Tensor): Tensor image of \n", 543 | " size (C, H, W) where the time mask \n", 544 | " is to be applied.\n", 545 | "\n", 546 | " Returns:\n", 547 | " Tensor: Transformed image with Time Mask.\n", 548 | " \"\"\"\n", 549 | " start = random.randrange(0, tensor.shape[1])\n", 550 | " end = start + random.randrange(0, self.max_width)\n", 551 | " if self.use_mean:\n", 552 | " tensor[:, :, start:end] = tensor.mean()\n", 553 | " else:\n", 554 | " tensor[:, :, start:end] = 0\n", 555 | " return tensor\n", 556 | "\n", 557 | " def __repr__(self):\n", 558 | " format_string = self.__class__.__name__ + \"(max_width=\"\n", 559 | " format_string += str(self.max_width) + \")\"\n", 560 | " format_string += 'use_mean=' + (str(self.use_mean) + ')')\n", 561 | " return format_string" 562 | ] 563 | }, 564 | { 565 | "cell_type": "code", 566 | "execution_count": null, 567 | "metadata": {}, 568 | "outputs": [], 569 | "source": [ 570 | "transforms.Compose([TimeMask(max_width=10, use_mean=False),\n", 571 | "transforms.ToPILImage()])(torch.rand(3,250,200))" 572 | ] 573 | }, 574 | { 575 | "cell_type": "code", 576 | "execution_count": null, 577 | "metadata": {}, 578 | "outputs": [], 579 | "source": [ 580 | "class PrecomputedTransformESC50(Dataset):\n", 581 | " def __init__(self, path, max_freqmask_width, max_timemask_width, use_mean=True, dpi=50):\n", 582 | " files = Path(path).glob('{}*.wav.png'.format(dpi))\n", 583 | " self.items = [(f,f.name.split(\"-\")[-1].replace(\".wav.png\",\"\")) for f in files]\n", 584 | " self.length = len(self.items)\n", 585 | " self.max_freqmask_width = max_freqmask_width\n", 586 | " self.max_timemask_width = max_timemask_width\n", 587 | " self.use_mean = use_mean\n", 588 | " self.img_transforms = transforms.Compose([\n", 589 | " transforms.ToTensor(),\n", 590 | " transforms.RandomApply([FrequencyMask(self.max_freqmask_width, self.use_mean)], p=0.5),\n", 591 | " transforms.RandomApply([TimeMask(self.max_timemask_width, self.use_mean)], p=0.5)\n", 592 | "])\n", 593 | " \n", 594 | " def __getitem__(self, index):\n", 595 | " filename, label = self.items[index]\n", 596 | " img = Image.open(filename)\n", 597 | " return (self.img_transforms(img), label)\n", 598 | " \n", 599 | " def __len__(self):\n", 600 | " return self.length" 601 | ] 602 | } 603 | ], 604 | "metadata": { 605 | "kernelspec": { 606 | "display_name": "Python 3", 607 | "language": "python", 608 | "name": "python3" 609 | }, 610 | "language_info": { 611 | "codemirror_mode": { 612 | "name": "ipython", 613 | "version": 3 614 | }, 615 | "file_extension": ".py", 616 | "mimetype": "text/x-python", 617 | "name": "python", 618 | "nbconvert_exporter": "python", 619 | "pygments_lexer": "ipython3", 620 | "version": "3.6.8" 621 | } 622 | }, 623 | "nbformat": 4, 624 | "nbformat_minor": 2 625 | } -------------------------------------------------------------------------------- /chapter9/Chapter9.5.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Chapter 9.5 — Text Generation With GPT-2 And (only) PyTorch\n", 8 | "\n", 9 | "While I'm mostly happy with how the book turned out, bar some silly errors that should not have made it to print and needing about another six months to do it properly (although work would have precluded that, so…anyway), I was a little disappointed with how I handled text generation. It worked, that's for sure, but it was little more than 'run this program on this text, then run this script to transform the Tensorflow model into a PyTorch compatible format, and run _this_ script to generate output'. And then, to top it all off, about a week after the book went to print, the repo that housed most of the code underwent a major change from `pytorch-pretrained-BERT` to its eventual name of `transformers`. A bit of a pain.\n", 10 | "\n", 11 | "In a way to make that up to people, welcome to Chapter 9.5 - A Half-Chapter in Two Parts. In this part, we'll take another look at text generation, but this time, we won't leave PyTorch. Promise. In Part Two (or is that Chapter 9.75?), we'll have a bit of a final look back at images. The common theme between both parts will be self-supervision and domain modelling. I don't have an ETA for Part Two yet, but it'll come, promise.\n", 12 | "\n", 13 | "If you're looking for a refresher on the Transformer architecture, then there's some in Chapter 9 of my book, but more usefully, you could go here to read [The Illustrated Transformer](http://jalammar.github.io/illustrated-transformer/), and here for [The Illustrated GPT-2](http://jalammar.github.io/illustrated-gpt2/).\n", 14 | "\n", 15 | "## Adding New Generation Tricks To GPT-2\n", 16 | " \n", 17 | "Right, so if you remember in the book, we went on a jolly side-jaunt with P.G. Wodehouse. And that was all very fine and whimsical, but maybe we want something that shows off the capabilities of GPT-2 a little better, even if it's really just doing most of the same thing under the covers.\n", 18 | "\n", 19 | "Instead of Jeeves and Wooster, we're going to generate tweets. And we're going to take things a step further by adding a new \"control code\" to our fine-tuned GPT-2 model, so we can instruct GPT-2 that we specifically want to generate a new tweet. If we don't add the control code, then we should just get a (mostly) standard GPT-2 output. And we can use this technique to add _multiple_ control codes, so if you had different sets of synthetic data that you wish to generate, you can use those codes to determine which type to create.\n", 20 | " \n", 21 | "And first…let's go back to the standard thing we always do. \n", 22 | "\n", 23 | "_\"Gee Brain, what are we going to do tonight?\"_\n", 24 | "_\"The same thing we do every night Pinky. Write a new custom dataset and take over the world!\"_\n" 25 | ] 26 | }, 27 | { 28 | "cell_type": "code", 29 | "execution_count": null, 30 | "metadata": {}, 31 | "outputs": [], 32 | "source": [ 33 | "# Using PyTorch 1.4\n", 34 | "\n", 35 | "import numpy as np\n", 36 | "import pyarrow.parquet as pq\n", 37 | "import pandas as pd\n", 38 | "import random\n", 39 | "import torch\n", 40 | "import fire\n", 41 | "import logging\n", 42 | "import os\n", 43 | "import csv\n", 44 | "\n", 45 | "from torch.utils.data import Dataset, DataLoader\n", 46 | "from transformers import GPT2Tokenizer, GPT2LMHeadModel, AdamW, get_linear_schedule_with_warmup\n", 47 | "from tqdm import tqdm, trange\n", 48 | "import torch.nn.functional as F\n" 49 | ] 50 | }, 51 | { 52 | "cell_type": "markdown", 53 | "metadata": {}, 54 | "source": [ 55 | "### Datasets\n", 56 | " \n", 57 | "Don't worry though, we won't be doing anything too crazy with this `Dataset`. \n", 58 | "\n", 59 | "Much. \n" 60 | ] 61 | }, 62 | { 63 | "cell_type": "code", 64 | "execution_count": null, 65 | "metadata": {}, 66 | "outputs": [], 67 | "source": [ 68 | "class ParquetDataset(Dataset):\n", 69 | " def __init__(self, path, cols, truncate=False, gpt2_type=\"gpt2\", max_length=768):\n", 70 | "\n", 71 | " # Grab our pandas dataframe, only reading in the columns we're interested in,\n", 72 | " # append our magic tokens (<#col_name#> for the particular column, and <|endoftext|>\n", 73 | " # used by GPT-2 as a text separator), then concatenate them into one giant column for\n", 74 | " # our dataset\n", 75 | "\n", 76 | " self.tokenizer = GPT2Tokenizer.from_pretrained(gpt2_type)\n", 77 | " \n", 78 | " self.df = pq.read_table(path, columns=cols).to_pandas().dropna()\n", 79 | " for col in cols:\n", 80 | " self.df[col] = self.df[col].apply(lambda x: torch.tensor(self.tokenizer.encode(f\"<#{col}#>{x[:768]}<|endoftext|>\")))\n", 81 | " self.df = pd.concat(map(self.df.get, cols)).reset_index(drop=True)\n", 82 | " if truncate:\n", 83 | " self.df = self.df.truncate(after=150)\n", 84 | "\n", 85 | " def __len__(self):\n", 86 | " return self.df.count()\n", 87 | "\n", 88 | " def __getitem__(self, item):\n", 89 | " return self.df.iloc[item]" 90 | ] 91 | }, 92 | { 93 | "cell_type": "code", 94 | "execution_count": null, 95 | "metadata": {}, 96 | "outputs": [], 97 | "source": [ 98 | "class CSVTwitter(Dataset):\n", 99 | " \n", 100 | " def __init__(self, control_code, truncate=False, gpt2_type=\"gpt2\", max_length=768):\n", 101 | "\n", 102 | " self.tokenizer = GPT2Tokenizer.from_pretrained(gpt2_type)\n", 103 | " self.tweets = []\n", 104 | "\n", 105 | " # This uses the same CSV of Sentiment140 that we created in Chapter 5\n", 106 | " \n", 107 | " with open('train-processed.csv', newline='') as csvfile:\n", 108 | " tweet_csv = csv.reader(csvfile)\n", 109 | " for row in tweet_csv:\n", 110 | " self.tweets.append(torch.tensor(\n", 111 | " self.tokenizer.encode(f\"<|{control_code}|>{row[5][:max_length]}<|endoftext|>\")\n", 112 | " ))\n", 113 | " \n", 114 | " if truncate:\n", 115 | " self.tweets = self.tweets[:20000]\n", 116 | " self.tweet_count = len(self.tweets)\n", 117 | " \n", 118 | " def __len__(self):\n", 119 | " return self.tweet_count\n", 120 | "\n", 121 | " def __getitem__(self, item):\n", 122 | " return self.tweets[item]" 123 | ] 124 | }, 125 | { 126 | "cell_type": "markdown", 127 | "metadata": {}, 128 | "source": [ 129 | "Firstly, you might wonder is why we're ensuring that we chop our strings at 768 characters. We're going to be using `gpt2-small` in this chapter, which has that limitation due to its hidden dimensionality of 768 (if you want to use larger pre-trained models, then you can increase this: `gpt2-medium`/1024, `gpt2-large`/1280, `gpt2-xl`/1600). Of course, because this dataset is only tweets, we're never going to bump up against the limit, but I thought I would I'd include it so you know to be aware of the limitation. \n", 130 | " \n", 131 | "You'll also see that we're injecting our `<|tweet|>` control code at the start of each entry, and the `<|endoftext|>` code at the end - this is actually a code that GPT-2 has already learnt during its initial training to signify the end of a piece of text. It'll become useful later on in training when we pack our training tensors.\n", 132 | "\n", 133 | "The last part of the dataset is _encoding_. This is similar to the encoding of text that we did back in Chapter 5, but with a small twist. Instead of a simple mapping of all words to a new dictionary, we are using a _byte pair encoding tokenizer_. This works in a different way to what we have seen before as it builds a dictionary by keeping track of common pairs of bytes and replaces them with a byte that is not present in the encoding. \n", 134 | "\n", 135 | "For example, take the nonsense string:\n", 136 | "\t\n", 137 | "\taabaabdeaa\n", 138 | "\t\t\n", 139 | "The first pass of the byte pair encoder would replace our `aa` strings:\n", 140 | "\n", 141 | "\tAbAbdeA\n", 142 | "\tA = aa\n", 143 | "\n", 144 | "But note that we now have new byte pairs and so we can replace again:\n", 145 | "\n", 146 | "\tBBdeA\n", 147 | "\tA = aa\n", 148 | "\tB = Ab\n", 149 | "\n", 150 | "For building up a vocabulary from our data, the byte pair encoding in language models these days tends to work in the opposite direction; it starts out with a set of characters in that language, and through passes on the data, builds up _subwords_ by finding the pairs present in the dataset, and then merging to find larger pairs, and so on. In this way, the tokenizer learns a vocabulary directly from the dataset itself and not from any manual input from an external source (like us).\n", 151 | "\n", 152 | "Happily, we can use the BPE tokenizer that has already been trained on the dataset of GPT-2 and not have to worry about training it ourselves here (though if you're looking to train on a new language, [Huggingface's tutorial on learning Esperanto](https://huggingface.co/blog/how-to-train) will tell you everything you need to get started). We create a pre-trained version using ` GPT2Tokenizer.from_pretrained(gpt2_type)`, which will download the appropriate files for the version of GPT-2 we're working with. We then encode the dataset and create tensors, returning a particular tensor within `__getitem__()` as normal.\n", 153 | "\n", 154 | "In addition to the CSV-based `Dataset`, I've also included a different implementation that uses PyArrow to load in named columns from a parquet file. I just had a bunch of parquet-based datasets lying around so it was useful to make a class that could handle them as well.\n", 155 | " \n", 156 | "We'll build a `DataLoader` in our usual way:\n", 157 | "\n", 158 | " DataLoader(dataset, batch_size=1, shuffle=True) \n", 159 | " \n", 160 | "(the reason for `batch_size` being 1 is something we'll come back to later)" 161 | ] 162 | }, 163 | { 164 | "cell_type": "markdown", 165 | "metadata": {}, 166 | "source": [ 167 | "### Training\n", 168 | " \n", 169 | " Okay, so how do we train this thing? Well, it turns out that it's actually a lot more simple than you'd think. We already have a pre-trained model, so we're just doing some fine-tuning (we won't freeze layers here, but you can certainly experiment with it). But…don't we need labels? \n", 170 | " \n", 171 | "Training GPT-2's involves passing our input text into the transformer model…and training the model to get the text back as output. In this way, the model learns the something of how text is structured, and eventually builds up a _language model_ that can be used for generating further text. So our labels are the input text! \n", 172 | "\n", 173 | "To get the model to produce anything resembling English or whatever language you're training it on requires a gargantuan amount of text (OpenAI trained GPT-2 on 8 million webpages). But as we're using a pre-trained model, all that hard work has been done for us, so we can get away with a much smaller dataset. We can create a pre-trained GPT-2 transformer with one line of code:\n", 174 | "\n", 175 | "\tmodel = GPT2LMHeadModel.from_pretrained(gpt2_type)\n", 176 | "\n", 177 | "As for our training loop, given that our labels are our input, all we're really doing is:\n", 178 | "\n", 179 | "\toutputs = model(input)\n", 180 | "\tloss = loss_function(output, input)\n", 181 | "\tloss.backward()\n", 182 | "\toptimizer.step()\n", 183 | "\n", 184 | "But there's a slight catch. You remember that GPT-2 is big, right? Very big. It's quite possible that you can't fit all the parameters and all the gradient updates inside your GPU. I know I can't, and I have a 1080Ti. There's various approaches we can use to get around this problem, like distributed training, or maybe gradient checkpointing (covered in Chapter 7).\n", 185 | "\n", 186 | "However, there's a simpler option we can use . What we're going to do is _accumulate_ our gradients for a number of batches and then do the updating every _x_ batches instead of every batch. We'll divide our loss updates by the `accumulated_batch_size` to average out the loss that we're applying.\n", 187 | "\n", 188 | "We're almost at the point of having the training loop sorted. But what's that, Columbo?\n", 189 | "\n", 190 | "You may have looked at the links to the illustrated Transformer articles and discovered that GPT-2 will 'see' all of its input at once. And we're sending in encoded tensors of 140-character strings. That's leaving a lot of our input set to…basically zero. Is that going to be great for training? Probably not, as we're not going to get a lot of information flowing forwards and backwards through our network. Enter…`pack_tensor()`!\n" 191 | ] 192 | }, 193 | { 194 | "cell_type": "code", 195 | "execution_count": null, 196 | "metadata": {}, 197 | "outputs": [], 198 | "source": [ 199 | "def pack_tensor(new_tensor, packed_tensor, max_seq_len):\n", 200 | " if packed_tensor is None:\n", 201 | " return new_tensor, True, None\n", 202 | " if new_tensor.size()[1] + packed_tensor.size()[1] > max_seq_len:\n", 203 | " return packed_tensor, False, new_tensor\n", 204 | " else:\n", 205 | " packed_tensor = torch.cat([new_tensor, packed_tensor[:, 1:]], dim=1)\n", 206 | " return packed_tensor, True, None" 207 | ] 208 | }, 209 | { 210 | "cell_type": "markdown", 211 | "metadata": {}, 212 | "source": [ 213 | "This is a very simple method that just tries to fit as many pieces of text into an input tensor as possible. This is why we created the DataLoader with a `batch_size` of 1, as in our training loop, we'll simply loop over and over the data until we've stuffed a tensor, and then push it through our model. Of course, this breaks the relationship between batches that come from the `Dataset` and what we send to the model for the training, so we add `accumulating_batch_count` as a counter to work out when we need to train on our accumulated gradients.\n", 214 | "\n", 215 | "You'll also notice in the train() code below that instead of our normal patten of:\n", 216 | "\toutputs = model(input)\n", 217 | "\tloss = loss_function(output, input)\n", 218 | "\n", 219 | "We're actually doing:\n", 220 | "\n", 221 | "\toutputs = model(input, labels=input)\n", 222 | " loss = outputs[0]\n", 223 | "\n", 224 | "There's nothing too nefarious going on here; the GPT-2 model simply has code inside it that calculates the loss to make things easier. [It's just a simple CrossEntropyLoss as we've seen in previous chapters]().\t\n", 225 | "\n", 226 | "Our optimizer and learning rate also come from the `transformers` library, and we're using the AdamW ([Adam + Weight Decay](https://www.fast.ai/2018/07/02/adam-weight-decay/)) optimizer with a warmup and linear decay (you can see alternatives at [Huggingface's docs page](https://huggingface.co/transformers/main_classes/optimizer_schedules.html)). Plus we also include the ability to save a set of weights at the end of an epoch." 227 | ] 228 | }, 229 | { 230 | "cell_type": "code", 231 | "execution_count": null, 232 | "metadata": {}, 233 | "outputs": [], 234 | "source": [ 235 | "def train(\n", 236 | " dataset,\n", 237 | " model,\n", 238 | " tokenizer,\n", 239 | " batch_size=16,\n", 240 | " epochs=4,\n", 241 | " lr=2e-5,\n", 242 | " max_seq_len=400,\n", 243 | " warmup_steps=5000,\n", 244 | " gpt2_type=\"gpt2\",\n", 245 | " device=\"cuda\",\n", 246 | " output_dir=\".\",\n", 247 | " output_prefix=\"wreckgar\",\n", 248 | " test_mode=False,\n", 249 | " save_model_on_epoch=False,\n", 250 | "):\n", 251 | "\n", 252 | " acc_steps = 100\n", 253 | "\n", 254 | " model = model.to(device)\n", 255 | " model.train()\n", 256 | "\n", 257 | " optimizer = AdamW(model.parameters(), lr=lr)\n", 258 | " scheduler = get_linear_schedule_with_warmup(\n", 259 | " optimizer, num_warmup_steps=warmup_steps, num_training_steps=-1\n", 260 | " )\n", 261 | "\n", 262 | " train_dataloader = DataLoader(dataset, batch_size=1, shuffle=True)\n", 263 | "\n", 264 | " accumulating_batch_count = 0\n", 265 | " input_tensor = None\n", 266 | "\n", 267 | " for epoch in range(epochs):\n", 268 | "\n", 269 | " print(f\"Training epoch {epoch}\")\n", 270 | " for idx, entry in tqdm(enumerate(train_dataloader)):\n", 271 | " (input_tensor, carry_on, remainder) = pack_tensor(entry, input_tensor, 768)\n", 272 | "\n", 273 | " if carry_on and idx != len(train_dataloader) - 1:\n", 274 | " continue\n", 275 | "\n", 276 | " input_tensor = input_tensor.to(device)\n", 277 | " outputs = model(input_tensor, labels=input_tensor)\n", 278 | " loss = outputs[0]\n", 279 | " loss.backward()\n", 280 | "\n", 281 | " if (accumulating_batch_count % batch_size) == 0:\n", 282 | " optimizer.step()\n", 283 | " scheduler.step()\n", 284 | " optimizer.zero_grad()\n", 285 | " model.zero_grad()\n", 286 | "\n", 287 | " accumulating_batch_count += 1\n", 288 | " input_tensor = None\n", 289 | " if save_model_on_epoch:\n", 290 | " torch.save(\n", 291 | " model.state_dict(),\n", 292 | " os.path.join(output_dir, f\"{output_prefix}-{epoch}.pt\"),\n", 293 | " )\n", 294 | " return model" 295 | ] 296 | }, 297 | { 298 | "cell_type": "code", 299 | "execution_count": null, 300 | "metadata": {}, 301 | "outputs": [], 302 | "source": [ 303 | "dataset = CSVTwitter(\"<|tweet|>\", truncate=True, gpt2_type=\"gpt2\")" 304 | ] 305 | }, 306 | { 307 | "cell_type": "code", 308 | "execution_count": null, 309 | "metadata": {}, 310 | "outputs": [], 311 | "source": [ 312 | "gpt2_type = \"gpt2\"" 313 | ] 314 | }, 315 | { 316 | "cell_type": "code", 317 | "execution_count": null, 318 | "metadata": { 319 | "scrolled": true 320 | }, 321 | "outputs": [], 322 | "source": [ 323 | "model = train(\n", 324 | " dataset,\n", 325 | " GPT2LMHeadModel.from_pretrained(gpt2_type),\n", 326 | " GPT2Tokenizer.from_pretrained(gpt2_type),\n", 327 | " batch_size=16,\n", 328 | " epochs=1,\n", 329 | " lr=3e-5,\n", 330 | " max_seq_len=140,\n", 331 | " warmup_steps=5000,\n", 332 | " gpt2_type=gpt2_type,\n", 333 | " device=\"cuda\",\n", 334 | " output_dir=\"trained_models\",\n", 335 | " output_prefix=\"twitter\",\n", 336 | " save_model_on_epoch=True\n", 337 | ")" 338 | ] 339 | }, 340 | { 341 | "cell_type": "markdown", 342 | "metadata": {}, 343 | "source": [ 344 | "### Generating Text\n", 345 | " \n", 346 | "For generating text from our fine-tuned model, there are multiple approaches that we could use, including _beam search_, *top_k filtering*, and the one we're going to use — _nucleus sampling_ (or *top_p filtering*). We take our input, in this case our new control code `<|tweet|>` and then feed that into the model to generate a new sequence. But all we care about it is the next word, and in particular, the probabilities of all the possible words that the model predicts should appear there. \n", 347 | "\n", 348 | "Of course, lots of words that the model may predict will not make sense, and that's where we can bring in _nucleus sampling_ (or *top_k* or any other approach). In this approach, we sum up all the probabilities, sorted in descending order that are present *until* the total sum (the cumulative distribution function) is above an adjustable hyperparameter, `p`, which is normally set between 0.7 and 0.9. There's another parameter, `temperature`, which can be used to scale the probabilities before they're summed up into the CDF. \n", 349 | "\n", 350 | "Once the CDF is formed, we eliminate everything that falls outside of our `p` by setting it to `-Infinity`. We're not messing around here. Note that as we're doing this by summing the highest probability selections first, it's possible that if there's a few high probability choices, they'll be the only ones present. And that makes sense if you think about sentences like:\n", 351 | "\t\n", 352 | "\tThe dog lifted up its ____\n", 353 | "\t\n", 354 | "Possible options here could include `paw, tail, tongue`. You'd expect `paw` or `tail` much more than `tongue`. In this way, our sampling feels more natural, while still providing the possibility for surprise when probabilities are more spread out.\n", 355 | "\n", 356 | "Most of the code here is taken from Huggingface's [`run_generation.py` script](https://github.com/huggingface/transformers/blob/master/examples/run_generation.py). \n", 357 | "\n", 358 | "Once we have our next word, we loop back around to the start, but this time we feed in the sentence with the new word added and choose the following word in the same way. We continue until we either reach `entry_length` or if the model generates a `<|endoftext|>` marker. And then it's back to the outer loop to generate our next sentence until we've generated the requested number of sentences.\n" 359 | ] 360 | }, 361 | { 362 | "cell_type": "code", 363 | "execution_count": null, 364 | "metadata": {}, 365 | "outputs": [], 366 | "source": [ 367 | "def generate(\n", 368 | " model,\n", 369 | " tokenizer,\n", 370 | " prompt,\n", 371 | " entry_count=10,\n", 372 | " entry_length=100,\n", 373 | " top_p=0.8,\n", 374 | " temperature=1.,\n", 375 | "):\n", 376 | "\n", 377 | " model.eval()\n", 378 | "\n", 379 | " generated_num = 0\n", 380 | " generated_list = []\n", 381 | "\n", 382 | " filter_value = -float(\"Inf\")\n", 383 | "\n", 384 | " with torch.no_grad():\n", 385 | "\n", 386 | " for entry_idx in trange(entry_count):\n", 387 | "\n", 388 | " entry_finished = False\n", 389 | "\n", 390 | " generated = torch.tensor(tokenizer.encode(prompt)).unsqueeze(0)\n", 391 | "\n", 392 | " # Using top-p (nucleus sampling): https://github.com/huggingface/transformers/blob/master/examples/run_generation.py\n", 393 | "\n", 394 | " for i in range(entry_length):\n", 395 | " outputs = model(generated, labels=generated)\n", 396 | " loss, logits = outputs[:2]\n", 397 | " logits = logits[:, -1, :] / (temperature if temperature > 0 else 1.0)\n", 398 | "\n", 399 | " sorted_logits, sorted_indices = torch.sort(logits, descending=True)\n", 400 | " cumulative_probs = torch.cumsum(\n", 401 | " F.softmax(sorted_logits, dim=-1), dim=-1\n", 402 | " )\n", 403 | "\n", 404 | " sorted_indices_to_remove = cumulative_probs > top_p\n", 405 | " sorted_indices_to_remove[..., 1:] = sorted_indices_to_remove[\n", 406 | " ..., :-1\n", 407 | " ].clone()\n", 408 | " sorted_indices_to_remove[..., 0] = 0\n", 409 | "\n", 410 | " indices_to_remove = sorted_indices[sorted_indices_to_remove]\n", 411 | " logits[:, indices_to_remove] = filter_value\n", 412 | "\n", 413 | " next_token = torch.multinomial(F.softmax(logits, dim=-1), num_samples=1)\n", 414 | " generated = torch.cat((generated, next_token), dim=1)\n", 415 | "\n", 416 | " if next_token in tokenizer.encode(\"<|endoftext|>\"):\n", 417 | " entry_finished = True\n", 418 | "\n", 419 | " if entry_finished:\n", 420 | "\n", 421 | " generated_num = generated_num + 1\n", 422 | "\n", 423 | " output_list = list(generated.squeeze().numpy())\n", 424 | " output_text = tokenizer.decode(output_list)\n", 425 | "\n", 426 | " generated_list.append(output_text)\n", 427 | " break\n", 428 | " \n", 429 | " if not entry_finished:\n", 430 | " output_list = list(generated.squeeze().numpy())\n", 431 | " output_text = f\"{tokenizer.decode(output_list)}<|endoftext|>\" \n", 432 | " generated_list.append(output_text)\n", 433 | " \n", 434 | " return generated_list" 435 | ] 436 | }, 437 | { 438 | "cell_type": "code", 439 | "execution_count": null, 440 | "metadata": {}, 441 | "outputs": [], 442 | "source": [ 443 | "generated_tweets = generate(model.to('cpu'), GPT2Tokenizer.from_pretrained(gpt2_type),\"<|tweet|>\",entry_count=10)\n" 444 | ] 445 | }, 446 | { 447 | "cell_type": "markdown", 448 | "metadata": {}, 449 | "source": [ 450 | "### Example Output\n", 451 | "\n", 452 | "And here's some output of calling `generate` on our trained model.\n", 453 | "\n", 454 | "\t\"<|tweet|>Casa the fifth Monday afternoons in the summer. Stay for one more - you'll be much better at finding a workplace than you would at the \toffice.\\n\\nThe Hours\\n\\n14:00 - 15:00, Hot and Cold\\n\\n18:00 - 19:00, Cafe Oktoberfest\\n\\n19:00 - 21:00, More Information<|endoftext|>\",\n", 455 | " \t'<|tweet|>Tweet what you like.<|endoftext|>',\n", 456 | "\t'<|tweet|>Sigh. Hope to see ya in there.<|endoftext|>',\n", 457 | " \t'<|tweet|> | The Walking Dead ends, '10 hours after everybody gets killed! I'm sick of zombies. pic.twitter.com/tsxhXdGLuGx.<|endoftext|>'\n", 458 | " \n", 459 | " \n", 460 | " ### Further Techniques & Reading\n", 461 | " \n", 462 | "[Huggingface](https://huggingface.co/)\n", 463 | "\n", 464 | "[Better Language Models and Their Implications (GPT-2)](https://openai.com/blog/better-language-models/)\n", 465 | "\n", 466 | "[Applying BERT-based models in Search](https://www.blog.google/products/search/search-language-understanding-bert/)\n", 467 | "\n", 468 | "[How To Sample From Language Models](https://towardsdatascience.com/how-to-sample-from-language-models-682bceb97277)" 469 | ] 470 | } 471 | ], 472 | "metadata": { 473 | "kernelspec": { 474 | "display_name": "Python 3", 475 | "language": "python", 476 | "name": "python3" 477 | }, 478 | "language_info": { 479 | "codemirror_mode": { 480 | "name": "ipython", 481 | "version": 3 482 | }, 483 | "file_extension": ".py", 484 | "mimetype": "text/x-python", 485 | "name": "python", 486 | "nbconvert_exporter": "python", 487 | "pygments_lexer": "ipython3", 488 | "version": "3.6.10" 489 | } 490 | }, 491 | "nbformat": 4, 492 | "nbformat_minor": 2 493 | } 494 | -------------------------------------------------------------------------------- /chapter8/Chapter_8_5_Quantizing_Models.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "name": "Chapter 7.5 Quantizing Models.ipynb", 7 | "provenance": [], 8 | "collapsed_sections": [] 9 | }, 10 | "kernelspec": { 11 | "name": "python3", 12 | "display_name": "Python 3" 13 | } 14 | }, 15 | "cells": [ 16 | { 17 | "cell_type": "code", 18 | "metadata": { 19 | "id": "QcxNUJb99ONb", 20 | "colab_type": "code", 21 | "colab": { 22 | "base_uri": "https://localhost:8080/", 23 | "height": 732 24 | }, 25 | "outputId": "0079f6b4-a609-4efc-8a09-a392a709ac70" 26 | }, 27 | "source": [ 28 | "!pip install torch transformers" 29 | ], 30 | "execution_count": 4, 31 | "outputs": [ 32 | { 33 | "output_type": "stream", 34 | "text": [ 35 | "Requirement already satisfied: torch in /usr/local/lib/python3.6/dist-packages (1.5.0+cu101)\n", 36 | "Collecting transformers\n", 37 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/a3/78/92cedda05552398352ed9784908b834ee32a0bd071a9b32de287327370b7/transformers-2.8.0-py3-none-any.whl (563kB)\n", 38 | "\u001b[K |████████████████████████████████| 573kB 2.9MB/s \n", 39 | "\u001b[?25hRequirement already satisfied: numpy in /usr/local/lib/python3.6/dist-packages (from torch) (1.18.3)\n", 40 | "Requirement already satisfied: future in /usr/local/lib/python3.6/dist-packages (from torch) (0.16.0)\n", 41 | "Requirement already satisfied: filelock in /usr/local/lib/python3.6/dist-packages (from transformers) (3.0.12)\n", 42 | "Requirement already satisfied: requests in /usr/local/lib/python3.6/dist-packages (from transformers) (2.23.0)\n", 43 | "Collecting tokenizers==0.5.2\n", 44 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/d1/3f/73c881ea4723e43c1e9acf317cf407fab3a278daab3a69c98dcac511c04f/tokenizers-0.5.2-cp36-cp36m-manylinux1_x86_64.whl (3.7MB)\n", 45 | "\u001b[K |████████████████████████████████| 3.7MB 7.7MB/s \n", 46 | "\u001b[?25hRequirement already satisfied: boto3 in /usr/local/lib/python3.6/dist-packages (from transformers) (1.12.47)\n", 47 | "Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.6/dist-packages (from transformers) (2019.12.20)\n", 48 | "Requirement already satisfied: dataclasses; python_version < \"3.7\" in /usr/local/lib/python3.6/dist-packages (from transformers) (0.7)\n", 49 | "Collecting sentencepiece\n", 50 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/98/2c/8df20f3ac6c22ac224fff307ebc102818206c53fc454ecd37d8ac2060df5/sentencepiece-0.1.86-cp36-cp36m-manylinux1_x86_64.whl (1.0MB)\n", 51 | "\u001b[K |████████████████████████████████| 1.0MB 39.8MB/s \n", 52 | "\u001b[?25hRequirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.6/dist-packages (from transformers) (4.38.0)\n", 53 | "Collecting sacremoses\n", 54 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/99/50/93509f906a40bffd7d175f97fd75ea328ad9bd91f48f59c4bd084c94a25e/sacremoses-0.0.41.tar.gz (883kB)\n", 55 | "\u001b[K |████████████████████████████████| 890kB 49.0MB/s \n", 56 | "\u001b[?25hRequirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from requests->transformers) (3.0.4)\n", 57 | "Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.6/dist-packages (from requests->transformers) (2.9)\n", 58 | "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.6/dist-packages (from requests->transformers) (2020.4.5.1)\n", 59 | "Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.6/dist-packages (from requests->transformers) (1.24.3)\n", 60 | "Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /usr/local/lib/python3.6/dist-packages (from boto3->transformers) (0.9.5)\n", 61 | "Requirement already satisfied: s3transfer<0.4.0,>=0.3.0 in /usr/local/lib/python3.6/dist-packages (from boto3->transformers) (0.3.3)\n", 62 | "Requirement already satisfied: botocore<1.16.0,>=1.15.47 in /usr/local/lib/python3.6/dist-packages (from boto3->transformers) (1.15.47)\n", 63 | "Requirement already satisfied: six in /usr/local/lib/python3.6/dist-packages (from sacremoses->transformers) (1.12.0)\n", 64 | "Requirement already satisfied: click in /usr/local/lib/python3.6/dist-packages (from sacremoses->transformers) (7.1.2)\n", 65 | "Requirement already satisfied: joblib in /usr/local/lib/python3.6/dist-packages (from sacremoses->transformers) (0.14.1)\n", 66 | "Requirement already satisfied: docutils<0.16,>=0.10 in /usr/local/lib/python3.6/dist-packages (from botocore<1.16.0,>=1.15.47->boto3->transformers) (0.15.2)\n", 67 | "Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /usr/local/lib/python3.6/dist-packages (from botocore<1.16.0,>=1.15.47->boto3->transformers) (2.8.1)\n", 68 | "Building wheels for collected packages: sacremoses\n", 69 | " Building wheel for sacremoses (setup.py) ... \u001b[?25l\u001b[?25hdone\n", 70 | " Created wheel for sacremoses: filename=sacremoses-0.0.41-cp36-none-any.whl size=893334 sha256=062da831c0e422df7a7afa4a701e74f06f882b89665793b94d9daf343ed538d7\n", 71 | " Stored in directory: /root/.cache/pip/wheels/22/5a/d4/b020a81249de7dc63758a34222feaa668dbe8ebfe9170cc9b1\n", 72 | "Successfully built sacremoses\n", 73 | "Installing collected packages: tokenizers, sentencepiece, sacremoses, transformers\n", 74 | "Successfully installed sacremoses-0.0.41 sentencepiece-0.1.86 tokenizers-0.5.2 transformers-2.8.0\n" 75 | ], 76 | "name": "stdout" 77 | } 78 | ] 79 | }, 80 | { 81 | "cell_type": "markdown", 82 | "metadata": { 83 | "id": "DjIYt4QS9YWL", 84 | "colab_type": "text" 85 | }, 86 | "source": [ 87 | "" 88 | ] 89 | }, 90 | { 91 | "cell_type": "markdown", 92 | "metadata": { 93 | "id": "L0kxCNnI9RVc", 94 | "colab_type": "text" 95 | }, 96 | "source": [ 97 | "# Big Models Hate This One Weird Trick! (Quantization, T5, & PyTorch 1.4)\n", 98 | "\n", 99 | "As we know, models can be big lumbering beasts, comprised of millions of parameters (both weights and activations) that require lots of matrix multiplications to take an input and arrive at an answer. And for most of our work so far, that's been fine! We have mighty GPUs that can handle these burdens with ease.\n", 100 | "\n", 101 | "But what if we didn’t? We often package a model up for production inference usage so that it only runs on the CPU. And what if we wanted to run our model on a smaller embedded platform? Suddenly, both the size of the model and all those floating-point operations become a little more problematic. Thankfully, there’s a trick we can perform that makes our model smaller _and_ faster, normally with the trade off with some accuracy. Even better, PyTorch allows us to perform this one weird trick with just one line of code, with some other approaches for squeezing even more performance. Let’s have a quick look at _quantization_.\n" 102 | ] 103 | }, 104 | { 105 | "cell_type": "markdown", 106 | "metadata": { 107 | "id": "SyNq8LN_9idf", 108 | "colab_type": "text" 109 | }, 110 | "source": [ 111 | "## Quantization\n", 112 | "\n", 113 | "Every parameter in our model is a 32-bit floating point number, taking up 4 bytes of memory. That’s not a lot, but it can soon add up. Let's have a look at Google’s recent T5 transformer-based model, which has a `t5-small` variant that’s available in the `transformers` library. \n" 114 | ] 115 | }, 116 | { 117 | "cell_type": "code", 118 | "metadata": { 119 | "id": "yDxF0i309uJy", 120 | "colab_type": "code", 121 | "colab": { 122 | "base_uri": "https://localhost:8080/", 123 | "height": 34 124 | }, 125 | "outputId": "f47acad5-c8d7-4fc7-d0cf-a2cee9a536a2" 126 | }, 127 | "source": [ 128 | "import torch\n", 129 | "from transformers import pipeline, T5ForConditionalGeneration\n", 130 | "\t\t\n", 131 | "def count_parameters(model):\n", 132 | " return sum(p.numel() for p in model.parameters())\n", 133 | "\t \n", 134 | "base_model = T5ForConditionalGeneration.from_pretrained(\"t5-small\")\n", 135 | "\n", 136 | "param_count = count_parameters(base_model)\n", 137 | "\n", 138 | "memory = (param_count * 4) / (1024 *1024)\n", 139 | "memory\n" 140 | ], 141 | "execution_count": 16, 142 | "outputs": [ 143 | { 144 | "output_type": "execute_result", 145 | "data": { 146 | "text/plain": [ 147 | "230.8154296875" 148 | ] 149 | }, 150 | "metadata": { 151 | "tags": [] 152 | }, 153 | "execution_count": 16 154 | } 155 | ] 156 | }, 157 | { 158 | "cell_type": "markdown", 159 | "metadata": { 160 | "id": "xzzcwPQW-NW_", 161 | "colab_type": "text" 162 | }, 163 | "source": [ 164 | "Even with the smallest pre-trained T5 weights, our model is roughly 60m parameters and weighs in at a whopping 230Mb! " 165 | ] 166 | }, 167 | { 168 | "cell_type": "markdown", 169 | "metadata": { 170 | "id": "vY079U2m-Wh2", 171 | "colab_type": "text" 172 | }, 173 | "source": [ 174 | "However, what if we decided that we didn’t need the full precision of our floating-point parameters? If our parameters could be restricted to within a certain range of values, then we could use a smaller type of number representation to store the parameters. This _quantization_ is the key to speeding up our inference time and reducing the memory footprint of our models. What we tend to aim for is to quantize down from a 32-bit floating point to an 8-bit integer. The basic idea is:" 175 | ] 176 | }, 177 | { 178 | "cell_type": "markdown", 179 | "metadata": { 180 | "id": "jRM0VuTr-Y0k", 181 | "colab_type": "text" 182 | }, 183 | "source": [ 184 | "\n", 185 | "$x_{int8} = (\\frac{x_{float32}}{x_{scale}} + x_{offset})$\n", 186 | "\n", 187 | "Which is essentially just fitting the potential values of the parameters of a network to a line of $y = mx + c$, although due to the reduced resolution of the 8-bit integer, there's only so many values a parameter now may take instead of the huge amount that a `float32` value could be. PyTorch does its quantizing in a slightly more complicated affair that ensures that zero is always zero, but the basic idea is the same - we have a range of values that our parameters can take, and then find an appropriate pair $x_{scale}$ and $x_{offset}$ to provide 256 graduations to represent that range - or 255 if you think about PyTorch always keeping zero around." 188 | ] 189 | }, 190 | { 191 | "cell_type": "markdown", 192 | "metadata": { 193 | "id": "yUJNQivcAVVZ", 194 | "colab_type": "text" 195 | }, 196 | "source": [ 197 | "At the moment (PyTorch 1.5), quantized layers are best supported with `CNN` and `Linear` layers. Thankfully, if we have a look at the model structure of T5, we can see a happy coincidence:" 198 | ] 199 | }, 200 | { 201 | "cell_type": "code", 202 | "metadata": { 203 | "id": "heo9StPOAZEL", 204 | "colab_type": "code", 205 | "colab": { 206 | "base_uri": "https://localhost:8080/", 207 | "height": 1000 208 | }, 209 | "outputId": "4855ce94-7df0-473f-f1ec-b352925f3580" 210 | }, 211 | "source": [ 212 | "base_model" 213 | ], 214 | "execution_count": 6, 215 | "outputs": [ 216 | { 217 | "output_type": "execute_result", 218 | "data": { 219 | "text/plain": [ 220 | "T5Model(\n", 221 | " (shared): Embedding(32128, 512)\n", 222 | " (encoder): T5Stack(\n", 223 | " (embed_tokens): Embedding(32128, 512)\n", 224 | " (block): ModuleList(\n", 225 | " (0): T5Block(\n", 226 | " (layer): ModuleList(\n", 227 | " (0): T5LayerSelfAttention(\n", 228 | " (SelfAttention): T5Attention(\n", 229 | " (q): Linear(in_features=512, out_features=512, bias=False)\n", 230 | " (k): Linear(in_features=512, out_features=512, bias=False)\n", 231 | " (v): Linear(in_features=512, out_features=512, bias=False)\n", 232 | " (o): Linear(in_features=512, out_features=512, bias=False)\n", 233 | " (relative_attention_bias): Embedding(32, 8)\n", 234 | " )\n", 235 | " (layer_norm): T5LayerNorm()\n", 236 | " (dropout): Dropout(p=0.1, inplace=False)\n", 237 | " )\n", 238 | " (1): T5LayerFF(\n", 239 | " (DenseReluDense): T5DenseReluDense(\n", 240 | " (wi): Linear(in_features=512, out_features=2048, bias=False)\n", 241 | " (wo): Linear(in_features=2048, out_features=512, bias=False)\n", 242 | " (dropout): Dropout(p=0.1, inplace=False)\n", 243 | " )\n", 244 | " (layer_norm): T5LayerNorm()\n", 245 | " (dropout): Dropout(p=0.1, inplace=False)\n", 246 | " )\n", 247 | " )\n", 248 | " )\n", 249 | " (1): T5Block(\n", 250 | " (layer): ModuleList(\n", 251 | " (0): T5LayerSelfAttention(\n", 252 | " (SelfAttention): T5Attention(\n", 253 | " (q): Linear(in_features=512, out_features=512, bias=False)\n", 254 | " (k): Linear(in_features=512, out_features=512, bias=False)\n", 255 | " (v): Linear(in_features=512, out_features=512, bias=False)\n", 256 | " (o): Linear(in_features=512, out_features=512, bias=False)\n", 257 | " )\n", 258 | " (layer_norm): T5LayerNorm()\n", 259 | " (dropout): Dropout(p=0.1, inplace=False)\n", 260 | " )\n", 261 | " (1): T5LayerFF(\n", 262 | " (DenseReluDense): T5DenseReluDense(\n", 263 | " (wi): Linear(in_features=512, out_features=2048, bias=False)\n", 264 | " (wo): Linear(in_features=2048, out_features=512, bias=False)\n", 265 | " (dropout): Dropout(p=0.1, inplace=False)\n", 266 | " )\n", 267 | " (layer_norm): T5LayerNorm()\n", 268 | " (dropout): Dropout(p=0.1, inplace=False)\n", 269 | " )\n", 270 | " )\n", 271 | " )\n", 272 | " (2): T5Block(\n", 273 | " (layer): ModuleList(\n", 274 | " (0): T5LayerSelfAttention(\n", 275 | " (SelfAttention): T5Attention(\n", 276 | " (q): Linear(in_features=512, out_features=512, bias=False)\n", 277 | " (k): Linear(in_features=512, out_features=512, bias=False)\n", 278 | " (v): Linear(in_features=512, out_features=512, bias=False)\n", 279 | " (o): Linear(in_features=512, out_features=512, bias=False)\n", 280 | " )\n", 281 | " (layer_norm): T5LayerNorm()\n", 282 | " (dropout): Dropout(p=0.1, inplace=False)\n", 283 | " )\n", 284 | " (1): T5LayerFF(\n", 285 | " (DenseReluDense): T5DenseReluDense(\n", 286 | " (wi): Linear(in_features=512, out_features=2048, bias=False)\n", 287 | " (wo): Linear(in_features=2048, out_features=512, bias=False)\n", 288 | " (dropout): Dropout(p=0.1, inplace=False)\n", 289 | " )\n", 290 | " (layer_norm): T5LayerNorm()\n", 291 | " (dropout): Dropout(p=0.1, inplace=False)\n", 292 | " )\n", 293 | " )\n", 294 | " )\n", 295 | " (3): T5Block(\n", 296 | " (layer): ModuleList(\n", 297 | " (0): T5LayerSelfAttention(\n", 298 | " (SelfAttention): T5Attention(\n", 299 | " (q): Linear(in_features=512, out_features=512, bias=False)\n", 300 | " (k): Linear(in_features=512, out_features=512, bias=False)\n", 301 | " (v): Linear(in_features=512, out_features=512, bias=False)\n", 302 | " (o): Linear(in_features=512, out_features=512, bias=False)\n", 303 | " )\n", 304 | " (layer_norm): T5LayerNorm()\n", 305 | " (dropout): Dropout(p=0.1, inplace=False)\n", 306 | " )\n", 307 | " (1): T5LayerFF(\n", 308 | " (DenseReluDense): T5DenseReluDense(\n", 309 | " (wi): Linear(in_features=512, out_features=2048, bias=False)\n", 310 | " (wo): Linear(in_features=2048, out_features=512, bias=False)\n", 311 | " (dropout): Dropout(p=0.1, inplace=False)\n", 312 | " )\n", 313 | " (layer_norm): T5LayerNorm()\n", 314 | " (dropout): Dropout(p=0.1, inplace=False)\n", 315 | " )\n", 316 | " )\n", 317 | " )\n", 318 | " (4): T5Block(\n", 319 | " (layer): ModuleList(\n", 320 | " (0): T5LayerSelfAttention(\n", 321 | " (SelfAttention): T5Attention(\n", 322 | " (q): Linear(in_features=512, out_features=512, bias=False)\n", 323 | " (k): Linear(in_features=512, out_features=512, bias=False)\n", 324 | " (v): Linear(in_features=512, out_features=512, bias=False)\n", 325 | " (o): Linear(in_features=512, out_features=512, bias=False)\n", 326 | " )\n", 327 | " (layer_norm): T5LayerNorm()\n", 328 | " (dropout): Dropout(p=0.1, inplace=False)\n", 329 | " )\n", 330 | " (1): T5LayerFF(\n", 331 | " (DenseReluDense): T5DenseReluDense(\n", 332 | " (wi): Linear(in_features=512, out_features=2048, bias=False)\n", 333 | " (wo): Linear(in_features=2048, out_features=512, bias=False)\n", 334 | " (dropout): Dropout(p=0.1, inplace=False)\n", 335 | " )\n", 336 | " (layer_norm): T5LayerNorm()\n", 337 | " (dropout): Dropout(p=0.1, inplace=False)\n", 338 | " )\n", 339 | " )\n", 340 | " )\n", 341 | " (5): T5Block(\n", 342 | " (layer): ModuleList(\n", 343 | " (0): T5LayerSelfAttention(\n", 344 | " (SelfAttention): T5Attention(\n", 345 | " (q): Linear(in_features=512, out_features=512, bias=False)\n", 346 | " (k): Linear(in_features=512, out_features=512, bias=False)\n", 347 | " (v): Linear(in_features=512, out_features=512, bias=False)\n", 348 | " (o): Linear(in_features=512, out_features=512, bias=False)\n", 349 | " )\n", 350 | " (layer_norm): T5LayerNorm()\n", 351 | " (dropout): Dropout(p=0.1, inplace=False)\n", 352 | " )\n", 353 | " (1): T5LayerFF(\n", 354 | " (DenseReluDense): T5DenseReluDense(\n", 355 | " (wi): Linear(in_features=512, out_features=2048, bias=False)\n", 356 | " (wo): Linear(in_features=2048, out_features=512, bias=False)\n", 357 | " (dropout): Dropout(p=0.1, inplace=False)\n", 358 | " )\n", 359 | " (layer_norm): T5LayerNorm()\n", 360 | " (dropout): Dropout(p=0.1, inplace=False)\n", 361 | " )\n", 362 | " )\n", 363 | " )\n", 364 | " )\n", 365 | " (final_layer_norm): T5LayerNorm()\n", 366 | " (dropout): Dropout(p=0.1, inplace=False)\n", 367 | " )\n", 368 | " (decoder): T5Stack(\n", 369 | " (embed_tokens): Embedding(32128, 512)\n", 370 | " (block): ModuleList(\n", 371 | " (0): T5Block(\n", 372 | " (layer): ModuleList(\n", 373 | " (0): T5LayerSelfAttention(\n", 374 | " (SelfAttention): T5Attention(\n", 375 | " (q): Linear(in_features=512, out_features=512, bias=False)\n", 376 | " (k): Linear(in_features=512, out_features=512, bias=False)\n", 377 | " (v): Linear(in_features=512, out_features=512, bias=False)\n", 378 | " (o): Linear(in_features=512, out_features=512, bias=False)\n", 379 | " (relative_attention_bias): Embedding(32, 8)\n", 380 | " )\n", 381 | " (layer_norm): T5LayerNorm()\n", 382 | " (dropout): Dropout(p=0.1, inplace=False)\n", 383 | " )\n", 384 | " (1): T5LayerCrossAttention(\n", 385 | " (EncDecAttention): T5Attention(\n", 386 | " (q): Linear(in_features=512, out_features=512, bias=False)\n", 387 | " (k): Linear(in_features=512, out_features=512, bias=False)\n", 388 | " (v): Linear(in_features=512, out_features=512, bias=False)\n", 389 | " (o): Linear(in_features=512, out_features=512, bias=False)\n", 390 | " (relative_attention_bias): Embedding(32, 8)\n", 391 | " )\n", 392 | " (layer_norm): T5LayerNorm()\n", 393 | " (dropout): Dropout(p=0.1, inplace=False)\n", 394 | " )\n", 395 | " (2): T5LayerFF(\n", 396 | " (DenseReluDense): T5DenseReluDense(\n", 397 | " (wi): Linear(in_features=512, out_features=2048, bias=False)\n", 398 | " (wo): Linear(in_features=2048, out_features=512, bias=False)\n", 399 | " (dropout): Dropout(p=0.1, inplace=False)\n", 400 | " )\n", 401 | " (layer_norm): T5LayerNorm()\n", 402 | " (dropout): Dropout(p=0.1, inplace=False)\n", 403 | " )\n", 404 | " )\n", 405 | " )\n", 406 | " (1): T5Block(\n", 407 | " (layer): ModuleList(\n", 408 | " (0): T5LayerSelfAttention(\n", 409 | " (SelfAttention): T5Attention(\n", 410 | " (q): Linear(in_features=512, out_features=512, bias=False)\n", 411 | " (k): Linear(in_features=512, out_features=512, bias=False)\n", 412 | " (v): Linear(in_features=512, out_features=512, bias=False)\n", 413 | " (o): Linear(in_features=512, out_features=512, bias=False)\n", 414 | " )\n", 415 | " (layer_norm): T5LayerNorm()\n", 416 | " (dropout): Dropout(p=0.1, inplace=False)\n", 417 | " )\n", 418 | " (1): T5LayerCrossAttention(\n", 419 | " (EncDecAttention): T5Attention(\n", 420 | " (q): Linear(in_features=512, out_features=512, bias=False)\n", 421 | " (k): Linear(in_features=512, out_features=512, bias=False)\n", 422 | " (v): Linear(in_features=512, out_features=512, bias=False)\n", 423 | " (o): Linear(in_features=512, out_features=512, bias=False)\n", 424 | " )\n", 425 | " (layer_norm): T5LayerNorm()\n", 426 | " (dropout): Dropout(p=0.1, inplace=False)\n", 427 | " )\n", 428 | " (2): T5LayerFF(\n", 429 | " (DenseReluDense): T5DenseReluDense(\n", 430 | " (wi): Linear(in_features=512, out_features=2048, bias=False)\n", 431 | " (wo): Linear(in_features=2048, out_features=512, bias=False)\n", 432 | " (dropout): Dropout(p=0.1, inplace=False)\n", 433 | " )\n", 434 | " (layer_norm): T5LayerNorm()\n", 435 | " (dropout): Dropout(p=0.1, inplace=False)\n", 436 | " )\n", 437 | " )\n", 438 | " )\n", 439 | " (2): T5Block(\n", 440 | " (layer): ModuleList(\n", 441 | " (0): T5LayerSelfAttention(\n", 442 | " (SelfAttention): T5Attention(\n", 443 | " (q): Linear(in_features=512, out_features=512, bias=False)\n", 444 | " (k): Linear(in_features=512, out_features=512, bias=False)\n", 445 | " (v): Linear(in_features=512, out_features=512, bias=False)\n", 446 | " (o): Linear(in_features=512, out_features=512, bias=False)\n", 447 | " )\n", 448 | " (layer_norm): T5LayerNorm()\n", 449 | " (dropout): Dropout(p=0.1, inplace=False)\n", 450 | " )\n", 451 | " (1): T5LayerCrossAttention(\n", 452 | " (EncDecAttention): T5Attention(\n", 453 | " (q): Linear(in_features=512, out_features=512, bias=False)\n", 454 | " (k): Linear(in_features=512, out_features=512, bias=False)\n", 455 | " (v): Linear(in_features=512, out_features=512, bias=False)\n", 456 | " (o): Linear(in_features=512, out_features=512, bias=False)\n", 457 | " )\n", 458 | " (layer_norm): T5LayerNorm()\n", 459 | " (dropout): Dropout(p=0.1, inplace=False)\n", 460 | " )\n", 461 | " (2): T5LayerFF(\n", 462 | " (DenseReluDense): T5DenseReluDense(\n", 463 | " (wi): Linear(in_features=512, out_features=2048, bias=False)\n", 464 | " (wo): Linear(in_features=2048, out_features=512, bias=False)\n", 465 | " (dropout): Dropout(p=0.1, inplace=False)\n", 466 | " )\n", 467 | " (layer_norm): T5LayerNorm()\n", 468 | " (dropout): Dropout(p=0.1, inplace=False)\n", 469 | " )\n", 470 | " )\n", 471 | " )\n", 472 | " (3): T5Block(\n", 473 | " (layer): ModuleList(\n", 474 | " (0): T5LayerSelfAttention(\n", 475 | " (SelfAttention): T5Attention(\n", 476 | " (q): Linear(in_features=512, out_features=512, bias=False)\n", 477 | " (k): Linear(in_features=512, out_features=512, bias=False)\n", 478 | " (v): Linear(in_features=512, out_features=512, bias=False)\n", 479 | " (o): Linear(in_features=512, out_features=512, bias=False)\n", 480 | " )\n", 481 | " (layer_norm): T5LayerNorm()\n", 482 | " (dropout): Dropout(p=0.1, inplace=False)\n", 483 | " )\n", 484 | " (1): T5LayerCrossAttention(\n", 485 | " (EncDecAttention): T5Attention(\n", 486 | " (q): Linear(in_features=512, out_features=512, bias=False)\n", 487 | " (k): Linear(in_features=512, out_features=512, bias=False)\n", 488 | " (v): Linear(in_features=512, out_features=512, bias=False)\n", 489 | " (o): Linear(in_features=512, out_features=512, bias=False)\n", 490 | " )\n", 491 | " (layer_norm): T5LayerNorm()\n", 492 | " (dropout): Dropout(p=0.1, inplace=False)\n", 493 | " )\n", 494 | " (2): T5LayerFF(\n", 495 | " (DenseReluDense): T5DenseReluDense(\n", 496 | " (wi): Linear(in_features=512, out_features=2048, bias=False)\n", 497 | " (wo): Linear(in_features=2048, out_features=512, bias=False)\n", 498 | " (dropout): Dropout(p=0.1, inplace=False)\n", 499 | " )\n", 500 | " (layer_norm): T5LayerNorm()\n", 501 | " (dropout): Dropout(p=0.1, inplace=False)\n", 502 | " )\n", 503 | " )\n", 504 | " )\n", 505 | " (4): T5Block(\n", 506 | " (layer): ModuleList(\n", 507 | " (0): T5LayerSelfAttention(\n", 508 | " (SelfAttention): T5Attention(\n", 509 | " (q): Linear(in_features=512, out_features=512, bias=False)\n", 510 | " (k): Linear(in_features=512, out_features=512, bias=False)\n", 511 | " (v): Linear(in_features=512, out_features=512, bias=False)\n", 512 | " (o): Linear(in_features=512, out_features=512, bias=False)\n", 513 | " )\n", 514 | " (layer_norm): T5LayerNorm()\n", 515 | " (dropout): Dropout(p=0.1, inplace=False)\n", 516 | " )\n", 517 | " (1): T5LayerCrossAttention(\n", 518 | " (EncDecAttention): T5Attention(\n", 519 | " (q): Linear(in_features=512, out_features=512, bias=False)\n", 520 | " (k): Linear(in_features=512, out_features=512, bias=False)\n", 521 | " (v): Linear(in_features=512, out_features=512, bias=False)\n", 522 | " (o): Linear(in_features=512, out_features=512, bias=False)\n", 523 | " )\n", 524 | " (layer_norm): T5LayerNorm()\n", 525 | " (dropout): Dropout(p=0.1, inplace=False)\n", 526 | " )\n", 527 | " (2): T5LayerFF(\n", 528 | " (DenseReluDense): T5DenseReluDense(\n", 529 | " (wi): Linear(in_features=512, out_features=2048, bias=False)\n", 530 | " (wo): Linear(in_features=2048, out_features=512, bias=False)\n", 531 | " (dropout): Dropout(p=0.1, inplace=False)\n", 532 | " )\n", 533 | " (layer_norm): T5LayerNorm()\n", 534 | " (dropout): Dropout(p=0.1, inplace=False)\n", 535 | " )\n", 536 | " )\n", 537 | " )\n", 538 | " (5): T5Block(\n", 539 | " (layer): ModuleList(\n", 540 | " (0): T5LayerSelfAttention(\n", 541 | " (SelfAttention): T5Attention(\n", 542 | " (q): Linear(in_features=512, out_features=512, bias=False)\n", 543 | " (k): Linear(in_features=512, out_features=512, bias=False)\n", 544 | " (v): Linear(in_features=512, out_features=512, bias=False)\n", 545 | " (o): Linear(in_features=512, out_features=512, bias=False)\n", 546 | " )\n", 547 | " (layer_norm): T5LayerNorm()\n", 548 | " (dropout): Dropout(p=0.1, inplace=False)\n", 549 | " )\n", 550 | " (1): T5LayerCrossAttention(\n", 551 | " (EncDecAttention): T5Attention(\n", 552 | " (q): Linear(in_features=512, out_features=512, bias=False)\n", 553 | " (k): Linear(in_features=512, out_features=512, bias=False)\n", 554 | " (v): Linear(in_features=512, out_features=512, bias=False)\n", 555 | " (o): Linear(in_features=512, out_features=512, bias=False)\n", 556 | " )\n", 557 | " (layer_norm): T5LayerNorm()\n", 558 | " (dropout): Dropout(p=0.1, inplace=False)\n", 559 | " )\n", 560 | " (2): T5LayerFF(\n", 561 | " (DenseReluDense): T5DenseReluDense(\n", 562 | " (wi): Linear(in_features=512, out_features=2048, bias=False)\n", 563 | " (wo): Linear(in_features=2048, out_features=512, bias=False)\n", 564 | " (dropout): Dropout(p=0.1, inplace=False)\n", 565 | " )\n", 566 | " (layer_norm): T5LayerNorm()\n", 567 | " (dropout): Dropout(p=0.1, inplace=False)\n", 568 | " )\n", 569 | " )\n", 570 | " )\n", 571 | " )\n", 572 | " (final_layer_norm): T5LayerNorm()\n", 573 | " (dropout): Dropout(p=0.1, inplace=False)\n", 574 | " )\n", 575 | ")" 576 | ] 577 | }, 578 | "metadata": { 579 | "tags": [] 580 | }, 581 | "execution_count": 6 582 | } 583 | ] 584 | }, 585 | { 586 | "cell_type": "markdown", 587 | "metadata": { 588 | "id": "HGSvW-kkAfyS", 589 | "colab_type": "text" 590 | }, 591 | "source": [ 592 | "Yes, that’s right, look at all those `Linear` layers! We should be able to get some benefit out of quantizing this model. " 593 | ] 594 | }, 595 | { 596 | "cell_type": "markdown", 597 | "metadata": { 598 | "id": "qYznffbOAuSl", 599 | "colab_type": "text" 600 | }, 601 | "source": [ 602 | "## One Weird Trick — Dynamic Quantization\n", 603 | "\n" 604 | ] 605 | }, 606 | { 607 | "cell_type": "code", 608 | "metadata": { 609 | "id": "U8YLMB4KAvKL", 610 | "colab_type": "code", 611 | "colab": {} 612 | }, 613 | "source": [ 614 | "import torch.quantization\n", 615 | "\t\n", 616 | "quantized_model = torch.quantization.quantize_dynamic(base_model, {torch.nn.Linear}, dtype=torch.qint8)" 617 | ], 618 | "execution_count": 0, 619 | "outputs": [] 620 | }, 621 | { 622 | "cell_type": "markdown", 623 | "metadata": { 624 | "id": "I0DEsvrmA5kp", 625 | "colab_type": "text" 626 | }, 627 | "source": [ 628 | "No, really, that’s it. Chapter done. Bye!\n", 629 | "\n", 630 | "Oh, okay, if you really insist. But honestly, there’s not much more to it. Okay, firstly, a caveat in that `quantize_dynamic` will only quantize the weights, not the activations in our parameters. But all we need to do is pass in the `model` we wish to quantize and a dict of layers that we wish to replace with our quantized versions, in this case `Linear`. The function returns a new model, though you could run with the optional parameter `inplace=True` to mutate the original model rather than make a copy. \n", 631 | "\n", 632 | "Let’s save the model and take a look at the quantized size:\n" 633 | ] 634 | }, 635 | { 636 | "cell_type": "code", 637 | "metadata": { 638 | "id": "-8t9Hq62A9Sm", 639 | "colab_type": "code", 640 | "colab": { 641 | "base_uri": "https://localhost:8080/", 642 | "height": 52 643 | }, 644 | "outputId": "e2abd9b4-2dce-437a-e530-f53dc6e83938" 645 | }, 646 | "source": [ 647 | "!mkdir t5\n", 648 | "quantized_model.save_pretrained(\"t5\")\n", 649 | "!du -m t5" 650 | ], 651 | "execution_count": 18, 652 | "outputs": [ 653 | { 654 | "output_type": "stream", 655 | "text": [ 656 | "mkdir: cannot create directory ‘t5’: File exists\n", 657 | "121\tt5\n" 658 | ], 659 | "name": "stdout" 660 | } 661 | ] 662 | }, 663 | { 664 | "cell_type": "markdown", 665 | "metadata": { 666 | "id": "l4AwvS1CBXzN", 667 | "colab_type": "text" 668 | }, 669 | "source": [ 670 | "Almost a 50% reduction in size! We can’t get down to 4 times smaller due to not being able to store the activations as 8-bit integers, but we’ve done pretty well for one line of code. Let's do a very simple microbenchmark using both models in the `transformers` library summarization pipeline. " 671 | ] 672 | }, 673 | { 674 | "cell_type": "code", 675 | "metadata": { 676 | "id": "-EydnHY3BbwT", 677 | "colab_type": "code", 678 | "colab": {} 679 | }, 680 | "source": [ 681 | "base_summarizer = pipeline(\"summarization\", model=base_model, tokenizer=\"t5-small\")\n", 682 | "quantized_summarizer = pipeline(\"summarization\", model=quantized_model, tokenizer=\"t5-small\")" 683 | ], 684 | "execution_count": 0, 685 | "outputs": [] 686 | }, 687 | { 688 | "cell_type": "code", 689 | "metadata": { 690 | "id": "BHhiTSOlBg-f", 691 | "colab_type": "code", 692 | "colab": { 693 | "base_uri": "https://localhost:8080/", 694 | "height": 34 695 | }, 696 | "outputId": "ee37076a-777e-4af2-ef18-da7048ece96b" 697 | }, 698 | "source": [ 699 | "%timeit base_summarizer(\"From the very beginning, Regan was seen as having series potential. After the television film scored highly in the ratings, work began on the development of the series proper. Ian Kennedy Martin's idea was for the series to be mainly studio-based, with more dialogue and less action, but producer Ted Childs disagreed, and in consequence Ian Kennedy Martin parted company with the project. Childs produced it on 16mm film, a format that allowed for a much smaller film unit than videotape at that time. This made it possible to shoot almost entirely on location which helped give the series a startling degree of realism and to use film editing techniques which enabled him to give the show a heavy bias toward action sequences. The television play and the subsequent series were commissioned by Thames Television and produced by its film division Euston Films. It was originally broadcast on ITV between 2 January 1975 and 28 December 1978 at 21:00–22:00 on weekdays (usually Mondays), with repeated screenings at the same time until the early 1980s. The writers were given strict guidelines to follow: \\\"Each show will have an overall screen time (minus titles) of 48 minutes 40 seconds. Each film will open with a teaser of up to 3 minutes, which will be followed by the opening titles. The story will be played across three acts, each being no more than 19 minutes and no less than 8 minutes in length. Regan will appear in every episode, Carter in approximately 10 out of 13 episodes. In addition to these main characters, scripts should be based around three major speaking parts, with up to ten minor speaking parts\")" 700 | ], 701 | "execution_count": 29, 702 | "outputs": [ 703 | { 704 | "output_type": "stream", 705 | "text": [ 706 | "1 loop, best of 3: 29.4 s per loop\n" 707 | ], 708 | "name": "stdout" 709 | } 710 | ] 711 | }, 712 | { 713 | "cell_type": "code", 714 | "metadata": { 715 | "id": "-am4cPghCdtc", 716 | "colab_type": "code", 717 | "colab": { 718 | "base_uri": "https://localhost:8080/", 719 | "height": 34 720 | }, 721 | "outputId": "6e0b1484-ffe2-4523-85da-e4d00884a74e" 722 | }, 723 | "source": [ 724 | "%timeit quantized_summarizer(\"From the very beginning, Regan was seen as having series potential. After the television film scored highly in the ratings, work began on the development of the series proper. Ian Kennedy Martin's idea was for the series to be mainly studio-based, with more dialogue and less action, but producer Ted Childs disagreed, and in consequence Ian Kennedy Martin parted company with the project. Childs produced it on 16mm film, a format that allowed for a much smaller film unit than videotape at that time. This made it possible to shoot almost entirely on location which helped give the series a startling degree of realism and to use film editing techniques which enabled him to give the show a heavy bias toward action sequences. The television play and the subsequent series were commissioned by Thames Television and produced by its film division Euston Films. It was originally broadcast on ITV between 2 January 1975 and 28 December 1978 at 21:00–22:00 on weekdays (usually Mondays), with repeated screenings at the same time until the early 1980s. The writers were given strict guidelines to follow: \\\"Each show will have an overall screen time (minus titles) of 48 minutes 40 seconds. Each film will open with a teaser of up to 3 minutes, which will be followed by the opening titles. The story will be played across three acts, each being no more than 19 minutes and no less than 8 minutes in length. Regan will appear in every episode, Carter in approximately 10 out of 13 episodes. In addition to these main characters, scripts should be based around three major speaking parts, with up to ten minor speaking parts\")" 725 | ], 726 | "execution_count": 30, 727 | "outputs": [ 728 | { 729 | "output_type": "stream", 730 | "text": [ 731 | "1 loop, best of 3: 16.6 s per loop\n" 732 | ], 733 | "name": "stdout" 734 | } 735 | ] 736 | }, 737 | { 738 | "cell_type": "markdown", 739 | "metadata": { 740 | "id": "oANUgOHTEDb0", 741 | "colab_type": "text" 742 | }, 743 | "source": [ 744 | "In addition to almost being half the size, the quantized model is almost twice as fast! So…why don’t we do this _all_ the time? Are there no downsides? Well…it depends. We **are** losing information in our inference in a quantized model as our values cannot map to all the possible floating-point values that we find in the original model. So the chain of multiplications will be less accurate in our quantized model than in the original. You’ll need to check the new model against a reference dataset to determine the accuracy loss and whether that loss is an acceptable trade-off compared to the reduced storage demands and faster execution." 745 | ] 746 | }, 747 | { 748 | "cell_type": "markdown", 749 | "metadata": { 750 | "id": "w4IgTYAAEiI2", 751 | "colab_type": "text" 752 | }, 753 | "source": [ 754 | "## Other Quantizing Options Are Available" 755 | ] 756 | }, 757 | { 758 | "cell_type": "markdown", 759 | "metadata": { 760 | "id": "StgyLod9EtEl", 761 | "colab_type": "text" 762 | }, 763 | "source": [ 764 | "In addition to dynamic quantizing, PyTorch also offers _static quantizing_, where a trained model is modified to include _observer_ modules and a selection of data is fed into the model. During the inference on this data, the observers can generate a quantized distribution that fits best to the observed data and the activations that result. This can can produce even further space and time savings, especially with vision models like ResNet.\n", 765 | "\n", 766 | "However, for the best-in-class of accuracy in your smaller model, you'll want to investigate quantization-aware training (QAT). In this approach, the model fakes quantizing during the training loop of both the forward and backward passes; while all the computations take place with standard floats, everything is rounded down to integer values, so you end up with a quantized model after training is finished, but one with a higher accuracy than you can acheive with the dynamic or static approaches. " 767 | ] 768 | }, 769 | { 770 | "cell_type": "markdown", 771 | "metadata": { 772 | "id": "0ufUnEbPHoNs", 773 | "colab_type": "text" 774 | }, 775 | "source": [ 776 | "## Is It Worth It?\n", 777 | "\n", 778 | "You might be wondering if you're just better off training a smaller model rather than going to all this effort to compress larger models. In the recent paper, [Train Large, Then Compress](https://arxiv.org/abs/2002.11794), there's a good deal of evidence presented that transformer-based models really do benefit from this approach. Because larger models converge faster than smaller ones, you will likely get more accurate results by training a large model and compressing than if you spent the same compute time on a smaller model. So go forth and compress!\n", 779 | "\n", 780 | "(and we'll see you back here in the future for _pruning_ models)" 781 | ] 782 | }, 783 | { 784 | "cell_type": "markdown", 785 | "metadata": { 786 | "id": "6NbH-K1IHkEC", 787 | "colab_type": "text" 788 | }, 789 | "source": [ 790 | "## Further Reading\n", 791 | "\n", 792 | "https://pytorch.org/docs/stable/quantization.html\n", 793 | "\n", 794 | "https://arxiv.org/abs/2002.11794" 795 | ] 796 | } 797 | ] 798 | } --------------------------------------------------------------------------------