├── README.md ├── alignment_exercise.ipynb ├── alignment_solution.ipynb ├── exercises.pdf └── utils.py /README.md: -------------------------------------------------------------------------------- 1 | ## Notebooks and helpers for the "Adversaries examples & human-ML alignment" tutorial. 2 | 3 | [Exercise notebook](https://colab.research.google.com/github/MadryLab/AdvEx_Tutorial/blob/master/alignment_exercise.ipynb) 4 | 5 | [Solution notebook](https://colab.research.google.com/github/MadryLab/AdvEx_Tutorial/blob/master/alignment_solution.ipynb) 6 | 7 | ### Based on the following works 8 | [IST+19] Ilyas A., Santurkar S., Tsipras D., Engstrom L., Tran B., Madry A. (2019). Adversarial Examples Are Not Bugs, They Are Features. arXiv, arXiv:1905.02175 9 | 10 | [EIS+19] Engstrom L., Ilyas A., Santurkar S., Tsipras D., Tran B., Madry A. (2019). Learning Perceptually-Aligned Representations via Adversarial Robustness. arXiv, arXiv:1906.00945 11 | 12 | [STE+19] Santurkar S., Tsipras D., Tran B., Ilyas A., Engstrom L., Madry A. (2019). Image Synthesis with a Single (Robust) Classifier. arXiv, arXiv:1906.09453 13 | 14 | [EIS+19] Robustness (Python Library) (2019); https://github.com/MadryLab/robustness. 15 | 16 | ### Citation 17 | If you use this material or code, please cite it as follows: 18 | 19 | ``` 20 | @misc{santurkar2020notes, 21 | title={Adversarial examples and human-ML alignment (MIT BCS tutorial)}, 22 | author={Shibani Santurkar and Dimitris Tsipras}, 23 | year={2020}, 24 | url={https://github.com/MadryLab/BCS_Tutoria} 25 | } 26 | ``` 27 | 28 | ### Contributors 29 | * [Shibani Santurkar](http://people.csail.mit.edu/shibani/) 30 | * [Dimitris Tsipras](http://people.csail.mit.edu/tsipras/) 31 | -------------------------------------------------------------------------------- /alignment_exercise.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "colab_type": "text", 7 | "id": "Pzh9ln5I_dPd" 8 | }, 9 | "source": [ 10 | "Before you run the code, make sure the runtime type is GPU (Runtime -> Change runtime type -> GPU)." 11 | ] 12 | }, 13 | { 14 | "cell_type": "code", 15 | "execution_count": null, 16 | "metadata": { 17 | "colab": {}, 18 | "colab_type": "code", 19 | "id": "lE5lMjUdwB3D" 20 | }, 21 | "outputs": [], 22 | "source": [ 23 | "%%capture \n", 24 | "!pip install robustness\n", 25 | "!git clone https://github.com/MadryLab/AdvEx_Tutorial.git code\n", 26 | "!mv code/*.py .\n", 27 | "!wget http://people.csail.mit.edu/shibani/tutorial.zip\n", 28 | "!unzip tutorial\n", 29 | "!mv tutorial/* ." 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": null, 35 | "metadata": { 36 | "colab": { 37 | "base_uri": "https://localhost:8080/", 38 | "height": 34 39 | }, 40 | "colab_type": "code", 41 | "id": "exqGyhuo-hZx", 42 | "outputId": "91207db8-0c78-4337-9e5f-71145013912a" 43 | }, 44 | "outputs": [], 45 | "source": [ 46 | "!ls " 47 | ] 48 | }, 49 | { 50 | "cell_type": "code", 51 | "execution_count": null, 52 | "metadata": { 53 | "colab": {}, 54 | "colab_type": "code", 55 | "id": "rF1TJ1fIwHI0" 56 | }, 57 | "outputs": [], 58 | "source": [ 59 | "try: # set up path\n", 60 | " import google.colab, sys, torch\n", 61 | " if not torch.cuda.is_available():\n", 62 | " print(\"Please change runtime type to include a GPU.\") \n", 63 | "except:\n", 64 | " pass" 65 | ] 66 | }, 67 | { 68 | "cell_type": "code", 69 | "execution_count": null, 70 | "metadata": { 71 | "colab": { 72 | "base_uri": "https://localhost:8080/", 73 | "height": 71 74 | }, 75 | "colab_type": "code", 76 | "id": "5kuhxDcTv-Gg", 77 | "outputId": "cd164293-8e21-4a9d-b76f-1d976c6a1176" 78 | }, 79 | "outputs": [], 80 | "source": [ 81 | "# Import basic libraries needed for the exercise (numpy, matplotlib, and torch)\n", 82 | "import numpy as np\n", 83 | "import matplotlib.pyplot as plt\n", 84 | "import seaborn as sns\n", 85 | "from tqdm import tqdm\n", 86 | "import torch as ch\n", 87 | "import torchvision.transforms as transforms\n", 88 | "\n", 89 | "# We also use the robustness library (https://robustness.readthedocs.io/en/latest/) for some \n", 90 | "# convenient functionality.\n", 91 | "from robustness.tools.vis_tools import show_image_row\n", 92 | "\n", 93 | "import utils \n", 94 | "\n", 95 | "sns.set_style('darkgrid')\n", 96 | "%matplotlib inline" 97 | ] 98 | }, 99 | { 100 | "cell_type": "markdown", 101 | "metadata": { 102 | "colab_type": "text", 103 | "id": "CbxLppqAv-Gk" 104 | }, 105 | "source": [ 106 | "# Setup" 107 | ] 108 | }, 109 | { 110 | "cell_type": "markdown", 111 | "metadata": { 112 | "colab_type": "text", 113 | "id": "Sehuu91Vv-Gk" 114 | }, 115 | "source": [ 116 | "### Load datasets" 117 | ] 118 | }, 119 | { 120 | "cell_type": "markdown", 121 | "metadata": { 122 | "colab_type": "text", 123 | "id": "quwRUwTZv-Gl" 124 | }, 125 | "source": [ 126 | "For our experiments (except Ex. II), we will use the ImageNet dataset from the ILSVRC challenge. This is a 1000 class dataset, that has played an important role in developing and evaluating deep learning models." 127 | ] 128 | }, 129 | { 130 | "cell_type": "code", 131 | "execution_count": null, 132 | "metadata": { 133 | "colab": { 134 | "base_uri": "https://localhost:8080/", 135 | "height": 34 136 | }, 137 | "colab_type": "code", 138 | "id": "_TXDcdRtv-Gl", 139 | "outputId": "69301f98-a81c-4227-f7b9-c0bb6f4ddbd6" 140 | }, 141 | "outputs": [], 142 | "source": [ 143 | "# Creater a dataset, and a loader to access it. In addition to the loader, we also need to obtain a\n", 144 | "# normalization function. This is because standard deep networks are typically trained \n", 145 | "# on normalized images, so we need to apply the same normalization during testing. Finally,\n", 146 | "# we also get a label map, that tells us what class a corresponding numeric value corresponds\n", 147 | "# to.\n", 148 | "in_dataset, in_loaders, normalization_function, label_map_IN = utils.load_dataset('imagenet',\n", 149 | " batch_size=5,\n", 150 | " num_workers=1)\n", 151 | "in_loader = in_loaders[1]" 152 | ] 153 | }, 154 | { 155 | "cell_type": "markdown", 156 | "metadata": { 157 | "colab_type": "text", 158 | "id": "RGoBlcACv-Go" 159 | }, 160 | "source": [ 161 | "We can visualize some ImageNet samples, along with their labels, as follows" 162 | ] 163 | }, 164 | { 165 | "cell_type": "code", 166 | "execution_count": null, 167 | "metadata": { 168 | "colab": { 169 | "base_uri": "https://localhost:8080/", 170 | "height": 167 171 | }, 172 | "colab_type": "code", 173 | "id": "yzFCpxGxv-Go", 174 | "outputId": "a468533b-2a86-404d-9186-c87d6b12944f" 175 | }, 176 | "outputs": [], 177 | "source": [ 178 | "_, (img, targ) = next(enumerate(in_loader))\n", 179 | "\n", 180 | "show_image_row([img],\n", 181 | " [\"ImageNet Images\"],\n", 182 | " tlist=[[label_map_IN[int(t)].split(',')[0] for t in targ]])" 183 | ] 184 | }, 185 | { 186 | "cell_type": "markdown", 187 | "metadata": { 188 | "colab_type": "text", 189 | "id": "vJwUgnnfv-Gq" 190 | }, 191 | "source": [ 192 | "## Load model" 193 | ] 194 | }, 195 | { 196 | "cell_type": "markdown", 197 | "metadata": { 198 | "colab_type": "text", 199 | "id": "5t5iyi-rv-Gr" 200 | }, 201 | "source": [ 202 | "Next, we need a model to play with! PyTorch provides access to a large range of pre-trained deep networks (for a full list, see ). For example, we can load a ResNet18 using the following code." 203 | ] 204 | }, 205 | { 206 | "cell_type": "code", 207 | "execution_count": null, 208 | "metadata": { 209 | "colab": { 210 | "base_uri": "https://localhost:8080/", 211 | "height": 103, 212 | "referenced_widgets": [ 213 | "cf3d338a3e0d4d6c9e8d47d23abf7c1a", 214 | "fb445702be25404fa29c10e7461bb0e2", 215 | "82d97a20caab4c7c83d41b9149be24a5", 216 | "7106747f5eac4286b49d9a49d33d64e0", 217 | "5443d05074074a7b8cdf6aaf22ae8bde", 218 | "a6e18480d09d45f0adfda751f5673406", 219 | "64c8c660ae09476a8b0c2839a8e323cc", 220 | "0d5cba86a8ba444395af9bb5016ce6c9" 221 | ] 222 | }, 223 | "colab_type": "code", 224 | "id": "aP-A-NzCv-Gr", 225 | "outputId": "2900d15b-745e-48a9-b62f-132f0dba3a7c" 226 | }, 227 | "outputs": [], 228 | "source": [ 229 | "std_model = utils.load_model('resnet18')" 230 | ] 231 | }, 232 | { 233 | "cell_type": "markdown", 234 | "metadata": { 235 | "colab_type": "text", 236 | "id": "fXKgaRWev-Gu" 237 | }, 238 | "source": [ 239 | "# Excercise I: Adversarial examples" 240 | ] 241 | }, 242 | { 243 | "cell_type": "markdown", 244 | "metadata": { 245 | "colab_type": "text", 246 | "id": "WUCdz6hxv-Gu" 247 | }, 248 | "source": [ 249 | "Since their discovery, adversarial examples have been one of the most extensively studied phenomena in deep learning. Adversarial perturbations are *imperceptible* (non-random) perturbations that can be added to any input image so as to cause a standard (highly accurate) classifier to misclassify the modified input (or classify it as an adversarially chosen class). " 250 | ] 251 | }, 252 | { 253 | "cell_type": "markdown", 254 | "metadata": { 255 | "colab_type": "text", 256 | "id": "i6qrbV9Zv-Gu" 257 | }, 258 | "source": [ 259 | "*Finding adversarial examples:* The idea is pretty simple: given a target class (t), we want to find a perturbation ($\\delta'$) that when added to the input (x) maximizes the likelihood of the target class. At the same time, we want the perturbation to be small or lie within some pre-defined perturbation set: for example in a tiny L2 ball around the image. Basically, we want to find a $\\delta'$ such that\n", 260 | "\n", 261 | "$\\delta' = argmin_{||\\delta||_2 \\leq \\epsilon} L(x + \\delta, t; \\theta)$\n", 262 | "\n", 263 | "\n", 264 | "To find a perturbation that minimizes the objective (maximizes likelihood) while remaining in a bounded set, we use projected gradient descent PGD (see for more). " 265 | ] 266 | }, 267 | { 268 | "cell_type": "markdown", 269 | "metadata": { 270 | "colab_type": "text", 271 | "id": "VZTrYEJJv-Gv" 272 | }, 273 | "source": [ 274 | "### Try it yourself! \n", 275 | "\n", 276 | "First choose a target class for every input. (Note that you have a batch of inputs, so you could try different targets for different inputs.)" 277 | ] 278 | }, 279 | { 280 | "cell_type": "code", 281 | "execution_count": null, 282 | "metadata": { 283 | "colab": { 284 | "base_uri": "https://localhost:8080/", 285 | "height": 34 286 | }, 287 | "colab_type": "code", 288 | "id": "hJ1nT-b2v-Gv", 289 | "outputId": "5d303326-9a1b-4cdf-c846-42c69b2676cc" 290 | }, 291 | "outputs": [], 292 | "source": [ 293 | "TARGET = 3\n", 294 | "\n", 295 | "print(f\"Target class: {label_map_IN[TARGET]}\")\n", 296 | "\n", 297 | "target_class = TARGET * ch.ones_like(targ)" 298 | ] 299 | }, 300 | { 301 | "cell_type": "markdown", 302 | "metadata": { 303 | "colab_type": "text", 304 | "id": "Z7y0ujIYv-Gy" 305 | }, 306 | "source": [ 307 | "Next, there are a couple of parameters that you need to choose: \n", 308 | "1. eps: maximum size of the perturbation in terms of L2 norm. For e.g., eps=2 implies that $||\\delta||_2 \\leq 2$\n", 309 | "2. Nsteps: number of (projected) gradient descent to perform\n", 310 | "3. step_size: size of each step of (projected) gradient descent\n", 311 | "\n", 312 | "Try varying these parameters and see what happens.\n", 313 | "\n", 314 | "(You could also try implementing the PGD function yourself!)" 315 | ] 316 | }, 317 | { 318 | "cell_type": "code", 319 | "execution_count": null, 320 | "metadata": { 321 | "colab": { 322 | "base_uri": "https://localhost:8080/", 323 | "height": 34 324 | }, 325 | "colab_type": "code", 326 | "id": "hyo_OIjNv-Gy", 327 | "outputId": "b7615035-d827-4910-f928-a57aa6446b4d" 328 | }, 329 | "outputs": [], 330 | "source": [ 331 | "# Create adversarial examples\n", 332 | "adv_ex = utils.L2PGD(std_model, img, target_class, normalization_function,\n", 333 | " step_size=0.5, Nsteps=20, \n", 334 | " eps=1.25, targeted=True)" 335 | ] 336 | }, 337 | { 338 | "cell_type": "markdown", 339 | "metadata": { 340 | "colab_type": "text", 341 | "id": "_ZZig2A7v-G0" 342 | }, 343 | "source": [ 344 | "### Evaluate model predictions at perturbed inputs\n", 345 | "\n", 346 | "To see if our attack was successful, we will now evaluate model predictions at the perturbed inputs. We would expect the predicted label to match the `target_class` used above, if we succeeded." 347 | ] 348 | }, 349 | { 350 | "cell_type": "code", 351 | "execution_count": null, 352 | "metadata": { 353 | "colab": {}, 354 | "colab_type": "code", 355 | "id": "c_Su8zCDv-G0" 356 | }, 357 | "outputs": [], 358 | "source": [ 359 | "with ch.no_grad():\n", 360 | " logits = utils.forward_pass(std_model, \n", 361 | " adv_ex, \n", 362 | " normalization_function)\n", 363 | " pred_label = logits.argmax(dim=1)" 364 | ] 365 | }, 366 | { 367 | "cell_type": "markdown", 368 | "metadata": { 369 | "colab_type": "text", 370 | "id": "YtQVh9cVv-G2" 371 | }, 372 | "source": [ 373 | "### Visualize adversarial examples\n", 374 | "\n", 375 | "We now inspect the original (unperturbed) inputs (*top*), along with the corresponding adversarial examples (*bottom*). We also look at what the model predicts for each row of images.\n", 376 | "\n", 377 | "Does the attack succeed? Do the adversarial examples look different from the original inputs?" 378 | ] 379 | }, 380 | { 381 | "cell_type": "code", 382 | "execution_count": null, 383 | "metadata": { 384 | "colab": { 385 | "base_uri": "https://localhost:8080/", 386 | "height": 315 387 | }, 388 | "colab_type": "code", 389 | "id": "BkloMYl8v-G3", 390 | "outputId": "78dd2cc9-fae4-4d42-8c5c-34e1efd405e8" 391 | }, 392 | "outputs": [], 393 | "source": [ 394 | "show_image_row([img, adv_ex], \n", 395 | " ['Original image', 'Adv. Example'],\n", 396 | " tlist=[[label_map_IN[int(t)].split(',')[0] for t in label] \\\n", 397 | " for label in [targ, pred_label, targ]])" 398 | ] 399 | }, 400 | { 401 | "cell_type": "markdown", 402 | "metadata": { 403 | "colab_type": "text", 404 | "id": "dBlZGgc2v-G5" 405 | }, 406 | "source": [ 407 | "# Excercise II: Are adversarial perturbations meaningless?" 408 | ] 409 | }, 410 | { 411 | "cell_type": "markdown", 412 | "metadata": { 413 | "colab_type": "text", 414 | "id": "w2Zb8bLNv-G6" 415 | }, 416 | "source": [ 417 | "In this experiment, we will revisit the cause underlying the brittleness of models to adversarial perturbations. In particular, we will examine if adversarial perturbations indeed correspond to meaningless sensitivities (or bugs) in the model.\n", 418 | "\n", 419 | "For computational efficiency, we will perform this experiment using linear classifiers trained on a binary subset of the CIFAR-10 dataset. \n" 420 | ] 421 | }, 422 | { 423 | "cell_type": "markdown", 424 | "metadata": { 425 | "colab_type": "text", 426 | "id": "rjEt2zDtv-G6" 427 | }, 428 | "source": [ 429 | "Let's start by loading the dataset and looking at some of its samples." 430 | ] 431 | }, 432 | { 433 | "cell_type": "code", 434 | "execution_count": null, 435 | "metadata": { 436 | "colab": { 437 | "base_uri": "https://localhost:8080/", 438 | "height": 306, 439 | "referenced_widgets": [ 440 | "025c326236cd48e48cfb5fc83403b8a7", 441 | "8e5da4ae9373457f823a7c7ec28dec8f", 442 | "d47a823649334878a19883d16c94b071", 443 | "6cbb663fb4cd48c5aaf1d2a0544d4e95", 444 | "60838b71f99e4445b92d53aa722d5687", 445 | "2c4c64e0da04428b8c74a7f55d0b3ca0", 446 | "cb153a59ff72473bac8b2f6dcaa29f57", 447 | "777111e1581c40e69acd4dc002f400fe" 448 | ] 449 | }, 450 | "colab_type": "code", 451 | "id": "UH_menT8v-G7", 452 | "outputId": "555cbf0b-93ce-4e64-eca5-48eba58a7034" 453 | }, 454 | "outputs": [], 455 | "source": [ 456 | "binary_data = utils.load_binary_dataset(batch_size=100, num_workers=1, classes=[0, 3])\n", 457 | "\n", 458 | "im, targ = binary_data['train']\n", 459 | "show_image_row([im[:5]],\n", 460 | " tlist=[[f\"Label={int(t)}\" for t in targ[:5]]],\n", 461 | " fontsize=20)" 462 | ] 463 | }, 464 | { 465 | "cell_type": "markdown", 466 | "metadata": { 467 | "colab_type": "text", 468 | "id": "d7sM-inKv-G9" 469 | }, 470 | "source": [ 471 | "### Training a linear classifier on this classification task" 472 | ] 473 | }, 474 | { 475 | "cell_type": "markdown", 476 | "metadata": { 477 | "colab_type": "text", 478 | "id": "0sn4nv5Bv-G9" 479 | }, 480 | "source": [ 481 | "As you can see, the dataset contains two classes: cats and airplanes. We will now train a very basic linear classifier on the data. " 482 | ] 483 | }, 484 | { 485 | "cell_type": "code", 486 | "execution_count": null, 487 | "metadata": { 488 | "colab": { 489 | "base_uri": "https://localhost:8080/", 490 | "height": 51 491 | }, 492 | "colab_type": "code", 493 | "id": "oEvlU7kMv-G-", 494 | "outputId": "66b28074-d49c-4bbc-df12-efb35f7e4beb" 495 | }, 496 | "outputs": [], 497 | "source": [ 498 | "train_log, linear_net = utils.train_linear(binary_data, step_size=0.1, iterations=2000)" 499 | ] 500 | }, 501 | { 502 | "cell_type": "markdown", 503 | "metadata": { 504 | "colab_type": "text", 505 | "id": "a65c92_nBNj1" 506 | }, 507 | "source": [ 508 | "If you instead prefer you could also load a pretrained model by setting `load_pretrained=True` in the code below." 509 | ] 510 | }, 511 | { 512 | "cell_type": "code", 513 | "execution_count": null, 514 | "metadata": { 515 | "colab": {}, 516 | "colab_type": "code", 517 | "id": "Ouz68dFCBFwU" 518 | }, 519 | "outputs": [], 520 | "source": [ 521 | "load_pretrained = False\n", 522 | "\n", 523 | "if load_pretrained:\n", 524 | " Nfeatures = int(np.prod(binary_data['train'][0].shape[1:]))\n", 525 | "\n", 526 | " linear_net = utils.Linear(Nfeatures=Nfeatures, Nclasses=2)\n", 527 | " linear_net.load_state_dict(ch.load(\"./models/LinearCifarBinary.pt\"))\n", 528 | " linear_net.eval()\n", 529 | " linear_net = ch.nn.DataParallel(linear_net.cuda())" 530 | ] 531 | }, 532 | { 533 | "cell_type": "markdown", 534 | "metadata": { 535 | "colab_type": "text", 536 | "id": "MkvRMptZv-HA" 537 | }, 538 | "source": [ 539 | "Let's take a look at some samples from the dataset, along with their labels and model predictions." 540 | ] 541 | }, 542 | { 543 | "cell_type": "code", 544 | "execution_count": null, 545 | "metadata": { 546 | "colab": { 547 | "base_uri": "https://localhost:8080/", 548 | "height": 169 549 | }, 550 | "colab_type": "code", 551 | "id": "HwNBPYOKv-HA", 552 | "outputId": "e8bda3ea-3b4e-4c9e-c097-ea16adf00db2" 553 | }, 554 | "outputs": [], 555 | "source": [ 556 | "preds = utils.get_predictions(im, linear_net)\n", 557 | "\n", 558 | "show_image_row([im[:5]],\n", 559 | " tlist=[[f\"Label={int(t)} Pred={int(p)}\" \n", 560 | " for t, p in zip(targ[:5], preds[:10])]],\n", 561 | " fontsize=16)" 562 | ] 563 | }, 564 | { 565 | "cell_type": "markdown", 566 | "metadata": { 567 | "colab_type": "text", 568 | "id": "PhVA9m3pv-HC" 569 | }, 570 | "source": [ 571 | "### Using adversarial examples to train models\n", 572 | "\n", 573 | "Note that adversarial examples correspond to adding a non-random perturbation to a given input data point. Thus adversarial perturbations modify input features, albeit in an imperceptible way.\n", 574 | "\n", 575 | "So what features do these perturbations modify? Do they just exploit meaningless sensitivities (or bugs) of the models? What happens if we train a new model solely on adversarial examples?" 576 | ] 577 | }, 578 | { 579 | "cell_type": "markdown", 580 | "metadata": { 581 | "colab_type": "text", 582 | "id": "lQmK4ijzv-HC" 583 | }, 584 | "source": [ 585 | "We will now construct a *training* dataset made of adversarial examples. Specifically, we will:\n", 586 | "\n", 587 | "Step 1. Add adversarial perturbations to all the training set images to fool the linear classifier into flipping its prediction (i.e., classify cats as `0` and airplanes as `1`).\n", 588 | "\n", 589 | "Step 2. We will now take all the successful adversarial examples (i.e., data points that were originally classified correctly by the model, but now after they have been adversarially perturbed) and use these to make a new dataset. The image labels in this dataset will be the labels *predicted* by the model." 590 | ] 591 | }, 592 | { 593 | "cell_type": "code", 594 | "execution_count": null, 595 | "metadata": { 596 | "colab": { 597 | "base_uri": "https://localhost:8080/", 598 | "height": 34 599 | }, 600 | "colab_type": "code", 601 | "id": "SM-tLZVvv-HD", 602 | "outputId": "e5b5d983-c639-4b25-cff7-0b8ee39c950d" 603 | }, 604 | "outputs": [], 605 | "source": [ 606 | "im_adv = utils.L2PGD(linear_net, im, targ, None,\n", 607 | " step_size=0.1, Nsteps=20, \n", 608 | " eps=1, targeted=False)" 609 | ] 610 | }, 611 | { 612 | "cell_type": "markdown", 613 | "metadata": { 614 | "colab_type": "text", 615 | "id": "fgPWwBqBBdTq" 616 | }, 617 | "source": [ 618 | "Let's look at how many examples we managed to fool the network on and examine some of these samples" 619 | ] 620 | }, 621 | { 622 | "cell_type": "code", 623 | "execution_count": null, 624 | "metadata": { 625 | "colab": { 626 | "base_uri": "https://localhost:8080/", 627 | "height": 351 628 | }, 629 | "colab_type": "code", 630 | "id": "JE-fEEY8v-HF", 631 | "outputId": "72b3564c-0a15-49a6-b66f-9fd7d71a01a9" 632 | }, 633 | "outputs": [], 634 | "source": [ 635 | "preds_adv = utils.get_predictions(im_adv, linear_net)\n", 636 | "print(\"% examples on which model is fooled:\",\n", 637 | " f\"{100 * ch.mean(preds_adv.cpu().eq(targ).float()).item():.2f} \\n\")\n", 638 | "\n", 639 | "idx = np.where(np.logical_and(preds.cpu() == targ, preds_adv.cpu() != targ))[0]\n", 640 | "np.random.shuffle(idx)\n", 641 | "\n", 642 | "show_image_row([im[idx[:5]], im_adv[idx[:5]]],\n", 643 | " ylist=[\"Original\", \"Adv. example\"],\n", 644 | " tlist=[[f\"Label={int(t)}, Pred={int(p)}\" \n", 645 | " for t, p in zip(targ[idx[:5]], preds[idx[:5]])],\n", 646 | " [f\"Label={int(t)}, Pred={int(p)}\" \n", 647 | " for t, p in zip(targ[idx[:5]], preds_adv[idx[:5]])]],\n", 648 | " fontsize=16)" 649 | ] 650 | }, 651 | { 652 | "cell_type": "markdown", 653 | "metadata": { 654 | "colab_type": "text", 655 | "id": "Rtb5qcRov-HH" 656 | }, 657 | "source": [ 658 | "Note that in the original data, airplanes were labeled as class `0` and cats were labeled as class `1`. However for the adversarial examples, the model predicts the opposite/incorrect label.\n", 659 | "\n", 660 | "We will now train a *new* model on these adversarial examples. Note crucially, that the data points are labeled based on the *predicted label* and hence the labels are flipped w.r.t. the original train/test set." 661 | ] 662 | }, 663 | { 664 | "cell_type": "code", 665 | "execution_count": null, 666 | "metadata": { 667 | "colab": { 668 | "base_uri": "https://localhost:8080/", 669 | "height": 34 670 | }, 671 | "colab_type": "code", 672 | "id": "qxGz4FCnv-HI", 673 | "outputId": "795e2a72-c19f-4d1a-87aa-b95093501ac8" 674 | }, 675 | "outputs": [], 676 | "source": [ 677 | "binary_data_adv = {}\n", 678 | "# Training data not consists solely of adv. examples which fooled the model \n", 679 | "# and their \"incorrect\" labels (airplanes -> 1 and cats -> 0)\n", 680 | "binary_data_adv['train'] = (im_adv[idx].cpu(), preds_adv[idx].cpu())\n", 681 | "# Test data remains the same (i.e., airplanes -> 0 and cats -> 1) \n", 682 | "binary_data_adv['test'] = binary_data['test']\n", 683 | "\n", 684 | "# The train set size is smaller because we only choose the training samples we are able to fool the model on\n", 685 | "print(binary_data_adv['train'][0].shape)" 686 | ] 687 | }, 688 | { 689 | "cell_type": "markdown", 690 | "metadata": { 691 | "colab_type": "text", 692 | "id": "NiG2If0Uv-HJ" 693 | }, 694 | "source": [ 695 | "Let's look again at the training and test data\n", 696 | "* If you, as a human, were trained on the samples in the second row in the figure above what mapping would you learn between [cats, airplanes] and labels [0, 1]?\n", 697 | "* What would your accuracy on the original (unmodified) test set be?" 698 | ] 699 | }, 700 | { 701 | "cell_type": "code", 702 | "execution_count": null, 703 | "metadata": { 704 | "colab": { 705 | "base_uri": "https://localhost:8080/", 706 | "height": 317 707 | }, 708 | "colab_type": "code", 709 | "id": "VW-rTYdAv-HK", 710 | "outputId": "b1facaa9-8497-423b-b163-3db3bf2d91fb" 711 | }, 712 | "outputs": [], 713 | "source": [ 714 | "show_image_row([binary_data_adv['train'][0][:5], binary_data_adv['test'][0][:5]],\n", 715 | " ylist=[\"Train\", \"Test\"],\n", 716 | " tlist=[[f\"Label={int(t)}\" for t in binary_data_adv['train'][1][:5]],\n", 717 | " [f\"Label={int(t)}\" for t in binary_data_adv['test'][1][:5]]],\n", 718 | " fontsize=16)" 719 | ] 720 | }, 721 | { 722 | "cell_type": "markdown", 723 | "metadata": { 724 | "colab_type": "text", 725 | "id": "MuCfp1Q-Bv_k" 726 | }, 727 | "source": [ 728 | "We will now train a linear classifier from scratch on this mislabelled dataset." 729 | ] 730 | }, 731 | { 732 | "cell_type": "code", 733 | "execution_count": null, 734 | "metadata": { 735 | "colab": { 736 | "base_uri": "https://localhost:8080/", 737 | "height": 51 738 | }, 739 | "colab_type": "code", 740 | "id": "I31mrfTJv-HM", 741 | "outputId": "c61be53c-08a0-4cea-8268-8dd2cdc17f6c" 742 | }, 743 | "outputs": [], 744 | "source": [ 745 | "train_log_adv, adv_net = utils.train_linear(binary_data_adv, step_size=0.1, iterations=3000)" 746 | ] 747 | }, 748 | { 749 | "cell_type": "markdown", 750 | "metadata": { 751 | "colab_type": "text", 752 | "id": "TIgfSZ9Gv-HN" 753 | }, 754 | "source": [ 755 | "The model still gets > 70% accuracy on the original, unmodified test set, despite being trained on an entirely mislabeled training set!\n", 756 | "\n", 757 | "Let's look at the predictions of this model to be sure." 758 | ] 759 | }, 760 | { 761 | "cell_type": "code", 762 | "execution_count": null, 763 | "metadata": { 764 | "colab": { 765 | "base_uri": "https://localhost:8080/", 766 | "height": 317 767 | }, 768 | "colab_type": "code", 769 | "id": "ND5ySR1tv-HO", 770 | "outputId": "ddbba98d-edec-44e3-efc7-1daf51e15bf8" 771 | }, 772 | "outputs": [], 773 | "source": [ 774 | "preds = utils.get_predictions(binary_data_adv['test'][0], linear_net)\n", 775 | "\n", 776 | "\n", 777 | "show_image_row([binary_data_adv['train'][0][:5], binary_data_adv['test'][0][:5]],\n", 778 | " ylist=[\"Train\", \"Test\"],\n", 779 | " tlist=[[f\"Label={int(t)}\" for t in binary_data_adv['train'][1][:5]],\n", 780 | " [f\"Label={int(t)}, Pred={int(p)}\" \n", 781 | " for t, p in zip(binary_data_adv['test'][1][:5], preds[:5])]],\n", 782 | " fontsize=16)" 783 | ] 784 | }, 785 | { 786 | "cell_type": "markdown", 787 | "metadata": { 788 | "colab_type": "text", 789 | "id": "a09lhoMrv-HP" 790 | }, 791 | "source": [ 792 | "So how did this happen? \n", 793 | "\n", 794 | "Note that all the human meaningful features, which we refer to as robust features, in these images point to the incorrect label (e.g., wings -> 1 and ears -> 0). Thus, a human trained on the dataset above would get 0% accuracy on the test set.\n", 795 | "\n", 796 | "Since it is not possible to get non-trivial accuracy on the test set based on robust features, this must be due to the (imperceptible) features we introduced via adversarial perturbations. For instance, when we added adversarial perturbations to a cat image to make the first linear classifier think it was a plane, we must have added features that actually generalize to planes on the test set. \n", 797 | "\n", 798 | "Thus, adversarial examples do not correspond just to meaningless sensitivies but to well-generalizing features. We see this phenomenon occur on state-of-the-art deep nets and on multi-class datasets such as CIFAR or ImageNet. You could try to reproduce this effect there as a follow-up exercise!" 799 | ] 800 | }, 801 | { 802 | "cell_type": "markdown", 803 | "metadata": { 804 | "colab_type": "text", 805 | "id": "cW3wybZwv-HQ" 806 | }, 807 | "source": [ 808 | "# Excercise III: Gradients as model interpretations" 809 | ] 810 | }, 811 | { 812 | "cell_type": "markdown", 813 | "metadata": { 814 | "colab_type": "text", 815 | "id": "-VCRtzglv-HQ" 816 | }, 817 | "source": [ 818 | "So far, we saw that standard models rely on non-robust features for part of their performance. We will now explore how this dependence affects other properties of standard models, specifically model interpretability.\n", 819 | "\n", 820 | "For this, we will begin by looking at one of the most natural interpretations: gradient-based saliency maps. These maps highlight which input features (pixels) the model prediction is sensitive to.\n", 821 | "\n", 822 | "(In these experiments, we will go back to the ImageNet-trained ResNet model from Ex. I.)" 823 | ] 824 | }, 825 | { 826 | "cell_type": "markdown", 827 | "metadata": { 828 | "colab_type": "text", 829 | "id": "lAiexsZdv-HQ" 830 | }, 831 | "source": [ 832 | "### Compute and visualize gradient" 833 | ] 834 | }, 835 | { 836 | "cell_type": "code", 837 | "execution_count": null, 838 | "metadata": { 839 | "colab": {}, 840 | "colab_type": "code", 841 | "id": "N4Oj1pKzB-JO" 842 | }, 843 | "outputs": [], 844 | "source": [ 845 | "_, (img, targ) = next(enumerate(in_loader))" 846 | ] 847 | }, 848 | { 849 | "cell_type": "code", 850 | "execution_count": null, 851 | "metadata": { 852 | "colab": { 853 | "base_uri": "https://localhost:8080/", 854 | "height": 315 855 | }, 856 | "colab_type": "code", 857 | "id": "23joydOev-HR", 858 | "outputId": "e68a9066-912a-4fb0-fbe3-bf59312d5a2c" 859 | }, 860 | "outputs": [], 861 | "source": [ 862 | "# We compute the gradient of the loss, with respect to the input. For every image pixel,\n", 863 | "# the gradient tells us how the loss changes if we vary that pixel slightly.\n", 864 | "\n", 865 | "grad, _ = utils.get_gradient(std_model, img, targ, normalization_function)\n", 866 | "\n", 867 | "# We can then visualize the original image, along with the gradient. Note that the gradient may\n", 868 | "# not lie within the valid pixel range ([0, 1]), so we need to rescale it using the \n", 869 | "# `visualize_gradient` function.\n", 870 | "\n", 871 | "show_image_row([img, utils.visualize_gradient(grad)],\n", 872 | " [\"Original Image\", \"Gradient\"],\n", 873 | " tlist=[[label_map_IN[int(t)].split(',')[0] for t in targ],\n", 874 | " [\"\" for _ in targ]])" 875 | ] 876 | }, 877 | { 878 | "cell_type": "markdown", 879 | "metadata": { 880 | "colab_type": "text", 881 | "id": "qzQVIFBzv-HS" 882 | }, 883 | "source": [ 884 | "The gradients of standard models look quite noisy and seem rather hard to interpret. Why might this be the case? Could it have something to do with non-robust features?" 885 | ] 886 | }, 887 | { 888 | "cell_type": "markdown", 889 | "metadata": { 890 | "colab_type": "text", 891 | "id": "I2J3deT0v-HT" 892 | }, 893 | "source": [ 894 | "# Exercise IV: Try SmoothGrad and visualize the interpretations" 895 | ] 896 | }, 897 | { 898 | "cell_type": "markdown", 899 | "metadata": { 900 | "colab_type": "text", 901 | "id": "jGw4-Wnbv-HT" 902 | }, 903 | "source": [ 904 | "Fill in the following skeleton to implement SmoothGrad.\n", 905 | "\n", 906 | "```\n", 907 | "def smooth_grad(mod, im, targ, normalization, Nsamples, stdev):\n", 908 | " it = tqdm(enumerate(range(Nsamples)), total=Nsamples)\n", 909 | " total_grad = 0\n", 910 | " for _, n in it:\n", 911 | " ...\n", 912 | " grad, _ = utils.get_gradient(mod, noised_im, targ, normalization)\n", 913 | " total_grad += grad\n", 914 | " return total_grad / Nsamples\n", 915 | "```\n", 916 | " \n", 917 | "Then, try using SmoothGrad to interpret a standard model" 918 | ] 919 | }, 920 | { 921 | "cell_type": "code", 922 | "execution_count": null, 923 | "metadata": { 924 | "colab": {}, 925 | "colab_type": "code", 926 | "id": "OqzO9wmcv-HU" 927 | }, 928 | "outputs": [], 929 | "source": [ 930 | "def smooth_grad(mod, im, targ, normalization,\n", 931 | " Nsamples, stdev):\n", 932 | " # Instead of taking the gradient of a single image, we will take gradients\n", 933 | " # at a bunch of neighborhood points and average their gradients.\n", 934 | " \n", 935 | " it = tqdm(range(Nsamples), total=Nsamples)\n", 936 | "\n", 937 | " total_grad = 0\n", 938 | " for _ in it:\n", 939 | " pass # FILL THIS IN\n", 940 | " \n", 941 | " # Return average gradient\n", 942 | " return total_grad / Nsamples" 943 | ] 944 | }, 945 | { 946 | "cell_type": "code", 947 | "execution_count": null, 948 | "metadata": { 949 | "colab": { 950 | "base_uri": "https://localhost:8080/", 951 | "height": 34 952 | }, 953 | "colab_type": "code", 954 | "id": "8ciBezL-v-HW", 955 | "outputId": "a8a5a68c-75ca-46af-c474-0b22473e1740" 956 | }, 957 | "outputs": [], 958 | "source": [ 959 | "sgrad = smooth_grad(std_model, img, targ, normalization_function,\n", 960 | " 100, 0.3)" 961 | ] 962 | }, 963 | { 964 | "cell_type": "code", 965 | "execution_count": null, 966 | "metadata": { 967 | "colab": { 968 | "base_uri": "https://localhost:8080/", 969 | "height": 315 970 | }, 971 | "colab_type": "code", 972 | "id": "Nu0XvNq-v-HX", 973 | "outputId": "769fd756-e587-481a-cf0c-b2f2bb55f776" 974 | }, 975 | "outputs": [], 976 | "source": [ 977 | "# We once again use the `visualize_gradient` helper to make the SmoothGrad suitable for \n", 978 | "# visualization.\n", 979 | "\n", 980 | "show_image_row([img, utils.visualize_gradient(sgrad)],\n", 981 | " [\"Original Image\", \"SmoothGrad\"],\n", 982 | " tlist=[[label_map_IN[int(t)].split(',')[0] for t in targ],\n", 983 | " [\"\" for _ in targ]])" 984 | ] 985 | }, 986 | { 987 | "cell_type": "markdown", 988 | "metadata": { 989 | "colab_type": "text", 990 | "id": "f3u_2sptv-HZ" 991 | }, 992 | "source": [ 993 | "Explanations based on SmoothGrad align much better with features that we humans might use to make predictions. But what did we actually fix in smoothing the gradients? Were vanilla gradients just overly sensitive and noisy? Or did we maybe mask some features that the models actually rely on to get cleaner interpretations?" 994 | ] 995 | }, 996 | { 997 | "cell_type": "markdown", 998 | "metadata": { 999 | "colab_type": "text", 1000 | "id": "0nzUoG-Lv-HZ" 1001 | }, 1002 | "source": [ 1003 | "# Excercise V: Playing with robust models\n", 1004 | "\n", 1005 | "The existance of adversarial examples has also prompted a large body of research to build models that are robust to these perturbations, i.e., so-called *robust models*. One approach to get a robust model is to train against the PGD adversary: instead of minimizing the loss over training examples, we minimize the loss against adversarially perturbed training samples (obtained using PGD). We will now take a closer look at robust models." 1006 | ] 1007 | }, 1008 | { 1009 | "cell_type": "markdown", 1010 | "metadata": { 1011 | "colab_type": "text", 1012 | "id": "mLma7ET2v-Ha" 1013 | }, 1014 | "source": [ 1015 | "### Loading a robust model\n", 1016 | "\n", 1017 | "For our study today, we will use a pre-trained robust model. We trained this model (ResNet50) on a 9-class subset of the ImageNet dataset. (Developing good robust models for the more complex 1000 class version is still an active area of research.)" 1018 | ] 1019 | }, 1020 | { 1021 | "cell_type": "code", 1022 | "execution_count": null, 1023 | "metadata": { 1024 | "colab": { 1025 | "base_uri": "https://localhost:8080/", 1026 | "height": 68 1027 | }, 1028 | "colab_type": "code", 1029 | "id": "wg7rUUDCv-Ha", 1030 | "outputId": "9e4dffd4-e9e1-4f47-b503-5a18e05f1940" 1031 | }, 1032 | "outputs": [], 1033 | "source": [ 1034 | "# Load the \"Restricted\" ImageNet dataset\n", 1035 | "restricted_imagenet_ds, rin_loaders, normalization_function, label_map_RIN = \\\n", 1036 | " utils.load_dataset('restricted_imagenet', batch_size=5, num_workers=1)\n", 1037 | "\n", 1038 | "rin_loader = rin_loaders[1] \n", 1039 | "# Load a pre-trained robust model\n", 1040 | "#robust_model = utils.load_model('robust', restricted_imagenet_ds)\n", 1041 | "robust_model = utils.load_model('robust', restricted_imagenet_ds)" 1042 | ] 1043 | }, 1044 | { 1045 | "cell_type": "markdown", 1046 | "metadata": { 1047 | "colab_type": "text", 1048 | "id": "ANuBO_aAv-Hc" 1049 | }, 1050 | "source": [ 1051 | "### Can adversarial examples fool a robust model?\n", 1052 | "\n", 1053 | "We can now try to fool the robust model using the same procedure as before. Does it succeed? Try varying the attack parameters and see what happens." 1054 | ] 1055 | }, 1056 | { 1057 | "cell_type": "code", 1058 | "execution_count": null, 1059 | "metadata": { 1060 | "colab": { 1061 | "base_uri": "https://localhost:8080/", 1062 | "height": 349 1063 | }, 1064 | "colab_type": "code", 1065 | "id": "p5KJBoDCv-Hc", 1066 | "outputId": "d048b67e-68bb-4c0d-f0ae-6ecafbbf18b9" 1067 | }, 1068 | "outputs": [], 1069 | "source": [ 1070 | "# Load images from the Restricted ImageNet dataset\n", 1071 | "_, (img, targ) = next(enumerate(rin_loader))\n", 1072 | "\n", 1073 | "# Then we choose a target label for the attack.\n", 1074 | "TARGET = 3\n", 1075 | "\n", 1076 | "print(f\"Target class: {label_map_RIN[TARGET]}\")\n", 1077 | "\n", 1078 | "target_class = TARGET * ch.ones_like(targ)\n", 1079 | "\n", 1080 | "# Create adversarial examples\n", 1081 | "adv_ex = utils.L2PGD(robust_model, img, target_class, normalization_function,\n", 1082 | " step_size=0.5, Nsteps=20, eps=1.25, targeted=True)\n", 1083 | "\n", 1084 | "# Evaluate model predictions\n", 1085 | "with ch.no_grad():\n", 1086 | " logit = utils.forward_pass(robust_model, adv_ex, normalization_function)\n", 1087 | " pred_label = logit.argmax(dim=1)\n", 1088 | "\n", 1089 | "# Visualize adversarial examples\n", 1090 | "\n", 1091 | "show_image_row([img, adv_ex], \n", 1092 | " ['Original image', 'Adv. Example'],\n", 1093 | " tlist=[[label_map_RIN[int(t)].split(',')[0] for t in label] \\\n", 1094 | " for label in [targ, pred_label]])" 1095 | ] 1096 | }, 1097 | { 1098 | "cell_type": "markdown", 1099 | "metadata": { 1100 | "colab_type": "text", 1101 | "id": "HrUpVbgMv-He" 1102 | }, 1103 | "source": [ 1104 | "### Changing the prediction of a robust model\n", 1105 | "\n", 1106 | "We know that robust models are not easily fooled by adversarial examples. This tells us that one cannot change the prediction of a robust model using imperceptible L2 perturbations to the input (in contrast to standard models). How can we then modify the input to make the robust model predict a different class?\n", 1107 | "\n", 1108 | "Try creating adversarial examples as before, but with a larger eps. Our hope is that by increasing the size of the perturbation set, we can find a perturbation that actually causes the model to change its prediction? What do the perturbed inputs, i.e., \"*large epsilon adversarial examples*\" look like?" 1109 | ] 1110 | }, 1111 | { 1112 | "cell_type": "code", 1113 | "execution_count": null, 1114 | "metadata": { 1115 | "colab": { 1116 | "base_uri": "https://localhost:8080/", 1117 | "height": 51 1118 | }, 1119 | "colab_type": "code", 1120 | "id": "E_kw0gsov-He", 1121 | "outputId": "456eba7f-c602-4a93-8fb1-3006e5a1aa69" 1122 | }, 1123 | "outputs": [], 1124 | "source": [ 1125 | "TARGET = 5\n", 1126 | "\n", 1127 | "print(f\"Target class: {label_map_RIN[TARGET]}\")\n", 1128 | "\n", 1129 | "target_class = TARGET * ch.ones_like(targ)\n", 1130 | "\n", 1131 | "im_targ = utils.L2PGD(robust_model, img, target_class, normalization_function,\n", 1132 | " step_size=5, Nsteps=20, eps=100, targeted=True)\n", 1133 | "\n", 1134 | "# Evaluate model predictions\n", 1135 | "with ch.no_grad():\n", 1136 | " logit = utils.forward_pass(robust_model, im_targ, normalization_function)\n", 1137 | " pred_label = logit.argmax(dim=1)" 1138 | ] 1139 | }, 1140 | { 1141 | "cell_type": "code", 1142 | "execution_count": null, 1143 | "metadata": { 1144 | "colab": { 1145 | "base_uri": "https://localhost:8080/", 1146 | "height": 315 1147 | }, 1148 | "colab_type": "code", 1149 | "id": "tNcfI5Wmv-Hg", 1150 | "outputId": "dc6782a6-a282-462a-f44a-1d6fc86b1c62" 1151 | }, 1152 | "outputs": [], 1153 | "source": [ 1154 | "show_image_row([img, im_targ],\n", 1155 | " ['Original image', 'Large eps \\n adv. example'],\n", 1156 | " tlist=[[label_map_RIN[int(t)].split(',')[0] for t in label] \\\n", 1157 | " for label in [targ, pred_label]])" 1158 | ] 1159 | }, 1160 | { 1161 | "cell_type": "markdown", 1162 | "metadata": { 1163 | "colab_type": "text", 1164 | "id": "z3RyX8ngv-Hh" 1165 | }, 1166 | "source": [ 1167 | "# Excercise VI: Interpretations for robust models\n", 1168 | "\n", 1169 | "Based on the previous experiment, we know that, for robust models, (a) imperceptible input changes do not change the prediction and (b) to change the prediction, we actually need to change \"salient image features\".\n", 1170 | "\n", 1171 | "Does this mean that the features that robust models rely on are more human-aligned in a sense?" 1172 | ] 1173 | }, 1174 | { 1175 | "cell_type": "markdown", 1176 | "metadata": { 1177 | "colab_type": "text", 1178 | "id": "1CjltHksv-Hi" 1179 | }, 1180 | "source": [ 1181 | "### VI.I Let's start by looking at their gradients.\n", 1182 | "\n", 1183 | "What do the gradients of robust models look like? How do they compare to the gradients of a standard model and the output of SmoothGrad?" 1184 | ] 1185 | }, 1186 | { 1187 | "cell_type": "code", 1188 | "execution_count": null, 1189 | "metadata": { 1190 | "colab": { 1191 | "base_uri": "https://localhost:8080/", 1192 | "height": 315 1193 | }, 1194 | "colab_type": "code", 1195 | "id": "ptvCWsNxv-Hi", 1196 | "outputId": "0e0c8e2c-38f3-4b0e-fe5c-75f16a4a6797" 1197 | }, 1198 | "outputs": [], 1199 | "source": [ 1200 | "# Get gradient of the loss with respect to the input\n", 1201 | "grad_rob, _ = utils.get_gradient(robust_model, img, targ, normalization_function)\n", 1202 | "\n", 1203 | "# Visualize gradient\n", 1204 | "show_image_row([img, utils.visualize_gradient(grad_rob)],\n", 1205 | " [\"Original Image\", \"Gradient\"],\n", 1206 | " tlist=[[label_map_RIN[int(t)].split(',')[0] for t in targ],\n", 1207 | " [\"\" for _ in targ]])" 1208 | ] 1209 | }, 1210 | { 1211 | "cell_type": "markdown", 1212 | "metadata": { 1213 | "colab_type": "text", 1214 | "id": "6Gy1pQuWv-Hk" 1215 | }, 1216 | "source": [ 1217 | "### VI.2 Feature Visualization\n", 1218 | "\n", 1219 | "Another popular interpretability technique is known as feature visualization. Here, the goal is to find an input that maximizes a feature (a particular neuron in the deep network), instead of just trying to maximize the loss (as we did before with gradients).\n", 1220 | "\n", 1221 | "You could now try to implement feature visualization yourself. For instance, the following function gives you, for specific inputs, the model's feature vector (the layer before the final linear classifier). " 1222 | ] 1223 | }, 1224 | { 1225 | "cell_type": "code", 1226 | "execution_count": null, 1227 | "metadata": { 1228 | "colab": { 1229 | "base_uri": "https://localhost:8080/", 1230 | "height": 34 1231 | }, 1232 | "colab_type": "code", 1233 | "id": "dciqUDFrv-Hk", 1234 | "outputId": "1bfcb595-ee73-48d7-bbea-8b3d476b9329" 1235 | }, 1236 | "outputs": [], 1237 | "source": [ 1238 | "# Getting the feature representation from the model\n", 1239 | "with ch.no_grad():\n", 1240 | " feats = utils.get_features(robust_model, img, normalization_function)\n", 1241 | " print(f\"Dimensions of the feature vector: {feats.shape[1]}\")" 1242 | ] 1243 | }, 1244 | { 1245 | "cell_type": "markdown", 1246 | "metadata": { 1247 | "colab_type": "text", 1248 | "id": "CzWeK3aDv-Hl" 1249 | }, 1250 | "source": [ 1251 | "#### Implement a loss function to perform feature visualization\n", 1252 | "\n", 1253 | "Fill in the skeleton below to create a feature visualization loss function. Our goal is to maximize the `feature_number` coordinate of the feature vector.\n", 1254 | "\n", 1255 | "```\n", 1256 | "def feature_maximization_loss(mod, im, feature_number, normalization_function):\n", 1257 | " feature_vector = utils.get_features(mod, im, normalization) # Get features for input\n", 1258 | " relevant_coordinate = ch.gather(feature_vector, 1, feature_number[:, None]) \n", 1259 | " loss = ?\n", 1260 | " ...\n", 1261 | " return loss\n", 1262 | "```" 1263 | ] 1264 | }, 1265 | { 1266 | "cell_type": "code", 1267 | "execution_count": null, 1268 | "metadata": { 1269 | "colab": {}, 1270 | "colab_type": "code", 1271 | "id": "-DC5InP7v-Hm" 1272 | }, 1273 | "outputs": [], 1274 | "source": [ 1275 | "# Feature visualization loss: Try to find an input that maximizes a specific feature\n", 1276 | "\n", 1277 | "def feature_maximization_loss(mod, im, feature_number, normalization):\n", 1278 | " # Get feature vector for inputs\n", 1279 | " fr = utils.get_features(mod, im, normalization)\n", 1280 | " # We will maximize the `targ` coordinate of the feature vector for every input\n", 1281 | " loss = 'FILL THIS IN'\n", 1282 | " return loss" 1283 | ] 1284 | }, 1285 | { 1286 | "cell_type": "markdown", 1287 | "metadata": { 1288 | "colab_type": "text", 1289 | "id": "6uLITJmov-Ho" 1290 | }, 1291 | "source": [ 1292 | "#### Visualize the results of feature visualization" 1293 | ] 1294 | }, 1295 | { 1296 | "cell_type": "markdown", 1297 | "metadata": { 1298 | "colab_type": "text", 1299 | "id": "Q-nvE0bNv-Hp" 1300 | }, 1301 | "source": [ 1302 | "You can then supply the `feature_maximization_loss` to the `custom_loss` argument input in `helpers.L2PGD`, and maximize it using the following snippet." 1303 | ] 1304 | }, 1305 | { 1306 | "cell_type": "code", 1307 | "execution_count": null, 1308 | "metadata": { 1309 | "colab": {}, 1310 | "colab_type": "code", 1311 | "id": "insyk95_v-Hp" 1312 | }, 1313 | "outputs": [], 1314 | "source": [ 1315 | "# Chose a feature to visualize\n", 1316 | "FEATURE = 200 # should be less than the dimension from before\n", 1317 | "\n", 1318 | "target_feature = FEATURE * ch.ones_like(targ)" 1319 | ] 1320 | }, 1321 | { 1322 | "cell_type": "code", 1323 | "execution_count": null, 1324 | "metadata": { 1325 | "colab": { 1326 | "base_uri": "https://localhost:8080/", 1327 | "height": 34 1328 | }, 1329 | "colab_type": "code", 1330 | "id": "L8sbOqSwv-Hr", 1331 | "outputId": "c234a7c1-4407-43bf-aca3-ad2934698424" 1332 | }, 1333 | "outputs": [], 1334 | "source": [ 1335 | "# Maximize feature \n", 1336 | "im_f = utils.L2PGD(robust_model, img, target_feature, normalization_function,\n", 1337 | " step_size=5, Nsteps=20, eps=1000, \n", 1338 | " custom_loss=feature_maximization_loss, \n", 1339 | " targeted=False)" 1340 | ] 1341 | }, 1342 | { 1343 | "cell_type": "code", 1344 | "execution_count": null, 1345 | "metadata": { 1346 | "colab": { 1347 | "base_uri": "https://localhost:8080/", 1348 | "height": 311 1349 | }, 1350 | "colab_type": "code", 1351 | "id": "u6q-2yrLv-Ht", 1352 | "outputId": "f50816dd-922d-4311-c36a-b88d6d768da7" 1353 | }, 1354 | "outputs": [], 1355 | "source": [ 1356 | "# Visualize results\n", 1357 | "show_image_row([img, im_f],\n", 1358 | " [\"Original Image\", f\"Maximize Feature #{FEATURE}\"])" 1359 | ] 1360 | }, 1361 | { 1362 | "cell_type": "markdown", 1363 | "metadata": { 1364 | "colab_type": "text", 1365 | "id": "qjawi1KTv-Hu" 1366 | }, 1367 | "source": [ 1368 | "#### Try the same for a standard model" 1369 | ] 1370 | }, 1371 | { 1372 | "cell_type": "code", 1373 | "execution_count": null, 1374 | "metadata": { 1375 | "colab": { 1376 | "base_uri": "https://localhost:8080/", 1377 | "height": 34 1378 | }, 1379 | "colab_type": "code", 1380 | "id": "ydPt6Ldnv-Hv", 1381 | "outputId": "f385e437-1e29-4f61-b307-e5f8da127d86" 1382 | }, 1383 | "outputs": [], 1384 | "source": [ 1385 | "# Load image-label pair from ImageNet\n", 1386 | "_, (img, targ) = next(enumerate(in_loader))\n", 1387 | "\n", 1388 | "TARGET = 100 \n", 1389 | "target_feature = TARGET * ch.ones_like(targ)\n", 1390 | "im_f = utils.L2PGD(std_model, img, target_feature, normalization_function,\n", 1391 | " step_size=5, Nsteps=20, eps=1000, \n", 1392 | " custom_loss=feature_maximization_loss, \n", 1393 | " targeted=False)" 1394 | ] 1395 | }, 1396 | { 1397 | "cell_type": "code", 1398 | "execution_count": null, 1399 | "metadata": { 1400 | "colab": { 1401 | "base_uri": "https://localhost:8080/", 1402 | "height": 311 1403 | }, 1404 | "colab_type": "code", 1405 | "id": "q_2T3HVjv-Hx", 1406 | "outputId": "254a6333-f5e6-49d0-b339-12ad5dca76c3" 1407 | }, 1408 | "outputs": [], 1409 | "source": [ 1410 | "show_image_row([img, im_f],\n", 1411 | " [\"Original Image\", f\"Maximize Feature #{FEATURE}\"])" 1412 | ] 1413 | }, 1414 | { 1415 | "cell_type": "markdown", 1416 | "metadata": { 1417 | "colab_type": "text", 1418 | "id": "OSrj6BcGv-Hz" 1419 | }, 1420 | "source": [ 1421 | "# Bonus Excercise I: Try feature visualization for robust models starting from noise rather than images\n", 1422 | "\n", 1423 | "What if we try feature visualization starting from noise?" 1424 | ] 1425 | }, 1426 | { 1427 | "cell_type": "code", 1428 | "execution_count": null, 1429 | "metadata": { 1430 | "colab": { 1431 | "base_uri": "https://localhost:8080/", 1432 | "height": 34 1433 | }, 1434 | "colab_type": "code", 1435 | "id": "rmPP9C_Yv-Hz", 1436 | "outputId": "e500c099-f790-4f9a-a4ac-1c5d1ed480a0" 1437 | }, 1438 | "outputs": [], 1439 | "source": [ 1440 | "# Create a \"noise\" image\n", 1441 | "noise_img = ch.clamp(ch.randn_like(img) + 0.5, 0, 1)\n", 1442 | "\n", 1443 | "FEATURE = 201\n", 1444 | "target_feature = FEATURE * ch.ones_like(targ)\n", 1445 | "im_f = utils.L2PGD(robust_model, noise_img, target_feature, normalization_function,\n", 1446 | " step_size=5, Nsteps=200, eps=1000, \n", 1447 | " custom_loss=feature_maximization_loss, \n", 1448 | " targeted=False)" 1449 | ] 1450 | }, 1451 | { 1452 | "cell_type": "code", 1453 | "execution_count": null, 1454 | "metadata": { 1455 | "colab": { 1456 | "base_uri": "https://localhost:8080/", 1457 | "height": 311 1458 | }, 1459 | "colab_type": "code", 1460 | "id": "uY4IiY8Hv-H2", 1461 | "outputId": "591fb720-94ad-4439-b0ec-62903c6b55c2" 1462 | }, 1463 | "outputs": [], 1464 | "source": [ 1465 | "show_image_row([noise_img, im_f],\n", 1466 | " [\"Original Image\", f\"Maximize Feature #{FEATURE}\"])" 1467 | ] 1468 | }, 1469 | { 1470 | "cell_type": "code", 1471 | "execution_count": null, 1472 | "metadata": { 1473 | "colab": {}, 1474 | "colab_type": "code", 1475 | "id": "XCxBh4Utv-H4" 1476 | }, 1477 | "outputs": [], 1478 | "source": [] 1479 | } 1480 | ], 1481 | "metadata": { 1482 | "accelerator": "GPU", 1483 | "colab": { 1484 | "name": "alignment_solution.ipynb", 1485 | "provenance": [] 1486 | }, 1487 | "kernelspec": { 1488 | "display_name": "Python 3", 1489 | "language": "python", 1490 | "name": "python3" 1491 | }, 1492 | "language_info": { 1493 | "codemirror_mode": { 1494 | "name": "ipython", 1495 | "version": 3 1496 | }, 1497 | "file_extension": ".py", 1498 | "mimetype": "text/x-python", 1499 | "name": "python", 1500 | "nbconvert_exporter": "python", 1501 | "pygments_lexer": "ipython3", 1502 | "version": "3.6.6" 1503 | }, 1504 | "widgets": { 1505 | "application/vnd.jupyter.widget-state+json": { 1506 | "025c326236cd48e48cfb5fc83403b8a7": { 1507 | "model_module": "@jupyter-widgets/controls", 1508 | "model_name": "HBoxModel", 1509 | "state": { 1510 | "_dom_classes": [], 1511 | "_model_module": "@jupyter-widgets/controls", 1512 | "_model_module_version": "1.5.0", 1513 | "_model_name": "HBoxModel", 1514 | "_view_count": null, 1515 | "_view_module": "@jupyter-widgets/controls", 1516 | "_view_module_version": "1.5.0", 1517 | "_view_name": "HBoxView", 1518 | "box_style": "", 1519 | "children": [ 1520 | "IPY_MODEL_d47a823649334878a19883d16c94b071", 1521 | "IPY_MODEL_6cbb663fb4cd48c5aaf1d2a0544d4e95" 1522 | ], 1523 | "layout": "IPY_MODEL_8e5da4ae9373457f823a7c7ec28dec8f" 1524 | } 1525 | }, 1526 | "0d5cba86a8ba444395af9bb5016ce6c9": { 1527 | "model_module": "@jupyter-widgets/base", 1528 | "model_name": "LayoutModel", 1529 | "state": { 1530 | "_model_module": "@jupyter-widgets/base", 1531 | "_model_module_version": "1.2.0", 1532 | "_model_name": "LayoutModel", 1533 | "_view_count": null, 1534 | "_view_module": "@jupyter-widgets/base", 1535 | "_view_module_version": "1.2.0", 1536 | "_view_name": "LayoutView", 1537 | "align_content": null, 1538 | "align_items": null, 1539 | "align_self": null, 1540 | "border": null, 1541 | "bottom": null, 1542 | "display": null, 1543 | "flex": null, 1544 | "flex_flow": null, 1545 | "grid_area": null, 1546 | "grid_auto_columns": null, 1547 | "grid_auto_flow": null, 1548 | "grid_auto_rows": null, 1549 | "grid_column": null, 1550 | "grid_gap": null, 1551 | "grid_row": null, 1552 | "grid_template_areas": null, 1553 | "grid_template_columns": null, 1554 | "grid_template_rows": null, 1555 | "height": null, 1556 | "justify_content": null, 1557 | "justify_items": null, 1558 | "left": null, 1559 | "margin": null, 1560 | "max_height": null, 1561 | "max_width": null, 1562 | "min_height": null, 1563 | "min_width": null, 1564 | "object_fit": null, 1565 | "object_position": null, 1566 | "order": null, 1567 | "overflow": null, 1568 | "overflow_x": null, 1569 | "overflow_y": null, 1570 | "padding": null, 1571 | "right": null, 1572 | "top": null, 1573 | "visibility": null, 1574 | "width": null 1575 | } 1576 | }, 1577 | "2c4c64e0da04428b8c74a7f55d0b3ca0": { 1578 | "model_module": "@jupyter-widgets/base", 1579 | "model_name": "LayoutModel", 1580 | "state": { 1581 | "_model_module": "@jupyter-widgets/base", 1582 | "_model_module_version": "1.2.0", 1583 | "_model_name": "LayoutModel", 1584 | "_view_count": null, 1585 | "_view_module": "@jupyter-widgets/base", 1586 | "_view_module_version": "1.2.0", 1587 | "_view_name": "LayoutView", 1588 | "align_content": null, 1589 | "align_items": null, 1590 | "align_self": null, 1591 | "border": null, 1592 | "bottom": null, 1593 | "display": null, 1594 | "flex": null, 1595 | "flex_flow": null, 1596 | "grid_area": null, 1597 | "grid_auto_columns": null, 1598 | "grid_auto_flow": null, 1599 | "grid_auto_rows": null, 1600 | "grid_column": null, 1601 | "grid_gap": null, 1602 | "grid_row": null, 1603 | "grid_template_areas": null, 1604 | "grid_template_columns": null, 1605 | "grid_template_rows": null, 1606 | "height": null, 1607 | "justify_content": null, 1608 | "justify_items": null, 1609 | "left": null, 1610 | "margin": null, 1611 | "max_height": null, 1612 | "max_width": null, 1613 | "min_height": null, 1614 | "min_width": null, 1615 | "object_fit": null, 1616 | "object_position": null, 1617 | "order": null, 1618 | "overflow": null, 1619 | "overflow_x": null, 1620 | "overflow_y": null, 1621 | "padding": null, 1622 | "right": null, 1623 | "top": null, 1624 | "visibility": null, 1625 | "width": null 1626 | } 1627 | }, 1628 | "5443d05074074a7b8cdf6aaf22ae8bde": { 1629 | "model_module": "@jupyter-widgets/controls", 1630 | "model_name": "ProgressStyleModel", 1631 | "state": { 1632 | "_model_module": "@jupyter-widgets/controls", 1633 | "_model_module_version": "1.5.0", 1634 | "_model_name": "ProgressStyleModel", 1635 | "_view_count": null, 1636 | "_view_module": "@jupyter-widgets/base", 1637 | "_view_module_version": "1.2.0", 1638 | "_view_name": "StyleView", 1639 | "bar_color": null, 1640 | "description_width": "initial" 1641 | } 1642 | }, 1643 | "60838b71f99e4445b92d53aa722d5687": { 1644 | "model_module": "@jupyter-widgets/controls", 1645 | "model_name": "ProgressStyleModel", 1646 | "state": { 1647 | "_model_module": "@jupyter-widgets/controls", 1648 | "_model_module_version": "1.5.0", 1649 | "_model_name": "ProgressStyleModel", 1650 | "_view_count": null, 1651 | "_view_module": "@jupyter-widgets/base", 1652 | "_view_module_version": "1.2.0", 1653 | "_view_name": "StyleView", 1654 | "bar_color": null, 1655 | "description_width": "initial" 1656 | } 1657 | }, 1658 | "64c8c660ae09476a8b0c2839a8e323cc": { 1659 | "model_module": "@jupyter-widgets/controls", 1660 | "model_name": "DescriptionStyleModel", 1661 | "state": { 1662 | "_model_module": "@jupyter-widgets/controls", 1663 | "_model_module_version": "1.5.0", 1664 | "_model_name": "DescriptionStyleModel", 1665 | "_view_count": null, 1666 | "_view_module": "@jupyter-widgets/base", 1667 | "_view_module_version": "1.2.0", 1668 | "_view_name": "StyleView", 1669 | "description_width": "" 1670 | } 1671 | }, 1672 | "6cbb663fb4cd48c5aaf1d2a0544d4e95": { 1673 | "model_module": "@jupyter-widgets/controls", 1674 | "model_name": "HTMLModel", 1675 | "state": { 1676 | "_dom_classes": [], 1677 | "_model_module": "@jupyter-widgets/controls", 1678 | "_model_module_version": "1.5.0", 1679 | "_model_name": "HTMLModel", 1680 | "_view_count": null, 1681 | "_view_module": "@jupyter-widgets/controls", 1682 | "_view_module_version": "1.5.0", 1683 | "_view_name": "HTMLView", 1684 | "description": "", 1685 | "description_tooltip": null, 1686 | "layout": "IPY_MODEL_777111e1581c40e69acd4dc002f400fe", 1687 | "placeholder": "​", 1688 | "style": "IPY_MODEL_cb153a59ff72473bac8b2f6dcaa29f57", 1689 | "value": " 170500096/? [00:09<00:00, 17086638.92it/s]" 1690 | } 1691 | }, 1692 | "7106747f5eac4286b49d9a49d33d64e0": { 1693 | "model_module": "@jupyter-widgets/controls", 1694 | "model_name": "HTMLModel", 1695 | "state": { 1696 | "_dom_classes": [], 1697 | "_model_module": "@jupyter-widgets/controls", 1698 | "_model_module_version": "1.5.0", 1699 | "_model_name": "HTMLModel", 1700 | "_view_count": null, 1701 | "_view_module": "@jupyter-widgets/controls", 1702 | "_view_module_version": "1.5.0", 1703 | "_view_name": "HTMLView", 1704 | "description": "", 1705 | "description_tooltip": null, 1706 | "layout": "IPY_MODEL_0d5cba86a8ba444395af9bb5016ce6c9", 1707 | "placeholder": "​", 1708 | "style": "IPY_MODEL_64c8c660ae09476a8b0c2839a8e323cc", 1709 | "value": " 44.7M/44.7M [00:14<00:00, 3.20MB/s]" 1710 | } 1711 | }, 1712 | "777111e1581c40e69acd4dc002f400fe": { 1713 | "model_module": "@jupyter-widgets/base", 1714 | "model_name": "LayoutModel", 1715 | "state": { 1716 | "_model_module": "@jupyter-widgets/base", 1717 | "_model_module_version": "1.2.0", 1718 | "_model_name": "LayoutModel", 1719 | "_view_count": null, 1720 | "_view_module": "@jupyter-widgets/base", 1721 | "_view_module_version": "1.2.0", 1722 | "_view_name": "LayoutView", 1723 | "align_content": null, 1724 | "align_items": null, 1725 | "align_self": null, 1726 | "border": null, 1727 | "bottom": null, 1728 | "display": null, 1729 | "flex": null, 1730 | "flex_flow": null, 1731 | "grid_area": null, 1732 | "grid_auto_columns": null, 1733 | "grid_auto_flow": null, 1734 | "grid_auto_rows": null, 1735 | "grid_column": null, 1736 | "grid_gap": null, 1737 | "grid_row": null, 1738 | "grid_template_areas": null, 1739 | "grid_template_columns": null, 1740 | "grid_template_rows": null, 1741 | "height": null, 1742 | "justify_content": null, 1743 | "justify_items": null, 1744 | "left": null, 1745 | "margin": null, 1746 | "max_height": null, 1747 | "max_width": null, 1748 | "min_height": null, 1749 | "min_width": null, 1750 | "object_fit": null, 1751 | "object_position": null, 1752 | "order": null, 1753 | "overflow": null, 1754 | "overflow_x": null, 1755 | "overflow_y": null, 1756 | "padding": null, 1757 | "right": null, 1758 | "top": null, 1759 | "visibility": null, 1760 | "width": null 1761 | } 1762 | }, 1763 | "82d97a20caab4c7c83d41b9149be24a5": { 1764 | "model_module": "@jupyter-widgets/controls", 1765 | "model_name": "FloatProgressModel", 1766 | "state": { 1767 | "_dom_classes": [], 1768 | "_model_module": "@jupyter-widgets/controls", 1769 | "_model_module_version": "1.5.0", 1770 | "_model_name": "FloatProgressModel", 1771 | "_view_count": null, 1772 | "_view_module": "@jupyter-widgets/controls", 1773 | "_view_module_version": "1.5.0", 1774 | "_view_name": "ProgressView", 1775 | "bar_style": "success", 1776 | "description": "100%", 1777 | "description_tooltip": null, 1778 | "layout": "IPY_MODEL_a6e18480d09d45f0adfda751f5673406", 1779 | "max": 46827520, 1780 | "min": 0, 1781 | "orientation": "horizontal", 1782 | "style": "IPY_MODEL_5443d05074074a7b8cdf6aaf22ae8bde", 1783 | "value": 46827520 1784 | } 1785 | }, 1786 | "8e5da4ae9373457f823a7c7ec28dec8f": { 1787 | "model_module": "@jupyter-widgets/base", 1788 | "model_name": "LayoutModel", 1789 | "state": { 1790 | "_model_module": "@jupyter-widgets/base", 1791 | "_model_module_version": "1.2.0", 1792 | "_model_name": "LayoutModel", 1793 | "_view_count": null, 1794 | "_view_module": "@jupyter-widgets/base", 1795 | "_view_module_version": "1.2.0", 1796 | "_view_name": "LayoutView", 1797 | "align_content": null, 1798 | "align_items": null, 1799 | "align_self": null, 1800 | "border": null, 1801 | "bottom": null, 1802 | "display": null, 1803 | "flex": null, 1804 | "flex_flow": null, 1805 | "grid_area": null, 1806 | "grid_auto_columns": null, 1807 | "grid_auto_flow": null, 1808 | "grid_auto_rows": null, 1809 | "grid_column": null, 1810 | "grid_gap": null, 1811 | "grid_row": null, 1812 | "grid_template_areas": null, 1813 | "grid_template_columns": null, 1814 | "grid_template_rows": null, 1815 | "height": null, 1816 | "justify_content": null, 1817 | "justify_items": null, 1818 | "left": null, 1819 | "margin": null, 1820 | "max_height": null, 1821 | "max_width": null, 1822 | "min_height": null, 1823 | "min_width": null, 1824 | "object_fit": null, 1825 | "object_position": null, 1826 | "order": null, 1827 | "overflow": null, 1828 | "overflow_x": null, 1829 | "overflow_y": null, 1830 | "padding": null, 1831 | "right": null, 1832 | "top": null, 1833 | "visibility": null, 1834 | "width": null 1835 | } 1836 | }, 1837 | "a6e18480d09d45f0adfda751f5673406": { 1838 | "model_module": "@jupyter-widgets/base", 1839 | "model_name": "LayoutModel", 1840 | "state": { 1841 | "_model_module": "@jupyter-widgets/base", 1842 | "_model_module_version": "1.2.0", 1843 | "_model_name": "LayoutModel", 1844 | "_view_count": null, 1845 | "_view_module": "@jupyter-widgets/base", 1846 | "_view_module_version": "1.2.0", 1847 | "_view_name": "LayoutView", 1848 | "align_content": null, 1849 | "align_items": null, 1850 | "align_self": null, 1851 | "border": null, 1852 | "bottom": null, 1853 | "display": null, 1854 | "flex": null, 1855 | "flex_flow": null, 1856 | "grid_area": null, 1857 | "grid_auto_columns": null, 1858 | "grid_auto_flow": null, 1859 | "grid_auto_rows": null, 1860 | "grid_column": null, 1861 | "grid_gap": null, 1862 | "grid_row": null, 1863 | "grid_template_areas": null, 1864 | "grid_template_columns": null, 1865 | "grid_template_rows": null, 1866 | "height": null, 1867 | "justify_content": null, 1868 | "justify_items": null, 1869 | "left": null, 1870 | "margin": null, 1871 | "max_height": null, 1872 | "max_width": null, 1873 | "min_height": null, 1874 | "min_width": null, 1875 | "object_fit": null, 1876 | "object_position": null, 1877 | "order": null, 1878 | "overflow": null, 1879 | "overflow_x": null, 1880 | "overflow_y": null, 1881 | "padding": null, 1882 | "right": null, 1883 | "top": null, 1884 | "visibility": null, 1885 | "width": null 1886 | } 1887 | }, 1888 | "cb153a59ff72473bac8b2f6dcaa29f57": { 1889 | "model_module": "@jupyter-widgets/controls", 1890 | "model_name": "DescriptionStyleModel", 1891 | "state": { 1892 | "_model_module": "@jupyter-widgets/controls", 1893 | "_model_module_version": "1.5.0", 1894 | "_model_name": "DescriptionStyleModel", 1895 | "_view_count": null, 1896 | "_view_module": "@jupyter-widgets/base", 1897 | "_view_module_version": "1.2.0", 1898 | "_view_name": "StyleView", 1899 | "description_width": "" 1900 | } 1901 | }, 1902 | "cf3d338a3e0d4d6c9e8d47d23abf7c1a": { 1903 | "model_module": "@jupyter-widgets/controls", 1904 | "model_name": "HBoxModel", 1905 | "state": { 1906 | "_dom_classes": [], 1907 | "_model_module": "@jupyter-widgets/controls", 1908 | "_model_module_version": "1.5.0", 1909 | "_model_name": "HBoxModel", 1910 | "_view_count": null, 1911 | "_view_module": "@jupyter-widgets/controls", 1912 | "_view_module_version": "1.5.0", 1913 | "_view_name": "HBoxView", 1914 | "box_style": "", 1915 | "children": [ 1916 | "IPY_MODEL_82d97a20caab4c7c83d41b9149be24a5", 1917 | "IPY_MODEL_7106747f5eac4286b49d9a49d33d64e0" 1918 | ], 1919 | "layout": "IPY_MODEL_fb445702be25404fa29c10e7461bb0e2" 1920 | } 1921 | }, 1922 | "d47a823649334878a19883d16c94b071": { 1923 | "model_module": "@jupyter-widgets/controls", 1924 | "model_name": "FloatProgressModel", 1925 | "state": { 1926 | "_dom_classes": [], 1927 | "_model_module": "@jupyter-widgets/controls", 1928 | "_model_module_version": "1.5.0", 1929 | "_model_name": "FloatProgressModel", 1930 | "_view_count": null, 1931 | "_view_module": "@jupyter-widgets/controls", 1932 | "_view_module_version": "1.5.0", 1933 | "_view_name": "ProgressView", 1934 | "bar_style": "success", 1935 | "description": "", 1936 | "description_tooltip": null, 1937 | "layout": "IPY_MODEL_2c4c64e0da04428b8c74a7f55d0b3ca0", 1938 | "max": 1, 1939 | "min": 0, 1940 | "orientation": "horizontal", 1941 | "style": "IPY_MODEL_60838b71f99e4445b92d53aa722d5687", 1942 | "value": 1 1943 | } 1944 | }, 1945 | "fb445702be25404fa29c10e7461bb0e2": { 1946 | "model_module": "@jupyter-widgets/base", 1947 | "model_name": "LayoutModel", 1948 | "state": { 1949 | "_model_module": "@jupyter-widgets/base", 1950 | "_model_module_version": "1.2.0", 1951 | "_model_name": "LayoutModel", 1952 | "_view_count": null, 1953 | "_view_module": "@jupyter-widgets/base", 1954 | "_view_module_version": "1.2.0", 1955 | "_view_name": "LayoutView", 1956 | "align_content": null, 1957 | "align_items": null, 1958 | "align_self": null, 1959 | "border": null, 1960 | "bottom": null, 1961 | "display": null, 1962 | "flex": null, 1963 | "flex_flow": null, 1964 | "grid_area": null, 1965 | "grid_auto_columns": null, 1966 | "grid_auto_flow": null, 1967 | "grid_auto_rows": null, 1968 | "grid_column": null, 1969 | "grid_gap": null, 1970 | "grid_row": null, 1971 | "grid_template_areas": null, 1972 | "grid_template_columns": null, 1973 | "grid_template_rows": null, 1974 | "height": null, 1975 | "justify_content": null, 1976 | "justify_items": null, 1977 | "left": null, 1978 | "margin": null, 1979 | "max_height": null, 1980 | "max_width": null, 1981 | "min_height": null, 1982 | "min_width": null, 1983 | "object_fit": null, 1984 | "object_position": null, 1985 | "order": null, 1986 | "overflow": null, 1987 | "overflow_x": null, 1988 | "overflow_y": null, 1989 | "padding": null, 1990 | "right": null, 1991 | "top": null, 1992 | "visibility": null, 1993 | "width": null 1994 | } 1995 | } 1996 | } 1997 | } 1998 | }, 1999 | "nbformat": 4, 2000 | "nbformat_minor": 2 2001 | } 2002 | -------------------------------------------------------------------------------- /exercises.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MadryLab/AdvEx_Tutorial/3d9ef8ff4c33e95ee996c8339577f5f874e14b08/exercises.pdf -------------------------------------------------------------------------------- /utils.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import torch as ch 3 | from torchvision.models import * 4 | from robustness.tools import helpers 5 | from robustness.datasets import DATASETS 6 | from robustness.tools.label_maps import CLASS_DICT 7 | from robustness import model_utils, datasets 8 | from tqdm import tqdm 9 | import torch.nn as nn 10 | import torch.optim as optim 11 | from tqdm import trange 12 | 13 | def load_model(arch, dataset=None): 14 | ''' 15 | Load pretrained model with specified architecture. 16 | Args: 17 | arch (str): name of one of the pytorch pretrained models or 18 | "robust" for robust model 19 | dataset (dataset object): not None only for robust model 20 | Returns: 21 | model: loaded model 22 | ''' 23 | 24 | if arch != 'robust': 25 | model = eval(arch)(pretrained=True).cuda() 26 | model.eval() 27 | pass 28 | else: 29 | model_kwargs = { 30 | 'arch': 'resnet50', 31 | 'dataset': dataset, 32 | 'resume_path': f'./models/RestrictedImageNet.pt' 33 | } 34 | 35 | model, _ = model_utils.make_and_restore_model(**model_kwargs) 36 | model.eval() 37 | try: 38 | model = model.module.model 39 | except: 40 | model = model.model 41 | return model 42 | 43 | def load_dataset(dataset, batch_size, num_workers=1, data_path='./data'): 44 | ''' 45 | Load pretrained model with specified architecture. 46 | Args: 47 | dataset (str): name of one of dataset 48 | ('restricted_imagenet' or 'imagenet') 49 | batch_size (int): batch size 50 | num_workers (int): number of workers 51 | data_path (str): path to data 52 | Returns: 53 | ds: dataset object 54 | loader: dataset loader 55 | norm: normalization function for dataset 56 | label_map: label map (class numbers to names) for dataset 57 | ''' 58 | 59 | ds = DATASETS[dataset](data_path) 60 | loaders = ds.make_loaders(num_workers, batch_size, data_aug=False) 61 | normalization = helpers.InputNormalize(ds.mean, ds.std) 62 | label_map = CLASS_DICT['ImageNet'] if dataset == 'imagenet' else CLASS_DICT['RestrictedImageNet'] 63 | return ds, loaders, normalization, label_map 64 | 65 | 66 | def load_binary_dataset(batch_size, num_workers=1, classes=[0, 1], data_path='./data'): 67 | dataset, loaders, normalization, label_map = load_dataset('cifar', 68 | batch_size=100, 69 | num_workers=1) 70 | 71 | train_loader, val_loader = loaders 72 | 73 | def get_subset(loader, classes=[0, 1]): 74 | ims, targs = [], [] 75 | for _, (im, targ) in enumerate(loader): 76 | for ci, c in enumerate(classes): 77 | idx = np.where(targ.numpy() == c)[0] 78 | if len(idx) == 0: continue 79 | ims.extend(im[idx]) 80 | if ci == 0: 81 | targs.extend(ch.zeros_like(targ[idx])) 82 | else: 83 | targs.extend(ch.ones_like(targ[idx])) 84 | ims, targs = ch.stack(ims), ch.stack(targs, 0) 85 | idx = np.arange(len(ims)) 86 | np.random.shuffle(idx) 87 | return ims[idx], targs[idx] 88 | 89 | data = {} 90 | data['train'] = get_subset(train_loader, classes=classes) 91 | data['test'] = get_subset(val_loader, classes=classes) 92 | return data 93 | 94 | 95 | def forward_pass(mod, im, normalization=None): 96 | ''' 97 | Compute model output (logits) for a batch of inputs. 98 | Args: 99 | mod: model 100 | im (tensor): batch of images 101 | normalization (function): normalization function to be applied on inputs 102 | 103 | Returns: 104 | op: logits of model for given inputs 105 | ''' 106 | if normalization is not None: 107 | im_norm = normalization(im) 108 | else: 109 | im_norm = im 110 | op = mod(im_norm.cuda()) 111 | return op 112 | 113 | def get_gradient(mod, im, targ, normalization, custom_loss=None): 114 | ''' 115 | Compute model gradients w.r.t. inputs. 116 | Args: 117 | mod: model 118 | im (tensor): batch of images 119 | normalization (function): normalization function to be applied on inputs 120 | custom_loss (function): custom loss function to employ (optional) 121 | 122 | Returns: 123 | grad: model gradients w.r.t. inputs 124 | loss: model loss evaluated at inputs 125 | ''' 126 | def compute_loss(inp, target, normalization): 127 | if custom_loss is None: 128 | output = forward_pass(mod, inp, normalization) 129 | return ch.nn.CrossEntropyLoss()(output, target.cuda()) 130 | else: 131 | return custom_loss(mod, inp, target.cuda(), normalization) 132 | 133 | x = im.clone().detach().requires_grad_(True) 134 | loss = compute_loss(x, targ, normalization) 135 | grad, = ch.autograd.grad(loss, [x]) 136 | return grad.clone(), loss.detach().item() 137 | 138 | def visualize_gradient(t): 139 | ''' 140 | Visualize gradients of model. To transform gradient to image range [0, 1], we 141 | subtract the mean, divide by 3 standard deviations, and then clip. 142 | 143 | Args: 144 | t (tensor): input tensor (usually gradients) 145 | ''' 146 | mt = ch.mean(t, dim=[2, 3], keepdim=True).expand_as(t) 147 | st = ch.std(t, dim=[2, 3], keepdim=True).expand_as(t) 148 | return ch.clamp((t - mt) / (3 * st) + 0.5, 0, 1) 149 | 150 | def L2PGD(mod, im, targ, normalization, step_size, Nsteps, 151 | eps=None, targeted=True, custom_loss=None): 152 | ''' 153 | Compute L2 adversarial examples for given model. 154 | Args: 155 | mod: model 156 | im (tensor): batch of images 157 | targ (tensor): batch of labels 158 | normalization (function): normalization function to be applied on inputs 159 | step_size (float): optimization step size 160 | Nsteps (int): number of optimization steps 161 | eps (float): radius of L2 ball 162 | targeted (bool): True if we want to maximize loss, else False 163 | custom_loss (function): custom loss function to employ (optional) 164 | 165 | Returns: 166 | x: batch of adversarial examples for input images 167 | ''' 168 | if custom_loss is None: 169 | loss_fn = ch.nn.CrossEntropyLoss() 170 | else: 171 | loss_fn = custom_loss 172 | 173 | sign = -1 if targeted else 1 174 | 175 | it = tqdm(enumerate(range(Nsteps)), total=Nsteps) 176 | 177 | x = im.detach() 178 | l = len(x.shape) - 1 179 | 180 | for _, i in it: 181 | x = x.clone().detach().requires_grad_(True) 182 | g, loss = get_gradient(mod, x, targ, normalization, 183 | custom_loss=custom_loss) 184 | 185 | it.set_description(f'Loss: {loss}') 186 | 187 | with ch.no_grad(): 188 | 189 | # Compute gradient step 190 | g_norm = ch.norm(g.view(g.shape[0], -1), dim=1).view(-1, *([1]*l)) 191 | scaled_g = g / (g_norm + 1e-10) 192 | x += sign * scaled_g * step_size 193 | 194 | # Project back to L2 eps ball 195 | if eps is not None: 196 | diff = x - im 197 | diff = diff.renorm(p=2, dim=0, maxnorm=eps) 198 | x = im + diff 199 | x = ch.clamp(x, 0, 1) 200 | return x 201 | 202 | def get_features(mod, im, normalization): 203 | ''' 204 | Get feature representation of model (output of layer before final linear 205 | classifier) for given inputs. 206 | 207 | Args: 208 | mod: model 209 | im (tensor): batch of images 210 | targ (tensor): batch of labels 211 | normalization (function): normalization function to be applied on inputs 212 | 213 | Returns: 214 | features: batch of features for input images 215 | ''' 216 | feature_rep = ch.nn.Sequential(*list(mod.children())[:-1]) 217 | im_norm = normalization(im.cpu()).cuda() 218 | features = feature_rep(im_norm)[:, :, 0, 0] 219 | return features 220 | 221 | ## Helpers for training/evaluating linear classifiers 222 | def accuracy(net, im, targ): 223 | ''' 224 | Evaluate the accuracy of a given linear classifier. 225 | Args: 226 | mod: model 227 | im (tensor): batch of images 228 | targ (tensor): batch of labels 229 | 230 | Returns: 231 | x: batch of adversarial examples for input images 232 | ''' 233 | op = net.forward(im).argmax(dim=1) 234 | acc = (op == targ).sum().item() / len(im) * 100 235 | return acc 236 | 237 | class Linear(nn.Module): 238 | ''' 239 | Class for linear classifiers. 240 | ''' 241 | def __init__(self, Nfeatures, Nclasses): 242 | ''' 243 | Initializes the linear classifier. 244 | Args: 245 | Nfeatures (int): Input dimension 246 | Nclasses (int): Number of classes 247 | ''' 248 | super(Linear, self).__init__() 249 | self.fc = nn.Linear(Nfeatures, Nclasses) 250 | def forward(self, im): 251 | ''' 252 | Perform a forward pass through the linear classifier. 253 | Args: 254 | im (tensor): batch of images 255 | 256 | Returns: 257 | pred (tensor): batch of logits 258 | ''' 259 | imr = im.view(im.shape[0], -1) 260 | pred = self.fc(imr) 261 | return pred 262 | 263 | def get_predictions(im, mod): 264 | ''' 265 | Determine predictions of linear classifier. 266 | Args: 267 | im (tensor): batch of images 268 | mod: model 269 | 270 | Returns: 271 | op (tensor): batch of predicted labels 272 | ''' 273 | with ch.no_grad(): 274 | op = mod(im.cuda()) 275 | op = op.argmax(dim=1) 276 | return op 277 | 278 | def train_linear(data, 279 | Nclasses=2, 280 | step_size=0.1, 281 | iterations=1000, 282 | log_iterations=500): 283 | ''' 284 | Train a linear classifier on the input data. 285 | Args: 286 | data (dict): A dictionary containing train and test data 287 | Nclasses (int): Number of classes in the data 288 | step_size (float): Step size to use for gradient descent 289 | iterations (int): Number of steps to train the model for 290 | log_iterations (int): Frequency of printing/logging of accuracies 291 | 292 | Returns: 293 | store (dict): Train and eval logs 294 | final_net: trained linear classifier 295 | ''' 296 | 297 | store = {'step': [], 'train': [], 'test': []} 298 | 299 | Nfeatures = int(np.prod(data['train'][0].shape[1:])) 300 | net = ch.nn.DataParallel(Linear(Nfeatures, Nclasses).cuda()) 301 | 302 | criterion = nn.CrossEntropyLoss() 303 | optimizer = optim.SGD(net.parameters(), lr=step_size) 304 | 305 | it = trange(iterations + 1) 306 | for k in it: 307 | if k % log_iterations == 0: 308 | store['step'].append(k) 309 | acc_log = [] 310 | for name, (xs, ys) in data.items(): 311 | xs, ys = xs.cuda(), ys.cuda() 312 | store[name].append(accuracy(net, xs, ys)) 313 | acc_log.append(store[name][-1]) 314 | if name == 'test' and len(store['test']) > 1 and \ 315 | store['test'][-1] > max(store['test'][:-1]): 316 | params = [p.clone() for p in net.module.parameters()] 317 | 318 | it.set_description(f"Train accuracy={acc_log[0]:.2f}, Test accuracy={acc_log[1]:.2f}") 319 | optimizer.zero_grad() 320 | xs, ys = data['train'] 321 | xs, ys = xs.cuda(), ys.cuda() 322 | logits = net(xs) 323 | loss = criterion(logits, ys) 324 | loss.backward() 325 | optimizer.step() 326 | 327 | final_net = Linear(data['train'][0].shape[1], Nclasses).cuda() 328 | final_net.fc.weight.data = params[0] 329 | final_net.fc.weight.bias = params[1] 330 | final_net = ch.nn.DataParallel(final_net) 331 | 332 | xs, ys = data['test'] 333 | xs, ys = xs.cuda(), ys.cuda() 334 | print(f"Final test accuracy: {accuracy(final_net, xs, ys):.2f}") 335 | return store, final_net --------------------------------------------------------------------------------