├── 1장.ipynb
├── 2장.ipynb
├── 3장.ipynb
├── 4장.ipynb
├── 5.png
├── 5장.ipynb
├── 5장.ipynb
├── 6장.ipynb
├── README.md
├── common
    ├── __init__.py
    ├── functions.py
    ├── gradient.py
    ├── layers.py
    ├── multi_layer_net.py
    ├── multi_layer_net_extend.py
    ├── optimizer.py
    ├── trainer.py
    └── util.py
├── dataset
    ├── __init__.py
    └── mnist.py
├── decision.png
├── gates.jpg
├── layers.png
├── lena.png
├── neurons.png
├── perceptron.png
├── sample_weight.pkl
├── xor.png
└── 목차.ipynb


/5.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SDRLurker/deep-learning/f95b0a2c7c4ccec3c7395ad5a648c7664b168866/5.png


--------------------------------------------------------------------------------
/5장.ipynb:
--------------------------------------------------------------------------------
   1 | {
   2 |  "cells": [
   3 |   {
   4 |    "cell_type": "markdown",
   5 |    "metadata": {},
   6 |    "source": [
   7 |     "# 밑바닥부터 시작하는 딥러닝\n",
   8 |     "\n",
   9 |     "# Deep Learning from Scratch\n",
  10 |     "\n",
  11 |     "## Github \n",
  12 |     "\n",
  13 |     "https://github.com/WegraLee/deep-learning-from-scratch\n",
  14 |     "\n",
  15 |     "## 목차\n",
  16 |     "\n",
  17 |     "http://nbviewer.jupyter.org/github/SDRLurker/deep-learning/blob/master/%EB%AA%A9%EC%B0%A8.ipynb"
  18 |    ]
  19 |   },
  20 |   {
  21 |    "cell_type": "markdown",
  22 |    "metadata": {},
  23 |    "source": [
  24 |     "# 5 오차역전파법\n",
  25 |     "\n",
  26 |     "오차역전파법(backpropagation): 가중치 매개변수의 기울기를 효율적으로 계산\n",
  27 |     "\n",
  28 |     "오차를 역(반대 방향)으로 전파하는 방법(backward propagation of errors)\n",
  29 |     "\n",
  30 |     "안드레 카패시(Andrej Karpathy)의 블로그\n",
  31 |     "\n",
  32 |     "* 참고주소 : http://karpathy.github.io/neuralnets\n",
  33 |     "\n",
  34 |     "* 오차역전파법을 계산 그래프로 설명\n",
  35 |     "\n",
  36 |     "페이페이 리(Fei-Fei Li) 교수가 진행한 스탠퍼드 대학교 딥러닝 수업 CS231n 참고\n",
  37 |     "\n",
  38 |     "* 참고주소 : http://cs231n.github.io"
  39 |    ]
  40 |   },
  41 |   {
  42 |    "cell_type": "markdown",
  43 |    "metadata": {},
  44 |    "source": [
  45 |     "## 5.1 계산 그래프\n",
  46 |     "\n",
  47 |     "계산 그래프(computational graph): 계산 과정을 그래프로 나타낸 것\n",
  48 |     "\n",
  49 |     "복수의 노드(node)와 에지(edge)로 표현됨.\n",
  50 |     "\n",
  51 |     "에지: 노드 사이의 직선"
  52 |    ]
  53 |   },
  54 |   {
  55 |    "cell_type": "markdown",
  56 |    "metadata": {},
  57 |    "source": [
  58 |     "### 5.1.1 계산 그래프로 풀다\n",
  59 |     "\n",
  60 |     "계산 그래프를 이용한 문제풀이는 다음 흐름으로 진행\n",
  61 |     "\n",
  62 |     "* 계산 그래프를 구성한다.\n",
  63 |     "* 그래프에서 계산을 왼쪽에서 오른쪽으로 진행한다.\n",
  64 |     "\n",
  65 |     "순전파: 계산을 왼쪽에서 오른쪽으로 진행. 계산 그래프의 출발점부터 종착점으로의 전파."
  66 |    ]
  67 |   },
  68 |   {
  69 |    "cell_type": "markdown",
  70 |    "metadata": {},
  71 |    "source": [
  72 |     "### 5.1.2 국소적 계산\n",
  73 |     "\n",
  74 |     "국소적: 자신과 직접 관련된 작은 범위\n",
  75 |     "\n",
  76 |     "국소적 계산: 자신과 관계된 정보만으로 다음 결과를 출력할 수 있음\n",
  77 |     "\n",
  78 |     "각 노드는 자신과 관련된 계산 외에는 아무 것도 신경 쓸게 없음\n",
  79 |     "\n",
  80 |     "복잡한 계산을 '단순하고 국소적 계산'으로 분할하고 계산 결과를 다음 노드로 전달\n",
  81 |     "\n",
  82 |     "복잡한 계산도 분해하면 단순한 계산으로 구성됨"
  83 |    ]
  84 |   },
  85 |   {
  86 |    "cell_type": "markdown",
  87 |    "metadata": {},
  88 |    "source": [
  89 |     "### 5.1.3 왜 계산 그래프로 푸는가?\n",
  90 |     "\n",
  91 |     "역전파를 통해 '미분'을 효율적으로 계산할 수 있음\n",
  92 |     "\n",
  93 |     "중간까지 구한 미분 결과를 공유할 수 있어 다수의 미분을 효율적으로 계산할 수 있음"
  94 |    ]
  95 |   },
  96 |   {
  97 |    "cell_type": "markdown",
  98 |    "metadata": {},
  99 |    "source": [
 100 |     "## 5.2 연쇄법칙\n",
 101 |     "\n",
 102 |     "'국소적 미분'을 전달하는 원리는 연쇄 법칙(chain rule)에 따른 것"
 103 |    ]
 104 |   },
 105 |   {
 106 |    "cell_type": "markdown",
 107 |    "metadata": {},
 108 |    "source": [
 109 |     "### 5.2.1 계산 그래프의 역전파\n",
 110 |     "\n",
 111 |     "계산 그래프의 역전파: 순방향과는 반대 방향으로 국소적 미분을 곱한다.\n",
 112 |     "\n",
 113 |     "역전파의 계산 절차는 신호 E에 노드의 국소적 미분을 곱한 후 다음 노드로 전달\n",
 114 |     "\n",
 115 |     "역전파의 계산 순에 따르면 목표로 하는 미분 값을 효율적으로 구할 수 있음"
 116 |    ]
 117 |   },
 118 |   {
 119 |    "cell_type": "markdown",
 120 |    "metadata": {},
 121 |    "source": [
 122 |     "### 5.2.2 연쇄법칙이란?\n",
 123 |     "\n",
 124 |     "합성 함수: 여러 함수로 구성된 함수\n",
 125 |     "\n",
 126 |     "#### 식 5.1\n",
 127 |     "\n",
 128 |     "\\begin{equation*}\n",
 129 |     "z = t^{2}\n",
 130 |     "\\end{equation*}\n",
 131 |     "\n",
 132 |     "\\begin{equation*}\n",
 133 |     "t = x + y\n",
 134 |     "\\end{equation*}\n",
 135 |     "\n",
 136 |     "연쇄법칙은 함성 함수의 미분에 대한 성질\n",
 137 |     "\n",
 138 |     "합성 함수의 미분은 합성 함수를 구성하는 각 함수의 미분의 곱으로 나타낼 수 있다.\n",
 139 |     "\n",
 140 |     "#### 식 5.2\n",
 141 |     "\n",
 142 |     "\\begin{equation*}\n",
 143 |     "\\frac{\\partial z}{\\partial x} = \\frac{\\partial z}{\\partial t} \\frac{\\partial t}{\\partial x}\n",
 144 |     "\\end{equation*}\n",
 145 |     "\n",
 146 |     "x에 대한 z의 미분은 t에 대한 z의 미분과 x에 대한 t의 미분의 곱으로 나타낼 수 있음\n",
 147 |     "\n",
 148 |     "∂t를 서로 지울 수 있음.\n",
 149 |     "\n",
 150 |     "\\begin{equation*}\n",
 151 |     "\\frac{\\partial z}{\\partial x} = \\frac{\\partial z}{} \\frac{}{\\partial x}\n",
 152 |     "\\end{equation*}\n",
 153 |     "\n",
 154 |     "#### 식 5.3\n",
 155 |     "\n",
 156 |     "식 5.1에 대한 국소적 미분(편미분)을 구함\n",
 157 |     "\n",
 158 |     "\\begin{equation*}\n",
 159 |     "\\frac{\\partial z}{\\partial t} = 2t\n",
 160 |     "\\end{equation*}\n",
 161 |     "\n",
 162 |     "\\begin{equation*}\n",
 163 |     "\\frac{\\partial t}{\\partial x} = 1\n",
 164 |     "\\end{equation*}\n",
 165 |     "\n",
 166 |     "최종적으로 구하고 싶은 x에 대한 z의 미분은 다음 두 미분을 곱해 계산\n",
 167 |     "\n",
 168 |     "#### 식 5.4\n",
 169 |     "\n",
 170 |     "\\begin{equation*}\n",
 171 |     "\\frac{\\partial z}{\\partial x} = \\frac{\\partial z}{\\partial t} \\frac{\\partial t}{\\partial x} = 2t · 1 = 2(x+y)\n",
 172 |     "\\end{equation*}"
 173 |    ]
 174 |   },
 175 |   {
 176 |    "cell_type": "markdown",
 177 |    "metadata": {
 178 |     "collapsed": true
 179 |    },
 180 |    "source": [
 181 |     "### 5.2.3 연쇄법칙과 계산 그래프\n",
 182 |     "\n",
 183 |     "계산 그래프의 역전파는 오른쪽에서 왼쪽으로 신호를 전파\n",
 184 |     "\n",
 185 |     "노드로 들어온 입력신호에 그 노드의 국소적 미분(편미분)을 곱한 후 다음 노드로 전달\n",
 186 |     "\n",
 187 |     "역전파가 하는 일은 연쇄 법칙의 원리와 같음."
 188 |    ]
 189 |   },
 190 |   {
 191 |    "cell_type": "markdown",
 192 |    "metadata": {},
 193 |    "source": [
 194 |     "## 5.3 역전파"
 195 |    ]
 196 |   },
 197 |   {
 198 |    "cell_type": "markdown",
 199 |    "metadata": {},
 200 |    "source": [
 201 |     "### 5.3.1 덧셈 노드의 역전파\n",
 202 |     "\n",
 203 |     "z = x + y 의 미분. 다음은 해석적으로 계산\n",
 204 |     "\n",
 205 |     "#### 식 5.5\n",
 206 |     "\n",
 207 |     "\\begin{equation*}\n",
 208 |     "\\frac{\\partial z}{\\partial x} = 1\n",
 209 |     "\\end{equation*}\n",
 210 |     "\n",
 211 |     "\\begin{equation*}\n",
 212 |     "\\frac{\\partial z}{\\partial y} = 1\n",
 213 |     "\\end{equation*}\n",
 214 |     "\n",
 215 |     "덧셈 노드의 역전파는 1을 곱하기만 할 뿐 입력된 값을 그대로 다음 노드로 보내게 됨."
 216 |    ]
 217 |   },
 218 |   {
 219 |    "cell_type": "markdown",
 220 |    "metadata": {},
 221 |    "source": [
 222 |     "### 5.3.2 곱셈 노드의 역전파\n",
 223 |     "\n",
 224 |     "z = xy 의 미분\n",
 225 |     "\n",
 226 |     "#### 식 5.6\n",
 227 |     "\n",
 228 |     "\\begin{equation*}\n",
 229 |     "\\frac{\\partial z}{\\partial x} = y\n",
 230 |     "\\end{equation*}\n",
 231 |     "\n",
 232 |     "\\begin{equation*}\n",
 233 |     "\\frac{\\partial z}{\\partial y} = x\n",
 234 |     "\\end{equation*}\n",
 235 |     "\n",
 236 |     "곱셈 노드의 역전파는 상류의 값에 순전파 때의 입력 신호들을 '서로 바꾼 값'을 곱해서 하류로 보냄\n",
 237 |     "\n",
 238 |     "순전파 때 x 였다면 역전파에서는 y. 순전파 때 y 였다면 역전파에서는 x로 바꿈"
 239 |    ]
 240 |   },
 241 |   {
 242 |    "cell_type": "markdown",
 243 |    "metadata": {},
 244 |    "source": [
 245 |     "### 5.3.3 사과 쇼핑의 예"
 246 |    ]
 247 |   },
 248 |   {
 249 |    "cell_type": "markdown",
 250 |    "metadata": {},
 251 |    "source": [
 252 |     "## 5.4 단순한 계층 구현하기\n",
 253 |     "\n",
 254 |     "계산 그래프의 곱셈 노드를 'MultiLayer', 덧셈 노드를 'AddLayer'로 구현"
 255 |    ]
 256 |   },
 257 |   {
 258 |    "cell_type": "markdown",
 259 |    "metadata": {},
 260 |    "source": [
 261 |     "### 5.4.1 곱셈 계층\n",
 262 |     "\n",
 263 |     "모든 계층은 forward() 순전파, backward() 역전파 라는 공통의 메서드(인터페이스)를 갖도록 수현\n",
 264 |     "\n",
 265 |     "곱셈 계층을 MultiLayer 클래스로 다음처럼 구현"
 266 |    ]
 267 |   },
 268 |   {
 269 |    "cell_type": "code",
 270 |    "execution_count": 1,
 271 |    "metadata": {
 272 |     "collapsed": true
 273 |    },
 274 |    "outputs": [],
 275 |    "source": [
 276 |     "# https://github.com/WegraLee/deep-learning-from-scratch/blob/master/ch05/layer_naive.py 소스 참고\n",
 277 |     "class MulLayer:\n",
 278 |     "    def __init__(self):\n",
 279 |     "        self.x = None\n",
 280 |     "        self.y = None\n",
 281 |     "    \n",
 282 |     "    def forward(self, x, y):\n",
 283 |     "        self.x = x\n",
 284 |     "        self.y = y\n",
 285 |     "        out = x * y\n",
 286 |     "        \n",
 287 |     "        return out\n",
 288 |     "    \n",
 289 |     "    def backward(self, dout):\n",
 290 |     "        dx = dout * self.y # x와 y를 바꾼다.\n",
 291 |     "        dy = dout * self.x \n",
 292 |     "        \n",
 293 |     "        return dx, dy"
 294 |    ]
 295 |   },
 296 |   {
 297 |    "cell_type": "markdown",
 298 |    "metadata": {},
 299 |    "source": [
 300 |     "\\__init\\__() : 인스턴스 변수인 x와 y를 초기화. 순전파 시 입력 값을 유지하기 위해 사용.\n",
 301 |     "\n",
 302 |     "forward() : x와 y를 인수로 받고 두 값을 곱해 반환\n",
 303 |     "\n",
 304 |     "backward() : 상류에서 넘어온 미분(dout)에 순전파 때 값을 '서로 바꿔' 곱한 후 하류로 흘림.\n",
 305 |     "\n",
 306 |     "MultiLayer를 사용하여 순전파 구현"
 307 |    ]
 308 |   },
 309 |   {
 310 |    "cell_type": "code",
 311 |    "execution_count": 2,
 312 |    "metadata": {
 313 |     "collapsed": false
 314 |    },
 315 |    "outputs": [
 316 |     {
 317 |      "name": "stdout",
 318 |      "output_type": "stream",
 319 |      "text": [
 320 |       "220.00000000000003\n"
 321 |      ]
 322 |     }
 323 |    ],
 324 |    "source": [
 325 |     "# https://github.com/WegraLee/deep-learning-from-scratch/blob/master/ch05/buy_apple.py 소스 참고\n",
 326 |     "apple = 100\n",
 327 |     "apple_num = 2\n",
 328 |     "tax = 1.1\n",
 329 |     "\n",
 330 |     "# 계층들\n",
 331 |     "mul_apple_layer = MulLayer()\n",
 332 |     "mul_tax_layer = MulLayer()\n",
 333 |     "\n",
 334 |     "# 순전파\n",
 335 |     "apple_price = mul_apple_layer.forward(apple, apple_num)\n",
 336 |     "price = mul_tax_layer.forward(apple_price, tax)\n",
 337 |     "\n",
 338 |     "print(price) # 220"
 339 |    ]
 340 |   },
 341 |   {
 342 |    "cell_type": "markdown",
 343 |    "metadata": {},
 344 |    "source": [
 345 |     "각 변수에 대한 미분은 backward()로 구할 수 있음"
 346 |    ]
 347 |   },
 348 |   {
 349 |    "cell_type": "code",
 350 |    "execution_count": 3,
 351 |    "metadata": {
 352 |     "collapsed": false
 353 |    },
 354 |    "outputs": [
 355 |     {
 356 |      "name": "stdout",
 357 |      "output_type": "stream",
 358 |      "text": [
 359 |       "2.2 110.00000000000001 200\n"
 360 |      ]
 361 |     }
 362 |    ],
 363 |    "source": [
 364 |     "dprice = 1\n",
 365 |     "dapple_price, dtax = mul_tax_layer.backward(dprice)\n",
 366 |     "dapple, dapple_num = mul_apple_layer.backward(dapple_price)\n",
 367 |     "\n",
 368 |     "print(dapple, dapple_num, dtax) # 2.2 110 200"
 369 |    ]
 370 |   },
 371 |   {
 372 |    "cell_type": "markdown",
 373 |    "metadata": {},
 374 |    "source": [
 375 |     "backward() 호출 순서는 forward() 때와 반대\n",
 376 |     "\n",
 377 |     "backward()가 받는 인수는 '순전파의 출력에 대한 미분'"
 378 |    ]
 379 |   },
 380 |   {
 381 |    "cell_type": "markdown",
 382 |    "metadata": {},
 383 |    "source": [
 384 |     "### 5.4.2 덧셈 계층\n",
 385 |     "\n",
 386 |     "모든 계층은 forward() 순전파, backward() 역전파 라는 공통의 메서드(인터페이스)를 갖도록 수현\n",
 387 |     "\n",
 388 |     "덧셈 계층을 MultiLayer 클래스"
 389 |    ]
 390 |   },
 391 |   {
 392 |    "cell_type": "code",
 393 |    "execution_count": 4,
 394 |    "metadata": {
 395 |     "collapsed": true
 396 |    },
 397 |    "outputs": [],
 398 |    "source": [
 399 |     "class AddLayer:\n",
 400 |     "    def __init__(self):\n",
 401 |     "        pass\n",
 402 |     "    \n",
 403 |     "    def forward(self, x, y):\n",
 404 |     "        out = x + y\n",
 405 |     "        return out\n",
 406 |     "\n",
 407 |     "    def backward(self, dout):\n",
 408 |     "        dx = dout * 1\n",
 409 |     "        dy = dout * 1\n",
 410 |     "        return dx, dy"
 411 |    ]
 412 |   },
 413 |   {
 414 |    "cell_type": "markdown",
 415 |    "metadata": {},
 416 |    "source": [
 417 |     "\\__init\\__() : pass를 통해 아무 일도 하지 않음\n",
 418 |     "\n",
 419 |     "forward() : x와 y를 인수로 받고 두 값을 더해 반환\n",
 420 |     "\n",
 421 |     "backward() : 상류에서 넘어온 미분(dout)을 그대로 하류로 흘림\n",
 422 |     "\n",
 423 |     "그림 5-17의 계산 그래프 파이썬 구현"
 424 |    ]
 425 |   },
 426 |   {
 427 |    "cell_type": "code",
 428 |    "execution_count": 5,
 429 |    "metadata": {
 430 |     "collapsed": false
 431 |    },
 432 |    "outputs": [
 433 |     {
 434 |      "name": "stdout",
 435 |      "output_type": "stream",
 436 |      "text": [
 437 |       "price: 715\n",
 438 |       "dApple: 2.2\n",
 439 |       "dApple_num: 110\n",
 440 |       "dOrange: 3.3000000000000003\n",
 441 |       "dOrange_num: 165\n",
 442 |       "dTax: 650\n"
 443 |      ]
 444 |     }
 445 |    ],
 446 |    "source": [
 447 |     "# https://github.com/WegraLee/deep-learning-from-scratch/blob/master/ch05/buy_apple.py 소스 참고\n",
 448 |     "apple = 100\n",
 449 |     "apple_num = 2\n",
 450 |     "orange = 150\n",
 451 |     "orange_num = 3\n",
 452 |     "tax = 1.1\n",
 453 |     "\n",
 454 |     "# 계층들\n",
 455 |     "mul_apple_layer = MulLayer()\n",
 456 |     "mul_orange_layer = MulLayer()\n",
 457 |     "add_apple_orange_layer = AddLayer()\n",
 458 |     "mul_tax_layer = MulLayer()\n",
 459 |     "\n",
 460 |     "# 순전파\n",
 461 |     "apple_price = mul_apple_layer.forward(apple, apple_num)  # (1)\n",
 462 |     "orange_price = mul_orange_layer.forward(orange, orange_num)  # (2)\n",
 463 |     "all_price = add_apple_orange_layer.forward(apple_price, orange_price)  # (3)\n",
 464 |     "price = mul_tax_layer.forward(all_price, tax)  # (4)\n",
 465 |     "\n",
 466 |     "# 역전파\n",
 467 |     "dprice = 1\n",
 468 |     "dall_price, dtax = mul_tax_layer.backward(dprice)  # (4)\n",
 469 |     "dapple_price, dorange_price = add_apple_orange_layer.backward(dall_price)  # (3)\n",
 470 |     "dorange, dorange_num = mul_orange_layer.backward(dorange_price)  # (2)\n",
 471 |     "dapple, dapple_num = mul_apple_layer.backward(dapple_price)  # (1)\n",
 472 |     "\n",
 473 |     "print(\"price:\", int(price)) # 715\n",
 474 |     "print(\"dApple:\", dapple) # 2.2\n",
 475 |     "print(\"dApple_num:\", int(dapple_num)) # 110\n",
 476 |     "print(\"dOrange:\", dorange) # 3.3\n",
 477 |     "print(\"dOrange_num:\", int(dorange_num)) # 165\n",
 478 |     "print(\"dTax:\", dtax) # 650"
 479 |    ]
 480 |   },
 481 |   {
 482 |    "cell_type": "markdown",
 483 |    "metadata": {
 484 |     "collapsed": true
 485 |    },
 486 |    "source": [
 487 |     "## 5.5 활성화 함수 계층 구현하기\n",
 488 |     "\n",
 489 |     "활성화 함수인 ReLU와 Sigmoid 계층을 구현"
 490 |    ]
 491 |   },
 492 |   {
 493 |    "cell_type": "markdown",
 494 |    "metadata": {},
 495 |    "source": [
 496 |     "### 5.5.1 ReLU 계층\n",
 497 |     "\n",
 498 |     "#### 식 5.7 ReLU 식\n",
 499 |     "\n",
 500 |     "\\begin{equation*}\n",
 501 |     "y = x ( x > 0 )\n",
 502 |     "\\end{equation*}\n",
 503 |     "\n",
 504 |     "\\begin{equation*}\n",
 505 |     "y = 0 ( x <= 0 )\n",
 506 |     "\\end{equation*}\n",
 507 |     "\n",
 508 |     "#### 식 5.8 ReLU x에 대한 y 미분 식\n",
 509 |     "\n",
 510 |     "\\begin{equation*}\n",
 511 |     "\\frac{\\partial y}{\\partial x} = 1 ( x > 0 )\n",
 512 |     "\\end{equation*}\n",
 513 |     "\n",
 514 |     "\\begin{equation*}\n",
 515 |     "\\frac{\\partial y}{\\partial x} = 0 ( x <= 0 )\n",
 516 |     "\\end{equation*}\n",
 517 |     "\n",
 518 |     "순전파 때 입력인 x가 0보다 크면 역전파는 상류의 값을 그대로 하류로 흘림\n",
 519 |     "\n",
 520 |     "순전파 때 x가 0 이하면 역전파 때는 하류로 신호를 보내지 않음\n",
 521 |     "\n",
 522 |     "ReLU 계층을 구현한 코드"
 523 |    ]
 524 |   },
 525 |   {
 526 |    "cell_type": "code",
 527 |    "execution_count": 6,
 528 |    "metadata": {
 529 |     "collapsed": true
 530 |    },
 531 |    "outputs": [],
 532 |    "source": [
 533 |     "# https://github.com/WegraLee/deep-learning-from-scratch/blob/master/common/layers.py 소스 참고\n",
 534 |     "class Relu:\n",
 535 |     "    def __init__(self):\n",
 536 |     "        self.mask = None\n",
 537 |     "    \n",
 538 |     "    def forward(self, x):\n",
 539 |     "        self.mask = (x <= 0)\n",
 540 |     "        out = x.copy()\n",
 541 |     "        out[self.mask] = 0\n",
 542 |     "        \n",
 543 |     "        return out\n",
 544 |     "        \n",
 545 |     "    def backward(self, dout):\n",
 546 |     "        dout[self.mask] = 0\n",
 547 |     "        dx = dout\n",
 548 |     "        \n",
 549 |     "        return dx"
 550 |    ]
 551 |   },
 552 |   {
 553 |    "cell_type": "markdown",
 554 |    "metadata": {},
 555 |    "source": [
 556 |     "Relu 클래스는 mask 인스턴스 변수를 가짐\n",
 557 |     "\n",
 558 |     "mask는 순전파의 입력인 x의 원소 값이 0 이하인 인덱스는 True, 그 외(0보다 큰 원소)는 False로 유지"
 559 |    ]
 560 |   },
 561 |   {
 562 |    "cell_type": "code",
 563 |    "execution_count": 7,
 564 |    "metadata": {
 565 |     "collapsed": false
 566 |    },
 567 |    "outputs": [
 568 |     {
 569 |      "name": "stdout",
 570 |      "output_type": "stream",
 571 |      "text": [
 572 |       "[[ 1.   0.5]\n",
 573 |       " [-2.   3. ]]\n"
 574 |      ]
 575 |     }
 576 |    ],
 577 |    "source": [
 578 |     "import numpy as np\n",
 579 |     "x = np.array([[1.0, 0.5], [-2.0, 3.0]])\n",
 580 |     "print(x)"
 581 |    ]
 582 |   },
 583 |   {
 584 |    "cell_type": "code",
 585 |    "execution_count": 8,
 586 |    "metadata": {
 587 |     "collapsed": false
 588 |    },
 589 |    "outputs": [
 590 |     {
 591 |      "name": "stdout",
 592 |      "output_type": "stream",
 593 |      "text": [
 594 |       "[[False False]\n",
 595 |       " [ True False]]\n"
 596 |      ]
 597 |     }
 598 |    ],
 599 |    "source": [
 600 |     "mask = (x <= 0)\n",
 601 |     "print(mask)"
 602 |    ]
 603 |   },
 604 |   {
 605 |    "cell_type": "code",
 606 |    "execution_count": 9,
 607 |    "metadata": {
 608 |     "collapsed": false
 609 |    },
 610 |    "outputs": [
 611 |     {
 612 |      "data": {
 613 |       "text/plain": [
 614 |        "array([[ 1. ,  0.5],\n",
 615 |        "       [ 0. ,  3. ]])"
 616 |       ]
 617 |      },
 618 |      "execution_count": 9,
 619 |      "metadata": {},
 620 |      "output_type": "execute_result"
 621 |     }
 622 |    ],
 623 |    "source": [
 624 |     "out = x.copy()\n",
 625 |     "out[mask] = 0\n",
 626 |     "out"
 627 |    ]
 628 |   },
 629 |   {
 630 |    "cell_type": "markdown",
 631 |    "metadata": {},
 632 |    "source": [
 633 |     "ReLU 계층은 전기 회로의 '스위치'에 비유\n",
 634 |     "\n",
 635 |     "순전파 때 전류가 흐르고 있으면 스위치를 ON, 흐르지 않으면 OFF\n",
 636 |     "\n",
 637 |     "역전파 때 스위치가 ON이라면 전류가 그대로 흐르고, OFF면 더 이상 흐르지 않음"
 638 |    ]
 639 |   },
 640 |   {
 641 |    "cell_type": "markdown",
 642 |    "metadata": {},
 643 |    "source": [
 644 |     "### 5.5.2 Sigmoid 계층\n",
 645 |     "\n",
 646 |     "#### 식 5.9 시그모이드 함수\n",
 647 |     "\n",
 648 |     "\\begin{equation*}\n",
 649 |     "y = \\frac{1}{1 + exp(-x)}\n",
 650 |     "\\end{equation*}\n",
 651 |     "\n",
 652 |     "**1단계** '/' 노드, y = 1 / x를 미분하면 다음식이 됨\n",
 653 |     "\n",
 654 |     "#### 식 5.10\n",
 655 |     "\n",
 656 |     "\\begin{equation*}\n",
 657 |     "\\frac{\\partial y}{\\partial x} = \\frac{1}{x^{2}}\n",
 658 |     "\\end{equation*}\n",
 659 |     "\n",
 660 |     "\\begin{equation*}\n",
 661 |     "= - y^{2}\n",
 662 |     "\\end{equation*}\n",
 663 |     "\n",
 664 |     "역전파 때는 상류의 예측값에 -y\\**2 을 곱해서 하류로 전달\n",
 665 |     "\n",
 666 |     "**2단계** 상류의 값을 여과 없이 하류로 보냄\n",
 667 |     "\n",
 668 |     "**3단계** y = exp(x) 연산을 수행\n",
 669 |     "\n",
 670 |     "#### 식 5.11\n",
 671 |     "\n",
 672 |     "\\begin{equation*}\n",
 673 |     "\\frac{\\partial y}{\\partial x} = exp(x)\n",
 674 |     "\\end{equation*}\n",
 675 |     "\n",
 676 |     "계산 그래프에서는 상류의 순전파 때의 출력(exp(-x))을 곱해 하류로 전파\n",
 677 |     "\n",
 678 |     "**4단계** y = exp(x) 연산을 수행\n",
 679 |     "\n",
 680 |     "'X' 노드, 순전파 때의 값을 서로 바꿔 곱함. 이 예에서는 -1을 곱함\n",
 681 |     "\n",
 682 |     "시그모이드 간소화버전\n",
 683 |     "\n",
 684 |     "노드를 그룹화하여 Sigmoid 계층의 세세한 내용을 노출하지 않고 입력과 출력에만 집중\n",
 685 |     "\n",
 686 |     "\\begin{equation*}\n",
 687 |     "\\frac{\\partial L}{\\partial y} y^{2} exp(-x) = \\frac{\\partial L}{\\partial y} \\frac{1} { (1+exp(-x))^{2}} exp(-x)\n",
 688 |     "\\end{equation*}\n",
 689 |     "\n",
 690 |     "\\begin{equation*}\n",
 691 |     "= \\frac{\\partial L}{\\partial y} \\frac{1} { 1+exp(-x)} \\frac{exp(-x)} {1+exp(-x)}\n",
 692 |     "\\end{equation*}\n",
 693 |     "\n",
 694 |     "\\begin{equation*}\n",
 695 |     "= \\frac{\\partial L}{\\partial y} y (1-y)\n",
 696 |     "\\end{equation*}\n",
 697 |     "\n",
 698 |     "Sigmoid 계층의 계산 그래프: 순전파의 출력 y만으로 역전파를 계산\n",
 699 |     "\n",
 700 |     "Sigmoid 계층을 파이썬으로 구현"
 701 |    ]
 702 |   },
 703 |   {
 704 |    "cell_type": "code",
 705 |    "execution_count": 10,
 706 |    "metadata": {
 707 |     "collapsed": true
 708 |    },
 709 |    "outputs": [],
 710 |    "source": [
 711 |     "class Sigmoid:\n",
 712 |     "    def __init__(self):\n",
 713 |     "        self.out = None\n",
 714 |     "    \n",
 715 |     "    def forward(self, x):\n",
 716 |     "        out = 1 / (1 + np.exp(-x))\n",
 717 |     "        self.out = out\n",
 718 |     "        \n",
 719 |     "        return out\n",
 720 |     "        \n",
 721 |     "    def backward(self, dout):\n",
 722 |     "        dx = dout * (1.0 - self.out) * self.out\n",
 723 |     "        \n",
 724 |     "        return dx"
 725 |    ]
 726 |   },
 727 |   {
 728 |    "cell_type": "markdown",
 729 |    "metadata": {
 730 |     "collapsed": true
 731 |    },
 732 |    "source": [
 733 |     "## 5.6 Affine/Softmax 계층 구현하기\n",
 734 |     "\n",
 735 |     "### 5.6.1 Affine 계층\n",
 736 |     "\n",
 737 |     "신경망의 순전파에서는 가중치 신호의 총합을 계산하기 위해 행렬의 내적(np.dot())을 사용"
 738 |    ]
 739 |   },
 740 |   {
 741 |    "cell_type": "code",
 742 |    "execution_count": 11,
 743 |    "metadata": {
 744 |     "collapsed": false
 745 |    },
 746 |    "outputs": [
 747 |     {
 748 |      "name": "stdout",
 749 |      "output_type": "stream",
 750 |      "text": [
 751 |       "(2,)\n",
 752 |       "(2, 3)\n",
 753 |       "(3,)\n"
 754 |      ]
 755 |     }
 756 |    ],
 757 |    "source": [
 758 |     "X = np.random.rand(2)   # 입력\n",
 759 |     "W = np.random.rand(2,3) # 가중치\n",
 760 |     "B = np.random.rand(3)   # 편향\n",
 761 |     "\n",
 762 |     "print(X.shape) # (2,)\n",
 763 |     "print(W.shape) # (2, 3)\n",
 764 |     "print(B.shape) # (3,)\n",
 765 |     "\n",
 766 |     "Y = np.dot(X, W) + B"
 767 |    ]
 768 |   },
 769 |   {
 770 |    "cell_type": "markdown",
 771 |    "metadata": {},
 772 |    "source": [
 773 |     "X와 W의 내적은 대응하는 차원의 원소 수를 일치 시켜야 함\n",
 774 |     "\n",
 775 |     "어파인 변환(affine transformation): 신경망의 순전파 때 수행하는 행렬의 내적. 기하학 용어\n",
 776 |     "\n",
 777 |     "이 계산 그래프는 '행렬'이 흐름\n",
 778 |     "\n",
 779 |     "#### 식 5.13 행렬을 사용한 역전파 전개식\n",
 780 |     "\n",
 781 |     "\\begin{equation*}\n",
 782 |     "\\frac{\\partial L}{\\partial X} = \\frac{\\partial L}{\\partial Y} W^{T}\n",
 783 |     "\\end{equation*}\n",
 784 |     "\n",
 785 |     "\\begin{equation*}\n",
 786 |     "\\frac{\\partial L}{\\partial W} = X^{T} \\frac{\\partial L}{\\partial Y}\n",
 787 |     "\\end{equation*}\n",
 788 |     "\n",
 789 |     "전치행렬 : W의 (i,j) 위치의 원소를 (j,i) 위치로 변경\n",
 790 |     "\n",
 791 |     "#### 식 5.14 전치행렬 수식\n",
 792 |     "\n",
 793 |     "\\begin{equation*}\n",
 794 |     "W =  \\begin{vmatrix}\n",
 795 |     "w_{11} w_{21} w_{31}\\\\\n",
 796 |     "w_{12} w_{22} w_{32}\\\n",
 797 |     "\\end{vmatrix}\n",
 798 |     "\\end{equation*}\n",
 799 |     "\n",
 800 |     "\\begin{equation*}\n",
 801 |     "W^{T} =  \\begin{vmatrix}\n",
 802 |     "w_{11} w_{12}\\\\\n",
 803 |     "w_{21} w_{22}\\\\\n",
 804 |     "w_{31} w_{32}\\\n",
 805 |     "\\end{vmatrix}\n",
 806 |     "\\end{equation*}\n",
 807 |     "\n",
 808 |     "W의 형상이 (2,3) 이면 W.T의 형상은 (3,2)\n",
 809 |     "\n",
 810 |     "#### 그림 5.25 Affine 계층의 역전파: 역전파에서의 변수 형상은 해당 변수명 옆에 표기\n",
 811 |     "\n",
 812 |     "\\begin{equation*}\n",
 813 |     "\\frac{\\partial L}{\\partial X}(2,) = \\frac{\\partial L}{\\partial Y}(3,) W^{T} (3,2)\n",
 814 |     "\\end{equation*}\n",
 815 |     "\n",
 816 |     "\\begin{equation*}\n",
 817 |     "X(2,) 와 \\frac{\\partial L}{\\partial X}(2,) 은 같은 형상\n",
 818 |     "\\end{equation*}\n",
 819 |     "\n",
 820 |     "\\begin{equation*}\n",
 821 |     "\\frac{\\partial L}{\\partial W}(2,3) = X^{T}(2,1) \\frac{\\partial L}{\\partial Y} (1,3)\n",
 822 |     "\\end{equation*}\n",
 823 |     "\n",
 824 |     "\\begin{equation*}\n",
 825 |     "W(2,3) 와 \\frac{\\partial L}{\\partial W}(2,3) 은 같은 형상\n",
 826 |     "\\end{equation*}"
 827 |    ]
 828 |   },
 829 |   {
 830 |    "cell_type": "markdown",
 831 |    "metadata": {},
 832 |    "source": [
 833 |     "### 5.6.2 배치용 Affine 계층\n",
 834 |     "\n",
 835 |     "#### 그림 5-27 배치용 Affine 계층의 계산 그래프\n",
 836 |     "\n",
 837 |     "\\begin{equation*}\n",
 838 |     "\\frac{\\partial L}{\\partial X}(N,2) = \\frac{\\partial L}{\\partial Y}(N,3) W^{T} (3,2)\n",
 839 |     "\\end{equation*}\n",
 840 |     "\n",
 841 |     "\\begin{equation*}\n",
 842 |     "\\frac{\\partial L}{\\partial W}(2,3) = X^{T}(2,N) \\frac{\\partial L}{\\partial Y} (N,3)\n",
 843 |     "\\end{equation*}\n",
 844 |     "\n",
 845 |     "\\begin{equation*}\n",
 846 |     "\\frac{\\partial L}{\\partial B}(3) = \\frac{\\partial L}{\\partial Y} (N,3) 의 첫 번째(제 0축, 열방향)의 합\n",
 847 |     "\\end{equation*}\n",
 848 |     "\n",
 849 |     "기존과 다른 부분은 입력인 X의 형상이 (N,2)가 됨\n",
 850 |     "\n",
 851 |     "예를 들어 N=2(데이터가 2개)로 한 경우, 편향은 그 두 데이터 각각에 더해집니다."
 852 |    ]
 853 |   },
 854 |   {
 855 |    "cell_type": "code",
 856 |    "execution_count": 12,
 857 |    "metadata": {
 858 |     "collapsed": false
 859 |    },
 860 |    "outputs": [
 861 |     {
 862 |      "data": {
 863 |       "text/plain": [
 864 |        "array([[ 0,  0,  0],\n",
 865 |        "       [10, 10, 10]])"
 866 |       ]
 867 |      },
 868 |      "execution_count": 12,
 869 |      "metadata": {},
 870 |      "output_type": "execute_result"
 871 |     }
 872 |    ],
 873 |    "source": [
 874 |     "X_dot_W = np.array([[0, 0, 0], [10, 10, 10]])\n",
 875 |     "B = np.array([1, 2, 3])\n",
 876 |     "\n",
 877 |     "X_dot_W"
 878 |    ]
 879 |   },
 880 |   {
 881 |    "cell_type": "code",
 882 |    "execution_count": 13,
 883 |    "metadata": {
 884 |     "collapsed": false
 885 |    },
 886 |    "outputs": [
 887 |     {
 888 |      "data": {
 889 |       "text/plain": [
 890 |        "array([[ 1,  2,  3],\n",
 891 |        "       [11, 12, 13]])"
 892 |       ]
 893 |      },
 894 |      "execution_count": 13,
 895 |      "metadata": {},
 896 |      "output_type": "execute_result"
 897 |     }
 898 |    ],
 899 |    "source": [
 900 |     "X_dot_W + B"
 901 |    ]
 902 |   },
 903 |   {
 904 |    "cell_type": "markdown",
 905 |    "metadata": {},
 906 |    "source": [
 907 |     "순전파의 편향 덧셈은 각각의 데이터(1번째 데이터, 2번째 데이터)에 더해짐\n",
 908 |     "\n",
 909 |     "역전파 때는 각 데이터의 역전파 값이 편향의 원소에 모여야 함"
 910 |    ]
 911 |   },
 912 |   {
 913 |    "cell_type": "code",
 914 |    "execution_count": 14,
 915 |    "metadata": {
 916 |     "collapsed": false
 917 |    },
 918 |    "outputs": [
 919 |     {
 920 |      "data": {
 921 |       "text/plain": [
 922 |        "array([[1, 2, 3],\n",
 923 |        "       [4, 5, 6]])"
 924 |       ]
 925 |      },
 926 |      "execution_count": 14,
 927 |      "metadata": {},
 928 |      "output_type": "execute_result"
 929 |     }
 930 |    ],
 931 |    "source": [
 932 |     "dY = np.array([[1, 2, 3], [4, 5, 6]])\n",
 933 |     "dY"
 934 |    ]
 935 |   },
 936 |   {
 937 |    "cell_type": "code",
 938 |    "execution_count": 15,
 939 |    "metadata": {
 940 |     "collapsed": false
 941 |    },
 942 |    "outputs": [
 943 |     {
 944 |      "data": {
 945 |       "text/plain": [
 946 |        "array([5, 7, 9])"
 947 |       ]
 948 |      },
 949 |      "execution_count": 15,
 950 |      "metadata": {},
 951 |      "output_type": "execute_result"
 952 |     }
 953 |    ],
 954 |    "source": [
 955 |     "dB = np.sum(dY, axis=0)\n",
 956 |     "dB"
 957 |    ]
 958 |   },
 959 |   {
 960 |    "cell_type": "markdown",
 961 |    "metadata": {},
 962 |    "source": [
 963 |     "np.sum()에서 0번째 축(데이터를 단위로 한 축)에 대해서 (axis=0)의 총합을 구함\n",
 964 |     "\n",
 965 |     "Affine 구현\n",
 966 |     "\n",
 967 |     "common/layer.py 파일의 Affine 구현은 입력 데이터가 텐서(4차원 데이터)인 경우도 고려. 다음 구현과 약간 차이가 있음."
 968 |    ]
 969 |   },
 970 |   {
 971 |    "cell_type": "code",
 972 |    "execution_count": 16,
 973 |    "metadata": {
 974 |     "collapsed": true
 975 |    },
 976 |    "outputs": [],
 977 |    "source": [
 978 |     "class Affine:\n",
 979 |     "    def __init__(self, W, b):\n",
 980 |     "        self.W = W\n",
 981 |     "        self.b = b\n",
 982 |     "        self.x = None\n",
 983 |     "        self.dW = None\n",
 984 |     "        self.db = None\n",
 985 |     "    \n",
 986 |     "    def forward(self, x):\n",
 987 |     "        self.x = x\n",
 988 |     "        out = np.dot(x, self.W) + self.b\n",
 989 |     "        \n",
 990 |     "        return out\n",
 991 |     "    \n",
 992 |     "    def backward(self, dout):\n",
 993 |     "        dx = np.dot(dout, self.W.T)\n",
 994 |     "        self.dW = np.dot(self.x.T, dout)\n",
 995 |     "        self.db = np.sum(dout, axis=0)\n",
 996 |     "        \n",
 997 |     "        return dx"
 998 |    ]
 999 |   },
1000 |   {
1001 |    "cell_type": "markdown",
1002 |    "metadata": {},
1003 |    "source": [
1004 |     "### 5.6.3 Softmax-with-Loss 계층\n",
1005 |     "\n",
1006 |     "소프트맥스 함수는 입력 값을 정규화하여 출력\n",
1007 |     "\n",
1008 |     "추론할 때는 일반적으로 Softmax 계층을 사용하지 않음\n",
1009 |     "\n",
1010 |     "점수(score): Softmax 앞의 Affine 계층의 출력\n",
1011 |     "\n",
1012 |     "신경망을 학습할 때는 Softmax 계층이 필요\n",
1013 |     "\n",
1014 |     "소프트맥스 계층 구현: 손실 함수인 교차 엔트로피 오차도 포함하여 'Softmax-with-Loss 계층'이라는 이름으로 구현\n",
1015 |     "\n",
1016 |     "Softmax 계층: 입력 (a1, a2, a3)를 정규화하여 (y1, y2, y3)를 출력\n",
1017 |     "\n",
1018 |     "Cross Entropy 계층: Softmax의 출력(y1, y2, y3)과 정답 레이블(t1, t2, t3)를 받고, 손실 L을 출력\n",
1019 |     "\n",
1020 |     "Softmax 계층의 역전파는 (y1-t1, y2-t2, y3-t3)로 말끔한 결과임\n",
1021 |     "\n",
1022 |     "Softmax 계층의 출력과 정답 레이블의 차분.\n",
1023 |     "\n",
1024 |     "신경망의 역전파에서는 이 차이인 오차가 앞 계층에 전해지는 것\n",
1025 |     "\n",
1026 |     "<u>소프트맥스 함수의 손실 함수로 교차 엔트로피 오차를 사용하니 역전파가 (y1-t1, y2-t2, y3-t3)로 말끔히 떨어짐</u>\n",
1027 |     "\n",
1028 |     "=> <u>교차 엔트로피 함수가 그렇게 설계되었기 때문</u>\n",
1029 |     "\n",
1030 |     "항등 함수의 손실 함수로 '평균 제곱 오차'를 사용하면 역전파의 결과가 말끔히 떨어짐\n",
1031 |     "\n",
1032 |     "구체적인 예\n",
1033 |     "\n",
1034 |     "정답 레이블 (0, 1, 0), 소프트맥스 계층이 (0.3, 0.2, 0.5)를 출력\n",
1035 |     "\n",
1036 |     "=> 소프트맥스 계층의 역전파는 (0.3, -0.8, 0.5)라는 커다란 오차를 전파\n",
1037 |     "\n",
1038 |     "정답 레이블 (0, 1, 0), 소프트맥스 계층이 (0.01, 0.99, 0)을 출력\n",
1039 |     "\n",
1040 |     "=> 소프트맥스 계층의 역전파가 보내는 오차는 (0.01, -0.01, 0)이 됨. 학습하는 정도가 작아짐\n",
1041 |     "\n",
1042 |     "Softmax-with-Loss 계층을 구현한 코드"
1043 |    ]
1044 |   },
1045 |   {
1046 |    "cell_type": "code",
1047 |    "execution_count": 17,
1048 |    "metadata": {
1049 |     "collapsed": false
1050 |    },
1051 |    "outputs": [],
1052 |    "source": [
1053 |     "# https://github.com/WegraLee/deep-learning-from-scratch/blob/master/common/functions.py 소스 참고\n",
1054 |     "# 3.5.2 소프트맥스 함수 구현시 주의점 참고\n",
1055 |     "def sigmoid(x):\n",
1056 |     "    return 1 / (1 + np.exp(-x))\n",
1057 |     "\n",
1058 |     "# 4.2.2. 교차 엔트로피 오차 참고\n",
1059 |     "def cross_entropy_error(y, t):\n",
1060 |     "    if y.ndim == 1:\n",
1061 |     "        t = t.reshape(1, t.size)\n",
1062 |     "        y = y.reshape(1, y.size)\n",
1063 |     "        \n",
1064 |     "    # 훈련 데이터가 원-핫 벡터라면 정답 레이블의 인덱스로 반환\n",
1065 |     "    if t.size == y.size:\n",
1066 |     "        t = t.argmax(axis=1)\n",
1067 |     "             \n",
1068 |     "    batch_size = y.shape[0]\n",
1069 |     "    return -np.sum(np.log(y[np.arange(batch_size), t])) / batch_size\n",
1070 |     "\n",
1071 |     "class SoftmaxWithLoss:\n",
1072 |     "    def __init__(self):\n",
1073 |     "        self.loss = None # 손실\n",
1074 |     "        self.y = None    # softmax의 출력\n",
1075 |     "        self.t = None    # 정답 레이블(원-핫 벡터)\n",
1076 |     "    \n",
1077 |     "    def forward(self, x, t):\n",
1078 |     "        self.t = t\n",
1079 |     "        self.y = softmax(x)\n",
1080 |     "        self.loss = cross_entropy_error(self.y, self.t)\n",
1081 |     "        return self.loss\n",
1082 |     "    \n",
1083 |     "    def backward(self, dout=1):\n",
1084 |     "        batch_size = self.t.shape[0]\n",
1085 |     "        dx = (self.y - self.t) / batch_size\n",
1086 |     "        \n",
1087 |     "        return dx"
1088 |    ]
1089 |   },
1090 |   {
1091 |    "cell_type": "markdown",
1092 |    "metadata": {},
1093 |    "source": [
1094 |     "주의. 역전파 때는 전파하는 값을 배치의 수(batch_size)로 나눠 데이터 1개당 오차를 앞 계층으로 전파함"
1095 |    ]
1096 |   },
1097 |   {
1098 |    "cell_type": "markdown",
1099 |    "metadata": {},
1100 |    "source": [
1101 |     "## 5.7 오차역전파법 구현하기"
1102 |    ]
1103 |   },
1104 |   {
1105 |    "cell_type": "markdown",
1106 |    "metadata": {},
1107 |    "source": [
1108 |     "### 5.7.1 신경망 학습의 전체 그림\n",
1109 |     "\n",
1110 |     "**전제**\n",
1111 |     "\n",
1112 |     "학습: 가중치와 편향을 훈련 데이터에 적응하도록 조정하는 과정\n",
1113 |     "\n",
1114 |     "**1단계 - 미니배치**\n",
1115 |     "\n",
1116 |     "미니배치: 훈련 데이터 중 일부를 무작위로 가져옴\n",
1117 |     "\n",
1118 |     "목표: 미니배치의 손실 함수 값을 줄이기\n",
1119 |     "\n",
1120 |     "**2단계 - 기울기 산출**\n",
1121 |     "\n",
1122 |     "가중치 매개변수의 기울기를 구함. 기울기는 손실 함수의 값을 가장 작게하는 방향을 제시\n",
1123 |     "\n",
1124 |     "**3단계 - 매개변수 갱신**\n",
1125 |     "\n",
1126 |     "가중치 매개변수를 기울기 방향으로 아주 조금 갱신\n",
1127 |     "\n",
1128 |     "**4단계 - 반복**\n",
1129 |     "\n",
1130 |     "1~3 단계를 반복\n",
1131 |     "\n",
1132 |     "<u>오차역전법이 등장하는 단계는 두 번째인 '기울기 산출'</u>\n",
1133 |     "\n",
1134 |     "<u>느린 수치 미분과 달리 기울기를 효율적이고 빠르게 구할 수 있음</u>"
1135 |    ]
1136 |   },
1137 |   {
1138 |    "cell_type": "markdown",
1139 |    "metadata": {},
1140 |    "source": [
1141 |     "### 5.7.2 오차역전파법을 적용한 신경망 구현하기\n",
1142 |     "\n",
1143 |     "계층을 사용함으로써 \n",
1144 |     "\n",
1145 |     "인식 결과를 얻는 처리(predict())와 기울기를 구하는 처리(gradient()) 계층의 전파만으로 동작이 이루어짐."
1146 |    ]
1147 |   },
1148 |   {
1149 |    "cell_type": "code",
1150 |    "execution_count": 18,
1151 |    "metadata": {
1152 |     "collapsed": false
1153 |    },
1154 |    "outputs": [],
1155 |    "source": [
1156 |     "# https://github.com/WegraLee/deep-learning-from-scratch/blob/master/ch05/two_layer_net.py 참고\n",
1157 |     "# coding: utf-8\n",
1158 |     "#import sys, os\n",
1159 |     "#sys.path.append(os.pardir)  # 부모 디렉터리의 파일을 가져올 수 있도록 설정\n",
1160 |     "import numpy as np\n",
1161 |     "#from common.layers import *\n",
1162 |     "#from common.gradient import numerical_gradient\n",
1163 |     "from collections import OrderedDict\n",
1164 |     "\n",
1165 |     "# https://github.com/WegraLee/deep-learning-from-scratch/blob/master/common/functions.py\n",
1166 |     "def softmax(x):\n",
1167 |     "    if x.ndim == 2:\n",
1168 |     "        x = x.T\n",
1169 |     "        x = x - np.max(x, axis=0)\n",
1170 |     "        y = np.exp(x) / np.sum(np.exp(x), axis=0)\n",
1171 |     "        return y.T \n",
1172 |     "\n",
1173 |     "    x = x - np.max(x) # 오버플로 대책\n",
1174 |     "    return np.exp(x) / np.sum(np.exp(x))\n",
1175 |     "\n",
1176 |     "# https://github.com/WegraLee/deep-learning-from-scratch/blob/master/common/gradient.py 참고\n",
1177 |     "def numerical_gradient(f, x):\n",
1178 |     "    h = 1e-4 # 0.0001\n",
1179 |     "    grad = np.zeros_like(x)\n",
1180 |     "    \n",
1181 |     "    it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])\n",
1182 |     "    while not it.finished:\n",
1183 |     "        idx = it.multi_index\n",
1184 |     "        tmp_val = x[idx]\n",
1185 |     "        x[idx] = float(tmp_val) + h\n",
1186 |     "        fxh1 = f(x) # f(x+h)\n",
1187 |     "        \n",
1188 |     "        x[idx] = tmp_val - h \n",
1189 |     "        fxh2 = f(x) # f(x-h)\n",
1190 |     "        grad[idx] = (fxh1 - fxh2) / (2*h)\n",
1191 |     "        \n",
1192 |     "        x[idx] = tmp_val # 값 복원\n",
1193 |     "        it.iternext()   \n",
1194 |     "        \n",
1195 |     "    return grad\n",
1196 |     "\n",
1197 |     "\n",
1198 |     "class TwoLayerNet:\n",
1199 |     "\n",
1200 |     "    def __init__(self, input_size, hidden_size, output_size, weight_init_std = 0.01):\n",
1201 |     "        # 가중치 초기화\n",
1202 |     "        self.params = {}\n",
1203 |     "        self.params['W1'] = weight_init_std * np.random.randn(input_size, hidden_size)\n",
1204 |     "        self.params['b1'] = np.zeros(hidden_size)\n",
1205 |     "        self.params['W2'] = weight_init_std * np.random.randn(hidden_size, output_size) \n",
1206 |     "        self.params['b2'] = np.zeros(output_size)\n",
1207 |     "\n",
1208 |     "        # 계층 생성\n",
1209 |     "        self.layers = OrderedDict()                                           ###\n",
1210 |     "        self.layers['Affine1'] = Affine(self.params['W1'], self.params['b1']) ###\n",
1211 |     "        self.layers['Relu1'] = Relu()                                         ###\n",
1212 |     "        self.layers['Affine2'] = Affine(self.params['W2'], self.params['b2']) ###\n",
1213 |     "\n",
1214 |     "        self.lastLayer = SoftmaxWithLoss()                                    ###\n",
1215 |     "        \n",
1216 |     "    def predict(self, x):\n",
1217 |     "        for layer in self.layers.values():                                    ###\n",
1218 |     "            x = layer.forward(x)                                              ###\n",
1219 |     "        \n",
1220 |     "        return x\n",
1221 |     "        \n",
1222 |     "    # x : 입력 데이터, t : 정답 레이블\n",
1223 |     "    def loss(self, x, t):\n",
1224 |     "        y = self.predict(x)\n",
1225 |     "        return self.lastLayer.forward(y, t)\n",
1226 |     "    \n",
1227 |     "    def accuracy(self, x, t):\n",
1228 |     "        y = self.predict(x)\n",
1229 |     "        y = np.argmax(y, axis=1)\n",
1230 |     "        if t.ndim != 1 : t = np.argmax(t, axis=1)\n",
1231 |     "        \n",
1232 |     "        accuracy = np.sum(y == t) / float(x.shape[0])\n",
1233 |     "        return accuracy\n",
1234 |     "        \n",
1235 |     "    # x : 입력 데이터, t : 정답 레이블\n",
1236 |     "    def numerical_gradient(self, x, t):\n",
1237 |     "        loss_W = lambda W: self.loss(x, t)\n",
1238 |     "        \n",
1239 |     "        grads = {}\n",
1240 |     "        grads['W1'] = numerical_gradient(loss_W, self.params['W1'])\n",
1241 |     "        grads['b1'] = numerical_gradient(loss_W, self.params['b1'])\n",
1242 |     "        grads['W2'] = numerical_gradient(loss_W, self.params['W2'])\n",
1243 |     "        grads['b2'] = numerical_gradient(loss_W, self.params['b2'])\n",
1244 |     "        \n",
1245 |     "        return grads\n",
1246 |     "        \n",
1247 |     "    def gradient(self, x, t):\n",
1248 |     "        # forward\n",
1249 |     "        self.loss(x, t)                      ###\n",
1250 |     "\n",
1251 |     "        # backward\n",
1252 |     "        dout = 1                             ###\n",
1253 |     "        dout = self.lastLayer.backward(dout) ###\n",
1254 |     "        \n",
1255 |     "        layers = list(self.layers.values())  ###\n",
1256 |     "        layers.reverse()                     ###\n",
1257 |     "        for layer in layers:                 ###\n",
1258 |     "            dout = layer.backward(dout)      ###\n",
1259 |     "\n",
1260 |     "        # 결과 저장\n",
1261 |     "        grads = {}\n",
1262 |     "        grads['W1'], grads['b1'] = self.layers['Affine1'].dW, self.layers['Affine1'].db\n",
1263 |     "        grads['W2'], grads['b2'] = self.layers['Affine2'].dW, self.layers['Affine2'].db\n",
1264 |     "\n",
1265 |     "        return grads"
1266 |    ]
1267 |   },
1268 |   {
1269 |    "cell_type": "markdown",
1270 |    "metadata": {},
1271 |    "source": [
1272 |     "\\### 으로 중요코드 표시. 집중해서 살펴보세요.\n",
1273 |     "\n",
1274 |     "OrderedDict: 딕셔너리에 추가한 순서를 기억하는 (순서가 있는) 딕셔너리\n",
1275 |     "\n",
1276 |     "순전파 때는 추가한 순서대로 각 계층의 forward() 메서드를 호출\n",
1277 |     "\n",
1278 |     "역전파 때는 계층을 반대 순서로 호출\n",
1279 |     "\n",
1280 |     "신경망의 구성 요소를 '계층'으로 구현한 덕분에 신경망을 쉽게 구축\n",
1281 |     "\n",
1282 |     "=> 레고 블록을 조립하듯 필요한 만큼 계층을 더 추가하면 됨"
1283 |    ]
1284 |   },
1285 |   {
1286 |    "cell_type": "markdown",
1287 |    "metadata": {},
1288 |    "source": [
1289 |     "### 5.7.3 오차역전파법으로 구한 기울기 검증하기\n",
1290 |     "\n",
1291 |     "수치미분은 느립니다."
1292 |    ]
1293 |   },
1294 |   {
1295 |    "cell_type": "code",
1296 |    "execution_count": 19,
1297 |    "metadata": {
1298 |     "collapsed": true
1299 |    },
1300 |    "outputs": [],
1301 |    "source": [
1302 |     "from dataset.mnist import load_mnist\n",
1303 |     "(x_train, t_train), (x_test, t_test) = load_mnist(normalize=True, one_hot_label=True)\n",
1304 |     "network = TwoLayerNet(input_size=784, hidden_size=50, output_size=10)\n",
1305 |     "x_batch = x_train[:3]\n",
1306 |     "t_batch = t_train[:3]"
1307 |    ]
1308 |   },
1309 |   {
1310 |    "cell_type": "code",
1311 |    "execution_count": 20,
1312 |    "metadata": {
1313 |     "collapsed": false
1314 |    },
1315 |    "outputs": [
1316 |     {
1317 |      "name": "stdout",
1318 |      "output_type": "stream",
1319 |      "text": [
1320 |       "1 loop, best of 3: 9.95 s per loop\n"
1321 |      ]
1322 |     }
1323 |    ],
1324 |    "source": [
1325 |     "%timeit network.numerical_gradient(x_batch, t_batch)"
1326 |    ]
1327 |   },
1328 |   {
1329 |    "cell_type": "code",
1330 |    "execution_count": 21,
1331 |    "metadata": {
1332 |     "collapsed": false
1333 |    },
1334 |    "outputs": [
1335 |     {
1336 |      "name": "stdout",
1337 |      "output_type": "stream",
1338 |      "text": [
1339 |       "The slowest run took 5.15 times longer than the fastest. This could mean that an intermediate result is being cached.\n",
1340 |       "1000 loops, best of 3: 248 µs per loop\n"
1341 |      ]
1342 |     }
1343 |    ],
1344 |    "source": [
1345 |     "%timeit network.gradient(x_batch, t_batch)"
1346 |    ]
1347 |   },
1348 |   {
1349 |    "cell_type": "markdown",
1350 |    "metadata": {},
1351 |    "source": [
1352 |     "수치미분(numerical_gradient) 속도: 9.95초\n",
1353 |     "\n",
1354 |     "오차역전법(gradient) 속도: 248 µs, 0.000248초\n",
1355 |     "\n",
1356 |     "약 42,000배 속도차이가 남"
1357 |    ]
1358 |   },
1359 |   {
1360 |    "cell_type": "markdown",
1361 |    "metadata": {},
1362 |    "source": [
1363 |     "수치 미분을 오차역전파법을 정확히 구현했는지 확인하기 위해 필요.\n",
1364 |     "\n",
1365 |     "수치 미분의 이점은 구현하기 쉬움\n",
1366 |     "\n",
1367 |     "기울기 확인(gradient check): 수치 미분의 결과와 오차역전파법의 결과를 비교하여 오차역전파법을 제대로 구현했는지 검증함."
1368 |    ]
1369 |   },
1370 |   {
1371 |    "cell_type": "code",
1372 |    "execution_count": 22,
1373 |    "metadata": {
1374 |     "collapsed": false
1375 |    },
1376 |    "outputs": [
1377 |     {
1378 |      "name": "stdout",
1379 |      "output_type": "stream",
1380 |      "text": [
1381 |       "b1:8.3863151124e-13\n",
1382 |       "W2:7.71846137622e-13\n",
1383 |       "W1:2.16518280464e-13\n",
1384 |       "b2:1.20348177257e-10\n"
1385 |      ]
1386 |     }
1387 |    ],
1388 |    "source": [
1389 |     "# https://github.com/WegraLee/deep-learning-from-scratch/blob/master/ch05/gradient_check.py 참고\n",
1390 |     "# coding: utf-8\n",
1391 |     "#import sys, os\n",
1392 |     "#sys.path.append(os.pardir)  # 부모 디렉터리의 파일을 가져올 수 있도록 설정\n",
1393 |     "import numpy as np\n",
1394 |     "from dataset.mnist import load_mnist\n",
1395 |     "\n",
1396 |     "# 데이터 읽기\n",
1397 |     "(x_train, t_train), (x_test, t_test) = load_mnist(normalize=True, one_hot_label=True)\n",
1398 |     "\n",
1399 |     "network = TwoLayerNet(input_size=784, hidden_size=50, output_size=10)\n",
1400 |     "\n",
1401 |     "x_batch = x_train[:3]\n",
1402 |     "t_batch = t_train[:3]\n",
1403 |     "\n",
1404 |     "grad_numerical = network.numerical_gradient(x_batch, t_batch)\n",
1405 |     "grad_backprop = network.gradient(x_batch, t_batch)\n",
1406 |     "\n",
1407 |     "# 각 가중치의 절대 오차의 평균을 구한다.\n",
1408 |     "for key in grad_numerical.keys():\n",
1409 |     "    diff = np.average( np.abs(grad_backprop[key] - grad_numerical[key]) )\n",
1410 |     "    print(key + \":\" + str(diff))"
1411 |    ]
1412 |   },
1413 |   {
1414 |    "cell_type": "markdown",
1415 |    "metadata": {},
1416 |    "source": [
1417 |     "이 결과는 수치 미분과 오차역전파법으로 구한 기울기의 차이가 매우 작다고 말해줌\n",
1418 |     "\n",
1419 |     "오차역전법이 실수 없이 구현했다는 믿음이 커짐\n",
1420 |     "\n",
1421 |     "수치 미분과 오차역전파법의 결과 오차가 0이 되는 일은 드뭄\n",
1422 |     "\n",
1423 |     "올바르게 구현했다면 0에 아주 가까운 작은 값이 됨"
1424 |    ]
1425 |   },
1426 |   {
1427 |    "cell_type": "markdown",
1428 |    "metadata": {},
1429 |    "source": [
1430 |     "### 5.7.4 오차역전파법을 사용한 학습 구현하기"
1431 |    ]
1432 |   },
1433 |   {
1434 |    "cell_type": "code",
1435 |    "execution_count": 23,
1436 |    "metadata": {
1437 |     "collapsed": false
1438 |    },
1439 |    "outputs": [
1440 |     {
1441 |      "name": "stdout",
1442 |      "output_type": "stream",
1443 |      "text": [
1444 |       "0.163 0.168\n",
1445 |       "0.90245 0.9076\n",
1446 |       "0.920516666667 0.9231\n",
1447 |       "0.93425 0.9368\n",
1448 |       "0.944716666667 0.9424\n",
1449 |       "0.950316666667 0.9467\n",
1450 |       "0.9546 0.9516\n",
1451 |       "0.9601 0.9568\n",
1452 |       "0.963266666667 0.9575\n",
1453 |       "0.964683333333 0.9588\n",
1454 |       "0.968233333333 0.961\n",
1455 |       "0.968616666667 0.9615\n",
1456 |       "0.970616666667 0.9637\n",
1457 |       "0.973433333333 0.9678\n",
1458 |       "0.976033333333 0.9686\n",
1459 |       "0.976383333333 0.968\n",
1460 |       "0.977083333333 0.9709\n"
1461 |      ]
1462 |     }
1463 |    ],
1464 |    "source": [
1465 |     "# coding: utf-8\n",
1466 |     "#import sys, os\n",
1467 |     "#sys.path.append(os.pardir)\n",
1468 |     "import numpy as np\n",
1469 |     "from dataset.mnist import load_mnist\n",
1470 |     "#from two_layer_net import TwoLayerNet\n",
1471 |     "\n",
1472 |     "# 데이터 읽기\n",
1473 |     "(x_train, t_train), (x_test, t_test) = load_mnist(normalize=True, one_hot_label=True)\n",
1474 |     "\n",
1475 |     "network = TwoLayerNet(input_size=784, hidden_size=50, output_size=10)\n",
1476 |     "\n",
1477 |     "iters_num = 10000\n",
1478 |     "train_size = x_train.shape[0]\n",
1479 |     "batch_size = 100\n",
1480 |     "learning_rate = 0.1\n",
1481 |     "\n",
1482 |     "train_loss_list = []\n",
1483 |     "train_acc_list = []\n",
1484 |     "test_acc_list = []\n",
1485 |     "\n",
1486 |     "iter_per_epoch = max(train_size / batch_size, 1)\n",
1487 |     "\n",
1488 |     "for i in range(iters_num):\n",
1489 |     "    batch_mask = np.random.choice(train_size, batch_size)\n",
1490 |     "    x_batch = x_train[batch_mask]\n",
1491 |     "    t_batch = t_train[batch_mask]\n",
1492 |     "    \n",
1493 |     "    # 기울기 계산\n",
1494 |     "    #grad = network.numerical_gradient(x_batch, t_batch) # 수치 미분 방식\n",
1495 |     "    grad = network.gradient(x_batch, t_batch) # 오차역전파법 방식(훨씬 빠르다)\n",
1496 |     "    \n",
1497 |     "    # 갱신\n",
1498 |     "    for key in ('W1', 'b1', 'W2', 'b2'):\n",
1499 |     "        network.params[key] -= learning_rate * grad[key]\n",
1500 |     "    \n",
1501 |     "    loss = network.loss(x_batch, t_batch)\n",
1502 |     "    train_loss_list.append(loss)\n",
1503 |     "    \n",
1504 |     "    if i % iter_per_epoch == 0:\n",
1505 |     "        train_acc = network.accuracy(x_train, t_train)\n",
1506 |     "        test_acc = network.accuracy(x_test, t_test)\n",
1507 |     "        train_acc_list.append(train_acc)\n",
1508 |     "        test_acc_list.append(test_acc)\n",
1509 |     "        print(train_acc, test_acc)"
1510 |    ]
1511 |   },
1512 |   {
1513 |    "cell_type": "markdown",
1514 |    "metadata": {},
1515 |    "source": [
1516 |     "## 5.8 정리\n",
1517 |     "\n",
1518 |     "계산 그래프를 이용하여 신경망의 동작과 오차역전파법을 설명\n",
1519 |     "\n",
1520 |     "모든 계층에서 forward와 backward 메서드를 구현\n",
1521 |     "\n",
1522 |     "forward는 데이터를 순방향으로 backward는 역방향으로 전파함\n",
1523 |     "\n",
1524 |     "가중치 매개변수의 기울기를 효율적으로 구할 수 있음"
1525 |    ]
1526 |   },
1527 |   {
1528 |    "cell_type": "markdown",
1529 |    "metadata": {},
1530 |    "source": [
1531 |     "**이번 장에서 배운 것**\n",
1532 |     "\n",
1533 |     "계산그래프를 이용하면 계산 과정을 시각적으로 파악 가능\n",
1534 |     "\n",
1535 |     "계산그래프 노드는 국소적 계산으로 구성. 국소적 계산을 조합해 전체 계산을 구성\n",
1536 |     "\n",
1537 |     "순전파는 통상의 계산을 수행. 역전파는 노드의 미분을 구함\n",
1538 |     "\n",
1539 |     "오차역전파법: 신경망의 구성 요소를 계층으로 구현하여 기울기를 효율적으로 계산\n",
1540 |     "\n",
1541 |     "기울기 확인: 수치 미분과 오차역전파법의 결과를 비교하면 오차역전파법 구현에 잘못이 없는지 확인가능"
1542 |    ]
1543 |   }
1544 |  ],
1545 |  "metadata": {
1546 |   "anaconda-cloud": {},
1547 |   "kernelspec": {
1548 |    "display_name": "Python [conda root]",
1549 |    "language": "python",
1550 |    "name": "conda-root-py"
1551 |   },
1552 |   "language_info": {
1553 |    "codemirror_mode": {
1554 |     "name": "ipython",
1555 |     "version": 3
1556 |    },
1557 |    "file_extension": ".py",
1558 |    "mimetype": "text/x-python",
1559 |    "name": "python",
1560 |    "nbconvert_exporter": "python",
1561 |    "pygments_lexer": "ipython3",
1562 |    "version": "3.5.2"
1563 |   }
1564 |  },
1565 |  "nbformat": 4,
1566 |  "nbformat_minor": 0
1567 | }
1568 | 


--------------------------------------------------------------------------------
/5장.ipynb:
--------------------------------------------------------------------------------
   1 | {
   2 |  "cells": [
   3 |   {
   4 |    "cell_type": "markdown",
   5 |    "metadata": {},
   6 |    "source": [
   7 |     "# 밑바닥부터 시작하는 딥러닝\n",
   8 |     "\n",
   9 |     "# Deep Learning from Scratch\n",
  10 |     "\n",
  11 |     "## Github \n",
  12 |     "\n",
  13 |     "https://github.com/WegraLee/deep-learning-from-scratch\n",
  14 |     "\n",
  15 |     "## 목차\n",
  16 |     "\n",
  17 |     "http://nbviewer.jupyter.org/github/SDRLurker/deep-learning/blob/master/%EB%AA%A9%EC%B0%A8.ipynb"
  18 |    ]
  19 |   },
  20 |   {
  21 |    "cell_type": "markdown",
  22 |    "metadata": {},
  23 |    "source": [
  24 |     "# 5 오차역전파법\n",
  25 |     "\n",
  26 |     "오차역전파법(backpropagation): 가중치 매개변수의 기울기를 효율적으로 계산\n",
  27 |     "\n",
  28 |     "오차를 역(반대 방향)으로 전파하는 방법(backward propagation of errors)\n",
  29 |     "\n",
  30 |     "안드레 카패시(Andrej Karpathy)의 블로그\n",
  31 |     "\n",
  32 |     "* 참고주소 : http://karpathy.github.io/neuralnets\n",
  33 |     "\n",
  34 |     "* 오차역전파법을 계산 그래프로 설명\n",
  35 |     "\n",
  36 |     "페이페이 리(Fei-Fei Li) 교수가 진행한 스탠퍼드 대학교 딥러닝 수업 CS231n 참고\n",
  37 |     "\n",
  38 |     "* 참고주소 : http://cs231n.github.io"
  39 |    ]
  40 |   },
  41 |   {
  42 |    "cell_type": "markdown",
  43 |    "metadata": {},
  44 |    "source": [
  45 |     "## 5.1 계산 그래프\n",
  46 |     "\n",
  47 |     "계산 그래프(computational graph): 계산 과정을 그래프로 나타낸 것\n",
  48 |     "\n",
  49 |     "복수의 노드(node)와 에지(edge)로 표현됨.\n",
  50 |     "\n",
  51 |     "에지: 노드 사이의 직선"
  52 |    ]
  53 |   },
  54 |   {
  55 |    "cell_type": "markdown",
  56 |    "metadata": {},
  57 |    "source": [
  58 |     "### 5.1.1 계산 그래프로 풀다\n",
  59 |     "\n",
  60 |     "계산 그래프를 이용한 문제풀이는 다음 흐름으로 진행\n",
  61 |     "\n",
  62 |     "* 계산 그래프를 구성한다.\n",
  63 |     "* 그래프에서 계산을 왼쪽에서 오른쪽으로 진행한다.\n",
  64 |     "\n",
  65 |     "순전파: 계산을 왼쪽에서 오른쪽으로 진행. 계산 그래프의 출발점부터 종착점으로의 전파."
  66 |    ]
  67 |   },
  68 |   {
  69 |    "cell_type": "markdown",
  70 |    "metadata": {},
  71 |    "source": [
  72 |     "### 5.1.2 국소적 계산\n",
  73 |     "\n",
  74 |     "국소적: 자신과 직접 관련된 작은 범위\n",
  75 |     "\n",
  76 |     "국소적 계산: 자신과 관계된 정보만으로 다음 결과를 출력할 수 있음\n",
  77 |     "\n",
  78 |     "각 노드는 자신과 관련된 계산 외에는 아무 것도 신경 쓸게 없음\n",
  79 |     "\n",
  80 |     "복잡한 계산을 '단순하고 국소적 계산'으로 분할하고 계산 결과를 다음 노드로 전달\n",
  81 |     "\n",
  82 |     "복잡한 계산도 분해하면 단순한 계산으로 구성됨"
  83 |    ]
  84 |   },
  85 |   {
  86 |    "cell_type": "markdown",
  87 |    "metadata": {},
  88 |    "source": [
  89 |     "### 5.1.3 왜 계산 그래프로 푸는가?\n",
  90 |     "\n",
  91 |     "역전파를 통해 '미분'을 효율적으로 계산할 수 있음\n",
  92 |     "\n",
  93 |     "중간까지 구한 미분 결과를 공유할 수 있어 다수의 미분을 효율적으로 계산할 수 있음"
  94 |    ]
  95 |   },
  96 |   {
  97 |    "cell_type": "markdown",
  98 |    "metadata": {},
  99 |    "source": [
 100 |     "## 5.2 연쇄법칙\n",
 101 |     "\n",
 102 |     "'국소적 미분'을 전달하는 원리는 연쇄 법칙(chain rule)에 따른 것"
 103 |    ]
 104 |   },
 105 |   {
 106 |    "cell_type": "markdown",
 107 |    "metadata": {},
 108 |    "source": [
 109 |     "### 5.2.1 계산 그래프의 역전파\n",
 110 |     "\n",
 111 |     "계산 그래프의 역전파: 순방향과는 반대 방향으로 국소적 미분을 곱한다.\n",
 112 |     "\n",
 113 |     "역전파의 계산 절차는 신호 E에 노드의 국소적 미분을 곱한 후 다음 노드로 전달\n",
 114 |     "\n",
 115 |     "역전파의 계산 순에 따르면 목표로 하는 미분 값을 효율적으로 구할 수 있음"
 116 |    ]
 117 |   },
 118 |   {
 119 |    "cell_type": "markdown",
 120 |    "metadata": {},
 121 |    "source": [
 122 |     "### 5.2.2 연쇄법칙이란?\n",
 123 |     "\n",
 124 |     "합성 함수: 여러 함수로 구성된 함수\n",
 125 |     "\n",
 126 |     "#### 식 5.1\n",
 127 |     "\n",
 128 |     "\\begin{equation*}\n",
 129 |     "z = t^{2}\n",
 130 |     "\\end{equation*}\n",
 131 |     "\n",
 132 |     "\\begin{equation*}\n",
 133 |     "t = x + y\n",
 134 |     "\\end{equation*}\n",
 135 |     "\n",
 136 |     "연쇄법칙은 함성 함수의 미분에 대한 성질\n",
 137 |     "\n",
 138 |     "합성 함수의 미분은 합성 함수를 구성하는 각 함수의 미분의 곱으로 나타낼 수 있다.\n",
 139 |     "\n",
 140 |     "#### 식 5.2\n",
 141 |     "\n",
 142 |     "\\begin{equation*}\n",
 143 |     "\\frac{\\partial z}{\\partial x} = \\frac{\\partial z}{\\partial t} \\frac{\\partial t}{\\partial x}\n",
 144 |     "\\end{equation*}\n",
 145 |     "\n",
 146 |     "x에 대한 z의 미분은 t에 대한 z의 미분과 x에 대한 t의 미분의 곱으로 나타낼 수 있음\n",
 147 |     "\n",
 148 |     "∂t를 서로 지울 수 있음.\n",
 149 |     "\n",
 150 |     "\\begin{equation*}\n",
 151 |     "\\frac{\\partial z}{\\partial x} = \\frac{\\partial z}{} \\frac{}{\\partial x}\n",
 152 |     "\\end{equation*}\n",
 153 |     "\n",
 154 |     "#### 식 5.3\n",
 155 |     "\n",
 156 |     "식 5.1에 대한 국소적 미분(편미분)을 구함\n",
 157 |     "\n",
 158 |     "\\begin{equation*}\n",
 159 |     "\\frac{\\partial z}{\\partial t} = 2t\n",
 160 |     "\\end{equation*}\n",
 161 |     "\n",
 162 |     "\\begin{equation*}\n",
 163 |     "\\frac{\\partial t}{\\partial x} = 1\n",
 164 |     "\\end{equation*}\n",
 165 |     "\n",
 166 |     "최종적으로 구하고 싶은 x에 대한 z의 미분은 다음 두 미분을 곱해 계산\n",
 167 |     "\n",
 168 |     "#### 식 5.4\n",
 169 |     "\n",
 170 |     "\\begin{equation*}\n",
 171 |     "\\frac{\\partial z}{\\partial x} = \\frac{\\partial z}{\\partial t} \\frac{\\partial t}{\\partial x} = 2t · 1 = 2(x+y)\n",
 172 |     "\\end{equation*}"
 173 |    ]
 174 |   },
 175 |   {
 176 |    "cell_type": "markdown",
 177 |    "metadata": {
 178 |     "collapsed": true
 179 |    },
 180 |    "source": [
 181 |     "### 5.2.3 연쇄법칙과 계산 그래프\n",
 182 |     "\n",
 183 |     "계산 그래프의 역전파는 오른쪽에서 왼쪽으로 신호를 전파\n",
 184 |     "\n",
 185 |     "노드로 들어온 입력신호에 그 노드의 국소적 미분(편미분)을 곱한 후 다음 노드로 전달\n",
 186 |     "\n",
 187 |     "역전파가 하는 일은 연쇄 법칙의 원리와 같음."
 188 |    ]
 189 |   },
 190 |   {
 191 |    "cell_type": "markdown",
 192 |    "metadata": {},
 193 |    "source": [
 194 |     "## 5.3 역전파"
 195 |    ]
 196 |   },
 197 |   {
 198 |    "cell_type": "markdown",
 199 |    "metadata": {},
 200 |    "source": [
 201 |     "### 5.3.1 덧셈 노드의 역전파\n",
 202 |     "\n",
 203 |     "z = x + y 의 미분. 다음은 해석적으로 계산\n",
 204 |     "\n",
 205 |     "#### 식 5.5\n",
 206 |     "\n",
 207 |     "\\begin{equation*}\n",
 208 |     "\\frac{\\partial z}{\\partial x} = 1\n",
 209 |     "\\end{equation*}\n",
 210 |     "\n",
 211 |     "\\begin{equation*}\n",
 212 |     "\\frac{\\partial z}{\\partial y} = 1\n",
 213 |     "\\end{equation*}\n",
 214 |     "\n",
 215 |     "덧셈 노드의 역전파는 1을 곱하기만 할 뿐 입력된 값을 그대로 다음 노드로 보내게 됨."
 216 |    ]
 217 |   },
 218 |   {
 219 |    "cell_type": "markdown",
 220 |    "metadata": {},
 221 |    "source": [
 222 |     "### 5.3.2 곱셈 노드의 역전파\n",
 223 |     "\n",
 224 |     "z = xy 의 미분\n",
 225 |     "\n",
 226 |     "#### 식 5.6\n",
 227 |     "\n",
 228 |     "\\begin{equation*}\n",
 229 |     "\\frac{\\partial z}{\\partial x} = y\n",
 230 |     "\\end{equation*}\n",
 231 |     "\n",
 232 |     "\\begin{equation*}\n",
 233 |     "\\frac{\\partial z}{\\partial y} = x\n",
 234 |     "\\end{equation*}\n",
 235 |     "\n",
 236 |     "곱셈 노드의 역전파는 상류의 값에 순전파 때의 입력 신호들을 '서로 바꾼 값'을 곱해서 하류로 보냄\n",
 237 |     "\n",
 238 |     "순전파 때 x 였다면 역전파에서는 y. 순전파 때 y 였다면 역전파에서는 x로 바꿈"
 239 |    ]
 240 |   },
 241 |   {
 242 |    "cell_type": "markdown",
 243 |    "metadata": {},
 244 |    "source": [
 245 |     "### 5.3.3 사과 쇼핑의 예"
 246 |    ]
 247 |   },
 248 |   {
 249 |    "cell_type": "markdown",
 250 |    "metadata": {},
 251 |    "source": [
 252 |     "## 5.4 단순한 계층 구현하기\n",
 253 |     "\n",
 254 |     "계산 그래프의 곱셈 노드를 'MultiLayer', 덧셈 노드를 'AddLayer'로 구현"
 255 |    ]
 256 |   },
 257 |   {
 258 |    "cell_type": "markdown",
 259 |    "metadata": {},
 260 |    "source": [
 261 |     "### 5.4.1 곱셈 계층\n",
 262 |     "\n",
 263 |     "모든 계층은 forward() 순전파, backward() 역전파 라는 공통의 메서드(인터페이스)를 갖도록 구현\n",
 264 |     "\n",
 265 |     "곱셈 계층을 MultiLayer 클래스로 다음처럼 구현"
 266 |    ]
 267 |   },
 268 |   {
 269 |    "cell_type": "code",
 270 |    "execution_count": 1,
 271 |    "metadata": {
 272 |     "collapsed": true
 273 |    },
 274 |    "outputs": [],
 275 |    "source": [
 276 |     "# https://github.com/WegraLee/deep-learning-from-scratch/blob/master/ch05/layer_naive.py 소스 참고\n",
 277 |     "class MulLayer:\n",
 278 |     "    def __init__(self):\n",
 279 |     "        self.x = None\n",
 280 |     "        self.y = None\n",
 281 |     "    \n",
 282 |     "    def forward(self, x, y):\n",
 283 |     "        self.x = x\n",
 284 |     "        self.y = y\n",
 285 |     "        out = x * y\n",
 286 |     "        \n",
 287 |     "        return out\n",
 288 |     "    \n",
 289 |     "    def backward(self, dout):\n",
 290 |     "        dx = dout * self.y # x와 y를 바꾼다.\n",
 291 |     "        dy = dout * self.x \n",
 292 |     "        \n",
 293 |     "        return dx, dy"
 294 |    ]
 295 |   },
 296 |   {
 297 |    "cell_type": "markdown",
 298 |    "metadata": {},
 299 |    "source": [
 300 |     "\\__init\\__() : 인스턴스 변수인 x와 y를 초기화. 순전파 시 입력 값을 유지하기 위해 사용.\n",
 301 |     "\n",
 302 |     "forward() : x와 y를 인수로 받고 두 값을 곱해 반환\n",
 303 |     "\n",
 304 |     "backward() : 상류에서 넘어온 미분(dout)에 순전파 때 값을 '서로 바꿔' 곱한 후 하류로 흘림.\n",
 305 |     "\n",
 306 |     "MultiLayer를 사용하여 순전파 구현"
 307 |    ]
 308 |   },
 309 |   {
 310 |    "cell_type": "code",
 311 |    "execution_count": 2,
 312 |    "metadata": {
 313 |     "collapsed": false
 314 |    },
 315 |    "outputs": [
 316 |     {
 317 |      "name": "stdout",
 318 |      "output_type": "stream",
 319 |      "text": [
 320 |       "220.00000000000003\n"
 321 |      ]
 322 |     }
 323 |    ],
 324 |    "source": [
 325 |     "# https://github.com/WegraLee/deep-learning-from-scratch/blob/master/ch05/buy_apple.py 소스 참고\n",
 326 |     "apple = 100\n",
 327 |     "apple_num = 2\n",
 328 |     "tax = 1.1\n",
 329 |     "\n",
 330 |     "# 계층들\n",
 331 |     "mul_apple_layer = MulLayer()\n",
 332 |     "mul_tax_layer = MulLayer()\n",
 333 |     "\n",
 334 |     "# 순전파\n",
 335 |     "apple_price = mul_apple_layer.forward(apple, apple_num)\n",
 336 |     "price = mul_tax_layer.forward(apple_price, tax)\n",
 337 |     "\n",
 338 |     "print(price) # 220"
 339 |    ]
 340 |   },
 341 |   {
 342 |    "cell_type": "markdown",
 343 |    "metadata": {},
 344 |    "source": [
 345 |     "각 변수에 대한 미분은 backward()로 구할 수 있음"
 346 |    ]
 347 |   },
 348 |   {
 349 |    "cell_type": "code",
 350 |    "execution_count": 3,
 351 |    "metadata": {
 352 |     "collapsed": false
 353 |    },
 354 |    "outputs": [
 355 |     {
 356 |      "name": "stdout",
 357 |      "output_type": "stream",
 358 |      "text": [
 359 |       "2.2 110.00000000000001 200\n"
 360 |      ]
 361 |     }
 362 |    ],
 363 |    "source": [
 364 |     "dprice = 1\n",
 365 |     "dapple_price, dtax = mul_tax_layer.backward(dprice)\n",
 366 |     "dapple, dapple_num = mul_apple_layer.backward(dapple_price)\n",
 367 |     "\n",
 368 |     "print(dapple, dapple_num, dtax) # 2.2 110 200"
 369 |    ]
 370 |   },
 371 |   {
 372 |    "cell_type": "markdown",
 373 |    "metadata": {},
 374 |    "source": [
 375 |     "backward() 호출 순서는 forward() 때와 반대\n",
 376 |     "\n",
 377 |     "backward()가 받는 인수는 '순전파의 출력에 대한 미분'"
 378 |    ]
 379 |   },
 380 |   {
 381 |    "cell_type": "markdown",
 382 |    "metadata": {},
 383 |    "source": [
 384 |     "### 5.4.2 덧셈 계층\n",
 385 |     "\n",
 386 |     "모든 계층은 forward() 순전파, backward() 역전파 라는 공통의 메서드(인터페이스)를 갖도록 구현\n",
 387 |     "\n",
 388 |     "덧셈 계층을 MultiLayer 클래스"
 389 |    ]
 390 |   },
 391 |   {
 392 |    "cell_type": "code",
 393 |    "execution_count": 4,
 394 |    "metadata": {
 395 |     "collapsed": true
 396 |    },
 397 |    "outputs": [],
 398 |    "source": [
 399 |     "class AddLayer:\n",
 400 |     "    def __init__(self):\n",
 401 |     "        pass\n",
 402 |     "    \n",
 403 |     "    def forward(self, x, y):\n",
 404 |     "        out = x + y\n",
 405 |     "        return out\n",
 406 |     "\n",
 407 |     "    def backward(self, dout):\n",
 408 |     "        dx = dout * 1\n",
 409 |     "        dy = dout * 1\n",
 410 |     "        return dx, dy"
 411 |    ]
 412 |   },
 413 |   {
 414 |    "cell_type": "markdown",
 415 |    "metadata": {},
 416 |    "source": [
 417 |     "\\__init\\__() : pass를 통해 아무 일도 하지 않음\n",
 418 |     "\n",
 419 |     "forward() : x와 y를 인수로 받고 두 값을 더해 반환\n",
 420 |     "\n",
 421 |     "backward() : 상류에서 넘어온 미분(dout)을 그대로 하류로 흘림\n",
 422 |     "\n",
 423 |     "그림 5-17의 계산 그래프 파이썬 구현"
 424 |    ]
 425 |   },
 426 |   {
 427 |    "cell_type": "code",
 428 |    "execution_count": 5,
 429 |    "metadata": {
 430 |     "collapsed": false
 431 |    },
 432 |    "outputs": [
 433 |     {
 434 |      "name": "stdout",
 435 |      "output_type": "stream",
 436 |      "text": [
 437 |       "price: 715\n",
 438 |       "dApple: 2.2\n",
 439 |       "dApple_num: 110\n",
 440 |       "dOrange: 3.3000000000000003\n",
 441 |       "dOrange_num: 165\n",
 442 |       "dTax: 650\n"
 443 |      ]
 444 |     }
 445 |    ],
 446 |    "source": [
 447 |     "# https://github.com/WegraLee/deep-learning-from-scratch/blob/master/ch05/buy_apple.py 소스 참고\n",
 448 |     "apple = 100\n",
 449 |     "apple_num = 2\n",
 450 |     "orange = 150\n",
 451 |     "orange_num = 3\n",
 452 |     "tax = 1.1\n",
 453 |     "\n",
 454 |     "# 계층들\n",
 455 |     "mul_apple_layer = MulLayer()\n",
 456 |     "mul_orange_layer = MulLayer()\n",
 457 |     "add_apple_orange_layer = AddLayer()\n",
 458 |     "mul_tax_layer = MulLayer()\n",
 459 |     "\n",
 460 |     "# 순전파\n",
 461 |     "apple_price = mul_apple_layer.forward(apple, apple_num)  # (1)\n",
 462 |     "orange_price = mul_orange_layer.forward(orange, orange_num)  # (2)\n",
 463 |     "all_price = add_apple_orange_layer.forward(apple_price, orange_price)  # (3)\n",
 464 |     "price = mul_tax_layer.forward(all_price, tax)  # (4)\n",
 465 |     "\n",
 466 |     "# 역전파\n",
 467 |     "dprice = 1\n",
 468 |     "dall_price, dtax = mul_tax_layer.backward(dprice)  # (4)\n",
 469 |     "dapple_price, dorange_price = add_apple_orange_layer.backward(dall_price)  # (3)\n",
 470 |     "dorange, dorange_num = mul_orange_layer.backward(dorange_price)  # (2)\n",
 471 |     "dapple, dapple_num = mul_apple_layer.backward(dapple_price)  # (1)\n",
 472 |     "\n",
 473 |     "print(\"price:\", int(price)) # 715\n",
 474 |     "print(\"dApple:\", dapple) # 2.2\n",
 475 |     "print(\"dApple_num:\", int(dapple_num)) # 110\n",
 476 |     "print(\"dOrange:\", dorange) # 3.3\n",
 477 |     "print(\"dOrange_num:\", int(dorange_num)) # 165\n",
 478 |     "print(\"dTax:\", dtax) # 650"
 479 |    ]
 480 |   },
 481 |   {
 482 |    "cell_type": "markdown",
 483 |    "metadata": {
 484 |     "collapsed": true
 485 |    },
 486 |    "source": [
 487 |     "## 5.5 활성화 함수 계층 구현하기\n",
 488 |     "\n",
 489 |     "활성화 함수인 ReLU와 Sigmoid 계층을 구현"
 490 |    ]
 491 |   },
 492 |   {
 493 |    "cell_type": "markdown",
 494 |    "metadata": {},
 495 |    "source": [
 496 |     "### 5.5.1 ReLU 계층\n",
 497 |     "\n",
 498 |     "#### 식 5.7 ReLU 식\n",
 499 |     "\n",
 500 |     "\\begin{equation*}\n",
 501 |     "y = x ( x > 0 )\n",
 502 |     "\\end{equation*}\n",
 503 |     "\n",
 504 |     "\\begin{equation*}\n",
 505 |     "y = 0 ( x <= 0 )\n",
 506 |     "\\end{equation*}\n",
 507 |     "\n",
 508 |     "#### 식 5.8 ReLU x에 대한 y 미분 식\n",
 509 |     "\n",
 510 |     "\\begin{equation*}\n",
 511 |     "\\frac{\\partial y}{\\partial x} = 1 ( x > 0 )\n",
 512 |     "\\end{equation*}\n",
 513 |     "\n",
 514 |     "\\begin{equation*}\n",
 515 |     "\\frac{\\partial y}{\\partial x} = 0 ( x <= 0 )\n",
 516 |     "\\end{equation*}\n",
 517 |     "\n",
 518 |     "순전파 때 입력인 x가 0보다 크면 역전파는 상류의 값을 그대로 하류로 흘림\n",
 519 |     "\n",
 520 |     "순전파 때 x가 0 이하면 역전파 때는 하류로 신호를 보내지 않음\n",
 521 |     "\n",
 522 |     "ReLU 계층을 구현한 코드"
 523 |    ]
 524 |   },
 525 |   {
 526 |    "cell_type": "code",
 527 |    "execution_count": 6,
 528 |    "metadata": {
 529 |     "collapsed": true
 530 |    },
 531 |    "outputs": [],
 532 |    "source": [
 533 |     "# https://github.com/WegraLee/deep-learning-from-scratch/blob/master/common/layers.py 소스 참고\n",
 534 |     "class Relu:\n",
 535 |     "    def __init__(self):\n",
 536 |     "        self.mask = None\n",
 537 |     "    \n",
 538 |     "    def forward(self, x):\n",
 539 |     "        self.mask = (x <= 0)\n",
 540 |     "        out = x.copy()\n",
 541 |     "        out[self.mask] = 0\n",
 542 |     "        \n",
 543 |     "        return out\n",
 544 |     "        \n",
 545 |     "    def backward(self, dout):\n",
 546 |     "        dout[self.mask] = 0\n",
 547 |     "        dx = dout\n",
 548 |     "        \n",
 549 |     "        return dx"
 550 |    ]
 551 |   },
 552 |   {
 553 |    "cell_type": "markdown",
 554 |    "metadata": {},
 555 |    "source": [
 556 |     "Relu 클래스는 mask 인스턴스 변수를 가짐\n",
 557 |     "\n",
 558 |     "mask는 순전파의 입력인 x의 원소 값이 0 이하인 인덱스는 True, 그 외(0보다 큰 원소)는 False로 유지"
 559 |    ]
 560 |   },
 561 |   {
 562 |    "cell_type": "code",
 563 |    "execution_count": 7,
 564 |    "metadata": {
 565 |     "collapsed": false
 566 |    },
 567 |    "outputs": [
 568 |     {
 569 |      "name": "stdout",
 570 |      "output_type": "stream",
 571 |      "text": [
 572 |       "[[ 1.   0.5]\n",
 573 |       " [-2.   3. ]]\n"
 574 |      ]
 575 |     }
 576 |    ],
 577 |    "source": [
 578 |     "import numpy as np\n",
 579 |     "x = np.array([[1.0, 0.5], [-2.0, 3.0]])\n",
 580 |     "print(x)"
 581 |    ]
 582 |   },
 583 |   {
 584 |    "cell_type": "code",
 585 |    "execution_count": 8,
 586 |    "metadata": {
 587 |     "collapsed": false
 588 |    },
 589 |    "outputs": [
 590 |     {
 591 |      "name": "stdout",
 592 |      "output_type": "stream",
 593 |      "text": [
 594 |       "[[False False]\n",
 595 |       " [ True False]]\n"
 596 |      ]
 597 |     }
 598 |    ],
 599 |    "source": [
 600 |     "mask = (x <= 0)\n",
 601 |     "print(mask)"
 602 |    ]
 603 |   },
 604 |   {
 605 |    "cell_type": "code",
 606 |    "execution_count": 9,
 607 |    "metadata": {
 608 |     "collapsed": false
 609 |    },
 610 |    "outputs": [
 611 |     {
 612 |      "data": {
 613 |       "text/plain": [
 614 |        "array([[ 1. ,  0.5],\n",
 615 |        "       [ 0. ,  3. ]])"
 616 |       ]
 617 |      },
 618 |      "execution_count": 9,
 619 |      "metadata": {},
 620 |      "output_type": "execute_result"
 621 |     }
 622 |    ],
 623 |    "source": [
 624 |     "out = x.copy()\n",
 625 |     "out[mask] = 0\n",
 626 |     "out"
 627 |    ]
 628 |   },
 629 |   {
 630 |    "cell_type": "markdown",
 631 |    "metadata": {},
 632 |    "source": [
 633 |     "ReLU 계층은 전기 회로의 '스위치'에 비유\n",
 634 |     "\n",
 635 |     "순전파 때 전류가 흐르고 있으면 스위치를 ON, 흐르지 않으면 OFF\n",
 636 |     "\n",
 637 |     "역전파 때 스위치가 ON이라면 전류가 그대로 흐르고, OFF면 더 이상 흐르지 않음"
 638 |    ]
 639 |   },
 640 |   {
 641 |    "cell_type": "markdown",
 642 |    "metadata": {},
 643 |    "source": [
 644 |     "### 5.5.2 Sigmoid 계층\n",
 645 |     "\n",
 646 |     "#### 식 5.9 시그모이드 함수\n",
 647 |     "\n",
 648 |     "\\begin{equation*}\n",
 649 |     "y = \\frac{1}{1 + exp(-x)}\n",
 650 |     "\\end{equation*}\n",
 651 |     "\n",
 652 |     "**1단계** '/' 노드, y = 1 / x를 미분하면 다음식이 됨\n",
 653 |     "\n",
 654 |     "#### 식 5.10\n",
 655 |     "\n",
 656 |     "\\begin{equation*}\n",
 657 |     "\\frac{\\partial y}{\\partial x} = -\\frac{1}{x^{2}}\n",
 658 |     "\\end{equation*}\n",
 659 |     "\n",
 660 |     "\\begin{equation*}\n",
 661 |     "= - y^{2}\n",
 662 |     "\\end{equation*}\n",
 663 |     "\n",
 664 |     "역전파 때는 상류의 예측값에 -y\\**2 을 곱해서 하류로 전달\n",
 665 |     "\n",
 666 |     "**2단계** 상류의 값을 여과 없이 하류로 보냄\n",
 667 |     "\n",
 668 |     "**3단계** y = exp(x) 연산을 수행\n",
 669 |     "\n",
 670 |     "#### 식 5.11\n",
 671 |     "\n",
 672 |     "\\begin{equation*}\n",
 673 |     "\\frac{\\partial y}{\\partial x} = exp(x)\n",
 674 |     "\\end{equation*}\n",
 675 |     "\n",
 676 |     "계산 그래프에서는 상류의 순전파 때의 출력(exp(-x))을 곱해 하류로 전파\n",
 677 |     "\n",
 678 |     "**4단계** y = exp(x) 연산을 수행\n",
 679 |     "\n",
 680 |     "'X' 노드, 순전파 때의 값을 서로 바꿔 곱함. 이 예에서는 -1을 곱함\n",
 681 |     "\n",
 682 |     "시그모이드 간소화버전\n",
 683 |     "\n",
 684 |     "노드를 그룹화하여 Sigmoid 계층의 세세한 내용을 노출하지 않고 입력과 출력에만 집중\n",
 685 |     "\n",
 686 |     "\\begin{equation*}\n",
 687 |     "\\frac{\\partial L}{\\partial y} y^{2} exp(-x) = \\frac{\\partial L}{\\partial y} \\frac{1} { (1+exp(-x))^{2}} exp(-x)\n",
 688 |     "\\end{equation*}\n",
 689 |     "\n",
 690 |     "\\begin{equation*}\n",
 691 |     "= \\frac{\\partial L}{\\partial y} \\frac{1} { 1+exp(-x)} \\frac{exp(-x)} {1+exp(-x)}\n",
 692 |     "\\end{equation*}\n",
 693 |     "\n",
 694 |     "\\begin{equation*}\n",
 695 |     "= \\frac{\\partial L}{\\partial y} y (1-y)\n",
 696 |     "\\end{equation*}\n",
 697 |     "\n",
 698 |     "Sigmoid 계층의 계산 그래프: 순전파의 출력 y만으로 역전파를 계산\n",
 699 |     "\n",
 700 |     "Sigmoid 계층을 파이썬으로 구현"
 701 |    ]
 702 |   },
 703 |   {
 704 |    "cell_type": "code",
 705 |    "execution_count": 10,
 706 |    "metadata": {
 707 |     "collapsed": true
 708 |    },
 709 |    "outputs": [],
 710 |    "source": [
 711 |     "class Sigmoid:\n",
 712 |     "    def __init__(self):\n",
 713 |     "        self.out = None\n",
 714 |     "    \n",
 715 |     "    def forward(self, x):\n",
 716 |     "        out = 1 / (1 + np.exp(-x))\n",
 717 |     "        self.out = out\n",
 718 |     "        \n",
 719 |     "        return out\n",
 720 |     "        \n",
 721 |     "    def backward(self, dout):\n",
 722 |     "        dx = dout * (1.0 - self.out) * self.out\n",
 723 |     "        \n",
 724 |     "        return dx"
 725 |    ]
 726 |   },
 727 |   {
 728 |    "cell_type": "markdown",
 729 |    "metadata": {
 730 |     "collapsed": true
 731 |    },
 732 |    "source": [
 733 |     "## 5.6 Affine/Softmax 계층 구현하기\n",
 734 |     "\n",
 735 |     "### 5.6.1 Affine 계층\n",
 736 |     "\n",
 737 |     "신경망의 순전파에서는 가중치 신호의 총합을 계산하기 위해 행렬의 내적(np.dot())을 사용"
 738 |    ]
 739 |   },
 740 |   {
 741 |    "cell_type": "code",
 742 |    "execution_count": 11,
 743 |    "metadata": {
 744 |     "collapsed": false
 745 |    },
 746 |    "outputs": [
 747 |     {
 748 |      "name": "stdout",
 749 |      "output_type": "stream",
 750 |      "text": [
 751 |       "(2,)\n",
 752 |       "(2, 3)\n",
 753 |       "(3,)\n"
 754 |      ]
 755 |     }
 756 |    ],
 757 |    "source": [
 758 |     "X = np.random.rand(2)   # 입력\n",
 759 |     "W = np.random.rand(2,3) # 가중치\n",
 760 |     "B = np.random.rand(3)   # 편향\n",
 761 |     "\n",
 762 |     "print(X.shape) # (2,)\n",
 763 |     "print(W.shape) # (2, 3)\n",
 764 |     "print(B.shape) # (3,)\n",
 765 |     "\n",
 766 |     "Y = np.dot(X, W) + B"
 767 |    ]
 768 |   },
 769 |   {
 770 |    "cell_type": "markdown",
 771 |    "metadata": {},
 772 |    "source": [
 773 |     "X와 W의 내적은 대응하는 차원의 원소 수를 일치 시켜야 함\n",
 774 |     "\n",
 775 |     "어파인 변환(affine transformation): 신경망의 순전파 때 수행하는 행렬의 내적. 기하학 용어\n",
 776 |     "\n",
 777 |     "이 계산 그래프는 '행렬'이 흐름\n",
 778 |     "\n",
 779 |     "#### 식 5.13 행렬을 사용한 역전파 전개식\n",
 780 |     "\n",
 781 |     "\\begin{equation*}\n",
 782 |     "\\frac{\\partial L}{\\partial X} = \\frac{\\partial L}{\\partial Y} W^{T}\n",
 783 |     "\\end{equation*}\n",
 784 |     "\n",
 785 |     "\\begin{equation*}\n",
 786 |     "\\frac{\\partial L}{\\partial W} = X^{T} \\frac{\\partial L}{\\partial Y}\n",
 787 |     "\\end{equation*}\n",
 788 |     "\n",
 789 |     "전치행렬 : W의 (i,j) 위치의 원소를 (j,i) 위치로 변경\n",
 790 |     "\n",
 791 |     "#### 식 5.14 전치행렬 수식\n",
 792 |     "\n",
 793 |     "\\begin{equation*}\n",
 794 |     "W =  \\begin{vmatrix}\n",
 795 |     "w_{11} w_{21} w_{31}\\\\\n",
 796 |     "w_{12} w_{22} w_{32}\\\n",
 797 |     "\\end{vmatrix}\n",
 798 |     "\\end{equation*}\n",
 799 |     "\n",
 800 |     "\\begin{equation*}\n",
 801 |     "W^{T} =  \\begin{vmatrix}\n",
 802 |     "w_{11} w_{12}\\\\\n",
 803 |     "w_{21} w_{22}\\\\\n",
 804 |     "w_{31} w_{32}\\\n",
 805 |     "\\end{vmatrix}\n",
 806 |     "\\end{equation*}\n",
 807 |     "\n",
 808 |     "W의 형상이 (2,3) 이면 W.T의 형상은 (3,2)\n",
 809 |     "\n",
 810 |     "#### 그림 5.25 Affine 계층의 역전파: 역전파에서의 변수 형상은 해당 변수명 옆에 표기\n",
 811 |     "\n",
 812 |     "\\begin{equation*}\n",
 813 |     "\\frac{\\partial L}{\\partial X}(2,) = \\frac{\\partial L}{\\partial Y}(3,) W^{T} (3,2)\n",
 814 |     "\\end{equation*}\n",
 815 |     "\n",
 816 |     "\\begin{equation*}\n",
 817 |     "X(2,) 와 \\frac{\\partial L}{\\partial X}(2,) 은 같은 형상\n",
 818 |     "\\end{equation*}\n",
 819 |     "\n",
 820 |     "\\begin{equation*}\n",
 821 |     "\\frac{\\partial L}{\\partial W}(2,3) = X^{T}(2,1) \\frac{\\partial L}{\\partial Y} (1,3)\n",
 822 |     "\\end{equation*}\n",
 823 |     "\n",
 824 |     "\\begin{equation*}\n",
 825 |     "W(2,3) 와 \\frac{\\partial L}{\\partial W}(2,3) 은 같은 형상\n",
 826 |     "\\end{equation*}"
 827 |    ]
 828 |   },
 829 |   {
 830 |    "cell_type": "markdown",
 831 |    "metadata": {},
 832 |    "source": [
 833 |     "### 5.6.2 배치용 Affine 계층\n",
 834 |     "\n",
 835 |     "#### 그림 5-27 배치용 Affine 계층의 계산 그래프\n",
 836 |     "\n",
 837 |     "\\begin{equation*}\n",
 838 |     "\\frac{\\partial L}{\\partial X}(N,2) = \\frac{\\partial L}{\\partial Y}(N,3) W^{T} (3,2)\n",
 839 |     "\\end{equation*}\n",
 840 |     "\n",
 841 |     "\\begin{equation*}\n",
 842 |     "\\frac{\\partial L}{\\partial W}(2,3) = X^{T}(2,N) \\frac{\\partial L}{\\partial Y} (N,3)\n",
 843 |     "\\end{equation*}\n",
 844 |     "\n",
 845 |     "\\begin{equation*}\n",
 846 |     "\\frac{\\partial L}{\\partial B}(3) = \\frac{\\partial L}{\\partial Y} (N,3) 의 첫 번째(제 0축, 열방향)의 합\n",
 847 |     "\\end{equation*}\n",
 848 |     "\n",
 849 |     "기존과 다른 부분은 입력인 X의 형상이 (N,2)가 됨\n",
 850 |     "\n",
 851 |     "예를 들어 N=2(데이터가 2개)로 한 경우, 편향은 그 두 데이터 각각에 더해집니다."
 852 |    ]
 853 |   },
 854 |   {
 855 |    "cell_type": "code",
 856 |    "execution_count": 12,
 857 |    "metadata": {
 858 |     "collapsed": false
 859 |    },
 860 |    "outputs": [
 861 |     {
 862 |      "data": {
 863 |       "text/plain": [
 864 |        "array([[ 0,  0,  0],\n",
 865 |        "       [10, 10, 10]])"
 866 |       ]
 867 |      },
 868 |      "execution_count": 12,
 869 |      "metadata": {},
 870 |      "output_type": "execute_result"
 871 |     }
 872 |    ],
 873 |    "source": [
 874 |     "X_dot_W = np.array([[0, 0, 0], [10, 10, 10]])\n",
 875 |     "B = np.array([1, 2, 3])\n",
 876 |     "\n",
 877 |     "X_dot_W"
 878 |    ]
 879 |   },
 880 |   {
 881 |    "cell_type": "code",
 882 |    "execution_count": 13,
 883 |    "metadata": {
 884 |     "collapsed": false
 885 |    },
 886 |    "outputs": [
 887 |     {
 888 |      "data": {
 889 |       "text/plain": [
 890 |        "array([[ 1,  2,  3],\n",
 891 |        "       [11, 12, 13]])"
 892 |       ]
 893 |      },
 894 |      "execution_count": 13,
 895 |      "metadata": {},
 896 |      "output_type": "execute_result"
 897 |     }
 898 |    ],
 899 |    "source": [
 900 |     "X_dot_W + B"
 901 |    ]
 902 |   },
 903 |   {
 904 |    "cell_type": "markdown",
 905 |    "metadata": {},
 906 |    "source": [
 907 |     "순전파의 편향 덧셈은 각각의 데이터(1번째 데이터, 2번째 데이터)에 더해짐\n",
 908 |     "\n",
 909 |     "역전파 때는 각 데이터의 역전파 값이 편향의 원소에 모여야 함"
 910 |    ]
 911 |   },
 912 |   {
 913 |    "cell_type": "code",
 914 |    "execution_count": 14,
 915 |    "metadata": {
 916 |     "collapsed": false
 917 |    },
 918 |    "outputs": [
 919 |     {
 920 |      "data": {
 921 |       "text/plain": [
 922 |        "array([[1, 2, 3],\n",
 923 |        "       [4, 5, 6]])"
 924 |       ]
 925 |      },
 926 |      "execution_count": 14,
 927 |      "metadata": {},
 928 |      "output_type": "execute_result"
 929 |     }
 930 |    ],
 931 |    "source": [
 932 |     "dY = np.array([[1, 2, 3], [4, 5, 6]])\n",
 933 |     "dY"
 934 |    ]
 935 |   },
 936 |   {
 937 |    "cell_type": "code",
 938 |    "execution_count": 15,
 939 |    "metadata": {
 940 |     "collapsed": false
 941 |    },
 942 |    "outputs": [
 943 |     {
 944 |      "data": {
 945 |       "text/plain": [
 946 |        "array([5, 7, 9])"
 947 |       ]
 948 |      },
 949 |      "execution_count": 15,
 950 |      "metadata": {},
 951 |      "output_type": "execute_result"
 952 |     }
 953 |    ],
 954 |    "source": [
 955 |     "dB = np.sum(dY, axis=0)\n",
 956 |     "dB"
 957 |    ]
 958 |   },
 959 |   {
 960 |    "cell_type": "markdown",
 961 |    "metadata": {},
 962 |    "source": [
 963 |     "np.sum()에서 0번째 축(데이터를 단위로 한 축)에 대해서 (axis=0)의 총합을 구함\n",
 964 |     "\n",
 965 |     "Affine 구현\n",
 966 |     "\n",
 967 |     "common/layer.py 파일의 Affine 구현은 입력 데이터가 텐서(4차원 데이터)인 경우도 고려. 다음 구현과 약간 차이가 있음."
 968 |    ]
 969 |   },
 970 |   {
 971 |    "cell_type": "code",
 972 |    "execution_count": 16,
 973 |    "metadata": {
 974 |     "collapsed": true
 975 |    },
 976 |    "outputs": [],
 977 |    "source": [
 978 |     "class Affine:\n",
 979 |     "    def __init__(self, W, b):\n",
 980 |     "        self.W = W\n",
 981 |     "        self.b = b\n",
 982 |     "        self.x = None\n",
 983 |     "        self.dW = None\n",
 984 |     "        self.db = None\n",
 985 |     "    \n",
 986 |     "    def forward(self, x):\n",
 987 |     "        self.x = x\n",
 988 |     "        out = np.dot(x, self.W) + self.b\n",
 989 |     "        \n",
 990 |     "        return out\n",
 991 |     "    \n",
 992 |     "    def backward(self, dout):\n",
 993 |     "        dx = np.dot(dout, self.W.T)\n",
 994 |     "        self.dW = np.dot(self.x.T, dout)\n",
 995 |     "        self.db = np.sum(dout, axis=0)\n",
 996 |     "        \n",
 997 |     "        return dx"
 998 |    ]
 999 |   },
1000 |   {
1001 |    "cell_type": "markdown",
1002 |    "metadata": {},
1003 |    "source": [
1004 |     "### 5.6.3 Softmax-with-Loss 계층\n",
1005 |     "\n",
1006 |     "소프트맥스 함수는 입력 값을 정규화하여 출력\n",
1007 |     "\n",
1008 |     "추론할 때는 일반적으로 Softmax 계층을 사용하지 않음\n",
1009 |     "\n",
1010 |     "점수(score): Softmax 앞의 Affine 계층의 출력\n",
1011 |     "\n",
1012 |     "신경망을 학습할 때는 Softmax 계층이 필요\n",
1013 |     "\n",
1014 |     "소프트맥스 계층 구현: 손실 함수인 교차 엔트로피 오차도 포함하여 'Softmax-with-Loss 계층'이라는 이름으로 구현\n",
1015 |     "\n",
1016 |     "Softmax 계층: 입력 (a1, a2, a3)를 정규화하여 (y1, y2, y3)를 출력\n",
1017 |     "\n",
1018 |     "Cross Entropy 계층: Softmax의 출력(y1, y2, y3)과 정답 레이블(t1, t2, t3)를 받고, 손실 L을 출력\n",
1019 |     "\n",
1020 |     "Softmax 계층의 역전파는 (y1-t1, y2-t2, y3-t3)로 말끔한 결과임\n",
1021 |     "\n",
1022 |     "Softmax 계층의 출력과 정답 레이블의 차분.\n",
1023 |     "\n",
1024 |     "신경망의 역전파에서는 이 차이인 오차가 앞 계층에 전해지는 것\n",
1025 |     "\n",
1026 |     "<u>소프트맥스 함수의 손실 함수로 교차 엔트로피 오차를 사용하니 역전파가 (y1-t1, y2-t2, y3-t3)로 말끔히 떨어짐</u>\n",
1027 |     "\n",
1028 |     "=> <u>교차 엔트로피 함수가 그렇게 설계되었기 때문</u>\n",
1029 |     "\n",
1030 |     "항등 함수의 손실 함수로 '평균 제곱 오차'를 사용하면 역전파의 결과가 말끔히 떨어짐\n",
1031 |     "\n",
1032 |     "구체적인 예\n",
1033 |     "\n",
1034 |     "정답 레이블 (0, 1, 0), 소프트맥스 계층이 (0.3, 0.2, 0.5)를 출력\n",
1035 |     "\n",
1036 |     "=> 소프트맥스 계층의 역전파는 (0.3, -0.8, 0.5)라는 커다란 오차를 전파\n",
1037 |     "\n",
1038 |     "정답 레이블 (0, 1, 0), 소프트맥스 계층이 (0.01, 0.99, 0)을 출력\n",
1039 |     "\n",
1040 |     "=> 소프트맥스 계층의 역전파가 보내는 오차는 (0.01, -0.01, 0)이 됨. 학습하는 정도가 작아짐\n",
1041 |     "\n",
1042 |     "Softmax-with-Loss 계층을 구현한 코드"
1043 |    ]
1044 |   },
1045 |   {
1046 |    "cell_type": "code",
1047 |    "execution_count": 17,
1048 |    "metadata": {
1049 |     "collapsed": false
1050 |    },
1051 |    "outputs": [],
1052 |    "source": [
1053 |     "# https://github.com/WegraLee/deep-learning-from-scratch/blob/master/common/functions.py 소스 참고\n",
1054 |     "# 3.5.2 소프트맥스 함수 구현시 주의점 참고\n",
1055 |     "def sigmoid(x):\n",
1056 |     "    return 1 / (1 + np.exp(-x))\n",
1057 |     "\n",
1058 |     "# 4.2.2. 교차 엔트로피 오차 참고\n",
1059 |     "def cross_entropy_error(y, t):\n",
1060 |     "    if y.ndim == 1:\n",
1061 |     "        t = t.reshape(1, t.size)\n",
1062 |     "        y = y.reshape(1, y.size)\n",
1063 |     "        \n",
1064 |     "    # 훈련 데이터가 원-핫 벡터라면 정답 레이블의 인덱스로 반환\n",
1065 |     "    if t.size == y.size:\n",
1066 |     "        t = t.argmax(axis=1)\n",
1067 |     "             \n",
1068 |     "    batch_size = y.shape[0]\n",
1069 |     "    return -np.sum(np.log(y[np.arange(batch_size), t])) / batch_size\n",
1070 |     "\n",
1071 |     "class SoftmaxWithLoss:\n",
1072 |     "    def __init__(self):\n",
1073 |     "        self.loss = None # 손실\n",
1074 |     "        self.y = None    # softmax의 출력\n",
1075 |     "        self.t = None    # 정답 레이블(원-핫 벡터)\n",
1076 |     "    \n",
1077 |     "    def forward(self, x, t):\n",
1078 |     "        self.t = t\n",
1079 |     "        self.y = softmax(x)\n",
1080 |     "        self.loss = cross_entropy_error(self.y, self.t)\n",
1081 |     "        return self.loss\n",
1082 |     "    \n",
1083 |     "    def backward(self, dout=1):\n",
1084 |     "        batch_size = self.t.shape[0]\n",
1085 |     "        dx = (self.y - self.t) / batch_size\n",
1086 |     "        \n",
1087 |     "        return dx"
1088 |    ]
1089 |   },
1090 |   {
1091 |    "cell_type": "markdown",
1092 |    "metadata": {},
1093 |    "source": [
1094 |     "주의. 역전파 때는 전파하는 값을 배치의 수(batch_size)로 나눠 데이터 1개당 오차를 앞 계층으로 전파함"
1095 |    ]
1096 |   },
1097 |   {
1098 |    "cell_type": "markdown",
1099 |    "metadata": {},
1100 |    "source": [
1101 |     "## 5.7 오차역전파법 구현하기"
1102 |    ]
1103 |   },
1104 |   {
1105 |    "cell_type": "markdown",
1106 |    "metadata": {},
1107 |    "source": [
1108 |     "### 5.7.1 신경망 학습의 전체 그림\n",
1109 |     "\n",
1110 |     "**전제**\n",
1111 |     "\n",
1112 |     "학습: 가중치와 편향을 훈련 데이터에 적응하도록 조정하는 과정\n",
1113 |     "\n",
1114 |     "**1단계 - 미니배치**\n",
1115 |     "\n",
1116 |     "미니배치: 훈련 데이터 중 일부를 무작위로 가져옴\n",
1117 |     "\n",
1118 |     "목표: 미니배치의 손실 함수 값을 줄이기\n",
1119 |     "\n",
1120 |     "**2단계 - 기울기 산출**\n",
1121 |     "\n",
1122 |     "가중치 매개변수의 기울기를 구함. 기울기는 손실 함수의 값을 가장 작게하는 방향을 제시\n",
1123 |     "\n",
1124 |     "**3단계 - 매개변수 갱신**\n",
1125 |     "\n",
1126 |     "가중치 매개변수를 기울기 방향으로 아주 조금 갱신\n",
1127 |     "\n",
1128 |     "**4단계 - 반복**\n",
1129 |     "\n",
1130 |     "1~3 단계를 반복\n",
1131 |     "\n",
1132 |     "<u>오차역전법이 등장하는 단계는 두 번째인 '기울기 산출'</u>\n",
1133 |     "\n",
1134 |     "<u>느린 수치 미분과 달리 기울기를 효율적이고 빠르게 구할 수 있음</u>"
1135 |    ]
1136 |   },
1137 |   {
1138 |    "cell_type": "markdown",
1139 |    "metadata": {},
1140 |    "source": [
1141 |     "### 5.7.2 오차역전파법을 적용한 신경망 구현하기\n",
1142 |     "\n",
1143 |     "계층을 사용함으로써 \n",
1144 |     "\n",
1145 |     "인식 결과를 얻는 처리(predict())와 기울기를 구하는 처리(gradient()) 계층의 전파만으로 동작이 이루어짐."
1146 |    ]
1147 |   },
1148 |   {
1149 |    "cell_type": "code",
1150 |    "execution_count": 18,
1151 |    "metadata": {
1152 |     "collapsed": false
1153 |    },
1154 |    "outputs": [],
1155 |    "source": [
1156 |     "# https://github.com/WegraLee/deep-learning-from-scratch/blob/master/ch05/two_layer_net.py 참고\n",
1157 |     "# coding: utf-8\n",
1158 |     "#import sys, os\n",
1159 |     "#sys.path.append(os.pardir)  # 부모 디렉터리의 파일을 가져올 수 있도록 설정\n",
1160 |     "import numpy as np\n",
1161 |     "#from common.layers import *\n",
1162 |     "#from common.gradient import numerical_gradient\n",
1163 |     "from collections import OrderedDict\n",
1164 |     "\n",
1165 |     "# https://github.com/WegraLee/deep-learning-from-scratch/blob/master/common/functions.py\n",
1166 |     "def softmax(x):\n",
1167 |     "    if x.ndim == 2:\n",
1168 |     "        x = x.T\n",
1169 |     "        x = x - np.max(x, axis=0)\n",
1170 |     "        y = np.exp(x) / np.sum(np.exp(x), axis=0)\n",
1171 |     "        return y.T \n",
1172 |     "\n",
1173 |     "    x = x - np.max(x) # 오버플로 대책\n",
1174 |     "    return np.exp(x) / np.sum(np.exp(x))\n",
1175 |     "\n",
1176 |     "# https://github.com/WegraLee/deep-learning-from-scratch/blob/master/common/gradient.py 참고\n",
1177 |     "def numerical_gradient(f, x):\n",
1178 |     "    h = 1e-4 # 0.0001\n",
1179 |     "    grad = np.zeros_like(x)\n",
1180 |     "    \n",
1181 |     "    it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])\n",
1182 |     "    while not it.finished:\n",
1183 |     "        idx = it.multi_index\n",
1184 |     "        tmp_val = x[idx]\n",
1185 |     "        x[idx] = float(tmp_val) + h\n",
1186 |     "        fxh1 = f(x) # f(x+h)\n",
1187 |     "        \n",
1188 |     "        x[idx] = tmp_val - h \n",
1189 |     "        fxh2 = f(x) # f(x-h)\n",
1190 |     "        grad[idx] = (fxh1 - fxh2) / (2*h)\n",
1191 |     "        \n",
1192 |     "        x[idx] = tmp_val # 값 복원\n",
1193 |     "        it.iternext()   \n",
1194 |     "        \n",
1195 |     "    return grad\n",
1196 |     "\n",
1197 |     "\n",
1198 |     "class TwoLayerNet:\n",
1199 |     "\n",
1200 |     "    def __init__(self, input_size, hidden_size, output_size, weight_init_std = 0.01):\n",
1201 |     "        # 가중치 초기화\n",
1202 |     "        self.params = {}\n",
1203 |     "        self.params['W1'] = weight_init_std * np.random.randn(input_size, hidden_size)\n",
1204 |     "        self.params['b1'] = np.zeros(hidden_size)\n",
1205 |     "        self.params['W2'] = weight_init_std * np.random.randn(hidden_size, output_size) \n",
1206 |     "        self.params['b2'] = np.zeros(output_size)\n",
1207 |     "\n",
1208 |     "        # 계층 생성\n",
1209 |     "        self.layers = OrderedDict()                                           ###\n",
1210 |     "        self.layers['Affine1'] = Affine(self.params['W1'], self.params['b1']) ###\n",
1211 |     "        self.layers['Relu1'] = Relu()                                         ###\n",
1212 |     "        self.layers['Affine2'] = Affine(self.params['W2'], self.params['b2']) ###\n",
1213 |     "\n",
1214 |     "        self.lastLayer = SoftmaxWithLoss()                                    ###\n",
1215 |     "        \n",
1216 |     "    def predict(self, x):\n",
1217 |     "        for layer in self.layers.values():                                    ###\n",
1218 |     "            x = layer.forward(x)                                              ###\n",
1219 |     "        \n",
1220 |     "        return x\n",
1221 |     "        \n",
1222 |     "    # x : 입력 데이터, t : 정답 레이블\n",
1223 |     "    def loss(self, x, t):\n",
1224 |     "        y = self.predict(x)\n",
1225 |     "        return self.lastLayer.forward(y, t)\n",
1226 |     "    \n",
1227 |     "    def accuracy(self, x, t):\n",
1228 |     "        y = self.predict(x)\n",
1229 |     "        y = np.argmax(y, axis=1)\n",
1230 |     "        if t.ndim != 1 : t = np.argmax(t, axis=1)\n",
1231 |     "        \n",
1232 |     "        accuracy = np.sum(y == t) / float(x.shape[0])\n",
1233 |     "        return accuracy\n",
1234 |     "        \n",
1235 |     "    # x : 입력 데이터, t : 정답 레이블\n",
1236 |     "    def numerical_gradient(self, x, t):\n",
1237 |     "        loss_W = lambda W: self.loss(x, t)\n",
1238 |     "        \n",
1239 |     "        grads = {}\n",
1240 |     "        grads['W1'] = numerical_gradient(loss_W, self.params['W1'])\n",
1241 |     "        grads['b1'] = numerical_gradient(loss_W, self.params['b1'])\n",
1242 |     "        grads['W2'] = numerical_gradient(loss_W, self.params['W2'])\n",
1243 |     "        grads['b2'] = numerical_gradient(loss_W, self.params['b2'])\n",
1244 |     "        \n",
1245 |     "        return grads\n",
1246 |     "        \n",
1247 |     "    def gradient(self, x, t):\n",
1248 |     "        # forward\n",
1249 |     "        self.loss(x, t)                      ###\n",
1250 |     "\n",
1251 |     "        # backward\n",
1252 |     "        dout = 1                             ###\n",
1253 |     "        dout = self.lastLayer.backward(dout) ###\n",
1254 |     "        \n",
1255 |     "        layers = list(self.layers.values())  ###\n",
1256 |     "        layers.reverse()                     ###\n",
1257 |     "        for layer in layers:                 ###\n",
1258 |     "            dout = layer.backward(dout)      ###\n",
1259 |     "\n",
1260 |     "        # 결과 저장\n",
1261 |     "        grads = {}\n",
1262 |     "        grads['W1'], grads['b1'] = self.layers['Affine1'].dW, self.layers['Affine1'].db\n",
1263 |     "        grads['W2'], grads['b2'] = self.layers['Affine2'].dW, self.layers['Affine2'].db\n",
1264 |     "\n",
1265 |     "        return grads"
1266 |    ]
1267 |   },
1268 |   {
1269 |    "cell_type": "markdown",
1270 |    "metadata": {},
1271 |    "source": [
1272 |     "\\### 으로 중요코드 표시. 집중해서 살펴보세요.\n",
1273 |     "\n",
1274 |     "OrderedDict: 딕셔너리에 추가한 순서를 기억하는 (순서가 있는) 딕셔너리\n",
1275 |     "\n",
1276 |     "순전파 때는 추가한 순서대로 각 계층의 forward() 메서드를 호출\n",
1277 |     "\n",
1278 |     "역전파 때는 계층을 반대 순서로 호출\n",
1279 |     "\n",
1280 |     "신경망의 구성 요소를 '계층'으로 구현한 덕분에 신경망을 쉽게 구축\n",
1281 |     "\n",
1282 |     "=> 레고 블록을 조립하듯 필요한 만큼 계층을 더 추가하면 됨"
1283 |    ]
1284 |   },
1285 |   {
1286 |    "cell_type": "markdown",
1287 |    "metadata": {},
1288 |    "source": [
1289 |     "### 5.7.3 오차역전파법으로 구한 기울기 검증하기\n",
1290 |     "\n",
1291 |     "수치미분은 느립니다."
1292 |    ]
1293 |   },
1294 |   {
1295 |    "cell_type": "code",
1296 |    "execution_count": 19,
1297 |    "metadata": {
1298 |     "collapsed": true
1299 |    },
1300 |    "outputs": [],
1301 |    "source": [
1302 |     "from dataset.mnist import load_mnist\n",
1303 |     "(x_train, t_train), (x_test, t_test) = load_mnist(normalize=True, one_hot_label=True)\n",
1304 |     "network = TwoLayerNet(input_size=784, hidden_size=50, output_size=10)\n",
1305 |     "x_batch = x_train[:3]\n",
1306 |     "t_batch = t_train[:3]"
1307 |    ]
1308 |   },
1309 |   {
1310 |    "cell_type": "code",
1311 |    "execution_count": 20,
1312 |    "metadata": {
1313 |     "collapsed": false
1314 |    },
1315 |    "outputs": [
1316 |     {
1317 |      "name": "stdout",
1318 |      "output_type": "stream",
1319 |      "text": [
1320 |       "1 loop, best of 3: 14.1 s per loop\n"
1321 |      ]
1322 |     }
1323 |    ],
1324 |    "source": [
1325 |     "%timeit network.numerical_gradient(x_batch, t_batch)"
1326 |    ]
1327 |   },
1328 |   {
1329 |    "cell_type": "code",
1330 |    "execution_count": 21,
1331 |    "metadata": {
1332 |     "collapsed": false
1333 |    },
1334 |    "outputs": [
1335 |     {
1336 |      "name": "stdout",
1337 |      "output_type": "stream",
1338 |      "text": [
1339 |       "The slowest run took 16.66 times longer than the fastest. This could mean that an intermediate result is being cached.\n",
1340 |       "1000 loops, best of 3: 470 µs per loop\n"
1341 |      ]
1342 |     }
1343 |    ],
1344 |    "source": [
1345 |     "%timeit network.gradient(x_batch, t_batch)"
1346 |    ]
1347 |   },
1348 |   {
1349 |    "cell_type": "markdown",
1350 |    "metadata": {},
1351 |    "source": [
1352 |     "수치미분(numerical_gradient) 속도: 9.95초, 14.1초\n",
1353 |     "\n",
1354 |     "오차역전법(gradient) 속도: 248 µs(0.000248초), 470 µs(0.000470초)\n",
1355 |     "\n",
1356 |     "약 42,000배 속도차이가 남"
1357 |    ]
1358 |   },
1359 |   {
1360 |    "cell_type": "markdown",
1361 |    "metadata": {},
1362 |    "source": [
1363 |     "수치 미분을 오차역전파법을 정확히 구현했는지 확인하기 위해 필요.\n",
1364 |     "\n",
1365 |     "수치 미분의 이점은 구현하기 쉬움\n",
1366 |     "\n",
1367 |     "기울기 확인(gradient check): 수치 미분의 결과와 오차역전파법의 결과를 비교하여 오차역전파법을 제대로 구현했는지 검증함."
1368 |    ]
1369 |   },
1370 |   {
1371 |    "cell_type": "code",
1372 |    "execution_count": 22,
1373 |    "metadata": {
1374 |     "collapsed": false
1375 |    },
1376 |    "outputs": [
1377 |     {
1378 |      "name": "stdout",
1379 |      "output_type": "stream",
1380 |      "text": [
1381 |       "b2:1.20126118774e-10\n",
1382 |       "W1:2.80100167994e-13\n",
1383 |       "W2:9.12804904606e-13\n",
1384 |       "b1:7.24036213471e-13\n"
1385 |      ]
1386 |     }
1387 |    ],
1388 |    "source": [
1389 |     "# https://github.com/WegraLee/deep-learning-from-scratch/blob/master/ch05/gradient_check.py 참고\n",
1390 |     "# coding: utf-8\n",
1391 |     "#import sys, os\n",
1392 |     "#sys.path.append(os.pardir)  # 부모 디렉터리의 파일을 가져올 수 있도록 설정\n",
1393 |     "import numpy as np\n",
1394 |     "from dataset.mnist import load_mnist\n",
1395 |     "\n",
1396 |     "# 데이터 읽기\n",
1397 |     "(x_train, t_train), (x_test, t_test) = load_mnist(normalize=True, one_hot_label=True)\n",
1398 |     "\n",
1399 |     "network = TwoLayerNet(input_size=784, hidden_size=50, output_size=10)\n",
1400 |     "\n",
1401 |     "x_batch = x_train[:3]\n",
1402 |     "t_batch = t_train[:3]\n",
1403 |     "\n",
1404 |     "grad_numerical = network.numerical_gradient(x_batch, t_batch)\n",
1405 |     "grad_backprop = network.gradient(x_batch, t_batch)\n",
1406 |     "\n",
1407 |     "# 각 가중치의 절대 오차의 평균을 구한다.\n",
1408 |     "for key in grad_numerical.keys():\n",
1409 |     "    diff = np.average( np.abs(grad_backprop[key] - grad_numerical[key]) )\n",
1410 |     "    print(key + \":\" + str(diff))"
1411 |    ]
1412 |   },
1413 |   {
1414 |    "cell_type": "markdown",
1415 |    "metadata": {},
1416 |    "source": [
1417 |     "이 결과는 수치 미분과 오차역전파법으로 구한 기울기의 차이가 매우 작다고 말해줌\n",
1418 |     "\n",
1419 |     "오차역전법이 실수 없이 구현했다는 믿음이 커짐\n",
1420 |     "\n",
1421 |     "수치 미분과 오차역전파법의 결과 오차가 0이 되는 일은 드뭄\n",
1422 |     "\n",
1423 |     "올바르게 구현했다면 0에 아주 가까운 작은 값이 됨"
1424 |    ]
1425 |   },
1426 |   {
1427 |    "cell_type": "markdown",
1428 |    "metadata": {},
1429 |    "source": [
1430 |     "### 5.7.4 오차역전파법을 사용한 학습 구현하기"
1431 |    ]
1432 |   },
1433 |   {
1434 |    "cell_type": "code",
1435 |    "execution_count": 23,
1436 |    "metadata": {
1437 |     "collapsed": false
1438 |    },
1439 |    "outputs": [
1440 |     {
1441 |      "name": "stdout",
1442 |      "output_type": "stream",
1443 |      "text": [
1444 |       "0.1359 0.1349\n",
1445 |       "0.898666666667 0.9015\n",
1446 |       "0.921233333333 0.9229\n",
1447 |       "0.935483333333 0.9355\n",
1448 |       "0.946366666667 0.9449\n",
1449 |       "0.95215 0.9502\n",
1450 |       "0.956916666667 0.9527\n",
1451 |       "0.96005 0.9557\n",
1452 |       "0.9626 0.9573\n",
1453 |       "0.966833333333 0.9597\n",
1454 |       "0.968366666667 0.9616\n",
1455 |       "0.9704 0.9622\n",
1456 |       "0.971483333333 0.963\n",
1457 |       "0.974283333333 0.9663\n",
1458 |       "0.976 0.9669\n",
1459 |       "0.977116666667 0.967\n",
1460 |       "0.978 0.9677\n"
1461 |      ]
1462 |     }
1463 |    ],
1464 |    "source": [
1465 |     "# coding: utf-8\n",
1466 |     "#import sys, os\n",
1467 |     "#sys.path.append(os.pardir)\n",
1468 |     "import numpy as np\n",
1469 |     "from dataset.mnist import load_mnist\n",
1470 |     "#from two_layer_net import TwoLayerNet\n",
1471 |     "\n",
1472 |     "# 데이터 읽기\n",
1473 |     "(x_train, t_train), (x_test, t_test) = load_mnist(normalize=True, one_hot_label=True)\n",
1474 |     "\n",
1475 |     "network = TwoLayerNet(input_size=784, hidden_size=50, output_size=10)\n",
1476 |     "\n",
1477 |     "iters_num = 10000\n",
1478 |     "train_size = x_train.shape[0]\n",
1479 |     "batch_size = 100\n",
1480 |     "learning_rate = 0.1\n",
1481 |     "\n",
1482 |     "train_loss_list = []\n",
1483 |     "train_acc_list = []\n",
1484 |     "test_acc_list = []\n",
1485 |     "\n",
1486 |     "iter_per_epoch = max(train_size / batch_size, 1)\n",
1487 |     "\n",
1488 |     "for i in range(iters_num):\n",
1489 |     "    batch_mask = np.random.choice(train_size, batch_size)\n",
1490 |     "    x_batch = x_train[batch_mask]\n",
1491 |     "    t_batch = t_train[batch_mask]\n",
1492 |     "    \n",
1493 |     "    # 기울기 계산\n",
1494 |     "    #grad = network.numerical_gradient(x_batch, t_batch) # 수치 미분 방식\n",
1495 |     "    grad = network.gradient(x_batch, t_batch) # 오차역전파법 방식(훨씬 빠르다)\n",
1496 |     "    \n",
1497 |     "    # 갱신\n",
1498 |     "    for key in ('W1', 'b1', 'W2', 'b2'):\n",
1499 |     "        network.params[key] -= learning_rate * grad[key]\n",
1500 |     "    \n",
1501 |     "    loss = network.loss(x_batch, t_batch)\n",
1502 |     "    train_loss_list.append(loss)\n",
1503 |     "    \n",
1504 |     "    if i % iter_per_epoch == 0:\n",
1505 |     "        train_acc = network.accuracy(x_train, t_train)\n",
1506 |     "        test_acc = network.accuracy(x_test, t_test)\n",
1507 |     "        train_acc_list.append(train_acc)\n",
1508 |     "        test_acc_list.append(test_acc)\n",
1509 |     "        print(train_acc, test_acc)"
1510 |    ]
1511 |   },
1512 |   {
1513 |    "cell_type": "markdown",
1514 |    "metadata": {},
1515 |    "source": [
1516 |     "## 5.8 정리\n",
1517 |     "\n",
1518 |     "계산 그래프를 이용하여 신경망의 동작과 오차역전파법을 설명\n",
1519 |     "\n",
1520 |     "모든 계층에서 forward와 backward 메서드를 구현\n",
1521 |     "\n",
1522 |     "forward는 데이터를 순방향으로 backward는 역방향으로 전파함\n",
1523 |     "\n",
1524 |     "가중치 매개변수의 기울기를 효율적으로 구할 수 있음"
1525 |    ]
1526 |   },
1527 |   {
1528 |    "cell_type": "markdown",
1529 |    "metadata": {},
1530 |    "source": [
1531 |     "**이번 장에서 배운 것**\n",
1532 |     "\n",
1533 |     "계산그래프를 이용하면 계산 과정을 시각적으로 파악 가능\n",
1534 |     "\n",
1535 |     "계산그래프 노드는 국소적 계산으로 구성. 국소적 계산을 조합해 전체 계산을 구성\n",
1536 |     "\n",
1537 |     "순전파는 통상의 계산을 수행. 역전파는 노드의 미분을 구함\n",
1538 |     "\n",
1539 |     "오차역전파법: 신경망의 구성 요소를 계층으로 구현하여 기울기를 효율적으로 계산\n",
1540 |     "\n",
1541 |     "기울기 확인: 수치 미분과 오차역전파법의 결과를 비교하면 오차역전파법 구현에 잘못이 없는지 확인가능"
1542 |    ]
1543 |   }
1544 |  ],
1545 |  "metadata": {
1546 |   "anaconda-cloud": {},
1547 |   "kernelspec": {
1548 |    "display_name": "Python [Root]",
1549 |    "language": "python",
1550 |    "name": "Python [Root]"
1551 |   },
1552 |   "language_info": {
1553 |    "codemirror_mode": {
1554 |     "name": "ipython",
1555 |     "version": 3
1556 |    },
1557 |    "file_extension": ".py",
1558 |    "mimetype": "text/x-python",
1559 |    "name": "python",
1560 |    "nbconvert_exporter": "python",
1561 |    "pygments_lexer": "ipython3",
1562 |    "version": "3.5.2"
1563 |   }
1564 |  },
1565 |  "nbformat": 4,
1566 |  "nbformat_minor": 0
1567 | }
1568 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # 밑바닥부터 시작하는 딥러닝
 2 |     
 3 | # Deep Learning from Scratch
 4 |     
 5 | '밑바닥부터 시작하는 딥러닝' 공부한 내용을 jupyter notebook으로 정리하였습니다.
 6 |     
 7 | ## Github
 8 |     
 9 | https://github.com/WegraLee/deep-learning-from-scratch
10 |     
11 | ## 책주소
12 | 
13 | http://www.hanbit.co.kr/store/books/look.php?p_code=B8475831198
14 |     
15 | ![title](http://www.hanbit.co.kr/data/books/B8475831198_l.jpg)
16 |     
17 | ## 1장
18 |     
19 | http://nbviewer.jupyter.org/github/SDRLurker/deep-learning/blob/master/1장.ipynb
20 |     
21 | ## 2장
22 |     
23 | http://nbviewer.jupyter.org/github/SDRLurker/deep-learning/blob/master/2장.ipynb
24 |     
25 | ## 3장
26 |     
27 | http://nbviewer.jupyter.org/github/SDRLurker/deep-learning/blob/master/3장.ipynb
28 |     
29 | ## 4장
30 |     
31 | http://nbviewer.jupyter.org/github/SDRLurker/deep-learning/blob/master/4장.ipynb
32 |     
33 | ## 5장
34 |     
35 | http://nbviewer.jupyter.org/github/SDRLurker/deep-learning/blob/master/5장.ipynb
36 |     
37 | ## 6장
38 |     
39 | http://nbviewer.jupyter.org/github/SDRLurker/deep-learning/blob/master/6장.ipynb
40 |    
41 | 


--------------------------------------------------------------------------------
/common/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SDRLurker/deep-learning/f95b0a2c7c4ccec3c7395ad5a648c7664b168866/common/__init__.py


--------------------------------------------------------------------------------
/common/functions.py:
--------------------------------------------------------------------------------
 1 | # coding: utf-8
 2 | import numpy as np
 3 | 
 4 | 
 5 | def identity_function(x):
 6 |     return x
 7 | 
 8 | 
 9 | def step_function(x):
10 |     return np.array(x > 0, dtype=np.int)
11 | 
12 | 
13 | def sigmoid(x):
14 |     return 1 / (1 + np.exp(-x))    
15 | 
16 | 
17 | def sigmoid_grad(x):
18 |     return (1.0 - sigmoid(x)) * sigmoid(x)
19 |     
20 | 
21 | def relu(x):
22 |     return np.maximum(0, x)
23 | 
24 | 
25 | def relu_grad(x):
26 |     grad = np.zeros(x)
27 |     grad[x>=0] = 1
28 |     return grad
29 |     
30 | 
31 | def softmax(x):
32 |     if x.ndim == 2:
33 |         x = x.T
34 |         x = x - np.max(x, axis=0)
35 |         y = np.exp(x) / np.sum(np.exp(x), axis=0)
36 |         return y.T 
37 | 
38 |     x = x - np.max(x) # 오버플로 대책
39 |     return np.exp(x) / np.sum(np.exp(x))
40 | 
41 | 
42 | def mean_squared_error(y, t):
43 |     return 0.5 * np.sum((y-t)**2)
44 | 
45 | 
46 | def cross_entropy_error(y, t):
47 |     if y.ndim == 1:
48 |         t = t.reshape(1, t.size)
49 |         y = y.reshape(1, y.size)
50 |         
51 |     # 훈련 데이터가 원-핫 벡터라면 정답 레이블의 인덱스로 반환
52 |     if t.size == y.size:
53 |         t = t.argmax(axis=1)
54 |              
55 |     batch_size = y.shape[0]
56 |     return -np.sum(np.log(y[np.arange(batch_size), t])) / batch_size
57 | 
58 | 
59 | def softmax_loss(X, t):
60 |     y = softmax(X)
61 |     return cross_entropy_error(y, t)


--------------------------------------------------------------------------------
/common/gradient.py:
--------------------------------------------------------------------------------
 1 | # coding: utf-8
 2 | import numpy as np
 3 | 
 4 | def _numerical_gradient_1d(f, x):
 5 |     h = 1e-4 # 0.0001
 6 |     grad = np.zeros_like(x)
 7 |     
 8 |     for idx in range(x.size):
 9 |         tmp_val = x[idx]
10 |         x[idx] = float(tmp_val) + h
11 |         fxh1 = f(x) # f(x+h)
12 |         
13 |         x[idx] = tmp_val - h 
14 |         fxh2 = f(x) # f(x-h)
15 |         grad[idx] = (fxh1 - fxh2) / (2*h)
16 |         
17 |         x[idx] = tmp_val # 값 복원
18 |         
19 |     return grad
20 | 
21 | 
22 | def numerical_gradient_2d(f, X):
23 |     if X.ndim == 1:
24 |         return _numerical_gradient_1d(f, X)
25 |     else:
26 |         grad = np.zeros_like(X)
27 |         
28 |         for idx, x in enumerate(X):
29 |             grad[idx] = _numerical_gradient_1d(f, x)
30 |         
31 |         return grad
32 | 
33 | 
34 | def numerical_gradient(f, x):
35 |     h = 1e-4 # 0.0001
36 |     grad = np.zeros_like(x)
37 |     
38 |     it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])
39 |     while not it.finished:
40 |         idx = it.multi_index
41 |         tmp_val = x[idx]
42 |         x[idx] = float(tmp_val) + h
43 |         fxh1 = f(x) # f(x+h)
44 |         
45 |         x[idx] = tmp_val - h 
46 |         fxh2 = f(x) # f(x-h)
47 |         grad[idx] = (fxh1 - fxh2) / (2*h)
48 |         
49 |         x[idx] = tmp_val # 값 복원
50 |         it.iternext()   
51 |         
52 |     return grad


--------------------------------------------------------------------------------
/common/layers.py:
--------------------------------------------------------------------------------
  1 | # coding: utf-8
  2 | import numpy as np
  3 | from common.functions import *
  4 | from common.util import im2col, col2im
  5 | 
  6 | 
  7 | class Relu:
  8 |     def __init__(self):
  9 |         self.mask = None
 10 | 
 11 |     def forward(self, x):
 12 |         self.mask = (x <= 0)
 13 |         out = x.copy()
 14 |         out[self.mask] = 0
 15 | 
 16 |         return out
 17 | 
 18 |     def backward(self, dout):
 19 |         dout[self.mask] = 0
 20 |         dx = dout
 21 | 
 22 |         return dx
 23 | 
 24 | 
 25 | class Sigmoid:
 26 |     def __init__(self):
 27 |         self.out = None
 28 | 
 29 |     def forward(self, x):
 30 |         out = sigmoid(x)
 31 |         self.out = out
 32 |         return out
 33 | 
 34 |     def backward(self, dout):
 35 |         dx = dout * (1.0 - self.out) * self.out
 36 | 
 37 |         return dx
 38 | 
 39 | 
 40 | class Affine:
 41 |     def __init__(self, W, b):
 42 |         self.W = W
 43 |         self.b = b
 44 |         
 45 |         self.x = None
 46 |         self.original_x_shape = None
 47 |         # 가중치와 편향 매개변수의 미분
 48 |         self.dW = None
 49 |         self.db = None
 50 | 
 51 |     def forward(self, x):
 52 |         # 텐서 대응
 53 |         self.original_x_shape = x.shape
 54 |         x = x.reshape(x.shape[0], -1)
 55 |         self.x = x
 56 | 
 57 |         out = np.dot(self.x, self.W) + self.b
 58 | 
 59 |         return out
 60 | 
 61 |     def backward(self, dout):
 62 |         dx = np.dot(dout, self.W.T)
 63 |         self.dW = np.dot(self.x.T, dout)
 64 |         self.db = np.sum(dout, axis=0)
 65 |         
 66 |         dx = dx.reshape(*self.original_x_shape)  # 입력 데이터 모양 변경(텐서 대응)
 67 |         return dx
 68 | 
 69 | 
 70 | class SoftmaxWithLoss:
 71 |     def __init__(self):
 72 |         self.loss = None # 손실함수
 73 |         self.y = None    # softmax의 출력
 74 |         self.t = None    # 정답 레이블(원-핫 인코딩 형태)
 75 |         
 76 |     def forward(self, x, t):
 77 |         self.t = t
 78 |         self.y = softmax(x)
 79 |         self.loss = cross_entropy_error(self.y, self.t)
 80 |         
 81 |         return self.loss
 82 | 
 83 |     def backward(self, dout=1):
 84 |         batch_size = self.t.shape[0]
 85 |         if self.t.size == self.y.size: # 정답 레이블이 원-핫 인코딩 형태일 때
 86 |             dx = (self.y - self.t) / batch_size
 87 |         else:
 88 |             dx = self.y.copy()
 89 |             dx[np.arange(batch_size), self.t] -= 1
 90 |             dx = dx / batch_size
 91 |         
 92 |         return dx
 93 | 
 94 | 
 95 | class Dropout:
 96 |     """
 97 |     http://arxiv.org/abs/1207.0580
 98 |     """
 99 |     def __init__(self, dropout_ratio=0.5):
100 |         self.dropout_ratio = dropout_ratio
101 |         self.mask = None
102 | 
103 |     def forward(self, x, train_flg=True):
104 |         if train_flg:
105 |             self.mask = np.random.rand(*x.shape) > self.dropout_ratio
106 |             return x * self.mask
107 |         else:
108 |             return x * (1.0 - self.dropout_ratio)
109 | 
110 |     def backward(self, dout):
111 |         return dout * self.mask
112 | 
113 | 
114 | class BatchNormalization:
115 |     """
116 |     http://arxiv.org/abs/1502.03167
117 |     """
118 |     def __init__(self, gamma, beta, momentum=0.9, running_mean=None, running_var=None):
119 |         self.gamma = gamma
120 |         self.beta = beta
121 |         self.momentum = momentum
122 |         self.input_shape = None # 합성곱 계층은 4차원, 완전연결 계층은 2차원  
123 | 
124 |         # 시험할 때 사용할 평균과 분산
125 |         self.running_mean = running_mean
126 |         self.running_var = running_var  
127 |         
128 |         # backward 시에 사용할 중간 데이터
129 |         self.batch_size = None
130 |         self.xc = None
131 |         self.std = None
132 |         self.dgamma = None
133 |         self.dbeta = None
134 | 
135 |     def forward(self, x, train_flg=True):
136 |         self.input_shape = x.shape
137 |         if x.ndim != 2:
138 |             N, C, H, W = x.shape
139 |             x = x.reshape(N, -1)
140 | 
141 |         out = self.__forward(x, train_flg)
142 |         
143 |         return out.reshape(*self.input_shape)
144 |             
145 |     def __forward(self, x, train_flg):
146 |         if self.running_mean is None:
147 |             N, D = x.shape
148 |             self.running_mean = np.zeros(D)
149 |             self.running_var = np.zeros(D)
150 |                         
151 |         if train_flg:
152 |             mu = x.mean(axis=0)
153 |             xc = x - mu
154 |             var = np.mean(xc**2, axis=0)
155 |             std = np.sqrt(var + 10e-7)
156 |             xn = xc / std
157 |             
158 |             self.batch_size = x.shape[0]
159 |             self.xc = xc
160 |             self.xn = xn
161 |             self.std = std
162 |             self.running_mean = self.momentum * self.running_mean + (1-self.momentum) * mu
163 |             self.running_var = self.momentum * self.running_var + (1-self.momentum) * var            
164 |         else:
165 |             xc = x - self.running_mean
166 |             xn = xc / ((np.sqrt(self.running_var + 10e-7)))
167 |             
168 |         out = self.gamma * xn + self.beta 
169 |         return out
170 | 
171 |     def backward(self, dout):
172 |         if dout.ndim != 2:
173 |             N, C, H, W = dout.shape
174 |             dout = dout.reshape(N, -1)
175 | 
176 |         dx = self.__backward(dout)
177 | 
178 |         dx = dx.reshape(*self.input_shape)
179 |         return dx
180 | 
181 |     def __backward(self, dout):
182 |         dbeta = dout.sum(axis=0)
183 |         dgamma = np.sum(self.xn * dout, axis=0)
184 |         dxn = self.gamma * dout
185 |         dxc = dxn / self.std
186 |         dstd = -np.sum((dxn * self.xc) / (self.std * self.std), axis=0)
187 |         dvar = 0.5 * dstd / self.std
188 |         dxc += (2.0 / self.batch_size) * self.xc * dvar
189 |         dmu = np.sum(dxc, axis=0)
190 |         dx = dxc - dmu / self.batch_size
191 |         
192 |         self.dgamma = dgamma
193 |         self.dbeta = dbeta
194 |         
195 |         return dx
196 | 
197 | 
198 | class Convolution:
199 |     def __init__(self, W, b, stride=1, pad=0):
200 |         self.W = W
201 |         self.b = b
202 |         self.stride = stride
203 |         self.pad = pad
204 |         
205 |         # 중간 데이터（backward 시 사용）
206 |         self.x = None   
207 |         self.col = None
208 |         self.col_W = None
209 |         
210 |         # 가중치와 편향 매개변수의 기울기
211 |         self.dW = None
212 |         self.db = None
213 | 
214 |     def forward(self, x):
215 |         FN, C, FH, FW = self.W.shape
216 |         N, C, H, W = x.shape
217 |         out_h = 1 + int((H + 2*self.pad - FH) / self.stride)
218 |         out_w = 1 + int((W + 2*self.pad - FW) / self.stride)
219 | 
220 |         col = im2col(x, FH, FW, self.stride, self.pad)
221 |         col_W = self.W.reshape(FN, -1).T
222 | 
223 |         out = np.dot(col, col_W) + self.b
224 |         out = out.reshape(N, out_h, out_w, -1).transpose(0, 3, 1, 2)
225 | 
226 |         self.x = x
227 |         self.col = col
228 |         self.col_W = col_W
229 | 
230 |         return out
231 | 
232 |     def backward(self, dout):
233 |         FN, C, FH, FW = self.W.shape
234 |         dout = dout.transpose(0,2,3,1).reshape(-1, FN)
235 | 
236 |         self.db = np.sum(dout, axis=0)
237 |         self.dW = np.dot(self.col.T, dout)
238 |         self.dW = self.dW.transpose(1, 0).reshape(FN, C, FH, FW)
239 | 
240 |         dcol = np.dot(dout, self.col_W.T)
241 |         dx = col2im(dcol, self.x.shape, FH, FW, self.stride, self.pad)
242 | 
243 |         return dx
244 | 
245 | 
246 | class Pooling:
247 |     def __init__(self, pool_h, pool_w, stride=1, pad=0):
248 |         self.pool_h = pool_h
249 |         self.pool_w = pool_w
250 |         self.stride = stride
251 |         self.pad = pad
252 |         
253 |         self.x = None
254 |         self.arg_max = None
255 | 
256 |     def forward(self, x):
257 |         N, C, H, W = x.shape
258 |         out_h = int(1 + (H - self.pool_h) / self.stride)
259 |         out_w = int(1 + (W - self.pool_w) / self.stride)
260 | 
261 |         col = im2col(x, self.pool_h, self.pool_w, self.stride, self.pad)
262 |         col = col.reshape(-1, self.pool_h*self.pool_w)
263 | 
264 |         arg_max = np.argmax(col, axis=1)
265 |         out = np.max(col, axis=1)
266 |         out = out.reshape(N, out_h, out_w, C).transpose(0, 3, 1, 2)
267 | 
268 |         self.x = x
269 |         self.arg_max = arg_max
270 | 
271 |         return out
272 | 
273 |     def backward(self, dout):
274 |         dout = dout.transpose(0, 2, 3, 1)
275 |         
276 |         pool_size = self.pool_h * self.pool_w
277 |         dmax = np.zeros((dout.size, pool_size))
278 |         dmax[np.arange(self.arg_max.size), self.arg_max.flatten()] = dout.flatten()
279 |         dmax = dmax.reshape(dout.shape + (pool_size,)) 
280 |         
281 |         dcol = dmax.reshape(dmax.shape[0] * dmax.shape[1] * dmax.shape[2], -1)
282 |         dx = col2im(dcol, self.x.shape, self.pool_h, self.pool_w, self.stride, self.pad)
283 |         
284 |         return dx


--------------------------------------------------------------------------------
/common/multi_layer_net.py:
--------------------------------------------------------------------------------
  1 | # coding: utf-8
  2 | import sys, os
  3 | sys.path.append(os.pardir)  # 부모 디렉터리의 파일을 가져올 수 있도록 설정
  4 | import numpy as np
  5 | from collections import OrderedDict
  6 | from common.layers import *
  7 | from common.gradient import numerical_gradient
  8 | 
  9 | 
 10 | class MultiLayerNet:
 11 |     """완전연결 다층 신경망
 12 |     Parameters
 13 |     ----------
 14 |     input_size : 입력 크기（MNIST의 경우엔 784）
 15 |     hidden_size_list : 각 은닉층의 뉴런 수를 담은 리스트（e.g. [100, 100, 100]）
 16 |     output_size : 출력 크기（MNIST의 경우엔 10）
 17 |     activation : 활성화 함수 - 'relu' 혹은 'sigmoid'
 18 |     weight_init_std : 가중치의 표준편차 지정（e.g. 0.01）
 19 |         'relu'나 'he'로 지정하면 'He 초깃값'으로 설정
 20 |         'sigmoid'나 'xavier'로 지정하면 'Xavier 초깃값'으로 설정
 21 |     weight_decay_lambda : 가중치 감소(L2 법칙)의 세기
 22 |     """
 23 |     def __init__(self, input_size, hidden_size_list, output_size,
 24 |                  activation='relu', weight_init_std='relu', weight_decay_lambda=0):
 25 |         self.input_size = input_size
 26 |         self.output_size = output_size
 27 |         self.hidden_size_list = hidden_size_list
 28 |         self.hidden_layer_num = len(hidden_size_list)
 29 |         self.weight_decay_lambda = weight_decay_lambda
 30 |         self.params = {}
 31 | 
 32 |         # 가중치 초기화
 33 |         self.__init_weight(weight_init_std)
 34 | 
 35 |         # 계층 생성
 36 |         activation_layer = {'sigmoid': Sigmoid, 'relu': Relu}
 37 |         self.layers = OrderedDict()
 38 |         for idx in range(1, self.hidden_layer_num+1):
 39 |             self.layers['Affine' + str(idx)] = Affine(self.params['W' + str(idx)],
 40 |                                                       self.params['b' + str(idx)])
 41 |             self.layers['Activation_function' + str(idx)] = activation_layer[activation]()
 42 | 
 43 |         idx = self.hidden_layer_num + 1
 44 |         self.layers['Affine' + str(idx)] = Affine(self.params['W' + str(idx)],
 45 |             self.params['b' + str(idx)])
 46 | 
 47 |         self.last_layer = SoftmaxWithLoss()
 48 | 
 49 |     def __init_weight(self, weight_init_std):
 50 |         """가중치 초기화
 51 |         
 52 |         Parameters
 53 |         ----------
 54 |         weight_init_std : 가중치의 표준편차 지정（e.g. 0.01）
 55 |             'relu'나 'he'로 지정하면 'He 초깃값'으로 설정
 56 |             'sigmoid'나 'xavier'로 지정하면 'Xavier 초깃값'으로 설정
 57 |         """
 58 |         all_size_list = [self.input_size] + self.hidden_size_list + [self.output_size]
 59 |         for idx in range(1, len(all_size_list)):
 60 |             scale = weight_init_std
 61 |             if str(weight_init_std).lower() in ('relu', 'he'):
 62 |                 scale = np.sqrt(2.0 / all_size_list[idx - 1])  # ReLU를 사용할 때의 권장 초깃값
 63 |             elif str(weight_init_std).lower() in ('sigmoid', 'xavier'):
 64 |                 scale = np.sqrt(1.0 / all_size_list[idx - 1])  # sigmoid를 사용할 때의 권장 초깃값
 65 |             self.params['W' + str(idx)] = scale * np.random.randn(all_size_list[idx-1], all_size_list[idx])
 66 |             self.params['b' + str(idx)] = np.zeros(all_size_list[idx])
 67 | 
 68 |     def predict(self, x):
 69 |         for layer in self.layers.values():
 70 |             x = layer.forward(x)
 71 | 
 72 |         return x
 73 | 
 74 |     def loss(self, x, t):
 75 |         """손실 함수를 구한다.
 76 |         
 77 |         Parameters
 78 |         ----------
 79 |         x : 입력 데이터
 80 |         t : 정답 레이블 
 81 |         
 82 |         Returns
 83 |         -------
 84 |         손실 함수의 값
 85 |         """
 86 |         y = self.predict(x)
 87 | 
 88 |         weight_decay = 0
 89 |         for idx in range(1, self.hidden_layer_num + 2):
 90 |             W = self.params['W' + str(idx)]
 91 |             weight_decay += 0.5 * self.weight_decay_lambda * np.sum(W ** 2)
 92 | 
 93 |         return self.last_layer.forward(y, t) + weight_decay
 94 | 
 95 |     def accuracy(self, x, t):
 96 |         y = self.predict(x)
 97 |         y = np.argmax(y, axis=1)
 98 |         if t.ndim != 1 : t = np.argmax(t, axis=1)
 99 | 
100 |         accuracy = np.sum(y == t) / float(x.shape[0])
101 |         return accuracy
102 | 
103 |     def numerical_gradient(self, x, t):
104 |         """기울기를 구한다(수치 미분).
105 |         
106 |         Parameters
107 |         ----------
108 |         x : 입력 데이터
109 |         t : 정답 레이블
110 |         
111 |         Returns
112 |         -------
113 |         각 층의 기울기를 담은 딕셔너리(dictionary) 변수
114 |             grads['W1']、grads['W2']、... 각 층의 가중치
115 |             grads['b1']、grads['b2']、... 각 층의 편향
116 |         """
117 |         loss_W = lambda W: self.loss(x, t)
118 | 
119 |         grads = {}
120 |         for idx in range(1, self.hidden_layer_num+2):
121 |             grads['W' + str(idx)] = numerical_gradient(loss_W, self.params['W' + str(idx)])
122 |             grads['b' + str(idx)] = numerical_gradient(loss_W, self.params['b' + str(idx)])
123 | 
124 |         return grads
125 | 
126 |     def gradient(self, x, t):
127 |         """기울기를 구한다(오차역전파법).
128 |         Parameters
129 |         ----------
130 |         x : 입력 데이터
131 |         t : 정답 레이블
132 |         
133 |         Returns
134 |         -------
135 |         각 층의 기울기를 담은 딕셔너리(dictionary) 변수
136 |             grads['W1']、grads['W2']、... 각 층의 가중치
137 |             grads['b1']、grads['b2']、... 각 층의 편향
138 |         """
139 |         # forward
140 |         self.loss(x, t)
141 | 
142 |         # backward
143 |         dout = 1
144 |         dout = self.last_layer.backward(dout)
145 | 
146 |         layers = list(self.layers.values())
147 |         layers.reverse()
148 |         for layer in layers:
149 |             dout = layer.backward(dout)
150 | 
151 |         # 결과 저장
152 |         grads = {}
153 |         for idx in range(1, self.hidden_layer_num+2):
154 |             grads['W' + str(idx)] = self.layers['Affine' + str(idx)].dW + self.weight_decay_lambda * self.layers['Affine' + str(idx)].W
155 |             grads['b' + str(idx)] = self.layers['Affine' + str(idx)].db
156 | 
157 |         return grads


--------------------------------------------------------------------------------
/common/multi_layer_net_extend.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | from collections import OrderedDict
  3 | from common.layers import *
  4 | from common.gradient import numerical_gradient
  5 | 
  6 | class MultiLayerNetExtend:
  7 |     """완전 연결 다층 신경망(확장판)
  8 |     가중치 감소, 드롭아웃, 배치 정규화 구현
  9 |     Parameters
 10 |     ----------
 11 |     input_size : 입력 크기（MNIST의 경우엔 784）
 12 |     hidden_size_list : 각 은닉층의 뉴런 수를 담은 리스트（e.g. [100, 100, 100]）
 13 |     output_size : 출력 크기（MNIST의 경우엔 10）
 14 |     activation : 활성화 함수 - 'relu' 혹은 'sigmoid'
 15 |     weight_init_std : 가중치의 표준편차 지정（e.g. 0.01）
 16 |         'relu'나 'he'로 지정하면 'He 초깃값'으로 설정
 17 |         'sigmoid'나 'xavier'로 지정하면 'Xavier 초깃값'으로 설정
 18 |     weight_decay_lambda : 가중치 감소(L2 법칙)의 세기
 19 |     use_dropout : 드롭아웃 사용 여부
 20 |     dropout_ration : 드롭아웃 비율
 21 |     use_batchNorm : 배치 정규화 사용 여부
 22 |     """
 23 |     def __init__(self, input_size, hidden_size_list, output_size,
 24 |                  activation='relu', weight_init_std='relu', weight_decay_lambda=0, 
 25 |                  use_dropout = False, dropout_ration = 0.5, use_batchnorm=False):
 26 |         self.input_size = input_size
 27 |         self.output_size = output_size
 28 |         self.hidden_size_list = hidden_size_list
 29 |         self.hidden_layer_num = len(hidden_size_list)
 30 |         self.use_dropout = use_dropout
 31 |         self.weight_decay_lambda = weight_decay_lambda
 32 |         self.use_batchnorm = use_batchnorm
 33 |         self.params = {}
 34 | 
 35 |         # 가중치 초기화
 36 |         self.__init_weight(weight_init_std)
 37 | 
 38 |         # 계층 생성
 39 |         activation_layer = {'sigmoid': Sigmoid, 'relu': Relu}
 40 |         self.layers = OrderedDict()
 41 |         for idx in range(1, self.hidden_layer_num+1):
 42 |             self.layers['Affine' + str(idx)] = Affine(self.params['W' + str(idx)],
 43 |                                                       self.params['b' + str(idx)])
 44 |             if self.use_batchnorm:
 45 |                 self.params['gamma' + str(idx)] = np.ones(hidden_size_list[idx-1])
 46 |                 self.params['beta' + str(idx)] = np.zeros(hidden_size_list[idx-1])
 47 |                 self.layers['BatchNorm' + str(idx)] = BatchNormalization(self.params['gamma' + str(idx)], self.params['beta' + str(idx)])
 48 |                 
 49 |             self.layers['Activation_function' + str(idx)] = activation_layer[activation]()
 50 |             
 51 |             if self.use_dropout:
 52 |                 self.layers['Dropout' + str(idx)] = Dropout(dropout_ration)
 53 | 
 54 |         idx = self.hidden_layer_num + 1
 55 |         self.layers['Affine' + str(idx)] = Affine(self.params['W' + str(idx)], self.params['b' + str(idx)])
 56 | 
 57 |         self.last_layer = SoftmaxWithLoss()
 58 | 
 59 |     def __init_weight(self, weight_init_std):
 60 |         """가중치 초기화
 61 |         
 62 |         Parameters
 63 |         ----------
 64 |         weight_init_std : 가중치의 표준편차 지정（e.g. 0.01）
 65 |             'relu'나 'he'로 지정하면 'He 초깃값'으로 설정
 66 |             'sigmoid'나 'xavier'로 지정하면 'Xavier 초깃값'으로 설정
 67 |         """
 68 |         all_size_list = [self.input_size] + self.hidden_size_list + [self.output_size]
 69 |         for idx in range(1, len(all_size_list)):
 70 |             scale = weight_init_std
 71 |             if str(weight_init_std).lower() in ('relu', 'he'):
 72 |                 scale = np.sqrt(2.0 / all_size_list[idx - 1])  # ReLUを使う場合に推奨される初期値
 73 |             elif str(weight_init_std).lower() in ('sigmoid', 'xavier'):
 74 |                 scale = np.sqrt(1.0 / all_size_list[idx - 1])  # sigmoidを使う場合に推奨される初期値
 75 |             self.params['W' + str(idx)] = scale * np.random.randn(all_size_list[idx-1], all_size_list[idx])
 76 |             self.params['b' + str(idx)] = np.zeros(all_size_list[idx])
 77 | 
 78 |     def predict(self, x, train_flg=False):
 79 |         for key, layer in self.layers.items():
 80 |             if "Dropout" in key or "BatchNorm" in key:
 81 |                 x = layer.forward(x, train_flg)
 82 |             else:
 83 |                 x = layer.forward(x)
 84 | 
 85 |         return x
 86 | 
 87 |     def loss(self, x, t, train_flg=False):
 88 |         """손실 함수를 구한다.
 89 |         
 90 |         Parameters
 91 |         ----------
 92 |         x : 입력 데이터
 93 |         t : 정답 레이블 
 94 |         """
 95 |         y = self.predict(x, train_flg)
 96 | 
 97 |         weight_decay = 0
 98 |         for idx in range(1, self.hidden_layer_num + 2):
 99 |             W = self.params['W' + str(idx)]
100 |             weight_decay += 0.5 * self.weight_decay_lambda * np.sum(W**2)
101 | 
102 |         return self.last_layer.forward(y, t) + weight_decay
103 | 
104 |     def accuracy(self, X, T):
105 |         Y = self.predict(X, train_flg=False)
106 |         Y = np.argmax(Y, axis=1)
107 |         if T.ndim != 1 : T = np.argmax(T, axis=1)
108 | 
109 |         accuracy = np.sum(Y == T) / float(X.shape[0])
110 |         return accuracy
111 | 
112 |     def numerical_gradient(self, X, T):
113 |         """기울기를 구한다(수치 미분).
114 |         
115 |         Parameters
116 |         ----------
117 |         x : 입력 데이터
118 |         t : 정답 레이블
119 |         
120 |         Returns
121 |         -------
122 |         각 층의 기울기를 담은 사전(dictionary) 변수
123 |             grads['W1']、grads['W2']、... 각 층의 가중치
124 |             grads['b1']、grads['b2']、... 각 층의 편향
125 |         """
126 |         loss_W = lambda W: self.loss(X, T, train_flg=True)
127 | 
128 |         grads = {}
129 |         for idx in range(1, self.hidden_layer_num+2):
130 |             grads['W' + str(idx)] = numerical_gradient(loss_W, self.params['W' + str(idx)])
131 |             grads['b' + str(idx)] = numerical_gradient(loss_W, self.params['b' + str(idx)])
132 |             
133 |             if self.use_batchnorm and idx != self.hidden_layer_num+1:
134 |                 grads['gamma' + str(idx)] = numerical_gradient(loss_W, self.params['gamma' + str(idx)])
135 |                 grads['beta' + str(idx)] = numerical_gradient(loss_W, self.params['beta' + str(idx)])
136 | 
137 |         return grads
138 |         
139 |     def gradient(self, x, t):
140 |         # forward
141 |         self.loss(x, t, train_flg=True)
142 | 
143 |         # backward
144 |         dout = 1
145 |         dout = self.last_layer.backward(dout)
146 | 
147 |         layers = list(self.layers.values())
148 |         layers.reverse()
149 |         for layer in layers:
150 |             dout = layer.backward(dout)
151 | 
152 |         # 결과 저장
153 |         grads = {}
154 |         for idx in range(1, self.hidden_layer_num+2):
155 |             grads['W' + str(idx)] = self.layers['Affine' + str(idx)].dW + self.weight_decay_lambda * self.params['W' + str(idx)]
156 |             grads['b' + str(idx)] = self.layers['Affine' + str(idx)].db
157 | 
158 |             if self.use_batchnorm and idx != self.hidden_layer_num+1:
159 |                 grads['gamma' + str(idx)] = self.layers['BatchNorm' + str(idx)].dgamma
160 |                 grads['beta' + str(idx)] = self.layers['BatchNorm' + str(idx)].dbeta
161 | 
162 |         return grads


--------------------------------------------------------------------------------
/common/optimizer.py:
--------------------------------------------------------------------------------
  1 | # coding: utf-8
  2 | import numpy as np
  3 | 
  4 | class SGD:
  5 | 
  6 |     """확률적 경사 하강법（Stochastic Gradient Descent）"""
  7 | 
  8 |     def __init__(self, lr=0.01):
  9 |         self.lr = lr
 10 |         
 11 |     def update(self, params, grads):
 12 |         for key in params.keys():
 13 |             params[key] -= self.lr * grads[key] 
 14 | 
 15 | 
 16 | class Momentum:
 17 | 
 18 |     """모멘텀 SGD"""
 19 | 
 20 |     def __init__(self, lr=0.01, momentum=0.9):
 21 |         self.lr = lr
 22 |         self.momentum = momentum
 23 |         self.v = None
 24 |         
 25 |     def update(self, params, grads):
 26 |         if self.v is None:
 27 |             self.v = {}
 28 |             for key, val in params.items():                                
 29 |                 self.v[key] = np.zeros_like(val)
 30 |                 
 31 |         for key in params.keys():
 32 |             self.v[key] = self.momentum*self.v[key] - self.lr*grads[key] 
 33 |             params[key] += self.v[key]
 34 | 
 35 | 
 36 | class Nesterov:
 37 | 
 38 |     """Nesterov's Accelerated Gradient (http://arxiv.org/abs/1212.0901)"""
 39 |     # NAG는 모멘텀에서 한 단계 발전한 방법이다. (http://newsight.tistory.com/224)
 40 |     
 41 |     def __init__(self, lr=0.01, momentum=0.9):
 42 |         self.lr = lr
 43 |         self.momentum = momentum
 44 |         self.v = None
 45 |         
 46 |     def update(self, params, grads):
 47 |         if self.v is None:
 48 |             self.v = {}
 49 |             for key, val in params.items():
 50 |                 self.v[key] = np.zeros_like(val)
 51 |             
 52 |         for key in params.keys():
 53 |             self.v[key] *= self.momentum
 54 |             self.v[key] -= self.lr * grads[key]
 55 |             params[key] += self.momentum * self.momentum * self.v[key]
 56 |             params[key] -= (1 + self.momentum) * self.lr * grads[key]
 57 | 
 58 | 
 59 | class AdaGrad:
 60 | 
 61 |     """AdaGrad"""
 62 | 
 63 |     def __init__(self, lr=0.01):
 64 |         self.lr = lr
 65 |         self.h = None
 66 |         
 67 |     def update(self, params, grads):
 68 |         if self.h is None:
 69 |             self.h = {}
 70 |             for key, val in params.items():
 71 |                 self.h[key] = np.zeros_like(val)
 72 |             
 73 |         for key in params.keys():
 74 |             self.h[key] += grads[key] * grads[key]
 75 |             params[key] -= self.lr * grads[key] / (np.sqrt(self.h[key]) + 1e-7)
 76 | 
 77 | 
 78 | class RMSprop:
 79 | 
 80 |     """RMSprop"""
 81 | 
 82 |     def __init__(self, lr=0.01, decay_rate = 0.99):
 83 |         self.lr = lr
 84 |         self.decay_rate = decay_rate
 85 |         self.h = None
 86 |         
 87 |     def update(self, params, grads):
 88 |         if self.h is None:
 89 |             self.h = {}
 90 |             for key, val in params.items():
 91 |                 self.h[key] = np.zeros_like(val)
 92 |             
 93 |         for key in params.keys():
 94 |             self.h[key] *= self.decay_rate
 95 |             self.h[key] += (1 - self.decay_rate) * grads[key] * grads[key]
 96 |             params[key] -= self.lr * grads[key] / (np.sqrt(self.h[key]) + 1e-7)
 97 | 
 98 | 
 99 | class Adam:
100 | 
101 |     """Adam (http://arxiv.org/abs/1412.6980v8)"""
102 | 
103 |     def __init__(self, lr=0.001, beta1=0.9, beta2=0.999):
104 |         self.lr = lr
105 |         self.beta1 = beta1
106 |         self.beta2 = beta2
107 |         self.iter = 0
108 |         self.m = None
109 |         self.v = None
110 |         
111 |     def update(self, params, grads):
112 |         if self.m is None:
113 |             self.m, self.v = {}, {}
114 |             for key, val in params.items():
115 |                 self.m[key] = np.zeros_like(val)
116 |                 self.v[key] = np.zeros_like(val)
117 |         
118 |         self.iter += 1
119 |         lr_t  = self.lr * np.sqrt(1.0 - self.beta2**self.iter) / (1.0 - self.beta1**self.iter)         
120 |         
121 |         for key in params.keys():
122 |             #self.m[key] = self.beta1*self.m[key] + (1-self.beta1)*grads[key]
123 |             #self.v[key] = self.beta2*self.v[key] + (1-self.beta2)*(grads[key]**2)
124 |             self.m[key] += (1 - self.beta1) * (grads[key] - self.m[key])
125 |             self.v[key] += (1 - self.beta2) * (grads[key]**2 - self.v[key])
126 |             
127 |             params[key] -= lr_t * self.m[key] / (np.sqrt(self.v[key]) + 1e-7)
128 |             
129 |             #unbias_m += (1 - self.beta1) * (grads[key] - self.m[key]) # correct bias
130 |             #unbisa_b += (1 - self.beta2) * (grads[key]*grads[key] - self.v[key]) # correct bias
131 |             #params[key] += self.lr * unbias_m / (np.sqrt(unbisa_b) + 1e-7)


--------------------------------------------------------------------------------
/common/trainer.py:
--------------------------------------------------------------------------------
 1 | # coding: utf-8
 2 | import sys, os
 3 | sys.path.append(os.pardir)  # 부모 디렉터리의 파일을 가져올 수 있도록 설정
 4 | import numpy as np
 5 | from common.optimizer import *
 6 | 
 7 | class Trainer:
 8 |     """신경망 훈련을 대신 해주는 클래스
 9 |     """
10 |     def __init__(self, network, x_train, t_train, x_test, t_test,
11 |                  epochs=20, mini_batch_size=100,
12 |                  optimizer='SGD', optimizer_param={'lr':0.01}, 
13 |                  evaluate_sample_num_per_epoch=None, verbose=True):
14 |         self.network = network
15 |         self.verbose = verbose
16 |         self.x_train = x_train
17 |         self.t_train = t_train
18 |         self.x_test = x_test
19 |         self.t_test = t_test
20 |         self.epochs = epochs
21 |         self.batch_size = mini_batch_size
22 |         self.evaluate_sample_num_per_epoch = evaluate_sample_num_per_epoch
23 | 
24 |         # optimzer
25 |         optimizer_class_dict = {'sgd':SGD, 'momentum':Momentum, 'nesterov':Nesterov,
26 |                                 'adagrad':AdaGrad, 'rmsprpo':RMSprop, 'adam':Adam}
27 |         self.optimizer = optimizer_class_dict[optimizer.lower()](**optimizer_param)
28 |         
29 |         self.train_size = x_train.shape[0]
30 |         self.iter_per_epoch = max(self.train_size / mini_batch_size, 1)
31 |         self.max_iter = int(epochs * self.iter_per_epoch)
32 |         self.current_iter = 0
33 |         self.current_epoch = 0
34 |         
35 |         self.train_loss_list = []
36 |         self.train_acc_list = []
37 |         self.test_acc_list = []
38 | 
39 |     def train_step(self):
40 |         batch_mask = np.random.choice(self.train_size, self.batch_size)
41 |         x_batch = self.x_train[batch_mask]
42 |         t_batch = self.t_train[batch_mask]
43 |         
44 |         grads = self.network.gradient(x_batch, t_batch)
45 |         self.optimizer.update(self.network.params, grads)
46 |         
47 |         loss = self.network.loss(x_batch, t_batch)
48 |         self.train_loss_list.append(loss)
49 |         if self.verbose: print("train loss:" + str(loss))
50 |         
51 |         if self.current_iter % self.iter_per_epoch == 0:
52 |             self.current_epoch += 1
53 |             
54 |             x_train_sample, t_train_sample = self.x_train, self.t_train
55 |             x_test_sample, t_test_sample = self.x_test, self.t_test
56 |             if not self.evaluate_sample_num_per_epoch is None:
57 |                 t = self.evaluate_sample_num_per_epoch
58 |                 x_train_sample, t_train_sample = self.x_train[:t], self.t_train[:t]
59 |                 x_test_sample, t_test_sample = self.x_test[:t], self.t_test[:t]
60 |                 
61 |             train_acc = self.network.accuracy(x_train_sample, t_train_sample)
62 |             test_acc = self.network.accuracy(x_test_sample, t_test_sample)
63 |             self.train_acc_list.append(train_acc)
64 |             self.test_acc_list.append(test_acc)
65 | 
66 |             if self.verbose: print("=== epoch:" + str(self.current_epoch) + ", train acc:" + str(train_acc) + ", test acc:" + str(test_acc) + " ===")
67 |         self.current_iter += 1
68 | 
69 |     def train(self):
70 |         for i in range(self.max_iter):
71 |             self.train_step()
72 | 
73 |         test_acc = self.network.accuracy(self.x_test, self.t_test)
74 | 
75 |         if self.verbose:
76 |             print("=============== Final Test Accuracy ===============")
77 |             print("test acc:" + str(test_acc))


--------------------------------------------------------------------------------
/common/util.py:
--------------------------------------------------------------------------------
 1 | # coding: utf-8
 2 | import numpy as np
 3 | 
 4 | 
 5 | def smooth_curve(x):
 6 |     """손실 함수의 그래프를 매끄럽게 하기 위해 사용
 7 |     
 8 |     참고：http://glowingpython.blogspot.jp/2012/02/convolution-with-numpy.html
 9 |     """
10 |     window_len = 11
11 |     s = np.r_[x[window_len-1:0:-1], x, x[-1:-window_len:-1]]
12 |     w = np.kaiser(window_len, 2)
13 |     y = np.convolve(w/w.sum(), s, mode='valid')
14 |     return y[5:len(y)-5]
15 | 
16 | 
17 | def shuffle_dataset(x, t):
18 |     """데이터셋을 뒤섞는다.
19 |     Parameters
20 |     ----------
21 |     x : 훈련 데이터
22 |     t : 정답 레이블
23 |     
24 |     Returns
25 |     -------
26 |     x, t : 뒤섞은 훈련 데이터와 정답 레이블
27 |     """
28 |     permutation = np.random.permutation(x.shape[0])
29 |     x = x[permutation,:] if x.ndim == 2 else x[permutation,:,:,:]
30 |     t = t[permutation]
31 | 
32 |     return x, t
33 | 
34 | def conv_output_size(input_size, filter_size, stride=1, pad=0):
35 |     return (input_size + 2*pad - filter_size) / stride + 1
36 | 
37 | 
38 | def im2col(input_data, filter_h, filter_w, stride=1, pad=0):
39 |     """다수의 이미지를 입력받아 2차원 배열로 변환한다(평탄화).
40 |     
41 |     Parameters
42 |     ----------
43 |     input_data : 4차원 배열 형태의 입력 데이터(이미지 수, 채널 수, 높이, 너비)
44 |     filter_h : 필터의 높이
45 |     filter_w : 필터의 너비
46 |     stride : 스트라이드
47 |     pad : 패딩
48 |     
49 |     Returns
50 |     -------
51 |     col : 2차원 배열
52 |     """
53 |     N, C, H, W = input_data.shape
54 |     out_h = (H + 2*pad - filter_h)//stride + 1
55 |     out_w = (W + 2*pad - filter_w)//stride + 1
56 | 
57 |     img = np.pad(input_data, [(0,0), (0,0), (pad, pad), (pad, pad)], 'constant')
58 |     col = np.zeros((N, C, filter_h, filter_w, out_h, out_w))
59 | 
60 |     for y in range(filter_h):
61 |         y_max = y + stride*out_h
62 |         for x in range(filter_w):
63 |             x_max = x + stride*out_w
64 |             col[:, :, y, x, :, :] = img[:, :, y:y_max:stride, x:x_max:stride]
65 | 
66 |     col = col.transpose(0, 4, 5, 1, 2, 3).reshape(N*out_h*out_w, -1)
67 |     return col
68 | 
69 | 
70 | def col2im(col, input_shape, filter_h, filter_w, stride=1, pad=0):
71 |     """(im2col과 반대) 2차원 배열을 입력받아 다수의 이미지 묶음으로 변환한다.
72 |     
73 |     Parameters
74 |     ----------
75 |     col : 2차원 배열(입력 데이터)
76 |     input_shape : 원래 이미지 데이터의 형상（예：(10, 1, 28, 28)）
77 |     filter_h : 필터의 높이
78 |     filter_w : 필터의 너비
79 |     stride : 스트라이드
80 |     pad : 패딩
81 |     
82 |     Returns
83 |     -------
84 |     img : 변환된 이미지들
85 |     """
86 |     N, C, H, W = input_shape
87 |     out_h = (H + 2*pad - filter_h)//stride + 1
88 |     out_w = (W + 2*pad - filter_w)//stride + 1
89 |     col = col.reshape(N, out_h, out_w, C, filter_h, filter_w).transpose(0, 3, 4, 5, 1, 2)
90 | 
91 |     img = np.zeros((N, C, H + 2*pad + stride - 1, W + 2*pad + stride - 1))
92 |     for y in range(filter_h):
93 |         y_max = y + stride*out_h
94 |         for x in range(filter_w):
95 |             x_max = x + stride*out_w
96 |             img[:, :, y:y_max:stride, x:x_max:stride] += col[:, :, y, x, :, :]
97 | 
98 |     return img[:, :, pad:H + pad, pad:W + pad]


--------------------------------------------------------------------------------
/dataset/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SDRLurker/deep-learning/f95b0a2c7c4ccec3c7395ad5a648c7664b168866/dataset/__init__.py


--------------------------------------------------------------------------------
/dataset/mnist.py:
--------------------------------------------------------------------------------
  1 | # coding: utf-8
  2 | try:
  3 |     import urllib.request
  4 | except ImportError:
  5 |     raise ImportError('You should use Python 3.x')
  6 | import os.path
  7 | import gzip
  8 | import pickle
  9 | import os
 10 | import numpy as np
 11 | 
 12 | 
 13 | url_base = 'http://yann.lecun.com/exdb/mnist/'
 14 | key_file = {
 15 |     'train_img':'train-images-idx3-ubyte.gz',
 16 |     'train_label':'train-labels-idx1-ubyte.gz',
 17 |     'test_img':'t10k-images-idx3-ubyte.gz',
 18 |     'test_label':'t10k-labels-idx1-ubyte.gz'
 19 | }
 20 | 
 21 | dataset_dir = os.path.dirname(os.path.abspath(__file__))
 22 | save_file = dataset_dir + "/mnist.pkl"
 23 | 
 24 | train_num = 60000
 25 | test_num = 10000
 26 | img_dim = (1, 28, 28)
 27 | img_size = 784
 28 | 
 29 | 
 30 | def _download(file_name):
 31 |     file_path = dataset_dir + "/" + file_name
 32 |     
 33 |     if os.path.exists(file_path):
 34 |         return
 35 | 
 36 |     print("Downloading " + file_name + " ... ")
 37 |     urllib.request.urlretrieve(url_base + file_name, file_path)
 38 |     print("Done")
 39 |     
 40 | def download_mnist():
 41 |     for v in key_file.values():
 42 |        _download(v)
 43 |         
 44 | def _load_label(file_name):
 45 |     file_path = dataset_dir + "/" + file_name
 46 |     
 47 |     print("Converting " + file_name + " to NumPy Array ...")
 48 |     with gzip.open(file_path, 'rb') as f:
 49 |             labels = np.frombuffer(f.read(), np.uint8, offset=8)
 50 |     print("Done")
 51 |     
 52 |     return labels
 53 | 
 54 | def _load_img(file_name):
 55 |     file_path = dataset_dir + "/" + file_name
 56 |     
 57 |     print("Converting " + file_name + " to NumPy Array ...")    
 58 |     with gzip.open(file_path, 'rb') as f:
 59 |             data = np.frombuffer(f.read(), np.uint8, offset=16)
 60 |     data = data.reshape(-1, img_size)
 61 |     print("Done")
 62 |     
 63 |     return data
 64 |     
 65 | def _convert_numpy():
 66 |     dataset = {}
 67 |     dataset['train_img'] =  _load_img(key_file['train_img'])
 68 |     dataset['train_label'] = _load_label(key_file['train_label'])    
 69 |     dataset['test_img'] = _load_img(key_file['test_img'])
 70 |     dataset['test_label'] = _load_label(key_file['test_label'])
 71 |     
 72 |     return dataset
 73 | 
 74 | def init_mnist():
 75 |     download_mnist()
 76 |     dataset = _convert_numpy()
 77 |     print("Creating pickle file ...")
 78 |     with open(save_file, 'wb') as f:
 79 |         pickle.dump(dataset, f, -1)
 80 |     print("Done!")
 81 | 
 82 | def _change_ont_hot_label(X):
 83 |     T = np.zeros((X.size, 10))
 84 |     for idx, row in enumerate(T):
 85 |         row[X[idx]] = 1
 86 |         
 87 |     return T
 88 |     
 89 | 
 90 | def load_mnist(normalize=True, flatten=True, one_hot_label=False):
 91 |     """MNISTデータセットの読み込み
 92 |     
 93 |     Parameters
 94 |     ----------
 95 |     normalize : 画像のピクセル値を0.0~1.0に正規化する
 96 |     one_hot_label : 
 97 |         one_hot_labelがTrueの場合、ラベルはone-hot配列として返す
 98 |         one-hot配列とは、たとえば[0,0,1,0,0,0,0,0,0,0]のような配列
 99 |     flatten : 画像を一次元配列に平にするかどうか 
100 |     
101 |     Returns
102 |     -------
103 |     (訓練画像, 訓練ラベル), (テスト画像, テストラベル)
104 |     """
105 |     if not os.path.exists(save_file):
106 |         init_mnist()
107 |         
108 |     with open(save_file, 'rb') as f:
109 |         dataset = pickle.load(f)
110 |     
111 |     if normalize:
112 |         for key in ('train_img', 'test_img'):
113 |             dataset[key] = dataset[key].astype(np.float32)
114 |             dataset[key] /= 255.0
115 |             
116 |     if one_hot_label:
117 |         dataset['train_label'] = _change_ont_hot_label(dataset['train_label'])
118 |         dataset['test_label'] = _change_ont_hot_label(dataset['test_label'])    
119 |     
120 |     if not flatten:
121 |          for key in ('train_img', 'test_img'):
122 |             dataset[key] = dataset[key].reshape(-1, 1, 28, 28)
123 | 
124 |     return (dataset['train_img'], dataset['train_label']), (dataset['test_img'], dataset['test_label']) 
125 | 
126 | 
127 | if __name__ == '__main__':
128 |     init_mnist()
129 | 


--------------------------------------------------------------------------------
/decision.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SDRLurker/deep-learning/f95b0a2c7c4ccec3c7395ad5a648c7664b168866/decision.png


--------------------------------------------------------------------------------
/gates.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SDRLurker/deep-learning/f95b0a2c7c4ccec3c7395ad5a648c7664b168866/gates.jpg


--------------------------------------------------------------------------------
/layers.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SDRLurker/deep-learning/f95b0a2c7c4ccec3c7395ad5a648c7664b168866/layers.png


--------------------------------------------------------------------------------
/lena.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SDRLurker/deep-learning/f95b0a2c7c4ccec3c7395ad5a648c7664b168866/lena.png


--------------------------------------------------------------------------------
/neurons.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SDRLurker/deep-learning/f95b0a2c7c4ccec3c7395ad5a648c7664b168866/neurons.png


--------------------------------------------------------------------------------
/perceptron.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SDRLurker/deep-learning/f95b0a2c7c4ccec3c7395ad5a648c7664b168866/perceptron.png


--------------------------------------------------------------------------------
/sample_weight.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SDRLurker/deep-learning/f95b0a2c7c4ccec3c7395ad5a648c7664b168866/sample_weight.pkl


--------------------------------------------------------------------------------
/xor.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SDRLurker/deep-learning/f95b0a2c7c4ccec3c7395ad5a648c7664b168866/xor.png


--------------------------------------------------------------------------------
/목차.ipynb:
--------------------------------------------------------------------------------
 1 | {
 2 |  "cells": [
 3 |   {
 4 |    "cell_type": "markdown",
 5 |    "metadata": {},
 6 |    "source": [
 7 |     "# 밑바닥부터 시작하는 딥러닝\n",
 8 |     "\n",
 9 |     "# Deep Learning from Scratch\n",
10 |     "\n",
11 |     "## Github \n",
12 |     "\n",
13 |     "https://github.com/WegraLee/deep-learning-from-scratch\n",
14 |     "\n",
15 |     "## 책주소 \n",
16 |     "\n",
17 |     "http://www.hanbit.co.kr/store/books/look.php?p_code=B8475831198\n",
18 |     "\n",
19 |     "![title](http://www.hanbit.co.kr/data/books/B8475831198_l.jpg)\n",
20 |     "\n",
21 |     "## 1장\n",
22 |     "\n",
23 |     "http://nbviewer.jupyter.org/github/SDRLurker/deep-learning/blob/master/1장.ipynb\n",
24 |     "\n",
25 |     "## 2장\n",
26 |     "\n",
27 |     "http://nbviewer.jupyter.org/github/SDRLurker/deep-learning/blob/master/2장.ipynb\n",
28 |     "\n",
29 |     "## 3장\n",
30 |     "\n",
31 |     "http://nbviewer.jupyter.org/github/SDRLurker/deep-learning/blob/master/3장.ipynb\n",
32 |     "\n",
33 |     "## 4장\n",
34 |     "\n",
35 |     "http://nbviewer.jupyter.org/github/SDRLurker/deep-learning/blob/master/4장.ipynb\n",
36 |     "\n",
37 |     "## 5장\n",
38 |     "\n",
39 |     "http://nbviewer.jupyter.org/github/SDRLurker/deep-learning/blob/master/5장.ipynb\n",
40 |     "\n",
41 |     "## 6장\n",
42 |     "\n",
43 |     "http://nbviewer.jupyter.org/github/SDRLurker/deep-learning/blob/master/6장.ipynb"
44 |    ]
45 |   },
46 |   {
47 |    "cell_type": "code",
48 |    "execution_count": null,
49 |    "metadata": {
50 |     "collapsed": true
51 |    },
52 |    "outputs": [],
53 |    "source": []
54 |   }
55 |  ],
56 |  "metadata": {
57 |   "anaconda-cloud": {},
58 |   "kernelspec": {
59 |    "display_name": "Python [Root]",
60 |    "language": "python",
61 |    "name": "Python [Root]"
62 |   },
63 |   "language_info": {
64 |    "codemirror_mode": {
65 |     "name": "ipython",
66 |     "version": 3
67 |    },
68 |    "file_extension": ".py",
69 |    "mimetype": "text/x-python",
70 |    "name": "python",
71 |    "nbconvert_exporter": "python",
72 |    "pygments_lexer": "ipython3",
73 |    "version": "3.5.2"
74 |   }
75 |  },
76 |  "nbformat": 4,
77 |  "nbformat_minor": 0
78 | }
79 | 


--------------------------------------------------------------------------------