├── .github └── workflows │ └── pythonpackage.yml ├── .gitignore ├── .vscode └── settings.json ├── Final Presentation.ipynb ├── README.md ├── autodiff ├── .gitignore ├── __init__.py ├── autodiff │ ├── __init__.py │ ├── core.py │ ├── diff.py │ ├── global_vars.py │ ├── graph │ │ ├── __init__.py │ │ ├── manager.py │ │ ├── node.py │ │ └── tracer.py │ └── numpy_grad │ │ ├── .gitignore │ │ ├── __init__.py │ │ ├── vjps.py │ │ └── wrapper.py ├── nn │ ├── criterion.py │ ├── layer.py │ └── optimizer.py ├── tests │ ├── __init__.py │ ├── autodiff_test.py │ └── numpy_test.py └── utils │ ├── model_utils.py │ └── test_utils.py ├── examples ├── __init__.py ├── classification │ ├── Digits.png │ ├── Iris.png │ ├── __init__.py │ ├── digits.py │ └── iris.py ├── regression │ ├── Boston.png │ ├── __init__.py │ ├── boston.py │ ├── regression.png │ └── regression.py └── simple │ ├── __init__.py │ ├── poly.png │ └── poly_test.py ├── img ├── actions.png ├── backward.png └── forward.png ├── proposal.md └── requirements.txt /.github/workflows/pythonpackage.yml: -------------------------------------------------------------------------------- 1 | name: Python package 2 | 3 | on: [push] 4 | 5 | jobs: 6 | build: 7 | 8 | runs-on: ubuntu-latest 9 | strategy: 10 | max-parallel: 4 11 | matrix: 12 | python-version: [3.6, 3.7] 13 | 14 | steps: 15 | - uses: actions/checkout@v1 16 | - name: Set up Python ${{ matrix.python-version }} 17 | uses: actions/setup-python@v1 18 | with: 19 | python-version: ${{ matrix.python-version }} 20 | - name: Install dependencies 21 | run: | 22 | python -m pip install --upgrade pip 23 | pip install -r requirements.txt 24 | - name: Lint with flake8 25 | run: | 26 | pip install flake8 27 | # stop the build if there are Python syntax errors or undefined names 28 | flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics 29 | # exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide 30 | flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics 31 | - name: Test with pytest 32 | run: | 33 | pip install pytest 34 | pytest autodiff/tests 35 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | env/ 2 | __pycache__/ 3 | debug*.txt 4 | .ipynb_checkpoints/ -------------------------------------------------------------------------------- /.vscode/settings.json: -------------------------------------------------------------------------------- 1 | { 2 | "python.pythonPath": "/home/chenyee/anaconda3/bin/python", 3 | "python.formatting.provider": "yapf" 4 | } -------------------------------------------------------------------------------- /Final Presentation.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# A basic neural network library supported auto differentiation" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "## Outline\n", 15 | "\n", 16 | "- Motivation\n", 17 | "- Introduction\n", 18 | "- Implementation\n", 19 | "- Application\n", 20 | "- Software engineering\n", 21 | "- Misc\n", 22 | "\n", 23 | "---" 24 | ] 25 | }, 26 | { 27 | "cell_type": "markdown", 28 | "metadata": {}, 29 | "source": [ 30 | "## Motivation\n", 31 | "\n", 32 | "Most deep learning courses aim to teach math behind the network, architecture and their applications, but seldom course talk about how to implement and design the deep learning library.\n", 33 | "\n", 34 | "### Goal\n", 35 | "\n", 36 | "- Implement this kind of library\n", 37 | "- Learn how and why the priors (Tensorflow and PyTorch etc.) design their work.\n", 38 | "\n", 39 | "---" 40 | ] 41 | }, 42 | { 43 | "cell_type": "markdown", 44 | "metadata": {}, 45 | "source": [ 46 | "## Introduction" 47 | ] 48 | }, 49 | { 50 | "cell_type": "markdown", 51 | "metadata": {}, 52 | "source": [ 53 | "### Computational Graph\n", 54 | "\n", 55 | "We can represent the computations by using computation graph.\n", 56 | "\n", 57 | "The node represent input and operation, and the edge represent the argument of the operation.\n", 58 | "\n", 59 | "For example, we have the computation like $(3-2) + 1$ (which will be visited again), the corresponding graph is\n", 60 | "\n", 61 | "![forward](img/forward.png)\n", 62 | "\n", 63 | "Note that the order of arguments matters (3 should be on top of 2 etc.), but unfortunately the plot toolkits is hard to control that." 64 | ] 65 | }, 66 | { 67 | "cell_type": "markdown", 68 | "metadata": {}, 69 | "source": [ 70 | "### Back propagation\n", 71 | "\n", 72 | "The forward pass builds the graph, and the backward propagation calculate the gradient.\n", 73 | "\n", 74 | "Use the last example we used $(3 - 2) + 1$, but transform it into algebraic way $z = (a - b) + c$.\n", 75 | "\n", 76 | "Suppose we want to compute the gradient of `z` w.r.t. `c`, it's easily obtained by using the chain rule.\n", 77 | "\n", 78 | "$$\\frac{\\partial (a-b) + c}{\\partial c} = 1$$\n", 79 | "\n", 80 | "Now we want gradient w.r.t. `b`, again we use the chain rule.\n", 81 | "\n", 82 | "Before that, we decompose operation into different equation in order to introduce auto differentiation.\n", 83 | "\n", 84 | "$$\n", 85 | "\\begin{aligned}\n", 86 | "z &= (a - b) + c \\\\\n", 87 | "\\Rightarrow z &= d + c\n", 88 | "\\end{aligned}\n", 89 | "$$\n", 90 | "\n", 91 | "$$\n", 92 | "\\begin{aligned}\n", 93 | "\\frac{\\partial (a-b) + c}{\\partial b} &= \\frac{\\partial (d + c)}{\\partial d} \\times \\frac{\\partial d}{\\partial b} \\\\\n", 94 | " &= 1 \\times \\frac{\\partial (a - b)}{\\partial b} \\\\\n", 95 | " &= 1 \\times -1 \\\\\n", 96 | " &= -1\n", 97 | "\\end{aligned} \n", 98 | "$$" 99 | ] 100 | }, 101 | { 102 | "cell_type": "markdown", 103 | "metadata": {}, 104 | "source": [ 105 | "### Auto Differentiation\n", 106 | "\n", 107 | "We visit the last example again, all we need to know to calculate the gradient w.r.t. `d` is\n", 108 | "\n", 109 | "- Current operation node: add\n", 110 | "- Input argument: d and c\n", 111 | "- Previous gradient (upstream): 1 (I didn't say but it is actually 1 in the begining)\n", 112 | "\n", 113 | "$$\\frac{\\partial (d + c)}{\\partial d} = 1$$\n", 114 | "\n", 115 | "We know that `d` is the first argument of `add` operation, so after taking derivative w.r.t `d`, result becomes $1 \\times 1$ and we get 1, note that the first one is the upstream, and the second one is the derivative result.\n", 116 | "\n", 117 | "Now we get the gradient w.r.t. `d`, and because `d` is equal to `(a - b)`, so we pass the computed gradient (downstream) to their parent node (`a` and `b`, respectively), now we move forward (actually it's backward).\n", 118 | "\n", 119 | "Now we move ourselves to node `b` to compute the gradient w.r.t. `b`, and the current information are\n", 120 | "\n", 121 | "- Current operation node: sub\n", 122 | "- Input argument: a and b\n", 123 | "- Previous gradient (upstream): 1\n", 124 | "\n", 125 | "$$ \\frac{\\partial (a - b)}{\\partial b} = -1$$\n", 126 | "\n", 127 | "Because `b` is the second argument of `sub` operation, so the result is $1 \\times -1$ which is -1, and again, first 1 is the upstream, and the second -1 is the derivative result.\n", 128 | "\n", 129 | "The point I want to emphasize is that, it seems easy to get the derivative w.r.t. any variable without any decomposition because it's easy, but what if we want to compute the gradient of following eqation w.r.t. `z` directly in the following equation?\n", 130 | "\n", 131 | "$$y = \\frac{1}{1 + e^{-z}}$$ \n", 132 | "\n", 133 | "The philosophy of auto differntiation is that we don't take derivative directly, we decompose it into different primitive function and solve derivative of each decomposed function and combine together because we have chain rule got our back.\n", 134 | "\n", 135 | "So how does auto-diff solve the previous equation?\n", 136 | "\n", 137 | "It will decompose into\n", 138 | "\n", 139 | "- negative\n", 140 | "- exp\n", 141 | "- add\n", 142 | "- reciprocal\n", 143 | "\n", 144 | "and solve each function previously listed." 145 | ] 146 | }, 147 | { 148 | "cell_type": "markdown", 149 | "metadata": {}, 150 | "source": [ 151 | "### VJP\n", 152 | "\n", 153 | "VJP stands for vector-jacobian product, this is usually the product between upstream and the derivative w.r.t one of the argument of operation. I will not elaborate here and it's highly recommended to take a look at [Automatic Differentiation, Toronto CSC321](http://www.cs.toronto.edu/~rgrosse/courses/csc321_2018/slides/lec10.pdf).\n", 154 | "\n", 155 | "In the much simple way, you can think that what's the computation we want to do to get the downstream w.r.t any argument in any operation node." 156 | ] 157 | }, 158 | { 159 | "cell_type": "markdown", 160 | "metadata": {}, 161 | "source": [ 162 | "## Implementation" 163 | ] 164 | }, 165 | { 166 | "cell_type": "markdown", 167 | "metadata": {}, 168 | "source": [ 169 | "### Forward Propagation\n", 170 | "\n", 171 | "Q: How do we obtain the computational graph?\n", 172 | "\n", 173 | "A: Trace every operation with the wrapped version by decorator." 174 | ] 175 | }, 176 | { 177 | "cell_type": "markdown", 178 | "metadata": {}, 179 | "source": [ 180 | "#### Examples:\n", 181 | "\n", 182 | "That's say we want to print the numpy function name during the calculation.\n", 183 | "\n", 184 | "We can fully leverage the power of decorator." 185 | ] 186 | }, 187 | { 188 | "cell_type": "code", 189 | "execution_count": 1, 190 | "metadata": {}, 191 | "outputs": [ 192 | { 193 | "name": "stdout", 194 | "output_type": "stream", 195 | "text": [ 196 | "Wrapped version of numpy function, and its name is add\n", 197 | "3\n", 198 | "Wrapped version of numpy function, and its name is subtract\n", 199 | "-1\n", 200 | "Wrapped version of numpy function, and its name is __getitem__\n", 201 | "[2]\n", 202 | "Wrapped version of numpy function, and its name is __add__\n", 203 | "[1 1]\n" 204 | ] 205 | } 206 | ], 207 | "source": [ 208 | "%matplotlib inline \n", 209 | "\n", 210 | "import numpy as np\n", 211 | "\n", 212 | "\n", 213 | "def get_name_decorator(func):\n", 214 | " def wrapped(*args, **kwargs):\n", 215 | " print(f\"Wrapped version of numpy function, and its name is {func.__name__}\")\n", 216 | " result = func(*args, **kwargs)\n", 217 | " return result\n", 218 | " return wrapped\n", 219 | " \n", 220 | "\n", 221 | "for function in [np.add, np.subtract, np.ndarray.__getitem__, np.ndarray.__add__]:\n", 222 | " globals()[function.__name__] = get_name_decorator(function)\n", 223 | "\n", 224 | "print(add(1, 2))\n", 225 | "print(subtract(1, 2))\n", 226 | "print(__getitem__(np.array([0,1,2]), [2]))\n", 227 | "print(__add__(np.array([0,1]), np.array([1,0])))" 228 | ] 229 | }, 230 | { 231 | "cell_type": "markdown", 232 | "metadata": {}, 233 | "source": [ 234 | "Now things have been little complicated. Due to the same reason, we can put the operator on the computational graph on the fly." 235 | ] 236 | }, 237 | { 238 | "cell_type": "code", 239 | "execution_count": 2, 240 | "metadata": {}, 241 | "outputs": [], 242 | "source": [ 243 | "class Node:\n", 244 | " def __init__(self):\n", 245 | " self.gradient = 0\n", 246 | "\n", 247 | "class OperationNode(Node):\n", 248 | " def __init__(self, func, args, kwargs, result):\n", 249 | " super().__init__()\n", 250 | " self.recipe = (func, args, kwargs, result, len(args))\n", 251 | "\n", 252 | "\n", 253 | "class VariableNode(Node):\n", 254 | " def __init__(self, var):\n", 255 | " super().__init__()\n", 256 | " self.var = var\n", 257 | "\n", 258 | "\n", 259 | "class PlaceholderNode(Node):\n", 260 | " def __init__(self, var):\n", 261 | " super().__init__()\n", 262 | " self.var = var\n", 263 | "\n", 264 | "\n", 265 | "class ConstantNode(Node):\n", 266 | " def __init__(self, constant):\n", 267 | " super().__init__()\n", 268 | " self.constant = constant\n", 269 | "\n", 270 | "def plot_graph(backward=False, backward_result={}):\n", 271 | " arrow_style = \"<|-\" if backward else \"-|>\" \n", 272 | " edge_labels = {}\n", 273 | " if backward:\n", 274 | " info = \"output\"\n", 275 | " node_index = add_node(node=ConstantNode(info), info=info)\n", 276 | " default_graph.add_edge(node_index - 1, node_index)\n", 277 | " for node, result in backward_result.items():\n", 278 | " for edge in default_graph.edges:\n", 279 | " head, tail = edge\n", 280 | " if head == node:\n", 281 | " edge_labels[tuple(edge)] = str(result)\n", 282 | " edge_labels[(node_index, node_index-1)] = \"1\"\n", 283 | " plt.figure(3,figsize=(10,10)) \n", 284 | " limits=plt.axis('off')\n", 285 | " labels = {i: default_graph.nodes[i]['info'] for i in default_graph.nodes}\n", 286 | " pos = nx.spring_layout(default_graph)\n", 287 | " nx.draw_networkx_nodes(default_graph, pos, node_size = 3000, alpha=0.8)\n", 288 | " nx.draw_networkx_labels(default_graph, pos, labels=labels, font_color=\"w\")\n", 289 | " nx.draw_networkx_edges(default_graph, pos, width=3, arrowstyle=arrow_style, arrowsize=15)\n", 290 | " if backward:\n", 291 | " nx.draw_networkx_edge_labels(default_graph,pos,edge_labels=edge_labels,font_size=30)\n", 292 | " plt.show() " 293 | ] 294 | }, 295 | { 296 | "cell_type": "code", 297 | "execution_count": 11, 298 | "metadata": {}, 299 | "outputs": [ 300 | { 301 | "name": "stdout", 302 | "output_type": "stream", 303 | "text": [ 304 | "Forward pass\n", 305 | "Result is 2\n", 306 | "Corresponding computational graph is \n" 307 | ] 308 | }, 309 | { 310 | "data": { 311 | "image/png": "\n", 312 | "text/plain": [ 313 | "
" 314 | ] 315 | }, 316 | "metadata": {}, 317 | "output_type": "display_data" 318 | } 319 | ], 320 | "source": [ 321 | "import warnings\n", 322 | "\n", 323 | "import numpy as np\n", 324 | "import networkx as nx\n", 325 | "import matplotlib.pyplot as plt\n", 326 | "import matplotlib.cbook\n", 327 | "\n", 328 | "warnings.filterwarnings(\"ignore\", category=matplotlib.cbook.mplDeprecation)\n", 329 | "warnings.filterwarnings(\"ignore\", category=RuntimeWarning)\n", 330 | "\n", 331 | "default_graph = nx.DiGraph()\n", 332 | "stack = []\n", 333 | "\n", 334 | "def add_node(node, info):\n", 335 | " node_index = len(default_graph.nodes) + 1\n", 336 | " default_graph.add_node(node_index, node=node, info=info)\n", 337 | " return node_index\n", 338 | "\n", 339 | "\n", 340 | "def constant(array):\n", 341 | " def wrapped(*args, **kwargs):\n", 342 | " node = ConstantNode(\"\".join(str(a) for a in args))\n", 343 | " node_index = add_node(node, *args)\n", 344 | " stack.append(node_index)\n", 345 | " value = array(*args, **kwargs)\n", 346 | " return value\n", 347 | " return wrapped\n", 348 | "\n", 349 | " \n", 350 | "def primitive(func):\n", 351 | " def wrapped(*args, **kwargs):\n", 352 | " result = func(*args, **kwargs)\n", 353 | " node = OperationNode(func, args, kwargs, result)\n", 354 | " node_index = add_node(node, func.__name__[:3])\n", 355 | " parents = stack[-len(args):]\n", 356 | " \n", 357 | " for parent in parents:\n", 358 | " default_graph.add_edge(parent, node_index)\n", 359 | " stack.pop()\n", 360 | " \n", 361 | " stack.append(node_index)\n", 362 | " return result\n", 363 | " return wrapped\n", 364 | " \n", 365 | "\n", 366 | "def wrapped_numpy_operator():\n", 367 | " for function in [np.add, np.subtract]:\n", 368 | " globals()[function.__name__] = primitive(function)\n", 369 | "\n", 370 | " globals()[\"const\"] = constant(np.array)\n", 371 | "\n", 372 | "\n", 373 | "wrapped_numpy_operator()\n", 374 | "result = add(const(1), subtract(const(3), const(2)))\n", 375 | "print(\"Forward pass\")\n", 376 | "print(f\"Result is {result}\")\n", 377 | "print(\"Corresponding computational graph is \")\n", 378 | "plot_graph()" 379 | ] 380 | }, 381 | { 382 | "cell_type": "markdown", 383 | "metadata": {}, 384 | "source": [ 385 | "### Register VJP after loading\n", 386 | "\n", 387 | "- We need to look up the corresponding VJP of given operation while backtracking.\n", 388 | "- Register all VJP at loading time in `__init__.py`\n", 389 | "\n", 390 | "We can easily validate our result with numerical calculation (which can be used in test), which is\n", 391 | "$$f'(x) = \\lim_{\\epsilon\\to 0} \\frac{f(x + \\epsilon / 2) - f(x - \\epsilon / 2)}{\\epsilon}$$" 392 | ] 393 | }, 394 | { 395 | "cell_type": "code", 396 | "execution_count": 4, 397 | "metadata": {}, 398 | "outputs": [ 399 | { 400 | "name": "stdout", 401 | "output_type": "stream", 402 | "text": [ 403 | "Upstream is 1\n", 404 | "Operator is negative, x is -6, result is 6\n", 405 | "Downstream is -1\n", 406 | "Numerical result is -0.9999999999976694\n" 407 | ] 408 | } 409 | ], 410 | "source": [ 411 | "from collections import defaultdict\n", 412 | "\n", 413 | "import numpy as np\n", 414 | "\n", 415 | "# Register the VJP in memory\n", 416 | "primitive_vhp = defaultdict(dict)\n", 417 | "\n", 418 | "def register_vjp(func, vhp_list):\n", 419 | " for i, downstream in enumerate(vhp_list):\n", 420 | " primitive_vhp[func.__name__][i] = downstream\n", 421 | "\n", 422 | "register_vjp(\n", 423 | " np.add,\n", 424 | " [\n", 425 | " lambda upstream, result, x, y: upstream, # w.r.t. x\n", 426 | " lambda upstream, result, x, y: upstream, # w.r.t. y\n", 427 | " ])\n", 428 | "\n", 429 | "register_vjp(\n", 430 | " np.subtract,\n", 431 | " [\n", 432 | " lambda upstream, result, x, y: upstream, # w.r.t. x\n", 433 | " lambda upstream, result, x, y: -upstream, # w.r.t. y\n", 434 | " ])\n", 435 | "\n", 436 | "register_vjp(\n", 437 | " np.negative,\n", 438 | " [\n", 439 | " lambda upstream, result, x: -upstream, # w.r.t. x\n", 440 | " ])\n", 441 | "\n", 442 | "# This is the numerical way to calculate the derivatives\n", 443 | "\n", 444 | "epsilon = 1e-4\n", 445 | "\n", 446 | "def numerical_vjp(func, arguments, wrt):\n", 447 | " args_pos = [args + epsilon/2 if i == wrt else args for i, args in enumerate(arguments)]\n", 448 | " func_pos = func(*args_pos)\n", 449 | " args_neg = [args - epsilon/2 if i == wrt else args for i, args in enumerate(arguments)]\n", 450 | " func_neg = func(*args_neg)\n", 451 | " return (func_pos - func_neg) / epsilon\n", 452 | " \n", 453 | "func = np.negative\n", 454 | "x = -6\n", 455 | "upstream = 1\n", 456 | "result = func(x)\n", 457 | "vjp = primitive_vhp[func.__name__][0]\n", 458 | "\n", 459 | "print(f\"Upstream is {upstream}\")\n", 460 | "print(f\"Operator is {func.__name__}, x is {x}, result is {result}\")\n", 461 | "print(f\"Downstream is {vjp(upstream, result, x)}\")\n", 462 | "print(f\"Numerical result is {numerical_vjp(func, [x], 0)}\")" 463 | ] 464 | }, 465 | { 466 | "cell_type": "markdown", 467 | "metadata": {}, 468 | "source": [ 469 | "### Backward Propagation" 470 | ] 471 | }, 472 | { 473 | "cell_type": "code", 474 | "execution_count": 5, 475 | "metadata": {}, 476 | "outputs": [ 477 | { 478 | "name": "stdout", 479 | "output_type": "stream", 480 | "text": [ 481 | "Backward pass\n" 482 | ] 483 | }, 484 | { 485 | "data": { 486 | "image/png": "\n", 487 | "text/plain": [ 488 | "
" 489 | ] 490 | }, 491 | "metadata": {}, 492 | "output_type": "display_data" 493 | } 494 | ], 495 | "source": [ 496 | "import networkx as nx\n", 497 | "\n", 498 | "def backward_prop(upstream):\n", 499 | " default_graph.nodes[len(default_graph.nodes())]['node'].gradient = upstream\n", 500 | "\n", 501 | " gradient_dict = {}\n", 502 | " for node in reversed(list(nx.topological_sort(default_graph))):\n", 503 | " child_node = default_graph.nodes[node]['node']\n", 504 | " if isinstance(child_node, OperationNode):\n", 505 | " func, args, kwargs, result, arg_num = child_node.recipe\n", 506 | " upstream = child_node.gradient\n", 507 | "\n", 508 | " for i, parent in zip(range(arg_num), default_graph.predecessors(node)):\n", 509 | " vhp = primitive_vhp[func.__name__][i]\n", 510 | " downstream = vhp(upstream, result, *args, **kwargs)\n", 511 | " default_graph.nodes[parent]['node'].gradient += downstream\n", 512 | " else:\n", 513 | " gradient_dict[node] = child_node.gradient\n", 514 | "\n", 515 | " return gradient_dict\n", 516 | "\n", 517 | "print(\"Backward pass\")\n", 518 | "backward_result = backward_prop(np.ones_like(result))\n", 519 | "plot_graph(True, backward_result)" 520 | ] 521 | }, 522 | { 523 | "cell_type": "markdown", 524 | "metadata": {}, 525 | "source": [ 526 | "---\n", 527 | "## Application" 528 | ] 529 | }, 530 | { 531 | "cell_type": "markdown", 532 | "metadata": {}, 533 | "source": [ 534 | "### Simple demo\n", 535 | "\n", 536 | "We want to compute the derivatives of $x^3$, which is $3x^2$." 537 | ] 538 | }, 539 | { 540 | "cell_type": "code", 541 | "execution_count": 6, 542 | "metadata": {}, 543 | "outputs": [ 544 | { 545 | "data": { 546 | "image/png": "\n", 547 | "text/plain": [ 548 | "
" 549 | ] 550 | }, 551 | "metadata": { 552 | "needs_background": "light" 553 | }, 554 | "output_type": "display_data" 555 | }, 556 | { 557 | "data": { 558 | "text/plain": [ 559 | "
" 560 | ] 561 | }, 562 | "metadata": {}, 563 | "output_type": "display_data" 564 | } 565 | ], 566 | "source": [ 567 | "%run examples/simple/poly_test.py" 568 | ] 569 | }, 570 | { 571 | "cell_type": "markdown", 572 | "metadata": {}, 573 | "source": [ 574 | "Blue line is $x^3$, and the orange line is $3x^2$." 575 | ] 576 | }, 577 | { 578 | "cell_type": "markdown", 579 | "metadata": {}, 580 | "source": [ 581 | "### Boston house dataset – linear regression with multiple variables\n", 582 | "\n", 583 | "Description: 13 numerical attributes, and our target is predicting house price.\n", 584 | "\n", 585 | "Criterion: Mean Square Error" 586 | ] 587 | }, 588 | { 589 | "cell_type": "code", 590 | "execution_count": 7, 591 | "metadata": {}, 592 | "outputs": [ 593 | { 594 | "name": "stderr", 595 | "output_type": "stream", 596 | "text": [ 597 | "100%|██████████| 200/200 [00:00<00:00, 1747.03it/s]\n" 598 | ] 599 | }, 600 | { 601 | "data": { 602 | "image/png": "\n", 603 | "text/plain": [ 604 | "
" 605 | ] 606 | }, 607 | "metadata": { 608 | "needs_background": "light" 609 | }, 610 | "output_type": "display_data" 611 | } 612 | ], 613 | "source": [ 614 | "%run examples/regression/boston.py" 615 | ] 616 | }, 617 | { 618 | "cell_type": "markdown", 619 | "metadata": {}, 620 | "source": [ 621 | "### Iris dataset – classification \n", 622 | "\n", 623 | "Description: 4 attributes, predict one of class over 3 classes.\n", 624 | "\n", 625 | "Criterion: Cross Entropy" 626 | ] 627 | }, 628 | { 629 | "cell_type": "code", 630 | "execution_count": 8, 631 | "metadata": {}, 632 | "outputs": [ 633 | { 634 | "name": "stderr", 635 | "output_type": "stream", 636 | "text": [ 637 | "100%|██████████| 300/300 [00:00<00:00, 1527.80it/s]\n" 638 | ] 639 | }, 640 | { 641 | "data": { 642 | "image/png": "\n", 643 | "text/plain": [ 644 | "
" 645 | ] 646 | }, 647 | "metadata": { 648 | "needs_background": "light" 649 | }, 650 | "output_type": "display_data" 651 | } 652 | ], 653 | "source": [ 654 | "%run examples/classification/iris.py" 655 | ] 656 | }, 657 | { 658 | "cell_type": "markdown", 659 | "metadata": {}, 660 | "source": [ 661 | "### Hand-written digits dataset – classification\n", 662 | "\n", 663 | "Description: 8 * 8 image attribute, predict which digit (0-9) is\n", 664 | "\n", 665 | "Criterion: Cross Entropy" 666 | ] 667 | }, 668 | { 669 | "cell_type": "code", 670 | "execution_count": 9, 671 | "metadata": {}, 672 | "outputs": [ 673 | { 674 | "name": "stderr", 675 | "output_type": "stream", 676 | "text": [ 677 | "100%|██████████| 300/300 [00:00<00:00, 1615.85it/s]\n" 678 | ] 679 | }, 680 | { 681 | "data": { 682 | "image/png": "\n", 683 | "text/plain": [ 684 | "
" 685 | ] 686 | }, 687 | "metadata": { 688 | "needs_background": "light" 689 | }, 690 | "output_type": "display_data" 691 | } 692 | ], 693 | "source": [ 694 | "%run examples/classification/digits.py" 695 | ] 696 | }, 697 | { 698 | "cell_type": "markdown", 699 | "metadata": {}, 700 | "source": [ 701 | "---\n", 702 | "## Software engineering" 703 | ] 704 | }, 705 | { 706 | "cell_type": "markdown", 707 | "metadata": {}, 708 | "source": [ 709 | "### Unit Test\n", 710 | "\n", 711 | "Reference the test case from google/jax. So far, We have already covered the test for value and vjp.\n", 712 | "\n", 713 | "We test different shape including scalar, vector and matrix." 714 | ] 715 | }, 716 | { 717 | "cell_type": "code", 718 | "execution_count": 10, 719 | "metadata": {}, 720 | "outputs": [ 721 | { 722 | "name": "stdout", 723 | "output_type": "stream", 724 | "text": [ 725 | "\u001b[1m============================= test session starts ==============================\u001b[0m\n", 726 | "platform linux -- Python 3.7.4, pytest-5.2.1, py-1.8.0, pluggy-0.13.0\n", 727 | "rootdir: /home/chenyee/AutoDiff-from-scratch\n", 728 | "plugins: arraydiff-0.3, remotedata-0.3.2, openfiles-0.4.0, doctestplus-0.4.0, forked-1.1.3, xdist-1.30.0\n", 729 | "collected 78 items \u001b[0m\n", 730 | "\n", 731 | "autodiff/tests/autodiff_test.py \u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[36m [ 50%]\u001b[0m\n", 732 | "autodiff/tests/numpy_test.py \u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[36m [100%]\u001b[0m\n", 733 | "\n", 734 | "\u001b[32m\u001b[1m============================== 78 passed in 0.35s ==============================\u001b[0m\n" 735 | ] 736 | } 737 | ], 738 | "source": [ 739 | "! pytest autodiff/tests" 740 | ] 741 | }, 742 | { 743 | "cell_type": "markdown", 744 | "metadata": {}, 745 | "source": [ 746 | "### CI/CD\n", 747 | "\n", 748 | "- Use Github Actions\n", 749 | "- Also testing on container provided by TA (on `dev` branch)\n", 750 | "\n", 751 | "![actions](img/actions.png)" 752 | ] 753 | }, 754 | { 755 | "cell_type": "markdown", 756 | "metadata": {}, 757 | "source": [ 758 | "## Misc" 759 | ] 760 | }, 761 | { 762 | "cell_type": "markdown", 763 | "metadata": {}, 764 | "source": [ 765 | "### C++ building\n", 766 | "\n", 767 | "- Try to build C++ VJP on `dev` branch, but improved efficiency is questionable.\n", 768 | "- There is another potential bottleneck worth to improve, which is the graph building (use networkx currently).\n", 769 | "\n", 770 | "```c++\n", 771 | "typedef py::array_t tensor;\n", 772 | "\n", 773 | "tensor negative_vjp(tensor upstream, tensor result, tensor x)\n", 774 | "{\n", 775 | " return -upstream;\n", 776 | "}\n", 777 | "\n", 778 | "tensor reciprocal_vjp(tensor upstream, tensor result, tensor x)\n", 779 | "{\n", 780 | " py::object np_pow = py::module::import(\"numpy\").attr(\"power\");\n", 781 | " return -upstream / np_pow(x, 2);\n", 782 | "}\n", 783 | "```" 784 | ] 785 | }, 786 | { 787 | "cell_type": "markdown", 788 | "metadata": {}, 789 | "source": [ 790 | "### Open Source contributions\n", 791 | "\n", 792 | "Find the redundant calculations of the derivatives of power function.\n", 793 | "1. [PyTorch](https://github.com/pytorch/pytorch/pull/28651)\n", 794 | "2. [JAX](https://github.com/google/jax/pull/157)\n", 795 | "3. [Autograd](https://github.com/HIPS/autograd/pull/541)" 796 | ] 797 | } 798 | ], 799 | "metadata": { 800 | "kernelspec": { 801 | "display_name": "Python 3", 802 | "language": "python", 803 | "name": "python3" 804 | }, 805 | "language_info": { 806 | "codemirror_mode": { 807 | "name": "ipython", 808 | "version": 3 809 | }, 810 | "file_extension": ".py", 811 | "mimetype": "text/x-python", 812 | "name": "python", 813 | "nbconvert_exporter": "python", 814 | "pygments_lexer": "ipython3", 815 | "version": "3.7.4" 816 | } 817 | }, 818 | "nbformat": 4, 819 | "nbformat_minor": 2 820 | } 821 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Basic neural network library supported auto-differentiation 2 | 3 | ## Introduction 4 | 5 | This is the very simple neural network library that supporting auto-differtiation. After referenced works from priors (see *Reference*), I provide the implementation which is simple enough that can tackle the real-world problem (regression and classification). 6 | 7 | ## Documents 8 | 9 | ### Forward-prop and backward-prop 10 | 11 | Source: 12 | 13 | - `autodiff/autodiff/core.py` 14 | - `autodiff/autodiff/diff.py` 15 | 16 | We init the graph in the forward prop and call the wrapped operation (See *Wrapped operation and VJP*). I am too lazy to use the DiGraph in networkx as the computational graph but only use litte features (build graph and topological sort). 17 | 18 | Note that unlike `autograd` and `jax`, I can only calculate the first order derivative of the given function. 19 | 20 | ### Wrapped operation and VJP 21 | 22 | Source: 23 | 24 | - `autodiff/autodiff/numpy_grad/wrapper.py` 25 | - `autodiff/autodiff/numpy_grad/vjps.py` 26 | 27 | In the forward-prop, we construct the computational graph (See *Computational graph*) with the wrapped numpy function in the wrapper module. 28 | 29 | In the backward-prop, we need to reference the VJP form of the given operation, we register the VJPs in the very begining. I only support little function but it's enough to do the more higher implementation (See *High level usage*). 30 | 31 | ### Computational graph 32 | 33 | Source: 34 | 35 | - `autodiff/autodiff/graph/tracer.py` (wrapped operation and graph building) 36 | - `autodiff/autodiff/graph/node.py` (nodes classes) 37 | - `autodiff/autodiff/graph/manager.py` (Context manager for graph) 38 | 39 | Notice that we don't have to worry about the how the node has been built (order of execution), because the python interpreter know it, for example 40 | 41 | ```python 42 | ad.add(ad.Variable(x), ad.Variable(y) 43 | ``` 44 | 45 | The interpreter know that we need to construct the Variable of x and then Variable of y first, after that we do the operation of add. 46 | 47 | By fully leverage this feature, we can push variable nodes into stack and pop them from stack in the primitive operation and connect them with an edge. 48 | 49 | Note that in the current implementation, I use `id(x)` to identify different nodes in the graph (which is bad but simple). Due to small integer may have the same address which can cause program no longer work as expected, there are lots of TODO in the future. 50 | 51 | ### High level usage 52 | 53 | Source: 54 | 55 | - `autodiff/nn/layer.py` (Module and Layer classes) 56 | - `autodiff/nn/criterion.py` (Criterion for different problems) 57 | - `autodiff/nn/optimizer.py` (Optimizers) 58 | 59 | In the high-level implementation, Fully imitate the syntax from pytorch but it only support the multi-layer perceptron with little weird syntax in the current time, as shown below. 60 | 61 | ```python 62 | class SimpleModel(Module): 63 | def __init__(self, num_features, num_classes): 64 | super().__init__() 65 | self.linear1 = Linear(num_features, 5) 66 | self.linear2 = Linear(5, num_classes) 67 | 68 | def forward(self, x): 69 | x = self.linear1(x, bridge=False) 70 | x = self.linear2(x) 71 | return x 72 | ``` 73 | 74 | This model want to solve the classification problem with two layers, notice that in the `forward` method, first layer MUST assign bridge to `False` to generate a new `Placeholder` for x. 75 | 76 | ### Example 77 | 78 | Source: 79 | 80 | - `examples\*` 81 | 82 | We cover the real-world problem including regression and classification. I use MSE criterior for regression and cross-entropy loss for classification. 83 | 84 | ### Test 85 | 86 | Source: 87 | 88 | - `autodiff\tests\*` 89 | 90 | So far, we cover the test for get value from function and it's derivatives. 91 | 92 | ## Reference 93 | 94 | ### Implementation 95 | 96 | - Core idea from [Autograd](https://github.com/HIPS/autograd) 97 | - Test procedure and test cases from [JAX](https://github.com/google/jax) 98 | - Variable, Placeholder and Constant syntax from [TensorFlow](https://github.com/tensorflow/tensorflow) 99 | - Layer syntax and cross entropy from [PyTorch](https://github.com/pytorch/pytorch) 100 | 101 | 102 | ### Source code 103 | 104 | - [Autograd](https://github.com/HIPS/autograd) 105 | - [Autodidact](https://github.com/mattjj/autodidact) 106 | - [PyTorch](https://github.com/pytorch/pytorch) 107 | - [TensorFlow](https://github.com/tensorflow/tensorflow) 108 | - [Caffe](https://github.com/BVLC/caffe) 109 | - [Caffe2](https://github.com/pytorch/pytorch/tree/master/caffe2) 110 | - [JAX](https://github.com/google/jax) 111 | 112 | ### Lecture 113 | 114 | - [Backpropagation, Toronto CSC321](http://www.cs.toronto.edu/~rgrosse/courses/csc321_2018/slides/lec06.pdf) 115 | - [Automatic Differentiation, Toronto CSC321](http://www.cs.toronto.edu/~rgrosse/courses/csc321_2018/slides/lec10.pdf) 116 | - [Backpropagation, Stanford CS224N](https://www.youtube.com/watch?v=yLYHDSv-288&list=PLoROMvodv4rOhcuXMZkNm7j3fVwBBY42z&index=5&t=2177s) 117 | - [Introduction to Neural Networks, Stanford CS231N](https://www.youtube.com/watch?v=d14TUNcbn1k&list=PL3FW7Lu3i5JvHM8ljYj-zLfQRF3EO8sYv&index=4) 118 | - [Backpropagation: Find Partial Derivatives, MIT 18.065](https://www.youtube.com/watch?v=lZrIPRnoGQQ&list=PLUl4u3cNGP63oMNUHXqIUcrkS2PivhN3k&index=30&t=0s) 119 | 120 | ### Documents 121 | 122 | - [Jax official](https://jax.readthedocs.io/en/latest/index.html) 123 | - [Phd Thesis by Dougal Maclaurin (one of Autograd author)](https://dougalmaclaurin.com/phd-thesis.pdf) 124 | -------------------------------------------------------------------------------- /autodiff/.gitignore: -------------------------------------------------------------------------------- 1 | __pycache__/ -------------------------------------------------------------------------------- /autodiff/__init__.py: -------------------------------------------------------------------------------- 1 | from . import autodiff # register the VHP of primitive function 2 | import autodiff.autodiff.numpy_grad.wrapper as np # export the primitive function 3 | #TODO need refactor, may use __all__? 4 | from autodiff.autodiff.core import forward_prop, backward_prop, zero_grad 5 | from autodiff.autodiff.global_vars import set_forwarded, set_parameters 6 | 7 | # export the gradient-related function 8 | from autodiff.autodiff.diff import * 9 | globals().update(np.__dict__) 10 | 11 | for func in [ 12 | set_parameters, 13 | set_forwarded, 14 | forward_prop, 15 | backward_prop, 16 | zero_grad, 17 | ]: 18 | globals()[func.__name__] = func 19 | -------------------------------------------------------------------------------- /autodiff/autodiff/__init__.py: -------------------------------------------------------------------------------- 1 | from . import numpy_grad, graph 2 | -------------------------------------------------------------------------------- /autodiff/autodiff/core.py: -------------------------------------------------------------------------------- 1 | from collections import defaultdict 2 | 3 | import networkx as nx 4 | 5 | from .graph.node import OperationNode, VariableNode, PlaceholderNode 6 | from .global_vars import register_graph, pop_graph 7 | 8 | primitive_vhp = defaultdict(dict) 9 | 10 | 11 | def is_wrt(node): 12 | return type(node) in [VariableNode, PlaceholderNode] 13 | 14 | 15 | def forward_prop(func, provided_graph=None): 16 | def forward_wrap(*args, **kwargs): 17 | graph = nx.DiGraph() if provided_graph is None else provided_graph 18 | register_graph(graph) 19 | return func(*args, **kwargs) 20 | 21 | return forward_wrap 22 | 23 | 24 | def backward_prop(upstream): 25 | graph = pop_graph() 26 | graph.nodes[len(graph.nodes())]['node'].gradient = upstream 27 | # print("Set gradient to ", upstream, len(graph.nodes())) 28 | gradient_dict = {} 29 | for node in reversed(list(nx.topological_sort(graph))): 30 | child_node: OperationNode = graph.nodes[node]['node'] 31 | if isinstance(child_node, OperationNode): 32 | func, args, kwargs, result, arg_num = child_node.recipe 33 | upstream = child_node.gradient 34 | # print(func.__name__, node, upstream, args, arg_num) 35 | 36 | for i, parent in zip(range(arg_num), graph.predecessors(node)): 37 | vhp = primitive_vhp[func.__name__][i] 38 | downstream = vhp(upstream, result, *args, **kwargs) 39 | # print(i, "downstream size is", downstream.shape) 40 | graph.nodes[parent]['node'].gradient += downstream 41 | 42 | elif is_wrt(child_node): 43 | gradient_dict[child_node.var] = child_node.gradient 44 | 45 | return gradient_dict 46 | 47 | 48 | def zero_grad(graph): 49 | for node_index in graph.nodes: 50 | graph.nodes[node_index]['node'].gradient = 0 51 | 52 | 53 | def register_vjp(func, vhp_list): 54 | for i, downstream in enumerate(vhp_list): 55 | primitive_vhp[func.__name__][i] = downstream 56 | -------------------------------------------------------------------------------- /autodiff/autodiff/diff.py: -------------------------------------------------------------------------------- 1 | import numpy as onp 2 | 3 | from .core import forward_prop, backward_prop 4 | 5 | 6 | def value(func): 7 | def valueWrapped(*args, **kwargs): 8 | forward_func = forward_prop(func) 9 | return forward_func(*args, **kwargs) 10 | 11 | return valueWrapped 12 | 13 | 14 | def grad(func, wrt=None, upstream=None): 15 | def gradVal(*args, **kwargs): 16 | forward_func = forward_prop(func) 17 | end_value = forward_func(*args, **kwargs) 18 | g = onp.ones_like(end_value) if upstream is None else upstream 19 | grad = backward_prop(g) 20 | return grad if wrt is None else grad[wrt] 21 | 22 | return gradVal 23 | 24 | 25 | def value_and_grad(func, wrt=None): 26 | def gradVal(*args, **kwargs): 27 | forward_func = forward_prop(func) 28 | end_value = forward_func(*args, **kwargs) 29 | grad = backward_prop(onp.ones_like(end_value)) 30 | return (end_value, grad) if wrt is None else (end_value, grad[wrt]) 31 | 32 | return gradVal 33 | -------------------------------------------------------------------------------- /autodiff/autodiff/global_vars.py: -------------------------------------------------------------------------------- 1 | from collections import defaultdict 2 | 3 | 4 | class GraphInfo: 5 | def __init__(self): 6 | self.stack = [] 7 | self.vars = dict() 8 | self.places = dict() 9 | self.forwarded = False 10 | 11 | 12 | class var: 13 | pass 14 | 15 | 16 | global_vars = var() 17 | global_vars._graph_stack = [] 18 | global_vars._graph_info_dict = defaultdict(GraphInfo) 19 | 20 | 21 | def register_graph(graph): 22 | global_vars._graph_stack.append(graph) 23 | 24 | 25 | def pop_graph(): 26 | return global_vars._graph_stack.pop() 27 | 28 | 29 | def set_forwarded(graph): 30 | global_vars._graph_info_dict[graph].forwarded = True 31 | 32 | 33 | def set_parameters(parameters, graph): 34 | for array_id, node_index in global_vars._graph_info_dict[graph].vars.items( 35 | ): 36 | parameters.append({ 37 | "array_id": array_id, 38 | "variables": graph.nodes[node_index]['node'].content 39 | }) 40 | 41 | 42 | def get_graph_info(graph): 43 | return global_vars._graph_info_dict.get(graph, GraphInfo()) 44 | 45 | 46 | def update_graph_info(graph, graph_info): 47 | global_vars._graph_info_dict[graph] = graph_info 48 | -------------------------------------------------------------------------------- /autodiff/autodiff/graph/__init__.py: -------------------------------------------------------------------------------- 1 | from . import tracer -------------------------------------------------------------------------------- /autodiff/autodiff/graph/manager.py: -------------------------------------------------------------------------------- 1 | import networkx as nx 2 | 3 | from ..global_vars import get_graph_info, update_graph_info, register_graph, pop_graph 4 | 5 | 6 | class GraphManager: 7 | def __init__(self): 8 | pass 9 | 10 | def __enter__(self): 11 | self.graph: nx.DiGraph = pop_graph() 12 | self.graph_info = get_graph_info(self.graph) 13 | return self.graph, self.graph_info 14 | 15 | def __exit__(self, *args): 16 | register_graph(self.graph) 17 | update_graph_info(self.graph, self.graph_info) 18 | 19 | 20 | def add_node(graph, node): 21 | node_index = len(graph.nodes()) + 1 22 | graph.add_node(node_index, node=node) 23 | return node_index -------------------------------------------------------------------------------- /autodiff/autodiff/graph/node.py: -------------------------------------------------------------------------------- 1 | only_take_lhs_grad = [ 2 | "reshape", 3 | "__getitem__", 4 | ] 5 | 6 | class Node: 7 | def __init__(self): 8 | self.gradient = 0 9 | 10 | 11 | class OperationNode(Node): 12 | def __init__(self, func, args, kwargs, result): 13 | super().__init__() 14 | self.recipe = (func, args, kwargs, result, 15 | len(args) if func.__name__ not in only_take_lhs_grad else 1) 16 | 17 | 18 | class VariableNode(Node): 19 | def __init__(self, var_id, content): 20 | super().__init__() 21 | self.var = var_id 22 | self.content = content 23 | 24 | 25 | class PlaceholderNode(Node): 26 | def __init__(self, var): 27 | super().__init__() 28 | self.var = var 29 | 30 | 31 | class ConstantNode(Node): 32 | def __init__(self, constant): 33 | super().__init__() 34 | self.constant = constant -------------------------------------------------------------------------------- /autodiff/autodiff/graph/tracer.py: -------------------------------------------------------------------------------- 1 | from functools import wraps 2 | 3 | import networkx as nx 4 | 5 | from .node import ConstantNode, OperationNode, VariableNode, PlaceholderNode 6 | from .manager import GraphManager, add_node 7 | 8 | #TODO refactor this part 9 | def constant(array): 10 | def const_wrapped(content): 11 | with GraphManager() as (graph, info): 12 | if not info.forwarded: 13 | node = ConstantNode(content) 14 | node_index = add_node(graph, node) 15 | info.stack.append(node_index) 16 | # print('const', node_index, content) 17 | return content if isinstance(content, tuple) else array(content) 18 | 19 | return const_wrapped 20 | 21 | 22 | def variable(array): 23 | def var_wrapped(content): 24 | with GraphManager() as (graph, info): 25 | if not info.forwarded: 26 | var_id = id(content) 27 | if var_id not in info.vars: 28 | node = VariableNode(var_id, content) 29 | node_index = add_node(graph, node) 30 | info.vars[var_id] = node_index 31 | else: 32 | node_index = info.vars[var_id] 33 | info.stack.append(node_index) 34 | # print('var', node_index, content.shape) 35 | return array(content) 36 | 37 | return var_wrapped 38 | 39 | 40 | def placeholder(array): 41 | def place_wrapped(content): 42 | with GraphManager() as (graph, info): 43 | if not info.forwarded: 44 | place_id = id(content) 45 | if place_id not in info.places: 46 | node = PlaceholderNode(place_id) 47 | node_index = add_node(graph, node) 48 | info.places[place_id] = node_index 49 | else: 50 | node_index = info.places[place_id] 51 | info.stack.append(node_index) 52 | # print('place', node_index, content.shape) 53 | return array(content) 54 | 55 | return place_wrapped 56 | 57 | 58 | def primitive(func): 59 | @wraps(func) 60 | def func_wrapped(*args, **kwargs): 61 | result = func(*args, **kwargs) 62 | with GraphManager() as (graph, info): 63 | if not info.forwarded: 64 | node = OperationNode(func, args, kwargs, result) 65 | node_index = add_node(graph, node) 66 | 67 | parents = info.stack[-len(args):] 68 | for parent in parents: 69 | graph.add_edge(parent, node_index) 70 | info.stack.pop() 71 | 72 | # print('fun', node_index, func.__name__) 73 | 74 | info.stack.append(node_index) 75 | return result 76 | 77 | return func_wrapped 78 | -------------------------------------------------------------------------------- /autodiff/autodiff/numpy_grad/.gitignore: -------------------------------------------------------------------------------- 1 | __pycache__/ -------------------------------------------------------------------------------- /autodiff/autodiff/numpy_grad/__init__.py: -------------------------------------------------------------------------------- 1 | from . import vjps -------------------------------------------------------------------------------- /autodiff/autodiff/numpy_grad/vjps.py: -------------------------------------------------------------------------------- 1 | #pylint: disable=no-member 2 | import numpy as onp 3 | 4 | from ..numpy_grad import wrapper as wnp 5 | from ..core import register_vjp 6 | 7 | # print(__name__) 8 | """ 9 | Unary operation 10 | """ 11 | register_vjp( 12 | wnp.negative, 13 | [ 14 | lambda upstream, result, x: -upstream, # w.r.t. x 15 | ]) 16 | 17 | register_vjp( 18 | wnp.reciprocal, 19 | [ 20 | lambda upstream, result, x: upstream * (-1 / x**2), # w.r.t. x 21 | ]) 22 | 23 | register_vjp( 24 | wnp.exp, 25 | [ 26 | lambda upstream, result, x: upstream * result, # w.r.t. x 27 | ]) 28 | 29 | register_vjp( 30 | wnp.log, 31 | [ 32 | lambda upstream, result, x: upstream / x, # w.r.t. x 33 | ]) 34 | 35 | register_vjp( 36 | wnp.sin, 37 | [ 38 | lambda upstream, result, x: upstream * onp.cos(x), # w.r.t. x 39 | ]) 40 | 41 | register_vjp( 42 | wnp.cos, 43 | [ 44 | lambda upstream, result, x: upstream * -onp.sin(x), # w.r.t. x 45 | ]) 46 | """ 47 | Binary operation 48 | """ 49 | register_vjp( 50 | wnp.add, 51 | [ 52 | lambda upstream, result, x, y: unbroadcast(x, upstream), # w.r.t. x 53 | lambda upstream, result, x, y: unbroadcast(y, upstream), # w.r.t. y 54 | ]) 55 | 56 | register_vjp( 57 | wnp.subtract, 58 | [ 59 | lambda upstream, result, x, y: unbroadcast(x, upstream, other=y 60 | ), # w.r.t. x 61 | lambda upstream, result, x, y: unbroadcast(y, -upstream, other=x 62 | ), # w.r.t. y 63 | ]) 64 | 65 | register_vjp( 66 | wnp.multiply, 67 | [ 68 | lambda upstream, result, x, y: unbroadcast(x, upstream * y 69 | ), # w.r.t. x 70 | lambda upstream, result, x, y: unbroadcast(y, upstream * x 71 | ), # w.r.t. y 72 | ]) 73 | 74 | register_vjp( 75 | wnp.true_divide, 76 | [ 77 | lambda upstream, result, x, y: unbroadcast(x, upstream / y 78 | ), # w.r.t. x 79 | lambda upstream, result, x, y: unbroadcast(y, upstream * 80 | (-x / y**2)), # w.r.t. y 81 | ]) 82 | 83 | register_vjp( 84 | wnp.maximum, 85 | [ 86 | lambda upstream, result, x, y: upstream * balanced_eq(x, result, y 87 | ), # w.r.t. x 88 | lambda upstream, result, x, y: upstream * balanced_eq(y, result, x 89 | ), # w.r.t. y 90 | ]) 91 | 92 | register_vjp( 93 | wnp.minimum, 94 | [ 95 | lambda upstream, result, x, y: upstream * balanced_eq(x, result, y 96 | ), # w.r.t. x 97 | lambda upstream, result, x, y: upstream * balanced_eq(y, result, x 98 | ), # w.r.t. y 99 | ]) 100 | 101 | register_vjp( 102 | wnp.power, 103 | [ 104 | lambda upstream, result, x, y: unbroadcast( 105 | x, upstream * (y * x**(y - 1))), # w.r.t. x 106 | lambda upstream, result, x, y: unbroadcast( 107 | y, upstream * (result * onp.log(replace_zero(x, 1.)))), # w.r.t. y 108 | ]) 109 | 110 | 111 | # shamelessly taken from autograd 112 | def replace_zero(x, val): 113 | return onp.where(x, x, val) 114 | 115 | 116 | def unbroadcast(target, g, broadcast_idx=0, other=None): 117 | """ 118 | Let downstream have the same shape as target 119 | """ 120 | # if onp.ndim(g) == 2: 121 | # g = g.diagonal()[:,None] 122 | # print("-" * 10) 123 | # print(onp.ndim(g) > onp.ndim(target), g.shape, target.shape) 124 | # print(other) 125 | # print(target) 126 | # print("Before", g) 127 | # print(g, target) 128 | while onp.ndim(g) > onp.ndim(target): 129 | g = onp.sum(g, axis=broadcast_idx) 130 | # print("Sum", g) 131 | 132 | if onp.ndim(g) == onp.ndim(target): 133 | for axis, size in enumerate(onp.shape(target)): 134 | if size == 1: 135 | g = onp.sum(g, axis=axis, keepdims=True) 136 | # print("After", g) 137 | # print("-" * 10) 138 | return g 139 | 140 | 141 | def balanced_eq(x, z, y): 142 | return (x == z) / (1.0 + (x == y)) 143 | 144 | 145 | """ 146 | Matrix calculation 147 | """ 148 | 149 | 150 | def dot_vjp_first(upstream, result, x, y): 151 | # print("first: ", upstream, result.shape, x.shape, y.shape) 152 | 153 | if not (onp.ndim(x) == onp.ndim(y) == 2): 154 | raise NotImplementedError("Only care about MM or MV product!") 155 | 156 | # Take the derivative of output respect to x (input) 157 | downstream = onp.dot(upstream, y.T) 158 | 159 | assert downstream.shape == x.shape 160 | 161 | return downstream 162 | 163 | 164 | def dot_vjp_second(upstream, result, x, y): 165 | # print("second: ", upstream, result.shape, x.shape, y.shape) 166 | 167 | if not (onp.ndim(x) == onp.ndim(y) == 2): 168 | raise NotImplementedError("Only care about MM or MV product!") 169 | 170 | # Take the derivative of output respect to y (weight) 171 | downstream = onp.dot(x.T, upstream) 172 | 173 | assert downstream.shape == y.shape 174 | 175 | return downstream 176 | 177 | 178 | register_vjp( 179 | wnp.dot, 180 | [ 181 | dot_vjp_first, # w.r.t. x 182 | dot_vjp_second, # w.r.t. y 183 | ]) 184 | 185 | # Special operator 186 | 187 | register_vjp(wnp.reshape, [ 188 | lambda upstream, result, x, shape, order=None: onp.reshape( 189 | upstream, onp.shape(x), order=order) 190 | ]) 191 | 192 | 193 | def sum_vjp(upstream, result, x, axis=1, keepdims=False, dtype=None): 194 | shape = onp.shape(x) 195 | 196 | if not shape: 197 | return upstream 198 | 199 | # print(result, x, axis) 200 | axis = list(axis) if isinstance(axis, tuple) else axis 201 | new_shape = onp.array(shape) 202 | new_shape[axis] = 1 203 | # print(onp.reshape(upstream, new_shape)) 204 | return onp.reshape(upstream, new_shape) + onp.zeros(shape, dtype=dtype) 205 | 206 | 207 | register_vjp(wnp.sum, [sum_vjp]) 208 | 209 | 210 | def getitem_vjp(upstream, result, x, index): 211 | # print(x, index, upstream) 212 | onp.add.at(x, index, upstream) 213 | return x 214 | 215 | register_vjp(wnp.__getitem__, [getitem_vjp]) 216 | 217 | -------------------------------------------------------------------------------- /autodiff/autodiff/numpy_grad/wrapper.py: -------------------------------------------------------------------------------- 1 | import types 2 | 3 | import numpy as _np 4 | 5 | from ..graph.tracer import primitive, constant, variable, placeholder 6 | 7 | nograd_functions = [ 8 | _np.ndim, _np.shape, _np.iscomplexobj, _np.result_type, _np.zeros_like, 9 | _np.ones_like, _np.floor, _np.ceil, _np.round, _np.rint, _np.around, 10 | _np.fix, _np.trunc, _np.all, _np.any, _np.argmax, _np.argmin, 11 | _np.argpartition, _np.argsort, _np.argwhere, _np.nonzero, _np.flatnonzero, 12 | _np.count_nonzero, _np.searchsorted, _np.sign, _np.ndim, _np.shape, 13 | _np.floor_divide, _np.logical_and, _np.logical_or, _np.logical_not, 14 | _np.logical_xor, _np.isfinite, _np.isinf, _np.isnan, _np.isneginf, 15 | _np.isposinf, _np.allclose, _np.isclose, _np.array_equal, _np.array_equiv, 16 | _np.greater, _np.greater_equal, _np.less, _np.less_equal, _np.equal, 17 | _np.not_equal, _np.iscomplexobj, _np.iscomplex, _np.size, _np.isscalar, 18 | _np.isreal, _np.zeros_like, _np.ones_like, _np.result_type 19 | ] 20 | 21 | excluded_function = [ 22 | _np.linspace, 23 | _np.arange, 24 | ] 25 | 26 | 27 | def wrap_func(numpy, local): 28 | # Wrap numpy primitive function 29 | function_types = {_np.ufunc, types.FunctionType, types.BuiltinFunctionType} 30 | for name, func in numpy.items(): 31 | if func in nograd_functions or func in excluded_function: 32 | local[name] = func 33 | elif type(func) in function_types: 34 | local[name] = primitive(func) 35 | elif isinstance(func, type) and _np.issubdtype(func, _np.integer): 36 | local[name] = func 37 | 38 | # Wrap numpy array member function 39 | for func in [_np.ndarray.__getitem__, _np.ndarray.__len__, _np.ndarray.__contains__]: 40 | local[func.__name__] = primitive(func) 41 | 42 | 43 | wrap_func(_np.__dict__, globals()) 44 | globals()['Constant'] = constant(_np.array) 45 | globals()['Variable'] = variable(_np.array) 46 | globals()['Placeholder'] = placeholder(_np.array) 47 | -------------------------------------------------------------------------------- /autodiff/nn/criterion.py: -------------------------------------------------------------------------------- 1 | #pylint: disable=no-member 2 | 3 | import numpy as _np 4 | 5 | import autodiff as ad 6 | from autodiff import value_and_grad, value, grad 7 | 8 | 9 | class Criterion: 10 | def __init__(self): 11 | pass 12 | 13 | def __call__(self, true, predicted): 14 | value, loss_grad = value_and_grad(self.loss_func, 15 | id(predicted))(feed_dict={ 16 | 'predicted_y': predicted, 17 | 'true_y': true 18 | }) 19 | return _np.average(value), loss_grad 20 | 21 | def loss_func(self): 22 | pass 23 | 24 | 25 | class MSE(Criterion): 26 | def __init__(self): 27 | super().__init__() 28 | 29 | def loss_func(self, feed_dict={}): 30 | diff = ad.subtract(ad.Placeholder(feed_dict['predicted_y']), 31 | ad.Placeholder(feed_dict['true_y'])) 32 | return ad.multiply(ad.power(diff, ad.Constant(2)), ad.Constant(1 / 2)) 33 | 34 | 35 | class CrossEntropy(Criterion): 36 | def __init__(self): 37 | super().__init__() 38 | 39 | def loss_func(self, feed_dict={}): 40 | batch_size = feed_dict["predicted_y"].shape[0] 41 | return ad.add( 42 | ad.negative( 43 | ad.__getitem__( 44 | ad.Placeholder(feed_dict["predicted_y"]), 45 | ad.Constant( 46 | tuple([range(batch_size), 47 | feed_dict["true_y"].ravel()])))), 48 | ad.log( 49 | ad.sum(ad.exp(ad.Placeholder(feed_dict["predicted_y"])), 50 | axis=1))) 51 | -------------------------------------------------------------------------------- /autodiff/nn/layer.py: -------------------------------------------------------------------------------- 1 | #pylint: disable=no-member 2 | import numpy as _np 3 | import networkx as nx 4 | 5 | import autodiff as ad 6 | 7 | 8 | class Module: 9 | def __init__(self): 10 | self._graph = nx.DiGraph() 11 | self._parameters = [] 12 | 13 | def __call__(self, *args): 14 | result = ad.forward_prop(self.forward, provided_graph=self._graph)(*args) 15 | # only set parameters at the first forward pass 16 | if not self._parameters: 17 | ad.set_parameters(self._parameters, self._graph) 18 | ad.set_forwarded(self._graph) 19 | 20 | return result 21 | 22 | def forward(self, *args): 23 | return 24 | 25 | def zero_grad(self): 26 | ad.zero_grad(self._graph) 27 | 28 | def backward(self, upstream): 29 | return ad.backward_prop(upstream) 30 | 31 | def parameters(self): 32 | return self._parameters 33 | 34 | 35 | class Layer: 36 | def __init__(self): 37 | pass 38 | 39 | def __call__(self): 40 | pass 41 | 42 | 43 | class Linear(Layer): 44 | def __init__(self, in_features, out_features): 45 | super().__init__() 46 | self.weight = _np.random.random((in_features, out_features)) 47 | 48 | def __call__(self, features, bridge=True): 49 | features = features if bridge else ad.Placeholder(features) 50 | return ad.dot(features, ad.Variable(self.weight)) 51 | -------------------------------------------------------------------------------- /autodiff/nn/optimizer.py: -------------------------------------------------------------------------------- 1 | import numpy as _np 2 | 3 | import autodiff as ad 4 | from autodiff import value_and_grad, value, grad 5 | 6 | 7 | class Optimizer: 8 | def __init__(self, lr, parameters): 9 | self.lr = lr 10 | self.parameters = parameters 11 | 12 | def step(self): 13 | pass 14 | 15 | 16 | class GradientDescent(Optimizer): 17 | def __init__(self, lr, parameter): 18 | super().__init__(lr, parameter) 19 | 20 | def step(self, model_grad): 21 | for i, para_info in reversed(list(enumerate(self.parameters))): 22 | self.parameters[i]["variables"] -= self.lr * model_grad[para_info["array_id"]] 23 | -------------------------------------------------------------------------------- /autodiff/tests/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/titaneric/AutoDiff-from-scratch/993d70ccb83122889a04f93d3e884f93f7e1fb84/autodiff/tests/__init__.py -------------------------------------------------------------------------------- /autodiff/tests/autodiff_test.py: -------------------------------------------------------------------------------- 1 | #pylint: disable=no-member 2 | import unittest 3 | 4 | from absl.testing import parameterized 5 | 6 | import autodiff as ad 7 | from autodiff.autodiff.core import primitive_vhp 8 | import autodiff.utils.test_utils as adu 9 | 10 | 11 | class TestJVPMethods(adu.AutoDiffTestCase): 12 | @parameterized.named_parameters([{ 13 | "testcase_name": 14 | adu.format_test_name(record.name, shape), 15 | "ad_func": 16 | record.ad_func, 17 | "np_func": 18 | record.np_func, 19 | "rng": 20 | record.rng, 21 | "shape": 22 | shape, 23 | "nargs": 24 | record.nargs 25 | } for shape in adu.tested_shapes for record in adu.OP_RECORDS]) 26 | def test_op(self, ad_func, np_func, rng, shape, nargs): 27 | args = [rng(shape) for _ in range(nargs)] 28 | adu.check_vjp(np_func, ad_func, args) 29 | 30 | 31 | if __name__ == "__main__": 32 | unittest.main() -------------------------------------------------------------------------------- /autodiff/tests/numpy_test.py: -------------------------------------------------------------------------------- 1 | #pylint: disable=no-member 2 | import unittest 3 | import functools 4 | import itertools 5 | from collections import namedtuple 6 | 7 | import numpy as np 8 | import numpy.random as npr 9 | from absl.testing import parameterized 10 | 11 | import autodiff as ad 12 | from autodiff.autodiff.core import primitive_vhp 13 | import autodiff.utils.test_utils as adu 14 | 15 | 16 | class TestNumpyMethods(adu.AutoDiffTestCase): 17 | @parameterized.named_parameters([{ 18 | "testcase_name": 19 | adu.format_test_name(record.name, shape), 20 | "ad_func": 21 | record.ad_func, 22 | "np_func": 23 | record.np_func, 24 | "rng": 25 | record.rng, 26 | "shape": 27 | shape, 28 | "nargs": 29 | record.nargs 30 | } for shape in adu.tested_shapes for record in adu.OP_RECORDS]) 31 | def test_op(self, ad_func, np_func, rng, shape, nargs): 32 | args = [rng(shape) for _ in range(nargs)] 33 | adu.check_value(np_func, ad_func, args) 34 | 35 | 36 | if __name__ == "__main__": 37 | unittest.main() -------------------------------------------------------------------------------- /autodiff/utils/model_utils.py: -------------------------------------------------------------------------------- 1 | import random 2 | 3 | from tqdm import trange 4 | 5 | class Dataset: 6 | def __init__(self, X, Y): 7 | self.X = X 8 | self.Y = Y 9 | 10 | def __len__(self): 11 | return len(self.X) 12 | 13 | def __getitem__(self, idx): 14 | return self.X[idx], self.Y[idx] 15 | 16 | 17 | class DataLoader: 18 | def __init__(self, dataset: Dataset): 19 | self.dataset = dataset 20 | 21 | def next_batch(self, num): 22 | candidate = list(range(len(self.dataset))) 23 | sample_index = random.sample(candidate, num) 24 | return self.dataset[sample_index] 25 | 26 | 27 | def train_procedure(epoch): 28 | for i in trange(epoch): 29 | yield i -------------------------------------------------------------------------------- /autodiff/utils/test_utils.py: -------------------------------------------------------------------------------- 1 | #pylint: disable=no-member 2 | import unittest 3 | from functools import partial 4 | from collections import namedtuple 5 | 6 | from absl.testing import parameterized 7 | import numpy as np 8 | import numpy.random as npr 9 | 10 | import autodiff as ad 11 | from autodiff import value_and_grad, value, grad 12 | """ 13 | This numerical VJP checking is greatly referenced from google/jax. 14 | """ 15 | """ 16 | For random generator 17 | """ 18 | 19 | 20 | def _rand_type(rand, shape, dtype=np.float64, scale=1., post=lambda x: x): 21 | r = lambda: np.asarray(scale * rand(*shape), dtype=dtype) 22 | vals = r() 23 | return post(vals) 24 | 25 | 26 | def rand_default(): 27 | rand = npr.RandomState(0).randn 28 | return partial(_rand_type, rand, scale=3) 29 | 30 | 31 | def rand_positive(): 32 | post = lambda x: np.where(x < 0, -x, x) 33 | rand = npr.RandomState(0).randn 34 | return partial(_rand_type, rand, scale=2, post=post) 35 | 36 | 37 | def rand_small(): 38 | rand = npr.RandomState(0).randn 39 | return partial(_rand_type, rand, scale=1e-3) 40 | 41 | 42 | def rand_not_small(): 43 | post = lambda x: x + np.where(x > 0, 10.0, -10.0) 44 | rand = npr.RandomState(0).randn 45 | return partial(_rand_type, rand, scale=3., post=post) 46 | 47 | 48 | """ 49 | Format string 50 | """ 51 | 52 | 53 | def format_test_name(func, shape): 54 | return f"{func}_{shape}" 55 | 56 | 57 | """ 58 | For grad check 59 | """ 60 | epsilon = 1e-4 61 | 62 | add = lambda wrt, argument: [ 63 | np.add(arg, epsilon / 2) if i == wrt else arg 64 | for i, arg in enumerate(argument) 65 | ] 66 | sub = lambda wrt, argument: [ 67 | np.subtract(arg, epsilon / 2) if i == wrt else arg 68 | for i, arg in enumerate(argument) 69 | ] 70 | 71 | 72 | def numerical_jvp(func, arguments, wrt): 73 | func_pos = func(*add(wrt, arguments)) 74 | func_neg = func(*sub(wrt, arguments)) 75 | return (func_pos - func_neg) / epsilon 76 | 77 | 78 | default_tol = 1e-2 79 | 80 | 81 | def check_vjp(func, func_vjp, args): 82 | for i in range(len(args)): 83 | out = grad(func_vjp, wrt=id(args[i]))(*args) 84 | expected = numerical_jvp(func, args, i) 85 | np.testing.assert_allclose(out, 86 | expected, 87 | atol=default_tol, 88 | rtol=default_tol) 89 | 90 | 91 | """ 92 | For value check 93 | """ 94 | 95 | 96 | def check_value(func, func_value, args): 97 | out = value(func_value)(*args) 98 | expected = func_value(*args) 99 | np.testing.assert_allclose(out, expected) 100 | 101 | 102 | def func_helper(func): 103 | def wrapped(*args, **kwargs): 104 | arg_list = (ad.Variable(arg) for arg in args) 105 | return ad.__dict__[func](*arg_list) 106 | 107 | return wrapped 108 | 109 | 110 | class AutoDiffTestCase(parameterized.TestCase): 111 | def assert_all_close(self): 112 | pass 113 | 114 | """ 115 | Test cases 116 | """ 117 | 118 | tested_shapes = [(), (3, ), (3, 2)] 119 | record = namedtuple("TestRecord", 120 | ["name", "np_func", "ad_func", "nargs", "rng"]) 121 | 122 | 123 | def record_factory(name, nargs, rng): 124 | np_func = np.__dict__[name] 125 | ad_func = func_helper(name) 126 | return record(name, np_func, ad_func, nargs, rng) 127 | 128 | 129 | OP_RECORDS = [ 130 | record_factory("negative", 1, rand_default()), 131 | record_factory("reciprocal", 1, rand_positive()), 132 | record_factory("exp", 1, rand_small()), 133 | record_factory("log", 1, rand_positive()), 134 | record_factory("sin", 1, rand_default()), 135 | record_factory("cos", 1, rand_default()), 136 | record_factory("add", 2, rand_default()), 137 | record_factory("subtract", 2, rand_default()), 138 | record_factory("multiply", 2, rand_default()), 139 | record_factory("true_divide", 2, rand_not_small()), 140 | record_factory("maximum", 2, rand_default()), 141 | record_factory("minimum", 2, rand_default()), 142 | record_factory("power", 2, rand_positive()), 143 | ] 144 | -------------------------------------------------------------------------------- /examples/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/titaneric/AutoDiff-from-scratch/993d70ccb83122889a04f93d3e884f93f7e1fb84/examples/__init__.py -------------------------------------------------------------------------------- /examples/classification/Digits.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/titaneric/AutoDiff-from-scratch/993d70ccb83122889a04f93d3e884f93f7e1fb84/examples/classification/Digits.png -------------------------------------------------------------------------------- /examples/classification/Iris.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/titaneric/AutoDiff-from-scratch/993d70ccb83122889a04f93d3e884f93f7e1fb84/examples/classification/Iris.png -------------------------------------------------------------------------------- /examples/classification/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/titaneric/AutoDiff-from-scratch/993d70ccb83122889a04f93d3e884f93f7e1fb84/examples/classification/__init__.py -------------------------------------------------------------------------------- /examples/classification/digits.py: -------------------------------------------------------------------------------- 1 | #pylint: disable=no-member 2 | import warnings 3 | 4 | from sklearn.datasets import load_digits 5 | import matplotlib.pyplot as plt 6 | import matplotlib.cbook 7 | 8 | import autodiff as ad 9 | from autodiff.utils.model_utils import Dataset, DataLoader, train_procedure 10 | from autodiff.nn.optimizer import GradientDescent 11 | from autodiff.nn.criterion import CrossEntropy 12 | from autodiff.nn.layer import Module, Linear 13 | 14 | warnings.filterwarnings("ignore", category=matplotlib.cbook.mplDeprecation) 15 | warnings.filterwarnings("ignore", category=RuntimeWarning) 16 | 17 | 18 | class SimpleModel(Module): 19 | def __init__(self, num_features, num_classes): 20 | super().__init__() 21 | self.linear1 = Linear(num_features, 40) 22 | self.linear2 = Linear(40, num_classes) 23 | 24 | def forward(self, x): 25 | x = self.linear1(x, bridge=False) 26 | x = self.linear2(x) 27 | return x 28 | 29 | 30 | X, Y = load_digits(return_X_y=True) 31 | X = X / 16 32 | Y = Y.reshape(-1, 1) 33 | data_size, num_features = X.shape 34 | num_classes = 10 35 | 36 | model = SimpleModel(num_features, num_classes) 37 | dataset = Dataset(X, Y) 38 | dataloader = DataLoader(dataset) 39 | 40 | lr = 1e-5 41 | opt = GradientDescent(lr, model.parameters()) 42 | loss_func = CrossEntropy() 43 | epoch = 300 44 | batch_size = 32 45 | loss_list = [] 46 | for _ in train_procedure(epoch): 47 | x, y = dataloader.next_batch(batch_size) 48 | predicted_y = model(x) 49 | v, loss_grad = loss_func(y, predicted_y) 50 | model_grad = model.backward(loss_grad) 51 | opt.step(model_grad) 52 | model.zero_grad() 53 | loss_list.append(v) 54 | 55 | plt.plot(range(epoch), loss_list) 56 | plt.savefig('Digits.png') 57 | plt.show() -------------------------------------------------------------------------------- /examples/classification/iris.py: -------------------------------------------------------------------------------- 1 | #pylint: disable=no-member 2 | import warnings 3 | 4 | from sklearn.datasets import load_iris 5 | import matplotlib.pyplot as plt 6 | import matplotlib.cbook 7 | 8 | import autodiff as ad 9 | from autodiff.utils.model_utils import Dataset, DataLoader, train_procedure 10 | from autodiff.nn.optimizer import GradientDescent 11 | from autodiff.nn.criterion import CrossEntropy 12 | from autodiff.nn.layer import Module, Linear 13 | 14 | warnings.filterwarnings("ignore", category=matplotlib.cbook.mplDeprecation) 15 | warnings.filterwarnings("ignore", category=RuntimeWarning) 16 | 17 | 18 | class SimpleModel(Module): 19 | def __init__(self, num_features, num_classes): 20 | super().__init__() 21 | self.linear1 = Linear(num_features, 5) 22 | self.linear2 = Linear(5, num_classes) 23 | 24 | def forward(self, x): 25 | x = self.linear1(x, bridge=False) 26 | x = self.linear2(x) 27 | return x 28 | 29 | 30 | X, Y = load_iris(return_X_y=True) 31 | Y = Y.reshape(-1, 1) 32 | data_size, num_features = X.shape 33 | num_classes = 3 34 | 35 | model = SimpleModel(num_features, num_classes) 36 | dataset = Dataset(X, Y) 37 | dataloader = DataLoader(dataset) 38 | 39 | lr = 1e-5 40 | opt = GradientDescent(lr, model.parameters()) 41 | loss_func = CrossEntropy() 42 | epoch = 300 43 | batch_size = 32 44 | loss_list = [] 45 | for i in train_procedure(epoch): 46 | x, y = dataloader.next_batch(batch_size) 47 | predicted_y = model(x) 48 | v, loss_grad = loss_func(y, predicted_y) 49 | model_grad = model.backward(loss_grad) 50 | opt.step(model_grad) 51 | model.zero_grad() 52 | loss_list.append(v) 53 | 54 | plt.plot(range(epoch), loss_list) 55 | plt.savefig('Iris.png') 56 | plt.show() -------------------------------------------------------------------------------- /examples/regression/Boston.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/titaneric/AutoDiff-from-scratch/993d70ccb83122889a04f93d3e884f93f7e1fb84/examples/regression/Boston.png -------------------------------------------------------------------------------- /examples/regression/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/titaneric/AutoDiff-from-scratch/993d70ccb83122889a04f93d3e884f93f7e1fb84/examples/regression/__init__.py -------------------------------------------------------------------------------- /examples/regression/boston.py: -------------------------------------------------------------------------------- 1 | #pylint: disable=no-member 2 | import warnings 3 | 4 | from sklearn.datasets import load_boston 5 | import matplotlib.pyplot as plt 6 | import matplotlib.cbook 7 | 8 | import autodiff as ad 9 | from autodiff.utils.model_utils import Dataset, DataLoader, train_procedure 10 | from autodiff.nn.optimizer import GradientDescent 11 | from autodiff.nn.criterion import MSE 12 | from autodiff.nn.layer import Module, Linear 13 | 14 | warnings.filterwarnings("ignore", category=matplotlib.cbook.mplDeprecation) 15 | warnings.filterwarnings("ignore", category=RuntimeWarning) 16 | 17 | 18 | class SimpleModel(Module): 19 | def __init__(self, num_features): 20 | super().__init__() 21 | self.linear1 = Linear(num_features, 5) 22 | self.linear2 = Linear(5, 1) 23 | 24 | def forward(self, x): 25 | x = self.linear1(x, bridge=False) 26 | x = self.linear2(x) 27 | return x 28 | 29 | 30 | X, Y = load_boston(return_X_y=True) 31 | Y = Y.reshape(-1, 1) 32 | data_size, num_features = X.shape 33 | 34 | model = SimpleModel(num_features) 35 | dataset = Dataset(X, Y) 36 | dataloader = DataLoader(dataset) 37 | 38 | lr = 1e-8 39 | opt = GradientDescent(lr, model.parameters()) 40 | loss_func = MSE() 41 | epoch = 200 42 | batch_size = 16 43 | loss_list = [] 44 | for i in train_procedure(epoch): 45 | x, y = dataloader.next_batch(batch_size) 46 | predicted_y = model(x) 47 | v, loss_grad = loss_func(y, predicted_y) 48 | model_grad = model.backward(loss_grad) 49 | opt.step(model_grad) 50 | model.zero_grad() 51 | loss_list.append(v) 52 | 53 | plt.plot(range(epoch), loss_list) 54 | plt.savefig('Boston.png') 55 | plt.show() -------------------------------------------------------------------------------- /examples/regression/regression.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/titaneric/AutoDiff-from-scratch/993d70ccb83122889a04f93d3e884f93f7e1fb84/examples/regression/regression.png -------------------------------------------------------------------------------- /examples/regression/regression.py: -------------------------------------------------------------------------------- 1 | #pylint: disable=no-member 2 | import warnings 3 | 4 | import matplotlib.pyplot as plt 5 | import matplotlib.cbook 6 | import numpy as _np 7 | 8 | import autodiff as ad 9 | from autodiff.utils.model_utils import Dataset, DataLoader, train_procedure 10 | from autodiff.nn.optimizer import GradientDescent 11 | from autodiff.nn.criterion import MSE 12 | from autodiff.nn.layer import Module, Linear 13 | 14 | warnings.filterwarnings("ignore", category=matplotlib.cbook.mplDeprecation) 15 | warnings.filterwarnings("ignore", category=RuntimeWarning) 16 | 17 | 18 | class SimpleModel(Module): 19 | def __init__(self, num_features): 20 | super().__init__() 21 | self.linear = Linear(num_features, 1) 22 | 23 | def forward(self, x): 24 | return self.linear(x, bridge=False) 25 | 26 | data_size = 300 27 | X = 2 * _np.random.rand(data_size, 1) 28 | aug_X = _np.c_[_np.ones(data_size), X] 29 | 30 | # True parameters 31 | theta = _np.array([[5], [10]]) 32 | Y = aug_X @ theta + _np.random.randn(data_size, 1) 33 | print("training shape", X.shape, Y.shape) 34 | 35 | model = SimpleModel(2) 36 | dataset = Dataset(aug_X, Y) 37 | dataloader = DataLoader(dataset) 38 | 39 | lr = 0.001 40 | opt = GradientDescent(lr, model.parameters()) 41 | loss_func = MSE() 42 | epoch = 300 43 | for i in train_procedure(epoch): 44 | x, y = dataloader.next_batch(10) 45 | predicted_y = model(x) 46 | v, loss_grad = loss_func(y, predicted_y) 47 | model_grad = model.backward(loss_grad) 48 | opt.step(model_grad) 49 | model.zero_grad() 50 | 51 | predict = model(aug_X) 52 | 53 | plt.scatter( 54 | X, 55 | Y, 56 | ) 57 | plt.plot(X, predict, 'r') 58 | # plt.axis('off') 59 | plt.savefig('regression.png') 60 | plt.show() 61 | -------------------------------------------------------------------------------- /examples/simple/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/titaneric/AutoDiff-from-scratch/993d70ccb83122889a04f93d3e884f93f7e1fb84/examples/simple/__init__.py -------------------------------------------------------------------------------- /examples/simple/poly.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/titaneric/AutoDiff-from-scratch/993d70ccb83122889a04f93d3e884f93f7e1fb84/examples/simple/poly.png -------------------------------------------------------------------------------- /examples/simple/poly_test.py: -------------------------------------------------------------------------------- 1 | #pylint: disable=no-member 2 | import warnings 3 | 4 | import networkx as nx 5 | import matplotlib.pyplot as plt 6 | import matplotlib.cbook 7 | 8 | import autodiff as ad 9 | from autodiff import value_and_grad, value, grad 10 | 11 | warnings.filterwarnings("ignore", category=matplotlib.cbook.mplDeprecation) 12 | warnings.filterwarnings("ignore", category=RuntimeWarning) 13 | 14 | 15 | def tanh(x): 16 | return ad.divide( 17 | ad.subtract(ad.Constant(1), ad.exp(ad.negative(ad.Variable(x)))), 18 | ad.add(ad.Constant(1), ad.exp(ad.negative(ad.Variable(x))))) 19 | 20 | 21 | def test(x, y, z, w): 22 | return ad.multiply( 23 | ad.add(ad.multiply(ad.Variable(x), ad.Variable(y)), 24 | ad.maximum(ad.Variable(z), ad.Variable(w))), ad.Constant(2)) 25 | 26 | 27 | def test2(x, y, z): 28 | return ad.multiply(ad.add(ad.Variable(x), ad.Variable(y)), 29 | ad.maximum(ad.Variable(y), ad.Variable(z))) 30 | 31 | 32 | def test3(x): 33 | return ad.power(ad.Variable(x), ad.Constant(3)) 34 | 35 | 36 | def test4(x): 37 | return ad.multiply(ad.power(ad.Variable(x), ad.Constant(2)), 38 | ad.Constant(1 / 2)) 39 | 40 | 41 | def power_demo(): 42 | x_list = ad.linspace(-7, 7, 200) 43 | y_list, dy_list = value_and_grad(test3, id(x_list))(x=x_list) 44 | 45 | plt.plot(x_list, y_list, x_list, dy_list) 46 | # plt.axis('off') 47 | plt.savefig('poly.png') 48 | plt.show() 49 | 50 | 51 | if __name__ == "__main__": 52 | power_demo() -------------------------------------------------------------------------------- /img/actions.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/titaneric/AutoDiff-from-scratch/993d70ccb83122889a04f93d3e884f93f7e1fb84/img/actions.png -------------------------------------------------------------------------------- /img/backward.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/titaneric/AutoDiff-from-scratch/993d70ccb83122889a04f93d3e884f93f7e1fb84/img/backward.png -------------------------------------------------------------------------------- /img/forward.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/titaneric/AutoDiff-from-scratch/993d70ccb83122889a04f93d3e884f93f7e1fb84/img/forward.png -------------------------------------------------------------------------------- /proposal.md: -------------------------------------------------------------------------------- 1 | # Basic neural network library supported auto-differentiation 2 | 3 | ## Motivation 4 | 5 | Most deep learning courses aim to teach math behind the network, architecture and their applications. 6 | 7 | However, seldom course talk about how to implement and design the deep learning library and start everything from scratch. 8 | 9 | Wish to implement this kind of library and learn how and why the priors (TensorFlow and PyTorch etc.) design their work during the development of final project. 10 | 11 | ## Target 12 | 13 | Based on [Autograd](https://github.com/HIPS/autograd) project, build a similar library that user simply define the function, and this lib can automatically calculate this differentiation form of given function. 14 | 15 | Build the computational graph when function is called, calculate the backward propogation with respect to variable or placeholder (in tensorflow term). 16 | 17 | Provide a benchmark compared to TensorFlow and PyTorch. 18 | 19 | If time allowed, provide a simple multi-layer perceptron (neural network) interface, criterion, optimizer, datasets and dataloader like the priors. 20 | 21 | ## Project Link 22 | 23 | [AutoDiff-from-scratch](https://github.com/titaneric/AutoDiff-from-scratch) 24 | 25 | ## TODO 26 | 27 | - benchmark 28 | - documents 29 | 30 | ## Reference 31 | 32 | ### Source code 33 | 34 | - [Autograd](https://github.com/HIPS/autograd) 35 | - [Autodidact](https://github.com/mattjj/autodidact) 36 | - [PyTorch](https://github.com/pytorch/pytorch) 37 | - [TensorFlow](https://github.com/tensorflow/tensorflow) 38 | - [Caffe](https://github.com/BVLC/caffe) 39 | - [Caffe2](https://github.com/pytorch/pytorch/tree/master/caffe2) 40 | 41 | ### Lecture 42 | 43 | - [Backpropagation, Toronto CSC321](http://www.cs.toronto.edu/~rgrosse/courses/csc321_2018/slides/lec06.pdf) 44 | - [Automatic Differentiation, Toronto CSC321](http://www.cs.toronto.edu/~rgrosse/courses/csc321_2018/slides/lec10.pdf) 45 | - [Backpropagation, Stanford CS224N](https://www.youtube.com/watch?v=yLYHDSv-288&list=PLoROMvodv4rOhcuXMZkNm7j3fVwBBY42z&index=5&t=2177s) 46 | - [Introduction to Neural Networks, Stanford CS231N](https://www.youtube.com/watch?v=d14TUNcbn1k&list=PL3FW7Lu3i5JvHM8ljYj-zLfQRF3EO8sYv&index=4) 47 | - [Backpropagation: Find Partial Derivatives, MIT 18.065](https://www.youtube.com/watch?v=lZrIPRnoGQQ&list=PLUl4u3cNGP63oMNUHXqIUcrkS2PivhN3k&index=30&t=0s) 48 | 49 | ### Documents 50 | 51 | - [Jax official](https://jax.readthedocs.io/en/latest/index.html) 52 | - [Phd Thesis by Dougal Maclaurin (one of Autograd author)](https://dougalmaclaurin.com/phd-thesis.pdf) 53 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | networkx==2.3 2 | numpy==1.17.2 3 | absl-py==0.8.1 4 | --------------------------------------------------------------------------------