├── Neural Ordinary Differential Networks.ipynb ├── README.md └── requirements.txt /Neural Ordinary Differential Networks.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Install requirements" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": null, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "!pip install -r requirements.txt" 17 | ] 18 | }, 19 | { 20 | "cell_type": "markdown", 21 | "metadata": {}, 22 | "source": [ 23 | "# Intro to ODEs" 24 | ] 25 | }, 26 | { 27 | "cell_type": "markdown", 28 | "metadata": {}, 29 | "source": [ 30 | "## Rabbit population\n", 31 | "Imagine that some rabbits make their way onto an island that doesn't have any predators. We intially have N rabbits and after a month they make K more. After another month, those N+K rabbits make L rabbits and we observe that $\\frac{N+k}{N} = \\frac{L}{N+K}$, that is, the number of new-born rabbits is proportional to the number of rabbits currently on the island. If we denote time with the variable t, we've observed the following relationship\n", 32 | "$$\n", 33 | "\\frac{\\partial N(t)}{\\partial t} = k N(t),\n", 34 | "$$\n", 35 | "that is, the rate of change of the population is proportional to the population. You may recognise this as the continuous version of the gemoetric progression $x_n = q x_{n-1}$. This equation is simple enough such that we can solve it (see https://www.mathsisfun.com/calculus/differential-equations.html) and obtain an explicit representation of $N(t)=Ce^{kt}$ for some value $C$. Most commonly, though, it is either very difficult or impossible to find an explicit solution for an equation of this kind, for example it is unclear how to solve (if it is possible at all)\n", 36 | "$$\n", 37 | "\\left(\\frac{\\partial N(t)}{\\partial t}\\right)^3 + y^2 = N(t) y\n", 38 | "$$\n", 39 | "without using some advanced methods. Even if we do not knowing the exact representation, we can still do interesting things with these equations. For example, we can re-arrange and start from some initial value N(0) and (approximately) simulate how these change in time by iteratively applying the below equation for some small time difference $\\Delta t$\n", 40 | "$$\n", 41 | "N(t + \\Delta t) = N(t) + \\sqrt[3]{N(t) y - y^2}\\Delta t.\n", 42 | "$$\n", 43 | "This is known as the Euler method (https://en.wikipedia.org/wiki/Euler_method) and while it doesn't give great results due to the accumulation of errors, it shows how we can avoid requiring an explicit representation of N(t)." 44 | ] 45 | }, 46 | { 47 | "cell_type": "markdown", 48 | "metadata": {}, 49 | "source": [ 50 | "# Making the ODE \"Neural\"\n", 51 | "\n", 52 | "Looking at the previous section, we are inspired to ask ourselves the question \"what happens if we tried to model the derivative (with respect to time $t$) of the function $z(x)$ taking our inputs $x$ into our outputs $y$ with a neural network?\". That is, we imagine that our function $z$ is some continuous transformation that starts at time $t=0$ at $x$ and arrives at $y$ at time $t=1$ and are interested in how it changes as we vary $t$ from 0 to 1. If we're fitting to data anyway, we'll learn some very complex and inscrutable function, so does it provide any advantages over trying to fit the function $z$ itself? The answer, as you may expect, is yes and we will spend the rest of this tutorial looking at various ways in which this is hepful. " 53 | ] 54 | }, 55 | { 56 | "cell_type": "markdown", 57 | "metadata": {}, 58 | "source": [ 59 | "Firstly, though, let's briefly talk about exactly how we can learn the parameters $\\theta$ of our network $f_{\\theta}$ under this new setting. We will still employ gradient-based optimisation, which means that we need to find the quantity\n", 60 | "$$\n", 61 | "\\frac{\\partial L(z(1), y)}{\\partial \\theta}\n", 62 | "$$\n", 63 | "where $L$ is the loss function (e.g. least squares), and $z(t)$ is the aforementioned continuous process, with $z(0) = x$ and $z(1) = \\hat{y}$, that is, our prediction. Now, we know that $z(T) = z(0) + \\int_0^T f_{\\theta}(z(t), t) dt$, for some $0 <= T <= 1$, this is exactly us using our learnt derivative to find the value at time $t=T$ and is the analogue of running our \"network\" $G$ forward. Notice how we can set $T$ to be any real value, this is why we interpret Neural ODEs as having infinitely many hidden layers. As you may guess at this point, in order to fit our weights, we will need to do the equivalent of back-propagation through these infinite layers as well. This is where the concept of the adjoint state $a_z(t) = \\frac{\\partial L}{\\partial z(t)}$ comes in - this is similar to the error signal $\\delta$ in the normal neural network case. From here on out, with a bit of maths, we find the derivative of this adjoint state\n", 64 | "$$\n", 65 | "\\frac{\\partial a_z(t)}{\\partial t} = -a_z(t)\\frac{\\partial f_{\\theta}(z(t),t)}{\\partial z(t)}.\n", 66 | "$$\n", 67 | "Just like having the derivative of $z(t)$ allowed us to calculate $z(T)$ for any $T$, we can now calculate $a_z(T)$ as well. Note that this computation is \"backwards in time\" - we start from the known quantity $a(1)$ and go back towards $a(T)$. Finally, by similar argument to the above, we can define other adjoints $a_{\\theta}(t)$ and $a_t(t)$ to find each of $\\frac{\\partial L}{\\partial \\theta}$ and $\\frac{\\partial L}{\\partial t}$. Unsuprisingly, we get\n", 68 | "$$\n", 69 | "\\frac{\\partial a_{\\theta}}{\\partial t} = -a_z(t)\\frac{\\partial f_{\\theta}(z(t),t)}{\\partial \\theta}, \\\\\n", 70 | "\\frac{\\partial a_t}{\\partial t} = -a_z(t)\\frac{\\partial f_{\\theta}(z(t),t)}{\\partial t},\n", 71 | "$$\n", 72 | "where again, the first line is reminiscent to how we compute the gradient of $\\theta$ given the error signal $\\delta$ and the current hidden state $h_t = f_{\\theta}(z(t), t)$, and the last line follows the functional form of the other two. One final note is that we know $\\frac{\\partial L}{\\partial t}$ at time $t=1$ exactly (it is $a_z(1)f_{\\theta}(z(1), 1)$).\n", 73 | "With the gradients of $L$ with respect to its input parameters known, we can now minimise the function given some data.\n", 74 | "\n", 75 | "More detail on the maths can be found here: https://ml.berkeley.edu/blog/posts/neural-odes/#training-odenets" 76 | ] 77 | }, 78 | { 79 | "cell_type": "markdown", 80 | "metadata": {}, 81 | "source": [ 82 | "# Implementation\n", 83 | "\n", 84 | "We use PyTorch to define the ODENet. We will go over the implementation from https://github.com/msurtsukov/neural-ode as it is slightly more brief than the one in the original paper. First some boilerplate code" 85 | ] 86 | }, 87 | { 88 | "cell_type": "code", 89 | "execution_count": 1, 90 | "metadata": {}, 91 | "outputs": [], 92 | "source": [ 93 | "import math\n", 94 | "import numpy as np\n", 95 | "from IPython.display import clear_output\n", 96 | "from tqdm import tqdm_notebook as tqdm\n", 97 | "\n", 98 | "import matplotlib as mpl\n", 99 | "import matplotlib.pyplot as plt\n", 100 | "%matplotlib inline\n", 101 | "import seaborn as sns\n", 102 | "sns.color_palette(\"bright\")\n", 103 | "import matplotlib as mpl\n", 104 | "import matplotlib.cm as cm\n", 105 | "\n", 106 | "import torch\n", 107 | "from torch import Tensor\n", 108 | "from torch import nn\n", 109 | "from torch.nn import functional as F \n", 110 | "from torch.autograd import Variable\n", 111 | "\n", 112 | "use_cuda = torch.cuda.is_available()\n", 113 | "\n", 114 | "def ode_solve(z0, t0, t1, f):\n", 115 | " \"\"\"\n", 116 | " Simplest Euler ODE initial value solver\n", 117 | " \"\"\"\n", 118 | " h_max = 0.05\n", 119 | " n_steps = math.ceil((abs(t1 - t0)/h_max).max().item())\n", 120 | "\n", 121 | " h = (t1 - t0)/n_steps\n", 122 | " t = t0\n", 123 | " z = z0\n", 124 | "\n", 125 | " for i_step in range(n_steps):\n", 126 | " z = z + h * f(z, t)\n", 127 | " t = t + h\n", 128 | " return z" 129 | ] 130 | }, 131 | { 132 | "cell_type": "markdown", 133 | "metadata": {}, 134 | "source": [ 135 | "We will use the following trick several times from here on. If we want to solve several ODEs (in our case one for $a_z, a_{\\theta}, a_t$ each) at the same time, we can concatenate the states of each separate ODE into a single augmented state (let's call that $a_{aug}$), and taking into account the Jacobian matrix, we can find $\\frac{\\partial a_{aug}(t)}{\\partial t}$. This allows us to run an ODE solver on the augmented state and solve for the three variables at the same time. We define a function that performs the computation of the forward pass and the adjoint derivatives first" 136 | ] 137 | }, 138 | { 139 | "cell_type": "code", 140 | "execution_count": 2, 141 | "metadata": {}, 142 | "outputs": [], 143 | "source": [ 144 | "class ODEF(nn.Module):\n", 145 | " def forward_with_grad(self, z, t, grad_outputs):\n", 146 | " \"\"\"Compute f and a df/dz, a df/dp, a df/dt\"\"\"\n", 147 | " batch_size = z.shape[0]\n", 148 | "\n", 149 | " out = self.forward(z, t)\n", 150 | "\n", 151 | " # a_z in the description\n", 152 | " a = grad_outputs\n", 153 | " # Computes a_z [df/dz, df/dt, df/theta] using the augmented adjoint state [a_z, a_t, a_theta]\n", 154 | " adfdz, adfdt, *adfdp = torch.autograd.grad(\n", 155 | " (out,), (z, t) + tuple(self.parameters()), grad_outputs=(a),\n", 156 | " allow_unused=True, retain_graph=True\n", 157 | " )\n", 158 | " # grad method automatically sums gradients for batch items, we have to expand them back \n", 159 | " if adfdp is not None:\n", 160 | " adfdp = torch.cat([p_grad.flatten() for p_grad in adfdp]).unsqueeze(0)\n", 161 | " adfdp = adfdp.expand(batch_size, -1) / batch_size\n", 162 | " if adfdt is not None:\n", 163 | " adfdt = adfdt.expand(batch_size, 1) / batch_size\n", 164 | " return out, adfdz, adfdt, adfdp\n", 165 | "\n", 166 | " def flatten_parameters(self):\n", 167 | " p_shapes = []\n", 168 | " flat_parameters = []\n", 169 | " for p in self.parameters():\n", 170 | " p_shapes.append(p.size())\n", 171 | " flat_parameters.append(p.flatten())\n", 172 | " return torch.cat(flat_parameters)" 173 | ] 174 | }, 175 | { 176 | "cell_type": "markdown", 177 | "metadata": {}, 178 | "source": [ 179 | "Next, we define a function that allows us to repeat the process described above for a series of times $[t_0, t_1, ..., t_N]$. This will come in useful in the next section, where we do sequence modelling." 180 | ] 181 | }, 182 | { 183 | "cell_type": "code", 184 | "execution_count": 3, 185 | "metadata": {}, 186 | "outputs": [], 187 | "source": [ 188 | "class ODEAdjoint(torch.autograd.Function):\n", 189 | " @staticmethod\n", 190 | " def forward(ctx, z0, t, flat_parameters, func):\n", 191 | " assert isinstance(func, ODEF)\n", 192 | " bs, *z_shape = z0.size()\n", 193 | " time_len = t.size(0)\n", 194 | "\n", 195 | " with torch.no_grad():\n", 196 | " z = torch.zeros(time_len, bs, *z_shape).to(z0)\n", 197 | " z[0] = z0\n", 198 | " for i_t in range(time_len - 1):\n", 199 | " z0 = ode_solve(z0, t[i_t], t[i_t+1], func)\n", 200 | " z[i_t+1] = z0\n", 201 | "\n", 202 | " ctx.func = func\n", 203 | " ctx.save_for_backward(t, z.clone(), flat_parameters)\n", 204 | " return z\n", 205 | "\n", 206 | " @staticmethod\n", 207 | " def backward(ctx, dLdz):\n", 208 | " \"\"\"\n", 209 | " dLdz shape: time_len, batch_size, *z_shape\n", 210 | " \"\"\"\n", 211 | " func = ctx.func\n", 212 | " t, z, flat_parameters = ctx.saved_tensors\n", 213 | " time_len, bs, *z_shape = z.size()\n", 214 | " n_dim = np.prod(z_shape)\n", 215 | " n_params = flat_parameters.size(0)\n", 216 | "\n", 217 | " # Dynamics of augmented system to be calculated backwards in time\n", 218 | " def augmented_dynamics(aug_z_i, t_i):\n", 219 | " \"\"\"\n", 220 | " tensors here are temporal slices\n", 221 | " t_i - is tensor with size: bs, 1\n", 222 | " aug_z_i - is tensor with size: bs, n_dim*2 + n_params + 1\n", 223 | " \"\"\"\n", 224 | " z_i, a = aug_z_i[:, :n_dim], aug_z_i[:, n_dim:2*n_dim] # ignore parameters and time\n", 225 | "\n", 226 | " # Unflatten z and a\n", 227 | " z_i = z_i.view(bs, *z_shape)\n", 228 | " a = a.view(bs, *z_shape)\n", 229 | " with torch.set_grad_enabled(True):\n", 230 | " t_i = t_i.detach().requires_grad_(True)\n", 231 | " z_i = z_i.detach().requires_grad_(True)\n", 232 | " func_eval, adfdz, adfdt, adfdp = func.forward_with_grad(z_i, t_i, grad_outputs=a) # bs, *z_shape\n", 233 | " adfdz = adfdz.to(z_i) if adfdz is not None else torch.zeros(bs, *z_shape).to(z_i)\n", 234 | " adfdp = adfdp.to(z_i) if adfdp is not None else torch.zeros(bs, n_params).to(z_i)\n", 235 | " adfdt = adfdt.to(z_i) if adfdt is not None else torch.zeros(bs, 1).to(z_i)\n", 236 | "\n", 237 | " # Flatten f and adfdz\n", 238 | " func_eval = func_eval.view(bs, n_dim)\n", 239 | " adfdz = adfdz.view(bs, n_dim) \n", 240 | " return torch.cat((func_eval, -adfdz, -adfdp, -adfdt), dim=1)\n", 241 | "\n", 242 | " dLdz = dLdz.view(time_len, bs, n_dim) # flatten dLdz for convenience\n", 243 | " with torch.no_grad():\n", 244 | " ## Create placeholders for output gradients\n", 245 | " # Prev computed backwards adjoints to be adjusted by direct gradients\n", 246 | " adj_z = torch.zeros(bs, n_dim).to(dLdz)\n", 247 | " adj_p = torch.zeros(bs, n_params).to(dLdz)\n", 248 | " # In contrast to z and p we need to return gradients for all times\n", 249 | " adj_t = torch.zeros(time_len, bs, 1).to(dLdz)\n", 250 | "\n", 251 | " for i_t in range(time_len-1, 0, -1):\n", 252 | " z_i = z[i_t]\n", 253 | " t_i = t[i_t]\n", 254 | " f_i = func(z_i, t_i).view(bs, n_dim)\n", 255 | "\n", 256 | " # Compute direct gradients\n", 257 | " dLdz_i = dLdz[i_t]\n", 258 | " dLdt_i = torch.bmm(torch.transpose(dLdz_i.unsqueeze(-1), 1, 2), f_i.unsqueeze(-1))[:, 0]\n", 259 | "\n", 260 | " # Adjusting adjoints with direct gradients\n", 261 | " adj_z += dLdz_i\n", 262 | " adj_t[i_t] = adj_t[i_t] - dLdt_i\n", 263 | "\n", 264 | " # Pack augmented variable\n", 265 | " aug_z = torch.cat((z_i.view(bs, n_dim), adj_z, torch.zeros(bs, n_params).to(z), adj_t[i_t]), dim=-1)\n", 266 | "\n", 267 | " # Solve augmented system backwards\n", 268 | " aug_ans = ode_solve(aug_z, t_i, t[i_t-1], augmented_dynamics)\n", 269 | "\n", 270 | " # Unpack solved backwards augmented system\n", 271 | " adj_z[:] = aug_ans[:, n_dim:2*n_dim]\n", 272 | " adj_p[:] += aug_ans[:, 2*n_dim:2*n_dim + n_params]\n", 273 | " adj_t[i_t-1] = aug_ans[:, 2*n_dim + n_params:]\n", 274 | "\n", 275 | " del aug_z, aug_ans\n", 276 | "\n", 277 | " ## Adjust 0 time adjoint with direct gradients\n", 278 | " # Compute direct gradients \n", 279 | " dLdz_0 = dLdz[0]\n", 280 | " dLdt_0 = torch.bmm(torch.transpose(dLdz_0.unsqueeze(-1), 1, 2), f_i.unsqueeze(-1))[:, 0]\n", 281 | "\n", 282 | " # Adjust adjoints\n", 283 | " adj_z += dLdz_0\n", 284 | " adj_t[0] = adj_t[0] - dLdt_0\n", 285 | " return adj_z.view(bs, *z_shape), adj_t, adj_p, None" 286 | ] 287 | }, 288 | { 289 | "cell_type": "markdown", 290 | "metadata": {}, 291 | "source": [ 292 | "Finally, we define an neural network module wrapper of the function for more convenient use" 293 | ] 294 | }, 295 | { 296 | "cell_type": "code", 297 | "execution_count": 4, 298 | "metadata": {}, 299 | "outputs": [], 300 | "source": [ 301 | "class NeuralODE(nn.Module):\n", 302 | " def __init__(self, func):\n", 303 | " super(NeuralODE, self).__init__()\n", 304 | " assert isinstance(func, ODEF)\n", 305 | " self.func = func\n", 306 | "\n", 307 | " def forward(self, z0, t=Tensor([0., 1.]), return_whole_sequence=False):\n", 308 | " t = t.to(z0)\n", 309 | " z = ODEAdjoint.apply(z0, t, self.func.flatten_parameters(), self.func)\n", 310 | " if return_whole_sequence:\n", 311 | " return z\n", 312 | " else:\n", 313 | " return z[-1]" 314 | ] 315 | }, 316 | { 317 | "cell_type": "markdown", 318 | "metadata": {}, 319 | "source": [ 320 | "# Examples\n", 321 | "\n", 322 | "Let's look at a couple of examples of how we can apply this architecture to problems." 323 | ] 324 | }, 325 | { 326 | "cell_type": "markdown", 327 | "metadata": {}, 328 | "source": [ 329 | "## Continuous-time sequence models\n", 330 | "\n", 331 | "In this section, let's look at the first two examples in https://github.com/msurtsukov/neural-ode. First, we set up some boilerplate code." 332 | ] 333 | }, 334 | { 335 | "cell_type": "code", 336 | "execution_count": 5, 337 | "metadata": {}, 338 | "outputs": [], 339 | "source": [ 340 | "def to_np(x):\n", 341 | " return x.detach().cpu().numpy()\n", 342 | "\n", 343 | "def plot_trajectories(obs=None, times=None, trajs=None, save=None, figsize=(16, 8)):\n", 344 | " plt.figure(figsize=figsize)\n", 345 | " if obs is not None:\n", 346 | " if times is None:\n", 347 | " times = [None] * len(obs)\n", 348 | " for o, t in zip(obs, times):\n", 349 | " o, t = to_np(o), to_np(t)\n", 350 | " for b_i in range(o.shape[1]):\n", 351 | " plt.scatter(o[:, b_i, 0], o[:, b_i, 1], c=t[:, b_i, 0], cmap=cm.plasma)\n", 352 | "\n", 353 | " if trajs is not None: \n", 354 | " for z in trajs:\n", 355 | " z = to_np(z)\n", 356 | " plt.plot(z[:, 0, 0], z[:, 0, 1], lw=1.5)\n", 357 | " if save is not None:\n", 358 | " plt.savefig(save)\n", 359 | " plt.show()\n", 360 | " \n", 361 | "def conduct_experiment(ode_true, ode_trained, n_steps, name, plot_freq=10):\n", 362 | " # Create data\n", 363 | " z0 = Variable(torch.Tensor([[0.6, 0.3]]))\n", 364 | "\n", 365 | " t_max = 6.29*5\n", 366 | " n_points = 200\n", 367 | "\n", 368 | " index_np = np.arange(0, n_points, 1, dtype=np.int)\n", 369 | " index_np = np.hstack([index_np[:, None]])\n", 370 | " times_np = np.linspace(0, t_max, num=n_points)\n", 371 | " times_np = np.hstack([times_np[:, None]])\n", 372 | "\n", 373 | " times = torch.from_numpy(times_np[:, :, None]).to(z0)\n", 374 | " obs = ode_true(z0, times, return_whole_sequence=True).detach()\n", 375 | " obs = obs + torch.randn_like(obs) * 0.01\n", 376 | "\n", 377 | " # Get trajectory of random timespan \n", 378 | " min_delta_time = 1.0\n", 379 | " max_delta_time = 5.0\n", 380 | " max_points_num = 32\n", 381 | " def create_batch():\n", 382 | " t0 = np.random.uniform(0, t_max - max_delta_time)\n", 383 | " t1 = t0 + np.random.uniform(min_delta_time, max_delta_time)\n", 384 | "\n", 385 | " idx = sorted(np.random.permutation(index_np[(times_np > t0) & (times_np < t1)])[:max_points_num])\n", 386 | "\n", 387 | " obs_ = obs[idx]\n", 388 | " ts_ = times[idx]\n", 389 | " return obs_, ts_\n", 390 | "\n", 391 | " # Train Neural ODE\n", 392 | " optimizer = torch.optim.Adam(ode_trained.parameters(), lr=0.01)\n", 393 | " for i in range(n_steps):\n", 394 | " obs_, ts_ = create_batch()\n", 395 | "\n", 396 | " z_ = ode_trained(obs_[0], ts_, return_whole_sequence=True)\n", 397 | " loss = F.mse_loss(z_, obs_.detach())\n", 398 | "\n", 399 | " optimizer.zero_grad()\n", 400 | " loss.backward(retain_graph=True)\n", 401 | " optimizer.step()\n", 402 | "\n", 403 | " if i % plot_freq == 0:\n", 404 | " z_p = ode_trained(z0, times, return_whole_sequence=True)\n", 405 | "\n", 406 | " plot_trajectories(obs=[obs], times=[times], trajs=[z_p], save=f\"assets/imgs/{name}/{i}.png\")\n", 407 | " clear_output(wait=True)" 408 | ] 409 | }, 410 | { 411 | "cell_type": "markdown", 412 | "metadata": {}, 413 | "source": [ 414 | "### Simple linear ODE\n", 415 | "We are given a two-dimensinal $\\mathbf{z}(t)$, which changes according to the equation\n", 416 | "$$\n", 417 | "\\frac{\\partial \\mathbf{z}}{\\partial t} = \\begin{bmatrix}\n", 418 | "-0.1 z_1 - z_2 \\\\\n", 419 | "z_1 - 0.1 z_2 \\\\\n", 420 | "\\end{bmatrix}.\n", 421 | "$$\n", 422 | "This looks gives us a spiral from the initial point, going closer and closer around the origin. " 423 | ] 424 | }, 425 | { 426 | "cell_type": "code", 427 | "execution_count": 6, 428 | "metadata": {}, 429 | "outputs": [], 430 | "source": [ 431 | "# Restrict ODE to a linear function\n", 432 | "class LinearODEF(ODEF):\n", 433 | " def __init__(self, W):\n", 434 | " super(LinearODEF, self).__init__()\n", 435 | " self.lin = nn.Linear(2, 2, bias=False)\n", 436 | " self.lin.weight = nn.Parameter(W)\n", 437 | "\n", 438 | " def forward(self, x, t):\n", 439 | " return self.lin(x)\n", 440 | "\n", 441 | "# True function\n", 442 | "class SpiralFunctionExample(LinearODEF):\n", 443 | " def __init__(self):\n", 444 | " super(SpiralFunctionExample, self).__init__(Tensor([[-0.1, -1.], [1., -0.1]]))\n", 445 | " \n", 446 | "# Random initial guess for function\n", 447 | "class RandomLinearODEF(LinearODEF):\n", 448 | " def __init__(self):\n", 449 | " super(RandomLinearODEF, self).__init__(torch.randn(2, 2)/2.)" 450 | ] 451 | }, 452 | { 453 | "cell_type": "code", 454 | "execution_count": 7, 455 | "metadata": {}, 456 | "outputs": [ 457 | { 458 | "data": { 459 | "image/png": "\n", 460 | "text/plain": [ 461 | "
" 462 | ] 463 | }, 464 | "metadata": { 465 | "needs_background": "light" 466 | }, 467 | "output_type": "display_data" 468 | } 469 | ], 470 | "source": [ 471 | "ode_true = NeuralODE(SpiralFunctionExample())\n", 472 | "ode_trained = NeuralODE(RandomLinearODEF())\n", 473 | "\n", 474 | "conduct_experiment(ode_true, ode_trained, 500, \"linear\")" 475 | ] 476 | }, 477 | { 478 | "cell_type": "markdown", 479 | "metadata": {}, 480 | "source": [ 481 | "### More complex ODE\n", 482 | "\n", 483 | "Next we set up an ODE with more complicated dynamics. In this particular case, we will use a 2-layer neural network to produce the dynamics. That is, we have\n", 484 | "$$\n", 485 | "\\frac{\\partial \\mathbf{z}}{\\partial t} = f_{true}(\\mathbf{z}(t), t)\n", 486 | "$$\n", 487 | "for some 2-layer neural network $f_{true}$." 488 | ] 489 | }, 490 | { 491 | "cell_type": "code", 492 | "execution_count": 8, 493 | "metadata": {}, 494 | "outputs": [], 495 | "source": [ 496 | "# True 2-layer neural network\n", 497 | "class TestODEF(ODEF):\n", 498 | " def __init__(self, A, B, x0):\n", 499 | " super(TestODEF, self).__init__()\n", 500 | " self.A = nn.Linear(2, 2, bias=False)\n", 501 | " self.A.weight = nn.Parameter(A)\n", 502 | " self.B = nn.Linear(2, 2, bias=False)\n", 503 | " self.B.weight = nn.Parameter(B)\n", 504 | " self.x0 = nn.Parameter(x0)\n", 505 | "\n", 506 | " def forward(self, x, t):\n", 507 | " xTx0 = torch.sum(x*self.x0, dim=1)\n", 508 | " dxdt = torch.sigmoid(xTx0) * self.A(x - self.x0) + torch.sigmoid(-xTx0) * self.B(x + self.x0)\n", 509 | " return dxdt\n", 510 | "\n", 511 | "# Neural network to learn the dynamics\n", 512 | "class NNODEF(ODEF):\n", 513 | " def __init__(self, in_dim, hid_dim, time_invariant=False):\n", 514 | " super(NNODEF, self).__init__()\n", 515 | " self.time_invariant = time_invariant\n", 516 | "\n", 517 | " if time_invariant:\n", 518 | " self.lin1 = nn.Linear(in_dim, hid_dim)\n", 519 | " else:\n", 520 | " self.lin1 = nn.Linear(in_dim+1, hid_dim)\n", 521 | " self.lin2 = nn.Linear(hid_dim, hid_dim)\n", 522 | " self.lin3 = nn.Linear(hid_dim, in_dim)\n", 523 | " self.elu = nn.ELU(inplace=True)\n", 524 | "\n", 525 | " def forward(self, x, t):\n", 526 | " if not self.time_invariant:\n", 527 | " x = torch.cat((x, t), dim=-1)\n", 528 | "\n", 529 | " h = self.elu(self.lin1(x))\n", 530 | " h = self.elu(self.lin2(h))\n", 531 | " out = self.lin3(h)\n", 532 | " return out" 533 | ] 534 | }, 535 | { 536 | "cell_type": "code", 537 | "execution_count": 9, 538 | "metadata": {}, 539 | "outputs": [ 540 | { 541 | "data": { 542 | "image/png": "\n", 543 | "text/plain": [ 544 | "
" 545 | ] 546 | }, 547 | "metadata": { 548 | "needs_background": "light" 549 | }, 550 | "output_type": "display_data" 551 | } 552 | ], 553 | "source": [ 554 | "func = TestODEF(Tensor([[-0.1, -0.5], [0.5, -0.1]]), Tensor([[0.2, 1.], [-1, 0.2]]), Tensor([[-1., 0.]]))\n", 555 | "ode_true = NeuralODE(func)\n", 556 | "\n", 557 | "func = NNODEF(2, 16, time_invariant=True)\n", 558 | "ode_trained = NeuralODE(func)\n", 559 | "\n", 560 | "conduct_experiment(ode_true, ode_trained, 3000, \"comp\", plot_freq=30)" 561 | ] 562 | }, 563 | { 564 | "cell_type": "markdown", 565 | "metadata": {}, 566 | "source": [ 567 | "## Normalizing flows\n", 568 | "\n", 569 | "The original, discrete normalizing flows approximate a probability density over a random variable $\\mathbf{z}$ with a series of transformations of a simple, known density distribtuion over $\\mathbf{z}_0$. That is, let's say that $\\mathbf{z}_0 $ is drawn from the standard Gaussian distribution $\\mathcal{N}(\\mathbf{0}, I)$. We define $\\mathbf{z}_1 = r_{\\theta}(\\mathbf{z}_0)$ for some (differentiable) function r with parameters $\\theta$ and can use the change of variable rule for probability distribution to get\n", 570 | "$$\n", 571 | "p_{\\mathbf{z}_1}(\\mathbf{z_1}) = p_{\\mathbf{z}_0}(\\mathbf{z_0}) \\left |\\det (J_{z_1z_0}) \\right|\n", 572 | "$$\n", 573 | "where $\\det$ is the determinant operator and $J_{z_1z_0}$ is the Jacobian between the two variables (more thorough definition of this can be found here https://stats.libretexts.org/Bookshelves/Probability_Theory/Book%3A_Probability_Mathematical_Statistics_and_Stochastic_Processes_(Siegrist)/03%3A_Distributions/3.07%3A_Transformations_of_Random_Variables).\n", 574 | "\n", 575 | "We can repeat the above process over and over again to get more and more complex densities $p_{\\mathbf{z}_i}$. Normalizing flows allow us to fit the parameters $\\theta$ of each transformation, however they require that we compute the determinant of the Jacobians at each transformation. Unless the Jacobians have special structure, which restricts the type of transformations we can apply, this computation costs $\\mathcal{O}(D^3)$, where $D$ is the number of dimensions of $\\mathbf{z}$ and this can be prohibitively costly.\n", 576 | "\n", 577 | "Luckily, if we make these transformations continuous, we no longer require to compute the determinant of the Jacobian, but instead only the sum of the diagonal elements (the trace) of the continuous equivalent. This is a $\\mathcal{O}(D)$ operation, instead of the cubic one above. The full mathematical details are a bit involved, but are explained well in the original paper (https://arxiv.org/pdf/1806.07366.pdf). This means we can perform this density estimation a lot more efficiently than in the discrete case.\n", 578 | "\n", 579 | "For this we will use the original implementation (https://github.com/rtqichen/torchdiffeq), which is harder to understand, but more optimised. First some boilerplate code" 580 | ] 581 | }, 582 | { 583 | "cell_type": "code", 584 | "execution_count": 33, 585 | "metadata": {}, 586 | "outputs": [], 587 | "source": [ 588 | "import os\n", 589 | "import glob\n", 590 | "from PIL import Image\n", 591 | "import numpy as np\n", 592 | "import matplotlib\n", 593 | "matplotlib.use('agg')\n", 594 | "import matplotlib.pyplot as plt\n", 595 | "from sklearn.datasets import make_circles\n", 596 | "import torch\n", 597 | "import torch.nn as nn\n", 598 | "import torch.optim as optim\n", 599 | "\n", 600 | "args = lambda: None\n", 601 | "args.adjoint = False\n", 602 | "args.viz = True\n", 603 | "args.niters = 1000\n", 604 | "args.lr = 1e-3\n", 605 | "args.num_samples = 512\n", 606 | "args.width = 64\n", 607 | "args.hidden_dim = 32\n", 608 | "args.gpu = 0\n", 609 | "args.train_dir = None\n", 610 | "args.results_dir = \"./results\"\n", 611 | "\n", 612 | "if args.adjoint:\n", 613 | " from torchdiffeq import odeint_adjoint as odeint\n", 614 | "else:\n", 615 | " from torchdiffeq import odeint\n", 616 | " \n", 617 | "class RunningAverageMeter(object):\n", 618 | " \"\"\"Computes and stores the average and current value\"\"\"\n", 619 | "\n", 620 | " def __init__(self, momentum=0.99):\n", 621 | " self.momentum = momentum\n", 622 | " self.reset()\n", 623 | "\n", 624 | " def reset(self):\n", 625 | " self.val = None\n", 626 | " self.avg = 0\n", 627 | "\n", 628 | " def update(self, val):\n", 629 | " if self.val is None:\n", 630 | " self.avg = val\n", 631 | " else:\n", 632 | " self.avg = self.avg * self.momentum + val * (1 - self.momentum)\n", 633 | " self.val = val\n", 634 | "\n", 635 | "\n", 636 | "def get_batch(num_samples):\n", 637 | " points, _ = make_circles(n_samples=num_samples, noise=0.06, factor=0.5)\n", 638 | " x = torch.tensor(points).type(torch.float32).to(device)\n", 639 | " logp_diff_t1 = torch.zeros(num_samples, 1).type(torch.float32).to(device)\n", 640 | "\n", 641 | " return(x, logp_diff_t1)\n", 642 | "\n", 643 | "\n", 644 | "def trace_df_dz(f, z):\n", 645 | " \"\"\"Calculates the trace of the Jacobian df/dz.\n", 646 | " Stolen from: https://github.com/rtqichen/ffjord/blob/master/lib/layers/odefunc.py#L13\n", 647 | " \"\"\"\n", 648 | " sum_diag = 0.\n", 649 | " for i in range(z.shape[1]):\n", 650 | " sum_diag += torch.autograd.grad(f[:, i].sum(), z, create_graph=True)[0].contiguous()[:, i].contiguous()\n", 651 | "\n", 652 | " return sum_diag.contiguous()" 653 | ] 654 | }, 655 | { 656 | "cell_type": "markdown", 657 | "metadata": {}, 658 | "source": [ 659 | "We require that our neural network $f(z(t), t)$ varies with time. This is implemented with the HyperNetwork below." 660 | ] 661 | }, 662 | { 663 | "cell_type": "code", 664 | "execution_count": 28, 665 | "metadata": {}, 666 | "outputs": [], 667 | "source": [ 668 | "class HyperNetwork(nn.Module):\n", 669 | " \"\"\"Hyper-network allowing f(z(t), t) to change with time.\n", 670 | " Adapted from the NumPy implementation at:\n", 671 | " https://gist.github.com/rtqichen/91924063aa4cc95e7ef30b3a5491cc52\n", 672 | " \"\"\"\n", 673 | " def __init__(self, in_out_dim, hidden_dim, width):\n", 674 | " super().__init__()\n", 675 | "\n", 676 | " blocksize = width * in_out_dim\n", 677 | "\n", 678 | " self.fc1 = nn.Linear(1, hidden_dim)\n", 679 | " self.fc2 = nn.Linear(hidden_dim, hidden_dim)\n", 680 | " self.fc3 = nn.Linear(hidden_dim, 3 * blocksize + width)\n", 681 | "\n", 682 | " self.in_out_dim = in_out_dim\n", 683 | " self.hidden_dim = hidden_dim\n", 684 | " self.width = width\n", 685 | " self.blocksize = blocksize\n", 686 | "\n", 687 | " def forward(self, t):\n", 688 | " # predict params\n", 689 | " params = t.reshape(1, 1)\n", 690 | " params = torch.tanh(self.fc1(params))\n", 691 | " params = torch.tanh(self.fc2(params))\n", 692 | " params = self.fc3(params)\n", 693 | "\n", 694 | " # restructure\n", 695 | " params = params.reshape(-1)\n", 696 | " W = params[:self.blocksize].reshape(self.width, self.in_out_dim, 1)\n", 697 | "\n", 698 | " U = params[self.blocksize:2 * self.blocksize].reshape(self.width, 1, self.in_out_dim)\n", 699 | "\n", 700 | " G = params[2 * self.blocksize:3 * self.blocksize].reshape(self.width, 1, self.in_out_dim)\n", 701 | " U = U * torch.sigmoid(G)\n", 702 | "\n", 703 | " B = params[3 * self.blocksize:].reshape(self.width, 1, 1)\n", 704 | " return [W, B, U]\n", 705 | "\n", 706 | "\n", 707 | "class CNF(nn.Module):\n", 708 | " \"\"\"Adapted from the NumPy implementation at:\n", 709 | " https://gist.github.com/rtqichen/91924063aa4cc95e7ef30b3a5491cc52\n", 710 | " \"\"\"\n", 711 | " def __init__(self, in_out_dim, hidden_dim, width):\n", 712 | " super().__init__()\n", 713 | " self.in_out_dim = in_out_dim\n", 714 | " self.hidden_dim = hidden_dim\n", 715 | " self.width = width\n", 716 | " self.hyper_net = HyperNetwork(in_out_dim, hidden_dim, width)\n", 717 | "\n", 718 | " def forward(self, t, states):\n", 719 | " z = states[0]\n", 720 | " logp_z = states[1]\n", 721 | "\n", 722 | " batchsize = z.shape[0]\n", 723 | "\n", 724 | " with torch.set_grad_enabled(True):\n", 725 | " z.requires_grad_(True)\n", 726 | "\n", 727 | " W, B, U = self.hyper_net(t)\n", 728 | "\n", 729 | " Z = torch.unsqueeze(z, 0).repeat(self.width, 1, 1)\n", 730 | "\n", 731 | " h = torch.tanh(torch.matmul(Z, W) + B)\n", 732 | " dz_dt = torch.matmul(h, U).mean(0)\n", 733 | "\n", 734 | " dlogp_z_dt = -trace_df_dz(dz_dt, z).view(batchsize, 1)\n", 735 | "\n", 736 | " return (dz_dt, dlogp_z_dt)" 737 | ] 738 | }, 739 | { 740 | "cell_type": "markdown", 741 | "metadata": {}, 742 | "source": [ 743 | "We next train our network" 744 | ] 745 | }, 746 | { 747 | "cell_type": "code", 748 | "execution_count": null, 749 | "metadata": {}, 750 | "outputs": [], 751 | "source": [ 752 | "t0 = 0\n", 753 | "t1 = 10\n", 754 | "device = torch.device('cuda:' + str(args.gpu)\n", 755 | " if torch.cuda.is_available() else 'cpu')\n", 756 | "\n", 757 | "# model\n", 758 | "func = CNF(in_out_dim=2, hidden_dim=args.hidden_dim, width=args.width).to(device)\n", 759 | "optimizer = optim.Adam(func.parameters(), lr=args.lr)\n", 760 | "p_z0 = torch.distributions.MultivariateNormal(\n", 761 | " loc=torch.tensor([0.0, 0.0]).to(device),\n", 762 | " covariance_matrix=torch.tensor([[0.1, 0.0], [0.0, 0.1]]).to(device)\n", 763 | ")\n", 764 | "loss_meter = RunningAverageMeter()\n", 765 | "\n", 766 | "if args.train_dir is not None:\n", 767 | " if not os.path.exists(args.train_dir):\n", 768 | " os.makedirs(args.train_dir)\n", 769 | " ckpt_path = os.path.join(args.train_dir, 'ckpt.pth')\n", 770 | " if os.path.exists(ckpt_path):\n", 771 | " checkpoint = torch.load(ckpt_path)\n", 772 | " func.load_state_dict(checkpoint['func_state_dict'])\n", 773 | " optimizer.load_state_dict(checkpoint['optimizer_state_dict'])\n", 774 | " print('Loaded ckpt from {}'.format(ckpt_path))\n", 775 | "\n", 776 | "try:\n", 777 | " for itr in range(1, args.niters + 1):\n", 778 | " optimizer.zero_grad()\n", 779 | "\n", 780 | " x, logp_diff_t1 = get_batch(args.num_samples)\n", 781 | "\n", 782 | " z_t, logp_diff_t = odeint(\n", 783 | " func,\n", 784 | " (x, logp_diff_t1),\n", 785 | " torch.tensor([t1, t0]).type(torch.float32).to(device),\n", 786 | " atol=1e-5,\n", 787 | " rtol=1e-5,\n", 788 | " method='dopri5',\n", 789 | " )\n", 790 | "\n", 791 | " z_t0, logp_diff_t0 = z_t[-1], logp_diff_t[-1]\n", 792 | "\n", 793 | " logp_x = p_z0.log_prob(z_t0).to(device) - logp_diff_t0.view(-1)\n", 794 | " loss = -logp_x.mean(0)\n", 795 | "\n", 796 | " loss.backward()\n", 797 | " optimizer.step()\n", 798 | "\n", 799 | " loss_meter.update(loss.item())\n", 800 | "\n", 801 | " print('Iter: {}, running avg loss: {:.4f}'.format(itr, loss_meter.avg))\n", 802 | "\n", 803 | "except KeyboardInterrupt:\n", 804 | " if args.train_dir is not None:\n", 805 | " ckpt_path = os.path.join(args.train_dir, 'ckpt.pth')\n", 806 | " torch.save({\n", 807 | " 'func_state_dict': func.state_dict(),\n", 808 | " 'optimizer_state_dict': optimizer.state_dict(),\n", 809 | " }, ckpt_path)\n", 810 | " print('Stored ckpt at {}'.format(ckpt_path))\n", 811 | "print('Training complete after {} iters.'.format(itr))" 812 | ] 813 | }, 814 | { 815 | "cell_type": "markdown", 816 | "metadata": {}, 817 | "source": [ 818 | "And finally, we visualise the trained model." 819 | ] 820 | }, 821 | { 822 | "cell_type": "code", 823 | "execution_count": 31, 824 | "metadata": {}, 825 | "outputs": [ 826 | { 827 | "name": "stdout", 828 | "output_type": "stream", 829 | "text": [ 830 | "Saved visualization animation at ./results\\cnf-viz.gif\n" 831 | ] 832 | } 833 | ], 834 | "source": [ 835 | "if args.viz:\n", 836 | " viz_samples = 30000\n", 837 | " viz_timesteps = 41\n", 838 | " target_sample, _ = get_batch(viz_samples)\n", 839 | "\n", 840 | " if not os.path.exists(args.results_dir):\n", 841 | " os.makedirs(args.results_dir)\n", 842 | " with torch.no_grad():\n", 843 | " # Generate evolution of samples\n", 844 | " z_t0 = p_z0.sample([viz_samples]).to(device)\n", 845 | " logp_diff_t0 = torch.zeros(viz_samples, 1).type(torch.float32).to(device)\n", 846 | "\n", 847 | " z_t_samples, _ = odeint(\n", 848 | " func,\n", 849 | " (z_t0, logp_diff_t0),\n", 850 | " torch.tensor(np.linspace(t0, t1, viz_timesteps)).to(device),\n", 851 | " atol=1e-5,\n", 852 | " rtol=1e-5,\n", 853 | " method='dopri5',\n", 854 | " )\n", 855 | "\n", 856 | " # Generate evolution of density\n", 857 | " x = np.linspace(-1.5, 1.5, 100)\n", 858 | " y = np.linspace(-1.5, 1.5, 100)\n", 859 | " points = np.vstack(np.meshgrid(x, y)).reshape([2, -1]).T\n", 860 | "\n", 861 | " z_t1 = torch.tensor(points).type(torch.float32).to(device)\n", 862 | " logp_diff_t1 = torch.zeros(z_t1.shape[0], 1).type(torch.float32).to(device)\n", 863 | "\n", 864 | " z_t_density, logp_diff_t = odeint(\n", 865 | " func,\n", 866 | " (z_t1, logp_diff_t1),\n", 867 | " torch.tensor(np.linspace(t1, t0, viz_timesteps)).to(device),\n", 868 | " atol=1e-5,\n", 869 | " rtol=1e-5,\n", 870 | " method='dopri5',\n", 871 | " )\n", 872 | "\n", 873 | " # Create plots for each timestep\n", 874 | " for (t, z_sample, z_density, logp_diff) in zip(\n", 875 | " np.linspace(t0, t1, viz_timesteps),\n", 876 | " z_t_samples, z_t_density, logp_diff_t\n", 877 | " ):\n", 878 | " fig = plt.figure(figsize=(12, 4), dpi=200)\n", 879 | " plt.tight_layout()\n", 880 | " plt.axis('off')\n", 881 | " plt.margins(0, 0)\n", 882 | " fig.suptitle(f'{t:.2f}s')\n", 883 | "\n", 884 | " ax1 = fig.add_subplot(1, 3, 1)\n", 885 | " ax1.set_title('Target')\n", 886 | " ax1.get_xaxis().set_ticks([])\n", 887 | " ax1.get_yaxis().set_ticks([])\n", 888 | " ax2 = fig.add_subplot(1, 3, 2)\n", 889 | " ax2.set_title('Samples')\n", 890 | " ax2.get_xaxis().set_ticks([])\n", 891 | " ax2.get_yaxis().set_ticks([])\n", 892 | " ax3 = fig.add_subplot(1, 3, 3)\n", 893 | " ax3.set_title('Log Probability')\n", 894 | " ax3.get_xaxis().set_ticks([])\n", 895 | " ax3.get_yaxis().set_ticks([])\n", 896 | "\n", 897 | " ax1.hist2d(*target_sample.detach().cpu().numpy().T, bins=300, density=True,\n", 898 | " range=[[-1.5, 1.5], [-1.5, 1.5]])\n", 899 | "\n", 900 | " ax2.hist2d(*z_sample.detach().cpu().numpy().T, bins=300, density=True,\n", 901 | " range=[[-1.5, 1.5], [-1.5, 1.5]])\n", 902 | "\n", 903 | " logp = p_z0.log_prob(z_density) - logp_diff.view(-1)\n", 904 | " ax3.tricontourf(*z_t1.detach().cpu().numpy().T,\n", 905 | " np.exp(logp.detach().cpu().numpy()), 200)\n", 906 | "\n", 907 | " plt.savefig(os.path.join(args.results_dir, f\"cnf-viz-{int(t*1000):05d}.jpg\"),\n", 908 | " pad_inches=0.2, bbox_inches='tight')\n", 909 | " plt.close()\n", 910 | "\n", 911 | " img, *imgs = [Image.open(f) for f in sorted(glob.glob(os.path.join(args.results_dir, f\"cnf-viz-*.jpg\")))]\n", 912 | " img.save(fp=os.path.join(args.results_dir, \"cnf-viz.gif\"), format='GIF', append_images=imgs,\n", 913 | " save_all=True, duration=250, loop=0)\n", 914 | "\n", 915 | " print('Saved visualization animation at {}'.format(os.path.join(args.results_dir, \"cnf-viz.gif\")))" 916 | ] 917 | }, 918 | { 919 | "cell_type": "markdown", 920 | "metadata": {}, 921 | "source": [ 922 | "# Notes and further steps" 923 | ] 924 | }, 925 | { 926 | "cell_type": "markdown", 927 | "metadata": {}, 928 | "source": [ 929 | "It turns out vanilla Neural ODEs are limited in what type of functions they can express. In particular, they struggle to fit functions like this\n", 930 | "\n", 931 | "![title](assets/notebook_imgs/image_8.jpg)\n", 932 | "(image from https://ml.berkeley.edu/blog/posts/neural-odes/#training-odenets)\n", 933 | "\n", 934 | "because we're working in terms of the derivative. Think about what the derivative should be at the intersection of the blue and red curve. On one hand, it needs to be positive so the blue function can increase, but on the other hand, it needs to be negative so the red line can decrease. The way to overcome this issue is to introduce some extra \"ficticious\" dimensions and that approach is described in the paper Augmented Neural ODEs (ANODEs). **In general, it is recommended that you use ANODEs instead of vanilla NODEs**. A link to the GitHub repository can be found here: https://github.com/EmilienDupont/augmented-neural-odes\n", 935 | "\n", 936 | "We here looked at only first-order ODEs (the order is the highest derivative involved in expressing the dynamics of the system). If you are interested in exploring ODEs of higher order, for example because you are interested in modelling a physical system with known dynamics that are of higher order, you can look at second-order ODEs (it briefly talks about higher orders as well), which are described here - https://github.com/a-norcliffe/sonode.\n", 937 | "\n", 938 | "If you are interested in making the density estimation faster and thus more scalable, it is recommended that you refer to the follow up paper Free-form Jacobian of Reversible Dynamics (FFJORD) found here - https://github.com/rtqichen/ffjord.\n", 939 | "\n", 940 | "Finally, if you are interested whether this approach is extendable to Stochastic Differential Equations, you can refer to http://proceedings.mlr.press/v118/li20a/li20a.pdf with (currently-ongoing) implementation at https://github.com/google-research/torchsde." 941 | ] 942 | } 943 | ], 944 | "metadata": { 945 | "kernelspec": { 946 | "display_name": "Python 3", 947 | "language": "python", 948 | "name": "python3" 949 | }, 950 | "language_info": { 951 | "codemirror_mode": { 952 | "name": "ipython", 953 | "version": 3 954 | }, 955 | "file_extension": ".py", 956 | "mimetype": "text/x-python", 957 | "name": "python", 958 | "nbconvert_exporter": "python", 959 | "pygments_lexer": "ipython3", 960 | "version": "3.8.3" 961 | } 962 | }, 963 | "nbformat": 4, 964 | "nbformat_minor": 4 965 | } 966 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # neural-ode-tutorial 2 | Repository for tutorial on Neural ODEs prepared for the UCL AI Society 3 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | torchdiffeq==0.2.1 2 | matplotlib 3 | sklearn 4 | torch==1.7.1 5 | numpy==1.20.1 6 | tqdm==4.54.1 7 | seaborn 8 | --------------------------------------------------------------------------------