├── Compute_Jacobian.py ├── PINNsNTK_Poisson1D.ipynb ├── PINNsNTK_Wave1D.ipynb └── README.md /Compute_Jacobian.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | Created on Sat Jul 11 17:45:07 2020 4 | 5 | @author: sifan 6 | """ 7 | 8 | from __future__ import absolute_import 9 | from __future__ import division 10 | from __future__ import print_function 11 | import tensorflow as tf 12 | from tensorflow.python.framework import ops 13 | from tensorflow.python.ops import array_ops 14 | from tensorflow.python.ops import check_ops 15 | from tensorflow.python.ops import gradients_impl as gradient_ops 16 | from tensorflow.python.ops.parallel_for import control_flow_ops 17 | from tensorflow.python.util import nest 18 | 19 | def jacobian(output, inputs, use_pfor=True, parallel_iterations=None): 20 | """Computes jacobian of `output` w.r.t. `inputs`. 21 | Args: 22 | output: A tensor. 23 | inputs: A tensor or a nested structure of tensor objects. 24 | use_pfor: If true, uses pfor for computing the jacobian. Else uses 25 | tf.while_loop. 26 | parallel_iterations: A knob to control how many iterations and dispatched in 27 | parallel. This knob can be used to control the total memory usage. 28 | Returns: 29 | A tensor or a nested structure of tensors with the same structure as 30 | `inputs`. Each entry is the jacobian of `output` w.r.t. to the corresponding 31 | value in `inputs`. If output has shape [y_1, ..., y_n] and inputs_i has 32 | shape [x_1, ..., x_m], the corresponding jacobian has shape 33 | [y_1, ..., y_n, x_1, ..., x_m]. Note that in cases where the gradient is 34 | sparse (IndexedSlices), jacobian function currently makes it dense and 35 | returns a Tensor instead. This may change in the future. 36 | """ 37 | flat_inputs = nest.flatten(inputs) 38 | output_tensor_shape = output.shape 39 | output_shape = array_ops.shape(output) 40 | output = array_ops.reshape(output, [-1]) 41 | 42 | def loop_fn(i): 43 | y = array_ops.gather(output, i) 44 | return gradient_ops.gradients(y, flat_inputs, unconnected_gradients=tf.UnconnectedGradients.ZERO) 45 | 46 | try: 47 | output_size = int(output.shape[0]) 48 | except TypeError: 49 | output_size = array_ops.shape(output)[0] 50 | 51 | if use_pfor: 52 | pfor_outputs = control_flow_ops.pfor( 53 | loop_fn, output_size, parallel_iterations=parallel_iterations) 54 | else: 55 | pfor_outputs = control_flow_ops.for_loop( 56 | loop_fn, 57 | [output.dtype] * len(flat_inputs), 58 | output_size, 59 | parallel_iterations=parallel_iterations) 60 | 61 | for i, out in enumerate(pfor_outputs): 62 | if isinstance(out, ops.Tensor): 63 | new_shape = array_ops.concat( 64 | [output_shape, array_ops.shape(out)[1:]], axis=0) 65 | out = array_ops.reshape(out, new_shape) 66 | out.set_shape(output_tensor_shape.concatenate(flat_inputs[i].shape)) 67 | pfor_outputs[i] = out 68 | 69 | return nest.pack_sequence_as(inputs, pfor_outputs) -------------------------------------------------------------------------------- /PINNsNTK_Poisson1D.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "name": "PINNsNTK_Poisson.ipynb", 7 | "provenance": [] 8 | }, 9 | "kernelspec": { 10 | "name": "python3", 11 | "display_name": "Python 3" 12 | }, 13 | "language_info": { 14 | "name": "python" 15 | }, 16 | "accelerator": "GPU" 17 | }, 18 | "cells": [ 19 | { 20 | "cell_type": "code", 21 | "metadata": { 22 | "colab": { 23 | "base_uri": "https://localhost:8080/" 24 | }, 25 | "id": "OybZJApDYGsi", 26 | "outputId": "dc7cdffa-8613-42ee-86cc-0bc5000b2998" 27 | }, 28 | "source": [ 29 | "# Switch to tensorflow 1.x\n", 30 | "%tensorflow_version 1.x" 31 | ], 32 | "execution_count": null, 33 | "outputs": [ 34 | { 35 | "output_type": "stream", 36 | "name": "stdout", 37 | "text": [ 38 | "TensorFlow 1.x selected.\n" 39 | ] 40 | } 41 | ] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "metadata": { 46 | "id": "7WkCgnRiYQSY" 47 | }, 48 | "source": [ 49 | "import tensorflow as tf\n", 50 | "from Compute_Jacobian import jacobian # Please download 'Compute_Jacobian.py' in the repository \n", 51 | "import numpy as np\n", 52 | "import timeit\n", 53 | "from scipy.interpolate import griddata\n", 54 | "import seaborn as sns\n", 55 | "import matplotlib.pyplot as plt\n", 56 | "import pandas as pd\n", 57 | "import os" 58 | ], 59 | "execution_count": null, 60 | "outputs": [] 61 | }, 62 | { 63 | "cell_type": "code", 64 | "metadata": { 65 | "id": "-y7cHTcJfBTR" 66 | }, 67 | "source": [ 68 | "class Sampler:\n", 69 | " # Initialize the class\n", 70 | " def __init__(self, dim, coords, func, name=None):\n", 71 | " self.dim = dim\n", 72 | " self.coords = coords\n", 73 | " self.func = func\n", 74 | " self.name = name\n", 75 | "\n", 76 | " def sample(self, N):\n", 77 | " x = self.coords[0:1, :] + (self.coords[1:2, :] - self.coords[0:1, :]) * np.random.rand(N, self.dim)\n", 78 | " y = self.func(x)\n", 79 | " return x, y" 80 | ], 81 | "execution_count": null, 82 | "outputs": [] 83 | }, 84 | { 85 | "cell_type": "code", 86 | "metadata": { 87 | "id": "SDqDWN3nfSAg" 88 | }, 89 | "source": [ 90 | "class PINN:\n", 91 | " def __init__(self, layers, X_u, Y_u, X_r, Y_r):\n", 92 | " self.mu_X, self.sigma_X = X_r.mean(0), X_r.std(0)\n", 93 | " self.mu_x, self.sigma_x = self.mu_X[0], self.sigma_X[0]\n", 94 | "\n", 95 | " # Normalize\n", 96 | " self.X_u = (X_u - self.mu_X) / self.sigma_X\n", 97 | " self.Y_u = Y_u\n", 98 | " self.X_r = (X_r - self.mu_X) / self.sigma_X\n", 99 | " self.Y_r = Y_r\n", 100 | "\n", 101 | " # Initialize network weights and biases\n", 102 | " self.layers = layers\n", 103 | " self.weights, self.biases = self.initialize_NN(layers)\n", 104 | " \n", 105 | " # Define the size of the Kernel\n", 106 | " self.kernel_size = X_u.shape[0]\n", 107 | " \n", 108 | " # Define Tensorflow session\n", 109 | " self.sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))\n", 110 | "\n", 111 | " # Define placeholders and computational graph\n", 112 | " self.x_u_tf = tf.placeholder(tf.float32, shape=(None, 1))\n", 113 | " self.u_tf = tf.placeholder(tf.float32, shape=(None, 1))\n", 114 | "\n", 115 | " self.x_bc_tf = tf.placeholder(tf.float32, shape=(None, 1))\n", 116 | " self.u_bc_tf = tf.placeholder(tf.float32, shape=(None, 1))\n", 117 | "\n", 118 | " self.x_r_tf = tf.placeholder(tf.float32, shape=(None, 1))\n", 119 | " self.r_tf = tf.placeholder(tf.float32, shape=(None, 1))\n", 120 | " \n", 121 | " self.x_u_ntk_tf = tf.placeholder(tf.float32, shape=(self.kernel_size, 1))\n", 122 | " self.x_r_ntk_tf = tf.placeholder(tf.float32, shape=(self.kernel_size, 1))\n", 123 | "\n", 124 | "\n", 125 | " # Evaluate predictions\n", 126 | " self.u_bc_pred = self.net_u(self.x_bc_tf)\n", 127 | "\n", 128 | " self.u_pred = self.net_u(self.x_u_tf)\n", 129 | " self.r_pred = self.net_r(self.x_r_tf)\n", 130 | " \n", 131 | " self.u_ntk_pred = self.net_u(self.x_u_ntk_tf)\n", 132 | " self.r_ntk_pred = self.net_r(self.x_r_ntk_tf)\n", 133 | " \n", 134 | " # Boundary loss\n", 135 | " self.loss_bcs = tf.reduce_mean(tf.square(self.u_bc_pred - self.u_bc_tf))\n", 136 | "\n", 137 | " # Residual loss \n", 138 | " self.loss_res = tf.reduce_mean(tf.square(self.r_tf - self.r_pred))\n", 139 | " \n", 140 | " # Total loss\n", 141 | " self.loss = self.loss_res + self.loss_bcs\n", 142 | "\n", 143 | " # Define optimizer with learning rate schedule\n", 144 | " self.global_step = tf.Variable(0, trainable=False)\n", 145 | " starter_learning_rate = 1e-5\n", 146 | " self.learning_rate = tf.train.exponential_decay(starter_learning_rate, self.global_step,\n", 147 | " 1000, 0.9, staircase=False)\n", 148 | " # Passing global_step to minimize() will increment it at each step.\n", 149 | " # To compute NTK, it is better to use SGD optimizer\n", 150 | " # since the corresponding gradient flow is not exactly same.\n", 151 | " self.train_op = tf.train.GradientDescentOptimizer(starter_learning_rate).minimize(self.loss)\n", 152 | " # self.train_op = tf.train.AdamOptimizer(self.learning_rate).minimize(self.loss, global_step=self.global_step)\n", 153 | "\n", 154 | "\n", 155 | " # Initialize Tensorflow variables\n", 156 | " init = tf.global_variables_initializer()\n", 157 | " self.sess.run(init)\n", 158 | " \n", 159 | " self.saver = tf.train.Saver()\n", 160 | " \n", 161 | " # Compute the Jacobian for weights and biases in each hidden layer \n", 162 | " self.J_u = self.compute_jacobian(self.u_ntk_pred) \n", 163 | " self.J_r = self.compute_jacobian(self.r_ntk_pred)\n", 164 | " \n", 165 | " # The empirical NTK = J J^T, compute NTK of PINNs \n", 166 | " self.K_uu = self.compute_ntk(self.J_u, self.x_u_ntk_tf, self.J_u, self.x_u_ntk_tf)\n", 167 | " self.K_ur = self.compute_ntk(self.J_u, self.x_u_ntk_tf, self.J_r, self.x_r_ntk_tf)\n", 168 | " self.K_rr = self.compute_ntk(self.J_r, self.x_r_ntk_tf, self.J_r, self.x_r_ntk_tf)\n", 169 | " \n", 170 | " # Logger\n", 171 | " # Loss logger\n", 172 | " self.loss_bcs_log = []\n", 173 | " self.loss_res_log = []\n", 174 | "\n", 175 | " # NTK logger \n", 176 | " self.K_uu_log = []\n", 177 | " self.K_rr_log = []\n", 178 | " self.K_ur_log = []\n", 179 | " \n", 180 | " # Weights logger \n", 181 | " self.weights_log = []\n", 182 | " self.biases_log = []\n", 183 | " \n", 184 | " # Xavier initialization\n", 185 | " def xavier_init(self, size):\n", 186 | " in_dim = size[0]\n", 187 | " out_dim = size[1]\n", 188 | " xavier_stddev = 1. / np.sqrt((in_dim + out_dim) / 2.)\n", 189 | " return tf.Variable(tf.random.normal([in_dim, out_dim], dtype=tf.float32) * xavier_stddev,\n", 190 | " dtype=tf.float32)\n", 191 | " \n", 192 | " # NTK initialization\n", 193 | " def NTK_init(self, size):\n", 194 | " in_dim = size[0]\n", 195 | " out_dim = size[1]\n", 196 | " std = 1. / np.sqrt(in_dim)\n", 197 | " return tf.Variable(tf.random.normal([in_dim, out_dim], dtype=tf.float32) * std,\n", 198 | " dtype=tf.float32)\n", 199 | "\n", 200 | " # Initialize network weights and biases using Xavier initialization\n", 201 | " def initialize_NN(self, layers):\n", 202 | " weights = []\n", 203 | " biases = []\n", 204 | " num_layers = len(layers)\n", 205 | " for l in range(0, num_layers - 1):\n", 206 | " W = self.NTK_init(size=[layers[l], layers[l + 1]])\n", 207 | " b = tf.Variable(tf.random.normal([1, layers[l + 1]], dtype=tf.float32), dtype=tf.float32)\n", 208 | " weights.append(W)\n", 209 | " biases.append(b)\n", 210 | " return weights, biases\n", 211 | "\n", 212 | " # Evaluates the forward pass\n", 213 | " def forward_pass(self, H):\n", 214 | " num_layers = len(self.layers)\n", 215 | " for l in range(0, num_layers - 2):\n", 216 | " W = self.weights[l]\n", 217 | " b = self.biases[l]\n", 218 | " H = tf.nn.tanh(tf.add(tf.matmul(H, W), b))\n", 219 | " W = self.weights[-1]\n", 220 | " b = self.biases[-1]\n", 221 | " H = tf.add(tf.matmul(H, W), b)\n", 222 | " return H\n", 223 | "\n", 224 | " # Evaluates the PDE solution\n", 225 | " def net_u(self, x):\n", 226 | " u = self.forward_pass(x)\n", 227 | " return u\n", 228 | "\n", 229 | " # Forward pass for the residual\n", 230 | " def net_r(self, x):\n", 231 | " u = self.net_u(x)\n", 232 | "\n", 233 | " u_x = tf.gradients(u, x)[0] / self.sigma_x\n", 234 | " u_xx = tf.gradients(u_x, x)[0] / self.sigma_x\n", 235 | "\n", 236 | " res_u = u_xx\n", 237 | " return res_u\n", 238 | " \n", 239 | " # Compute Jacobian for each weights and biases in each layer and retrun a list \n", 240 | " def compute_jacobian(self, f):\n", 241 | " J_list =[]\n", 242 | " L = len(self.weights) \n", 243 | " for i in range(L):\n", 244 | " J_w = jacobian(f, self.weights[i])\n", 245 | " J_list.append(J_w)\n", 246 | " \n", 247 | " for i in range(L):\n", 248 | " J_b = jacobian(f, self.biases[i])\n", 249 | " J_list.append(J_b)\n", 250 | " return J_list\n", 251 | " \n", 252 | " # Compute the empirical NTK = J J^T\n", 253 | " def compute_ntk(self, J1_list, x1, J2_list, x2):\n", 254 | " D = x1.shape[0]\n", 255 | " N = len(J1_list)\n", 256 | " \n", 257 | " Ker = tf.zeros((D,D))\n", 258 | " for k in range(N):\n", 259 | " J1 = tf.reshape(J1_list[k], shape=(D,-1))\n", 260 | " J2 = tf.reshape(J2_list[k], shape=(D,-1))\n", 261 | " \n", 262 | " K = tf.matmul(J1, tf.transpose(J2))\n", 263 | " Ker = Ker + K\n", 264 | " return Ker\n", 265 | " \n", 266 | " # Trains the model by minimizing the MSE loss\n", 267 | " def train(self, nIter=10000, batch_size=128, log_NTK=True, log_weights=True):\n", 268 | "\n", 269 | " start_time = timeit.default_timer()\n", 270 | " for it in range(nIter):\n", 271 | " # Fetch boundary mini-batches\n", 272 | " # Define a dictionary for associating placeholders with data\n", 273 | " tf_dict = {self.x_bc_tf: self.X_u, self.u_bc_tf: self.Y_u,\n", 274 | " self.x_u_tf: self.X_u, self.x_r_tf: self.X_r,\n", 275 | " self.r_tf: self.Y_r\n", 276 | " }\n", 277 | " \n", 278 | " # Run the Tensorflow session to minimize the loss\n", 279 | " self.sess.run(self.train_op, tf_dict)\n", 280 | "\n", 281 | " # Print\n", 282 | " if it % 100 == 0:\n", 283 | " elapsed = timeit.default_timer() - start_time\n", 284 | " loss_value = self.sess.run(self.loss, tf_dict)\n", 285 | " loss_bcs_value, loss_res_value = self.sess.run([self.loss_bcs, self.loss_res], tf_dict)\n", 286 | " self.loss_bcs_log.append(loss_bcs_value)\n", 287 | " self.loss_res_log.append(loss_res_value)\n", 288 | "\n", 289 | " print('It: %d, Loss: %.3e, Loss_bcs: %.3e, Loss_res: %.3e ,Time: %.2f' %\n", 290 | " (it, loss_value, loss_bcs_value, loss_res_value, elapsed))\n", 291 | " \n", 292 | "\n", 293 | " start_time = timeit.default_timer()\n", 294 | "\n", 295 | " if log_NTK:\n", 296 | " # provide x, x' for NTK\n", 297 | " if it % 100 == 0:\n", 298 | " print(\"Compute NTK...\")\n", 299 | " tf_dict = {self.x_u_ntk_tf: self.X_u, self.x_r_ntk_tf: self.X_r}\n", 300 | " K_uu_value, K_ur_value, K_rr_value = self.sess.run([self.K_uu,\n", 301 | " self.K_ur,\n", 302 | " self.K_rr], tf_dict)\n", 303 | " self.K_uu_log.append(K_uu_value)\n", 304 | " self.K_ur_log.append(K_ur_value)\n", 305 | " self.K_rr_log.append(K_rr_value)\n", 306 | " \n", 307 | " if log_weights:\n", 308 | " if it % 100 ==0:\n", 309 | " print(\"Weights stored...\")\n", 310 | " weights = self.sess.run(self.weights)\n", 311 | " biases = self.sess.run(self.biases)\n", 312 | " \n", 313 | " self.weights_log.append(weights)\n", 314 | " self.biases_log.append(biases)\n", 315 | " \n", 316 | " # Evaluates predictions at test points\n", 317 | " def predict_u(self, X_star):\n", 318 | " X_star = (X_star - self.mu_X) / self.sigma_X\n", 319 | " tf_dict = {self.x_u_tf: X_star}\n", 320 | " u_star = self.sess.run(self.u_pred, tf_dict)\n", 321 | " return u_star\n", 322 | "\n", 323 | " # Evaluates predictions at test points\n", 324 | " def predict_r(self, X_star):\n", 325 | " X_star = (X_star - self.mu_X) / self.sigma_X\n", 326 | " tf_dict = {self.x_r_tf: X_star}\n", 327 | " r_star = self.sess.run(self.r_pred, tf_dict)\n", 328 | " return r_star\n" 329 | ], 330 | "execution_count": null, 331 | "outputs": [] 332 | }, 333 | { 334 | "cell_type": "code", 335 | "metadata": { 336 | "id": "FN1jEdRwY90i" 337 | }, 338 | "source": [ 339 | "# Define solution and its Laplace\n", 340 | "a = 4\n", 341 | "\n", 342 | "def u(x, a):\n", 343 | " return np.sin(np.pi * a * x)\n", 344 | "\n", 345 | "def u_xx(x, a):\n", 346 | " return -(np.pi * a)**2 * np.sin(np.pi * a * x)" 347 | ], 348 | "execution_count": null, 349 | "outputs": [] 350 | }, 351 | { 352 | "cell_type": "code", 353 | "metadata": { 354 | "id": "YGFobW0EatXj" 355 | }, 356 | "source": [ 357 | "# Define computional domain\n", 358 | "bc1_coords = np.array([[0.0],\n", 359 | " [0.0]])\n", 360 | "\n", 361 | "bc2_coords = np.array([[1.0],\n", 362 | " [1.0]])\n", 363 | "\n", 364 | "dom_coords = np.array([[0.0],\n", 365 | " [1.0]])\n", 366 | "\n", 367 | "# Training data on u(x) -- Dirichlet boundary conditions\n", 368 | "\n", 369 | "nn = 100\n", 370 | "\n", 371 | "X_bc1 = dom_coords[0, 0] * np.ones((nn // 2, 1))\n", 372 | "X_bc2 = dom_coords[1, 0] * np.ones((nn // 2, 1))\n", 373 | "X_u = np.vstack([X_bc1, X_bc2])\n", 374 | "Y_u = u(X_u, a)\n", 375 | "\n", 376 | "X_r = np.linspace(dom_coords[0, 0],\n", 377 | " dom_coords[1, 0], nn)[:, None]\n", 378 | "Y_r = u_xx(X_r, a)" 379 | ], 380 | "execution_count": null, 381 | "outputs": [] 382 | }, 383 | { 384 | "cell_type": "code", 385 | "metadata": { 386 | "colab": { 387 | "base_uri": "https://localhost:8080/" 388 | }, 389 | "id": "jZtWEM9-brXF", 390 | "outputId": "cafb2cca-8f1d-4370-8315-a739987c2838" 391 | }, 392 | "source": [ 393 | "# Define model\n", 394 | "layers = [1, 512, 1] \n", 395 | "# layers = [1, 512, 512, 512, 1] \n", 396 | "model = PINN(layers, X_u, Y_u, X_r, Y_r) " 397 | ], 398 | "execution_count": null, 399 | "outputs": [ 400 | { 401 | "output_type": "stream", 402 | "name": "stdout", 403 | "text": [ 404 | "Device mapping:\n", 405 | "/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device\n", 406 | "/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device\n", 407 | "/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0\n", 408 | "\n" 409 | ] 410 | } 411 | ] 412 | }, 413 | { 414 | "cell_type": "code", 415 | "metadata": { 416 | "id": "vF1hwPUobyPE" 417 | }, 418 | "source": [ 419 | "# Train model\n", 420 | "model.train(nIter=40001, batch_size=100, log_NTK=True, log_weights=True)" 421 | ], 422 | "execution_count": null, 423 | "outputs": [] 424 | }, 425 | { 426 | "cell_type": "markdown", 427 | "metadata": { 428 | "id": "oiyikOwBjRoZ" 429 | }, 430 | "source": [ 431 | "**Training Loss**" 432 | ] 433 | }, 434 | { 435 | "cell_type": "code", 436 | "metadata": { 437 | "colab": { 438 | "base_uri": "https://localhost:8080/", 439 | "height": 369 440 | }, 441 | "id": "Fw807UNzhu5z", 442 | "outputId": "929ae89c-3e10-4c56-e349-051441e8ab32" 443 | }, 444 | "source": [ 445 | "loss_bcs = model.loss_bcs_log\n", 446 | "loss_res = model.loss_res_log\n", 447 | "\n", 448 | "fig = plt.figure(figsize=(6,5))\n", 449 | "plt.plot(loss_res, label='$\\mathcal{L}_{r}$')\n", 450 | "plt.plot(loss_bcs, label='$\\mathcal{L}_{b}$')\n", 451 | "plt.yscale('log')\n", 452 | "plt.xlabel('iterations')\n", 453 | "plt.ylabel('Loss')\n", 454 | "plt.legend()\n", 455 | "plt.tight_layout()\n", 456 | "plt.show()" 457 | ], 458 | "execution_count": null, 459 | "outputs": [ 460 | { 461 | "output_type": "display_data", 462 | "data": { 463 | "image/png": "\n", 464 | "text/plain": [ 465 | "
" 466 | ] 467 | }, 468 | "metadata": { 469 | "needs_background": "light" 470 | } 471 | } 472 | ] 473 | }, 474 | { 475 | "cell_type": "markdown", 476 | "metadata": { 477 | "id": "TFLIBq5xjZ3v" 478 | }, 479 | "source": [ 480 | "**Model Prediction**" 481 | ] 482 | }, 483 | { 484 | "cell_type": "code", 485 | "metadata": { 486 | "id": "To0PDN17cc0v", 487 | "colab": { 488 | "base_uri": "https://localhost:8080/" 489 | }, 490 | "outputId": "7284b31e-f2fe-41ab-93a9-4c2c91f94cde" 491 | }, 492 | "source": [ 493 | "nn = 1000\n", 494 | "X_star = np.linspace(dom_coords[0, 0], dom_coords[1, 0], nn)[:, None]\n", 495 | "u_star = u(X_star, a)\n", 496 | "r_star = u_xx(X_star, a)\n", 497 | "\n", 498 | "# Predictions\n", 499 | "u_pred = model.predict_u(X_star)\n", 500 | "r_pred = model.predict_r(X_star)\n", 501 | "error_u = np.linalg.norm(u_star - u_pred, 2) / np.linalg.norm(u_star, 2)\n", 502 | "error_r = np.linalg.norm(r_star - r_pred, 2) / np.linalg.norm(r_star, 2)\n", 503 | "\n", 504 | "print('Relative L2 error_u: {:.2e}'.format(error_u))\n", 505 | "print('Relative L2 error_r: {:.2e}'.format(error_r))" 506 | ], 507 | "execution_count": null, 508 | "outputs": [ 509 | { 510 | "output_type": "stream", 511 | "name": "stdout", 512 | "text": [ 513 | "Relative L2 error_u: 4.21e-02\n", 514 | "Relative L2 error_r: 4.88e-03\n" 515 | ] 516 | } 517 | ] 518 | }, 519 | { 520 | "cell_type": "code", 521 | "metadata": { 522 | "colab": { 523 | "base_uri": "https://localhost:8080/", 524 | "height": 369 525 | }, 526 | "id": "K428lOuXhdc8", 527 | "outputId": "b1e23055-178c-400c-8972-f3e7987e0892" 528 | }, 529 | "source": [ 530 | "fig = plt.figure(figsize=(12, 5))\n", 531 | "plt.subplot(1,2,1)\n", 532 | "plt.plot(X_star, u_star, label='Exact')\n", 533 | "plt.plot(X_star, u_pred, '--', label='Predicted')\n", 534 | "plt.xlabel('$x$')\n", 535 | "plt.ylabel('$y$')\n", 536 | "plt.legend(loc='upper right')\n", 537 | "\n", 538 | "plt.subplot(1,2,2)\n", 539 | "plt.plot(X_star, np.abs(u_star - u_pred), label='Error')\n", 540 | "plt.yscale('log')\n", 541 | "plt.xlabel('$x$')\n", 542 | "plt.ylabel('Point-wise error')\n", 543 | "plt.tight_layout()\n", 544 | "plt.show()" 545 | ], 546 | "execution_count": null, 547 | "outputs": [ 548 | { 549 | "output_type": "display_data", 550 | "data": { 551 | "image/png": "\n", 552 | "text/plain": [ 553 | "
" 554 | ] 555 | }, 556 | "metadata": { 557 | "needs_background": "light" 558 | } 559 | } 560 | ] 561 | }, 562 | { 563 | "cell_type": "markdown", 564 | "metadata": { 565 | "id": "9EYdfKGLj6h0" 566 | }, 567 | "source": [ 568 | "**NTK Eigenvalues**" 569 | ] 570 | }, 571 | { 572 | "cell_type": "code", 573 | "metadata": { 574 | "id": "e3dByeQjhBYj" 575 | }, 576 | "source": [ 577 | "# Create empty lists for storing the eigenvalues of NTK\n", 578 | "lambda_K_log = []\n", 579 | "lambda_K_uu_log = []\n", 580 | "lambda_K_ur_log = []\n", 581 | "lambda_K_rr_log = []\n", 582 | "\n", 583 | "# Restore the NTK\n", 584 | "K_uu_list = model.K_uu_log\n", 585 | "K_ur_list = model.K_ur_log\n", 586 | "K_rr_list = model.K_rr_log\n", 587 | "K_list = []\n", 588 | " \n", 589 | "for k in range(len(K_uu_list)):\n", 590 | " K_uu = K_uu_list[k]\n", 591 | " K_ur = K_ur_list[k]\n", 592 | " K_rr = K_rr_list[k]\n", 593 | " \n", 594 | " K = np.concatenate([np.concatenate([K_uu, K_ur], axis = 1),\n", 595 | " np.concatenate([K_ur.T, K_rr], axis = 1)], axis = 0)\n", 596 | " K_list.append(K)\n", 597 | "\n", 598 | " # Compute eigenvalues\n", 599 | " lambda_K, _ = np.linalg.eig(K)\n", 600 | " lambda_K_uu, _ = np.linalg.eig(K_uu)\n", 601 | " lambda_K_rr, _ = np.linalg.eig(K_rr)\n", 602 | " \n", 603 | " # Sort in descresing order\n", 604 | " lambda_K = np.sort(np.real(lambda_K))[::-1]\n", 605 | " lambda_K_uu = np.sort(np.real(lambda_K_uu))[::-1]\n", 606 | " lambda_K_rr = np.sort(np.real(lambda_K_rr))[::-1]\n", 607 | " \n", 608 | " # Store eigenvalues\n", 609 | " lambda_K_log.append(lambda_K)\n", 610 | " lambda_K_uu_log.append(lambda_K_uu)\n", 611 | " lambda_K_rr_log.append(lambda_K_rr)" 612 | ], 613 | "execution_count": null, 614 | "outputs": [] 615 | }, 616 | { 617 | "cell_type": "code", 618 | "metadata": { 619 | "colab": { 620 | "base_uri": "https://localhost:8080/", 621 | "height": 369 622 | }, 623 | "id": "vSn3Q_1IhisN", 624 | "outputId": "4c713f42-11b2-4de9-8698-085eb54c164d" 625 | }, 626 | "source": [ 627 | "fig = plt.figure(figsize=(18, 5))\n", 628 | "plt.subplot(1,3,1)\n", 629 | "for i in range(1, len(lambda_K_log), 10):\n", 630 | " plt.plot(lambda_K_log[i], '--')\n", 631 | "plt.xscale('log')\n", 632 | "plt.yscale('log')\n", 633 | "plt.title(r'Eigenvalues of ${K}$')\n", 634 | "plt.tight_layout()\n", 635 | "\n", 636 | "plt.subplot(1,3,2)\n", 637 | "for i in range(1, len(lambda_K_uu_log), 10):\n", 638 | " plt.plot(lambda_K_uu_log[i], '--')\n", 639 | "plt.xscale('log')\n", 640 | "plt.yscale('log')\n", 641 | "plt.title(r'Eigenvalues of ${K}_{uu}$')\n", 642 | "plt.tight_layout()\n", 643 | "\n", 644 | "plt.subplot(1,3,3)\n", 645 | "for i in range(1, len(lambda_K_log), 10):\n", 646 | " plt.plot(lambda_K_rr_log[i], '--')\n", 647 | "plt.xscale('log')\n", 648 | "plt.yscale('log')\n", 649 | "plt.title(r'Eigenvalues of ${K}_{rr}$')\n", 650 | "plt.tight_layout()\n", 651 | "plt.show()" 652 | ], 653 | "execution_count": null, 654 | "outputs": [ 655 | { 656 | "output_type": "display_data", 657 | "data": { 658 | "image/png": "\n", 659 | "text/plain": [ 660 | "
" 661 | ] 662 | }, 663 | "metadata": { 664 | "needs_background": "light" 665 | } 666 | } 667 | ] 668 | }, 669 | { 670 | "cell_type": "markdown", 671 | "metadata": { 672 | "id": "pIS5UH81kOxT" 673 | }, 674 | "source": [ 675 | "**Change of NTK**" 676 | ] 677 | }, 678 | { 679 | "cell_type": "code", 680 | "metadata": { 681 | "id": "wF4Q_iZshQ-0" 682 | }, 683 | "source": [ 684 | "# Change of the NTK\n", 685 | "NTK_change_list = []\n", 686 | "K0 = K_list[0]\n", 687 | "for K in K_list:\n", 688 | " diff = np.linalg.norm(K - K0) / np.linalg.norm(K0) \n", 689 | " NTK_change_list.append(diff)" 690 | ], 691 | "execution_count": null, 692 | "outputs": [] 693 | }, 694 | { 695 | "cell_type": "code", 696 | "metadata": { 697 | "colab": { 698 | "base_uri": "https://localhost:8080/", 699 | "height": 338 700 | }, 701 | "id": "E-_gPGpCkF4n", 702 | "outputId": "9893e038-907d-4425-bb6e-8ecdb2ad497d" 703 | }, 704 | "source": [ 705 | "fig = plt.figure(figsize=(6,5))\n", 706 | "plt.plot(NTK_change_list)" 707 | ], 708 | "execution_count": null, 709 | "outputs": [ 710 | { 711 | "output_type": "execute_result", 712 | "data": { 713 | "text/plain": [ 714 | "[]" 715 | ] 716 | }, 717 | "metadata": {}, 718 | "execution_count": 15 719 | }, 720 | { 721 | "output_type": "display_data", 722 | "data": { 723 | "image/png": "\n", 724 | "text/plain": [ 725 | "
" 726 | ] 727 | }, 728 | "metadata": { 729 | "needs_background": "light" 730 | } 731 | } 732 | ] 733 | }, 734 | { 735 | "cell_type": "markdown", 736 | "metadata": { 737 | "id": "Mg0ZGHbAkW6N" 738 | }, 739 | "source": [ 740 | "\n", 741 | "**Change of NN Params**" 742 | ] 743 | }, 744 | { 745 | "cell_type": "code", 746 | "metadata": { 747 | "id": "LLGv9JUuioVZ" 748 | }, 749 | "source": [ 750 | "# Change of the weights and biases\n", 751 | "def compute_weights_diff(weights_1, weights_2):\n", 752 | " weights = []\n", 753 | " N = len(weights_1)\n", 754 | " for k in range(N):\n", 755 | " weight = weights_1[k] - weights_2[k]\n", 756 | " weights.append(weight)\n", 757 | " return weights\n", 758 | "\n", 759 | "def compute_weights_norm(weights, biases):\n", 760 | " norm = 0\n", 761 | " for w in weights:\n", 762 | " norm = norm + np.sum(np.square(w))\n", 763 | " for b in biases:\n", 764 | " norm = norm + np.sum(np.square(b))\n", 765 | " norm = np.sqrt(norm)\n", 766 | " return norm\n", 767 | "\n", 768 | "# Restore the list weights and biases\n", 769 | "weights_log = model.weights_log\n", 770 | "biases_log = model.biases_log\n", 771 | "\n", 772 | "weights_0 = weights_log[0]\n", 773 | "biases_0 = biases_log[0]\n", 774 | "\n", 775 | "# Norm of the weights at initialization\n", 776 | "weights_init_norm = compute_weights_norm(weights_0, biases_0)\n", 777 | "\n", 778 | "weights_change_list = []\n", 779 | "\n", 780 | "N = len(weights_log)\n", 781 | "for k in range(N):\n", 782 | " weights_diff = compute_weights_diff(weights_log[k], weights_log[0])\n", 783 | " biases_diff = compute_weights_diff(biases_log[k], biases_log[0])\n", 784 | " \n", 785 | " weights_diff_norm = compute_weights_norm(weights_diff, biases_diff)\n", 786 | " weights_change = weights_diff_norm / weights_init_norm\n", 787 | " weights_change_list.append(weights_change)" 788 | ], 789 | "execution_count": null, 790 | "outputs": [] 791 | }, 792 | { 793 | "cell_type": "code", 794 | "metadata": { 795 | "colab": { 796 | "base_uri": "https://localhost:8080/", 797 | "height": 338 798 | }, 799 | "id": "5NLsAxgzi4KH", 800 | "outputId": "74d92bf9-0dde-438e-e9b3-a551904f4e2f" 801 | }, 802 | "source": [ 803 | "fig = plt.figure(figsize=(6,5))\n", 804 | "plt.plot(weights_change_list)" 805 | ], 806 | "execution_count": null, 807 | "outputs": [ 808 | { 809 | "output_type": "execute_result", 810 | "data": { 811 | "text/plain": [ 812 | "[]" 813 | ] 814 | }, 815 | "metadata": {}, 816 | "execution_count": 17 817 | }, 818 | { 819 | "output_type": "display_data", 820 | "data": { 821 | "image/png": "\n", 822 | "text/plain": [ 823 | "
" 824 | ] 825 | }, 826 | "metadata": { 827 | "needs_background": "light" 828 | } 829 | } 830 | ] 831 | }, 832 | { 833 | "cell_type": "code", 834 | "metadata": { 835 | "id": "MYbzkhfMjJ8k" 836 | }, 837 | "source": [ 838 | "" 839 | ], 840 | "execution_count": null, 841 | "outputs": [] 842 | } 843 | ] 844 | } -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## When and why PINNs fail to train: A neural tangent kernel perspective 2 | 3 | Code and data (available upon request) accompanying the manuscript titled "When and why PINNs fail to train: A neural tangent kernel perspective", authored by Sifan Wang, Xinling Yu, and Paris Perdikaris. 4 | 5 | ## Abstract 6 | 7 | Physics-informed neural networks (PINNs) have lately received great attention thanks to their flexibility in tackling a wide range of forward and inverse problems involving partial differential equations. However, despite their noticeable empirical success, little is known about how such constrained neural networks behave during their training via gradient descent. More importantly, even less is known about why such models sometimes fail to train at all. 8 | In this work, we aim to investigate these questions through the lens of the Neural Tangent Kernel (NTK); a kernel that captures the behavior of fully-connected neural networks in the infinite width limit during training via gradient descent. Specifically, we derive the NTK 9 | of PINNs and prove that, under appropriate conditions, it converges to a deterministic kernel that stays constant during training in the infinite-width limit. This allows us to analyze the training dynamics of PINNs through the lens of their limiting NTK and find a remarkable discrepancy in the convergence rate of the different loss components contributing to the total training error. To address this fundamental pathology, we propose a novel gradient descent algorithm that utilizes the eigenvalues of the NTK to adaptively calibrate the convergence rate of the total training error. Finally, we perform a series of numerical experiments to verify the correctness of our theory and the practical effectiveness of the proposed algorithms. 10 | 11 | ## Citation 12 | 13 | @article{wang2021and, 14 | title={When and why PINNs fail to train: A neural tangent kernel perspective}, 15 | author={Wang, Sifan and Yu, Xinling and Perdikaris, Paris}, 16 | journal={Journal of Computational Physics}, 17 | pages={110768}, 18 | year={2021}, 19 | publisher={Elsevier} 20 | } 21 | --------------------------------------------------------------------------------