├── .ipynb_checkpoints ├── A1_vectors_matrices_in_numpy-checkpoint.ipynb ├── B4a_discrete-choice-estimation-checkpoint.ipynb ├── B4b_characteristics-models-checkpoint.ipynb ├── SL1_network-flow-problem-checkpoint.ipynb └── SL3_quantiles-regression-checkpoint.ipynb ├── L01_optimal-assignment.ipynb ├── L02_onedimensionaltransport.ipynb ├── L03_semi-discrete-optimal-transport.ipynb ├── L04_regularized-optimal-transport.ipynb ├── L05_matching-estimation.ipynb ├── L06_gravity_equation.ipynb ├── README.md └── docker ├── mec_optim.Dockerfile └── setup_mec_optim.pdf /.ipynb_checkpoints/A1_vectors_matrices_in_numpy-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "#
Crash course 1: Vectors and matrices in numpy and scipy
\n", 8 | "###
Alfred Galichon (NYU & ScPo) and Clément Montes (ScPo)
\n", 9 | "##
'math+econ+code' masterclass on optimal transport and economic applications
\n", 10 | "####
With python code examples
\n", 11 | "© 2018-2022 by Alfred Galichon. Past and present support from NSF grant DMS-1716489, ERC grant CoG-866274 are acknowledged, as well as inputs from contributors listed [here](http://www.math-econ-code.org/theteam).\n", 12 | "\n", 13 | "**If you reuse material from this masterclass, please cite as:**
\n", 14 | "Alfred Galichon, 'math+econ+code' masterclass on optimal transport and economic applications, January 2022. https://github.com/math-econ-code/mec_optim" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "# Introducing NumPy\n", 22 | "\n", 23 | "* Unlike R or Matlab, Python has no built-in matrix algebra interface. Fortunately, the NumPy library provides powerful matrix capabilities, on par with R or Matlab. Here is a quick introduction to vectorization, operations on vectors and matrices, higher-dimensional arrays, Kronecker products and sparse matrices, etc. in NumPy.\n", 24 | "\n", 25 | "* This is *not* a tutorial on Python itself. They are plenty good ones available on the web.\n", 26 | "\n", 27 | "* First, we load numpy (with its widely used alias):" 28 | ] 29 | }, 30 | { 31 | "cell_type": "code", 32 | "execution_count": null, 33 | "metadata": {}, 34 | "outputs": [], 35 | "source": [ 36 | "import numpy as np" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": {}, 42 | "source": [ 43 | "In NumPy, an `array` is built from a lists as follows:" 44 | ] 45 | }, 46 | { 47 | "cell_type": "code", 48 | "execution_count": null, 49 | "metadata": {}, 50 | "outputs": [], 51 | "source": [ 52 | "u = np.array([1,2,3])\n", 53 | "print(u)\n", 54 | "v = np.array([3,2,5])\n", 55 | "print(v)" 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "metadata": {}, 61 | "source": [ 62 | "One can then add arrays as:" 63 | ] 64 | }, 65 | { 66 | "cell_type": "code", 67 | "execution_count": null, 68 | "metadata": {}, 69 | "outputs": [], 70 | "source": [ 71 | "print(np.array([1,2,3])+np.array([3,2,5]))" 72 | ] 73 | }, 74 | { 75 | "cell_type": "markdown", 76 | "metadata": {}, 77 | "source": [ 78 | "Note the difference between the + operator when applied to numpy arrays vs. when applied to lists:" 79 | ] 80 | }, 81 | { 82 | "cell_type": "code", 83 | "execution_count": null, 84 | "metadata": {}, 85 | "outputs": [], 86 | "source": [ 87 | "[1,2,3]+[3,2,5]" 88 | ] 89 | }, 90 | { 91 | "cell_type": "markdown", 92 | "metadata": {}, 93 | "source": [ 94 | "In the latter case, it returns list concatenation." 95 | ] 96 | }, 97 | { 98 | "cell_type": "markdown", 99 | "metadata": {}, 100 | "source": [ 101 | "To input matrices in NumPy, one simply inputs a list of rows, which are themselves represented as lists." 102 | ] 103 | }, 104 | { 105 | "cell_type": "code", 106 | "execution_count": null, 107 | "metadata": {}, 108 | "outputs": [], 109 | "source": [ 110 | "A = np.array([[11,12],[21,22],[31,32]])\n", 111 | "A" 112 | ] 113 | }, 114 | { 115 | "cell_type": "markdown", 116 | "metadata": {}, 117 | "source": [ 118 | "The `shape` attribute of an array indicated the dimension of that array." 119 | ] 120 | }, 121 | { 122 | "cell_type": "code", 123 | "execution_count": null, 124 | "metadata": {}, 125 | "outputs": [], 126 | "source": [ 127 | "A.shape" 128 | ] 129 | }, 130 | { 131 | "cell_type": "markdown", 132 | "metadata": {}, 133 | "source": [ 134 | "## Vectorization and memory order\n", 135 | "\n", 136 | "* Matrices in all mathematical softwares are represented in a *vectorized* way as a sequence of numbers in the computers memory. This representation can involve either stacking the lines, or stacking the columns.\n", 137 | "\n", 138 | "* Different programming languages can use either of the two stacking conventions:\n", 139 | " + Stacking the lines (Row-major order) is used by `C`, and is the default convention for Python (NumPy). A matrix $M$ is represented by varying the last index first, i.e. a $2\\times2$ matrix will be represented as $vec_C\\left(M\\right) = \\left(M_{11}, M_{12}, M_{21}, M_{22}\\right).$ \n", 140 | " + Stacking the columns (Column-major order) is used by `Fortran`, `Matlab`, `R`, and most underlying core linear algebra libraries (like BLAS). A 2x2x2 3-dimensional array $A$ will be represented by varying the first index first, then the second, i.e. $vec_C\\left(A\\right) = \\left( A_{111}, A_{112}, A_{121}, A_{122}, A_{211}, A_{212}, A_{221}, A_{222} \\right)$. " 141 | ] 142 | }, 143 | { 144 | "cell_type": "markdown", 145 | "metadata": {}, 146 | "source": [ 147 | "The command `flatten()` provides the vectorized representation of a matrix." 148 | ] 149 | }, 150 | { 151 | "cell_type": "code", 152 | "execution_count": null, 153 | "metadata": {}, 154 | "outputs": [], 155 | "source": [ 156 | "A.flatten()" 157 | ] 158 | }, 159 | { 160 | "cell_type": "markdown", 161 | "metadata": {}, 162 | "source": [ 163 | "Remember, NumPy represents matrices by **varying the last index first**." 164 | ] 165 | }, 166 | { 167 | "cell_type": "markdown", 168 | "metadata": {}, 169 | "source": [ 170 | "In order to reshape the matrix `a`, one modifies its `shape` attribute. The following reshapes the matrix `a` into a row vector. " 171 | ] 172 | }, 173 | { 174 | "cell_type": "code", 175 | "execution_count": null, 176 | "metadata": {}, 177 | "outputs": [], 178 | "source": [ 179 | "A.shape = 1,6\n", 180 | "A" 181 | ] 182 | }, 183 | { 184 | "cell_type": "markdown", 185 | "metadata": {}, 186 | "source": [ 187 | "The previous output evidences the fact that Python uses the row-major order: rows are stacked one after the other. \n", 188 | "To reshape the vector into a column vector, do:" 189 | ] 190 | }, 191 | { 192 | "cell_type": "code", 193 | "execution_count": null, 194 | "metadata": {}, 195 | "outputs": [], 196 | "source": [ 197 | "A.shape = 6,1\n", 198 | "A" 199 | ] 200 | }, 201 | { 202 | "cell_type": "markdown", 203 | "metadata": {}, 204 | "source": [ 205 | "Equivalently, one could have set `A.shape=6,-1`, where Python would replace `-1` by the integer needed for the formula to make sense (in this case, `1`). \n", 206 | "Another way to reshape is to use the method `reshape,` which returns a duplicate of the object with the requested shape." 207 | ] 208 | }, 209 | { 210 | "cell_type": "code", 211 | "execution_count": null, 212 | "metadata": {}, 213 | "outputs": [], 214 | "source": [ 215 | "A1=np.array(range(6))\n", 216 | "A2 = A1.reshape(3,2)\n", 217 | "print(\"A1=\\n\", A1)\n", 218 | "print(\"A2=\\n\",A2)" 219 | ] 220 | }, 221 | { 222 | "cell_type": "markdown", 223 | "metadata": {}, 224 | "source": [ 225 | "Note that `NumPy` also supports the column-major order, but you have to specifically ask for it, by passing the optional argument `order='F'`, where 'F' stands for `Fortran`." 226 | ] 227 | }, 228 | { 229 | "cell_type": "code", 230 | "execution_count": null, 231 | "metadata": {}, 232 | "outputs": [], 233 | "source": [ 234 | "A3 = np.array(range(6)).reshape(3,2, order='F')\n", 235 | "A3" 236 | ] 237 | }, 238 | { 239 | "cell_type": "markdown", 240 | "metadata": {}, 241 | "source": [ 242 | "# Multiplication " 243 | ] 244 | }, 245 | { 246 | "cell_type": "markdown", 247 | "metadata": {}, 248 | "source": [ 249 | "### Multiplication of arrays" 250 | ] 251 | }, 252 | { 253 | "cell_type": "markdown", 254 | "metadata": {}, 255 | "source": [ 256 | "There are several ways to multiply two arrays using NumPy. The most commonly used is the following." 257 | ] 258 | }, 259 | { 260 | "cell_type": "code", 261 | "execution_count": null, 262 | "metadata": {}, 263 | "outputs": [], 264 | "source": [ 265 | "A = np.ones((2,2))\n", 266 | "B = 3*np.eye(2)\n", 267 | "A@B #@ is left associative. If you have A@B@C, it will compute (A@B)@C" 268 | ] 269 | }, 270 | { 271 | "cell_type": "markdown", 272 | "metadata": {}, 273 | "source": [ 274 | "Note that `np.matmul(A,B)` would give the same result as well, but it is more difficult to read `np.matmul(A,np.matmul(B,C))` than `A@B@C`." 275 | ] 276 | }, 277 | { 278 | "cell_type": "markdown", 279 | "metadata": {}, 280 | "source": [ 281 | "### Multiplication by a scalar" 282 | ] 283 | }, 284 | { 285 | "cell_type": "code", 286 | "execution_count": null, 287 | "metadata": {}, 288 | "outputs": [], 289 | "source": [ 290 | "4*np.eye(2)" 291 | ] 292 | }, 293 | { 294 | "cell_type": "markdown", 295 | "metadata": {}, 296 | "source": [ 297 | "The above assignation of B corresponds to the multiplication by a scalar. It is the simplest broadcasting allowed by numpy (which makes this library more powerful than just using lists -it is also much quicker-). More on broadcasting will arrive later in that Notebook." 298 | ] 299 | }, 300 | { 301 | "cell_type": "markdown", 302 | "metadata": {}, 303 | "source": [ 304 | "## Kronecker product\n", 305 | "\n", 306 | "A very important identity is\n", 307 | "\\begin{align*}\n", 308 | "vec_C\\left(AXB\\right) = \\left( A\\otimes B^\\top\\right) vec_C\\left(X\\right),\n", 309 | "\\end{align*}\n", 310 | "where $vec_C$ is the vectorization under the C (row-major) order, and where the Kronecker product $\\otimes$ is defined as follows for 2x2 matrices (with obvious generalization):\n", 311 | "\n", 312 | "\\begin{align*}\n", 313 | "A\\otimes B=\n", 314 | "\\begin{pmatrix}\n", 315 | "a_{11}B & a_{12}B\\\\\n", 316 | "a_{21}B & a_{22}B\n", 317 | "\\end{pmatrix}.\n", 318 | "\\end{align*}\n", 319 | "\n" 320 | ] 321 | }, 322 | { 323 | "cell_type": "code", 324 | "execution_count": null, 325 | "metadata": {}, 326 | "outputs": [], 327 | "source": [ 328 | "A = np.eye(2)\n", 329 | "\n", 330 | "AXB = np.kron(A, B)\n", 331 | "print(\"A=\",A)\n", 332 | "print(\"B=\",B)\n", 333 | "print(\"AXB=\",AXB)" 334 | ] 335 | }, 336 | { 337 | "cell_type": "markdown", 338 | "metadata": {}, 339 | "source": [ 340 | "## Type broadcasting in NumPy\n" 341 | ] 342 | }, 343 | { 344 | "cell_type": "markdown", 345 | "metadata": {}, 346 | "source": [ 347 | "The term broadcasting describes how NumPy treats arrays with different shapes during arithmetic operations. \n", 348 | "\n", 349 | "Subject to certain constraints, the smaller array is “broadcasted” across the larger array so that they have compatible shapes. Broadcasting provides a means of vectorizing array operations." 350 | ] 351 | }, 352 | { 353 | "cell_type": "code", 354 | "execution_count": null, 355 | "metadata": {}, 356 | "outputs": [], 357 | "source": [ 358 | "A = 10*np.array([[1],[2],[3]]) #Simplest broadcasting\n", 359 | "B = np.array([1,2])\n", 360 | "print('A=\\n',A)\n", 361 | "print('B=\\n',B)\n", 362 | "print('A+B=\\n',A+B)" 363 | ] 364 | }, 365 | { 366 | "cell_type": "markdown", 367 | "metadata": {}, 368 | "source": [ 369 | "The operation `A[:,np.newaxis]` creates a new dimension." 370 | ] 371 | }, 372 | { 373 | "cell_type": "code", 374 | "execution_count": null, 375 | "metadata": {}, 376 | "outputs": [], 377 | "source": [ 378 | "v = np.array([3,4,5])\n", 379 | "print(v)\n", 380 | "print(v[:,np.newaxis])\n", 381 | "print(v[np.newaxis,:])" 382 | ] 383 | }, 384 | { 385 | "cell_type": "markdown", 386 | "metadata": {}, 387 | "source": [ 388 | "# Arrays of larger dimensions" 389 | ] 390 | }, 391 | { 392 | "cell_type": "code", 393 | "execution_count": null, 394 | "metadata": {}, 395 | "outputs": [], 396 | "source": [ 397 | "a_3d_array = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])\n", 398 | "a_3d_array" 399 | ] 400 | }, 401 | { 402 | "cell_type": "markdown", 403 | "metadata": {}, 404 | "source": [ 405 | "Standard functions can also support arrays with more than 2 dimensions." 406 | ] 407 | }, 408 | { 409 | "cell_type": "code", 410 | "execution_count": null, 411 | "metadata": {}, 412 | "outputs": [], 413 | "source": [ 414 | "a_multiarray = np.zeros((2,3,3,3))\n", 415 | "print(a_multiarray, a_multiarray.shape)" 416 | ] 417 | }, 418 | { 419 | "cell_type": "markdown", 420 | "metadata": {}, 421 | "source": [ 422 | "# Searching for a maximum" 423 | ] 424 | }, 425 | { 426 | "cell_type": "markdown", 427 | "metadata": {}, 428 | "source": [ 429 | "### Maximum between 2 arrays" 430 | ] 431 | }, 432 | { 433 | "cell_type": "markdown", 434 | "metadata": {}, 435 | "source": [ 436 | "To compare two arrays (say $x$ and $y$) component by component, it is convenient to use `np.maximum`. It returns an array $z$ such that $ \\forall i: z[i] = \\max(x[i],y[i])$. " 437 | ] 438 | }, 439 | { 440 | "cell_type": "code", 441 | "execution_count": null, 442 | "metadata": {}, 443 | "outputs": [], 444 | "source": [ 445 | "np.maximum(np.array([2, 3, 4]), np.array([1, 5, 2]))" 446 | ] 447 | }, 448 | { 449 | "cell_type": "markdown", 450 | "metadata": {}, 451 | "source": [ 452 | "You can even broadcast." 453 | ] 454 | }, 455 | { 456 | "cell_type": "code", 457 | "execution_count": null, 458 | "metadata": {}, 459 | "outputs": [], 460 | "source": [ 461 | "np.maximum(np.eye(2), [0.5, 2]) # broadcasting" 462 | ] 463 | }, 464 | { 465 | "cell_type": "markdown", 466 | "metadata": {}, 467 | "source": [ 468 | "### Highest component within an array" 469 | ] 470 | }, 471 | { 472 | "cell_type": "markdown", 473 | "metadata": {}, 474 | "source": [ 475 | "`np.max` and `np.argmax` respectively find the maximum entry of a given array along a specified axis, and its index. `np.min` and `np.argmin` perform similar functions." 476 | ] 477 | }, 478 | { 479 | "cell_type": "code", 480 | "execution_count": null, 481 | "metadata": {}, 482 | "outputs": [], 483 | "source": [ 484 | "A = np.array([[0, 1,3], [0, 5,7]])\n", 485 | "A" 486 | ] 487 | }, 488 | { 489 | "cell_type": "code", 490 | "execution_count": null, 491 | "metadata": {}, 492 | "outputs": [], 493 | "source": [ 494 | "A.max(axis=0)" 495 | ] 496 | }, 497 | { 498 | "cell_type": "code", 499 | "execution_count": null, 500 | "metadata": {}, 501 | "outputs": [], 502 | "source": [ 503 | "A.argmax(axis=0)" 504 | ] 505 | }, 506 | { 507 | "cell_type": "code", 508 | "execution_count": null, 509 | "metadata": {}, 510 | "outputs": [], 511 | "source": [ 512 | "A.max(axis=1)" 513 | ] 514 | }, 515 | { 516 | "cell_type": "code", 517 | "execution_count": null, 518 | "metadata": {}, 519 | "outputs": [], 520 | "source": [ 521 | "A.min(axis=0)" 522 | ] 523 | }, 524 | { 525 | "cell_type": "code", 526 | "execution_count": null, 527 | "metadata": {}, 528 | "outputs": [], 529 | "source": [ 530 | "A.argmin(axis=0)" 531 | ] 532 | }, 533 | { 534 | "cell_type": "markdown", 535 | "metadata": {}, 536 | "source": [ 537 | "If `axis` is not specified, the maximum will be taken over all the entries of the matrix." 538 | ] 539 | }, 540 | { 541 | "cell_type": "code", 542 | "execution_count": null, 543 | "metadata": {}, 544 | "outputs": [], 545 | "source": [ 546 | "np.max(A) " 547 | ] 548 | }, 549 | { 550 | "cell_type": "markdown", 551 | "metadata": {}, 552 | "source": [ 553 | "Note: if your array contains a nan, you can use `np.nanmax` in order to ignore those values while searching for the highest component." 554 | ] 555 | }, 556 | { 557 | "cell_type": "markdown", 558 | "metadata": {}, 559 | "source": [ 560 | "## Summing all elements of an array" 561 | ] 562 | }, 563 | { 564 | "cell_type": "markdown", 565 | "metadata": {}, 566 | "source": [ 567 | "In a similar fashion as above, `np.sum` sums the elements of an array over a given axis." 568 | ] 569 | }, 570 | { 571 | "cell_type": "code", 572 | "execution_count": null, 573 | "metadata": {}, 574 | "outputs": [], 575 | "source": [ 576 | "A.sum( axis=0)" 577 | ] 578 | }, 579 | { 580 | "cell_type": "code", 581 | "execution_count": null, 582 | "metadata": {}, 583 | "outputs": [], 584 | "source": [ 585 | "A.sum(axis=1)" 586 | ] 587 | }, 588 | { 589 | "cell_type": "markdown", 590 | "metadata": {}, 591 | "source": [ 592 | "If `axis` is not specified, the sum is done over all the entries of the matrix.\n" 593 | ] 594 | }, 595 | { 596 | "cell_type": "code", 597 | "execution_count": null, 598 | "metadata": {}, 599 | "outputs": [], 600 | "source": [ 601 | "A.sum()" 602 | ] 603 | }, 604 | { 605 | "cell_type": "markdown", 606 | "metadata": {}, 607 | "source": [ 608 | "# Sparse matrices in Scipy\n", 609 | "\n", 610 | "Sparse matrices are available in the `sparse` module of the `scipy` library. " 611 | ] 612 | }, 613 | { 614 | "cell_type": "code", 615 | "execution_count": null, 616 | "metadata": {}, 617 | "outputs": [], 618 | "source": [ 619 | "import scipy.sparse as spr" 620 | ] 621 | }, 622 | { 623 | "cell_type": "code", 624 | "execution_count": null, 625 | "metadata": {}, 626 | "outputs": [], 627 | "source": [ 628 | "n = 1000\n", 629 | "\n", 630 | "print('size of sparse identity matrix of size '+str(n) +' in MB = ' + str(spr.identity(n).data.size / (1024**2)))\n", 631 | "\n", 632 | "print('size of dense identity matrix of size '+str(n) +' in MB = ' + str(spr.identity(n).todense().nbytes / (1024**2)))" 633 | ] 634 | }, 635 | { 636 | "cell_type": "markdown", 637 | "metadata": {}, 638 | "source": [ 639 | "Working with sparse matrices requires less storage. It is explained by the fact that while a dense matrix needs to encode every coefficient on a byte, sparse matrices only store the non-null coefficients. It is really convenient to work with such objects when it comes to matrices with really high sizes." 640 | ] 641 | }, 642 | { 643 | "cell_type": "code", 644 | "execution_count": null, 645 | "metadata": {}, 646 | "outputs": [], 647 | "source": [ 648 | "spr.identity(1000).data.size , spr.identity(1000).todense().nbytes " 649 | ] 650 | }, 651 | { 652 | "cell_type": "markdown", 653 | "metadata": {}, 654 | "source": [ 655 | "## Creating sparse matrices..." 656 | ] 657 | }, 658 | { 659 | "cell_type": "markdown", 660 | "metadata": {}, 661 | "source": [ 662 | "### ... with standard forms" 663 | ] 664 | }, 665 | { 666 | "cell_type": "code", 667 | "execution_count": null, 668 | "metadata": { 669 | "scrolled": true 670 | }, 671 | "outputs": [], 672 | "source": [ 673 | "I5 = spr.identity(5)\n", 674 | "I5" 675 | ] 676 | }, 677 | { 678 | "cell_type": "markdown", 679 | "metadata": {}, 680 | "source": [ 681 | "You can convert your sparse matrix into a dense one in order to visualise it. " 682 | ] 683 | }, 684 | { 685 | "cell_type": "code", 686 | "execution_count": null, 687 | "metadata": {}, 688 | "outputs": [], 689 | "source": [ 690 | "I5.todense()" 691 | ] 692 | }, 693 | { 694 | "cell_type": "markdown", 695 | "metadata": {}, 696 | "source": [ 697 | "### ... from a dense matrix" 698 | ] 699 | }, 700 | { 701 | "cell_type": "markdown", 702 | "metadata": {}, 703 | "source": [ 704 | "Let's create a dense matrix and make it sparse." 705 | ] 706 | }, 707 | { 708 | "cell_type": "code", 709 | "execution_count": null, 710 | "metadata": {}, 711 | "outputs": [], 712 | "source": [ 713 | "# import uniform module to create random numbers\n", 714 | "from scipy.stats import uniform" 715 | ] 716 | }, 717 | { 718 | "cell_type": "code", 719 | "execution_count": null, 720 | "metadata": {}, 721 | "outputs": [], 722 | "source": [ 723 | "np.random.seed(seed=42)\n", 724 | "dense_matrix = uniform.rvs(size=16, loc = 0, scale=2) #List of 16 random draws between 0 and 2\n", 725 | "dense_matrix = np.reshape(dense_matrix, (4, 4))\n", 726 | "dense_matrix" 727 | ] 728 | }, 729 | { 730 | "cell_type": "code", 731 | "execution_count": null, 732 | "metadata": {}, 733 | "outputs": [], 734 | "source": [ 735 | "dense_matrix[dense_matrix < 1] = 0 #Arbitrar criterion\n", 736 | "dense_matrix" 737 | ] 738 | }, 739 | { 740 | "cell_type": "code", 741 | "execution_count": null, 742 | "metadata": {}, 743 | "outputs": [], 744 | "source": [ 745 | "sparse_matrix = spr.csr_matrix(dense_matrix)\n", 746 | "print(sparse_matrix) #It prints a tuple giving the row and columns of the non-null component and its value." 747 | ] 748 | }, 749 | { 750 | "cell_type": "markdown", 751 | "metadata": {}, 752 | "source": [ 753 | "### ... from scratch" 754 | ] 755 | }, 756 | { 757 | "cell_type": "markdown", 758 | "metadata": {}, 759 | "source": [ 760 | "You can create two arrays containing respectively the rows and the column of the non-null coefficients.\n", 761 | "A third array would give the value of the non-null coefficient. The result is as follows:" 762 | ] 763 | }, 764 | { 765 | "cell_type": "code", 766 | "execution_count": null, 767 | "metadata": {}, 768 | "outputs": [], 769 | "source": [ 770 | "# row indices\n", 771 | "row_ind = np.array([0, 1, 1, 3, 4])\n", 772 | "# column indices\n", 773 | "col_ind = np.array([0, 2, 4, 3, 4])\n", 774 | "# coefficients\n", 775 | "data = np.array([1, 2, 3, 4, 5], dtype=float)\n", 776 | "\n", 777 | "mat_coo = spr.coo_matrix((data, (row_ind, col_ind)))\n", 778 | "print(mat_coo)" 779 | ] 780 | }, 781 | { 782 | "cell_type": "markdown", 783 | "metadata": {}, 784 | "source": [ 785 | "### Every common operation seen below works with sparse matrices." 786 | ] 787 | }, 788 | { 789 | "cell_type": "code", 790 | "execution_count": null, 791 | "metadata": {}, 792 | "outputs": [], 793 | "source": [ 794 | "I5 = spr.identity(5)\n", 795 | "I5 + np.ones((5,5))" 796 | ] 797 | }, 798 | { 799 | "cell_type": "code", 800 | "execution_count": null, 801 | "metadata": {}, 802 | "outputs": [], 803 | "source": [ 804 | "I5 + np.diag([1.,2.,3.,4.,5.])" 805 | ] 806 | }, 807 | { 808 | "cell_type": "code", 809 | "execution_count": null, 810 | "metadata": {}, 811 | "outputs": [], 812 | "source": [ 813 | "I5 @ np.diag([1.,2.,3.,4.,5.])" 814 | ] 815 | }, 816 | { 817 | "cell_type": "code", 818 | "execution_count": null, 819 | "metadata": {}, 820 | "outputs": [], 821 | "source": [ 822 | "kron_product = spr.kron(I5 , 10 *np.array([[1,2],[3,4]]))" 823 | ] 824 | }, 825 | { 826 | "cell_type": "code", 827 | "execution_count": null, 828 | "metadata": {}, 829 | "outputs": [], 830 | "source": [ 831 | "kron_product.todense()" 832 | ] 833 | }, 834 | { 835 | "cell_type": "markdown", 836 | "metadata": {}, 837 | "source": [ 838 | "## Time comparison" 839 | ] 840 | }, 841 | { 842 | "cell_type": "code", 843 | "execution_count": null, 844 | "metadata": {}, 845 | "outputs": [], 846 | "source": [ 847 | "import time\n", 848 | "A = np.ones((1000,1000))\n", 849 | "I5 = 3*spr.identity(1000)\n", 850 | "B = I5.todense()\n", 851 | "\n", 852 | "t0 = time.time()\n", 853 | "A@B\n", 854 | "t1 = time.time()\n", 855 | "A@I5\n", 856 | "t2 = time.time()\n", 857 | "\n", 858 | "print(t1-t0, t2-t1)" 859 | ] 860 | } 861 | ], 862 | "metadata": { 863 | "kernelspec": { 864 | "display_name": "Python 3", 865 | "language": "python", 866 | "name": "python3" 867 | }, 868 | "language_info": { 869 | "codemirror_mode": { 870 | "name": "ipython", 871 | "version": 3 872 | }, 873 | "file_extension": ".py", 874 | "mimetype": "text/x-python", 875 | "name": "python", 876 | "nbconvert_exporter": "python", 877 | "pygments_lexer": "ipython3", 878 | "version": "3.8.5" 879 | } 880 | }, 881 | "nbformat": 4, 882 | "nbformat_minor": 2 883 | } 884 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/B4a_discrete-choice-estimation-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "#
Block 4a Discrete choice estimation
\n", 8 | "###
Alfred Galichon (NYU & Sciences Po)
\n", 9 | "##
'math+econ+code' masterclass on optimal transport and economic applications
\n", 10 | "####
With python code examples
\n", 11 | "© 2018-2022 by Alfred Galichon. Past and present support from NSF grant DMS-1716489, ERC grant CoG-866274 are acknowledged, as well as inputs from contributors listed [here](http://www.math-econ-code.org/theteam).\n", 12 | "\n", 13 | "**If you reuse material from this masterclass, please cite as:**
\n", 14 | "Alfred Galichon, 'math+econ+code' masterclass on optimal transport and economic applications, January 2022. https://github.com/math-econ-code/mec_optim" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "## References\n", 22 | "\n", 23 | "* Savage, L. (1951). The theory of statistical decision. JASA.\n", 24 | "* Bonnet, Fougère, Galichon, Poulhès (2021). Minimax estimation of hedonic models. Preprint." 25 | ] 26 | }, 27 | { 28 | "cell_type": "markdown", 29 | "metadata": {}, 30 | "source": [ 31 | "## Loading the libraries\n", 32 | "\n", 33 | "First, let's load the libraries we shall need." 34 | ] 35 | }, 36 | { 37 | "cell_type": "code", 38 | "execution_count": null, 39 | "metadata": {}, 40 | "outputs": [], 41 | "source": [ 42 | "import numpy as np\n", 43 | "import os\n", 44 | "import pandas as pd\n", 45 | "import string as str\n", 46 | "import math\n", 47 | "import sys\n", 48 | "import time\n", 49 | "import scipy.sparse as spr\n", 50 | "from scipy import optimize, special\n", 51 | "# !python -m pip install -i https://pypi.gurobi.com gurobipy ## only if Gurobi not here\n", 52 | "import gurobipy as grb" 53 | ] 54 | }, 55 | { 56 | "cell_type": "markdown", 57 | "metadata": {}, 58 | "source": [ 59 | "## Our data\n", 60 | "We will go back to the dataset of Greene and Hensher (1997). As a reminder, 210 individuals are surveyed about their choice of travel mode between Sydney, Canberra and Melbourne, and the various costs (time and money) associated with each alternative. Therefore there are 840 = 4 x 210 observations, which we can stack into `travelmodedataset` a 3 dimensional array whose dimensions are mode,individual,dummy for choice+covariates.\n", 61 | "\n", 62 | "Let's load the dataset and represent it conveniently in a similar fashion as in block 6:" 63 | ] 64 | }, 65 | { 66 | "cell_type": "code", 67 | "execution_count": null, 68 | "metadata": {}, 69 | "outputs": [], 70 | "source": [ 71 | "#thepath = os.path.join(os.getcwd(),'data_mec_optim/demand_travelmode/')\n", 72 | "thepath = 'https://raw.githubusercontent.com/math-econ-code/mec_optim_2021-01/master/data_mec_optim/demand_travelmode/'\n", 73 | "\n", 74 | "travelmode = pd.read_csv(thepath+'travelmodedata.csv')\n", 75 | "\n", 76 | "travelmode['choice'] = np.where(travelmode['choice'] =='yes' , 1, 0)\n", 77 | "\n", 78 | "nobs = travelmode.shape[0]\n", 79 | "ncols = travelmode.shape[1]\n", 80 | "nbchoices = 4\n", 81 | "ninds = int(nobs/nbchoices)\n", 82 | "\n", 83 | "muhat_i_y = travelmode['choice'].values.reshape(ninds,nbchoices).T\n", 84 | "muhat_iy = muhat_i_y.flatten()\n", 85 | "\n", 86 | "muhat_i_y = travelmode['choice'].values.reshape(ninds,4).T\n", 87 | "muhat_iy = muhat_i_y.flatten()\n", 88 | "\n", 89 | "s_y = travelmode.groupby(['mode']).mean()['choice'].to_frame().sort_index()\n", 90 | "\n", 91 | "def two_d(X):\n", 92 | " return np.reshape(X,(X.size, 1))" 93 | ] 94 | }, 95 | { 96 | "cell_type": "markdown", 97 | "metadata": {}, 98 | "source": [ 99 | "# Estimation with no observable heterogeneity\n", 100 | "\n", 101 | "Start with assuming that there is no observable heterogeneity, so the only observation we have at hand are the aggregate market shares $s_y$. Hence the systematic utility will be the same for every agent. However, we wish to write a parametric model for it, namely assume a knwon parametric form for the dependence of $U_y$ with respect to various observed characteristics associated with $y$.\n", 102 | "\n", 103 | "Assume then that the utilities are parameterized as follows: $U = \\Phi \\beta$ where $\\beta\\in\\mathbb{R}^{p}$ is a parameter, and $\\Phi$ is a $\\left\\vert \\mathcal{Y}\\right\\vert \\times p$ matrix.\n", 104 | "\n", 105 | "The log-likelihood function is given by\n", 106 | "\n", 107 | "\\begin{align*}\n", 108 | "l\\left( \\beta\\right) =N\\sum_{y}\\hat{s}_{y}\\log\\sigma_{y}\\left(\\Phi \\beta\\right)\n", 109 | "\\end{align*}\n", 110 | "\n", 111 | "A common estimation method of $\\beta$ is by maximum likelihood%\n", 112 | "\n", 113 | "\\begin{align*}\n", 114 | "\\max_{\\beta}l\\left( \\beta\\right) .\n", 115 | "\\end{align*}\n", 116 | "\n", 117 | "MLE is statistically efficient; the problem is that the problem is not guaranteed to be convex, so there may be computational difficulties (e.g. local optima)." 118 | ] 119 | }, 120 | { 121 | "cell_type": "markdown", 122 | "metadata": {}, 123 | "source": [ 124 | "### MLE, logit case\n", 125 | "\n", 126 | "In the logit case,\n", 127 | "\n", 128 | "\\begin{align*}\n", 129 | "l\\left( \\beta\\right) =N\\left\\{ \\hat{s}^{\\intercal}\\Phi\\beta-\\log\\sum_{y}\\exp\\left( \\Phi\\beta\\right) _{y}\\right\\}\n", 130 | "\\end{align*}\n", 131 | "\n", 132 | "so that the max-likehood amounts to\n", 133 | "\n", 134 | "\\begin{align*}\n", 135 | "\\max_{\\beta}\\left\\{ \\hat{s}^{\\intercal} \\Phi \\beta-G\\left( \\Phi \\beta\\right)\n", 136 | "_{y}\\right\\}\n", 137 | "\\end{align*}\n", 138 | "\n", 139 | "whose value is the Legendre-Fenchel transform of $\\beta\\rightarrow G\\left( \\Phi \\beta\\right)$ evaluated at $\\Phi ^{^{\\intercal}}\\hat{s}$.\n", 140 | "\n", 141 | "Note that the vector $\\Phi^{^{\\intercal}}\\hat{s}$ is the vector of empirical moments, which is a sufficient statistics in the logit model.\n", 142 | "\n", 143 | "As a result, in the logit case, the MLE is a convex optimization problem, and it is therefore both statistically efficient and computationally efficient." 144 | ] 145 | }, 146 | { 147 | "cell_type": "markdown", 148 | "metadata": {}, 149 | "source": [ 150 | "### Moment estimation\n", 151 | "\n", 152 | "The previous remark will inspire an alternative procedure based on the moments statistics $\\Phi^{^{\\intercal}}\\hat{s}$.\n", 153 | "\n", 154 | "The social welfare is given in general by $W\\left( \\beta\\right) =G\\left( \\Phi\\beta\\right) $. One has $\\partial_{\\beta^{i}}W\\left(\\beta\\right) =\\sum_{y}\\sigma_{y}\\left( \\Phi\\beta\\right) \\Phi_{yi}$, that is \n", 155 | "\n", 156 | "\\begin{align*}\n", 157 | "\\nabla W\\left( \\beta\\right) = \\Phi^{\\intercal}\\sigma\\left( \\Phi\\beta\\right) ,\n", 158 | "\\end{align*}\n", 159 | "\n", 160 | "which is the vector of predicted moments.\n", 161 | "\n", 162 | "Therefore the program\n", 163 | "\n", 164 | "\\begin{align*}\n", 165 | "\\max_{\\beta}\\left\\{ \\hat{s}^{\\intercal}\\Phi\\beta-G\\left( \\Phi\\beta\\right)\n", 166 | "_{y}\\right\\}\n", 167 | "\\end{align*}\n", 168 | "\n", 169 | "picks up the parameter $\\beta$ which matches the empirical moments $X^{^{\\intercal}}\\hat{s}$ with the predicted ones $\\nabla W\\left(\\beta\\right) $. This procedure is not statistically efficient, but is computationally efficient becauses it arises from a convex optimization problem." 170 | ] 171 | }, 172 | { 173 | "cell_type": "markdown", 174 | "metadata": {}, 175 | "source": [ 176 | "### Fixed temperature MLE\n", 177 | "\n", 178 | "Back to the logit case. Recall we have\n", 179 | "\n", 180 | "\\begin{align*}\n", 181 | "l\\left( \\beta\\right) =N\\left\\{ \\hat{s}^{\\intercal}\\Phi\\beta-\\log\\sum_{y} \\exp\\left( \\Phi\\beta\\right) _{y}\\right\\}\n", 182 | "\\end{align*}\n", 183 | "\n", 184 | "Assume that we restrict ourselves to $\\beta^{\\top}z>0$. Then we can write $\\beta=\\theta/T$ where $T=1/\\beta^{\\top}z$ and $\\theta=\\beta T$. Call $\\Theta=\\left\\{ \\theta\\in\\mathbb{R}^{p},\\theta^{\\top}z=1\\right\\} $, so that $\\beta=\\theta/T$ where $\\theta\\in\\Theta$ and $T>0$. We have\n", 185 | "\n", 186 | "\\begin{align*}\n", 187 | "l\\left( \\theta,T\\right) =\\frac{N}{T}\\left\\{ \\hat{s}^{\\intercal}\n", 188 | "\\Phi\\theta-T\\log\\sum_{y}\\exp\\left( \\frac{\\left( \\Phi\\theta\\right) _{y}}{T}\\right) \\right\\}\n", 189 | "\\end{align*}\n", 190 | "\n", 191 | "and we define the *fixed temperature maximum likelihood estimator* by\n", 192 | "\n", 193 | "\\begin{align*}\n", 194 | "\\theta\\left( T\\right) =\\arg\\max_{\\theta}l\\left( \\theta,T\\right)\n", 195 | "\\end{align*}\n", 196 | "\n", 197 | " Note that $\\theta\\left( T\\right) =\\arg\\max_{\\theta\\in\\Theta}Tl\\left(\\theta,T\\right)$ where\n", 198 | "\n", 199 | "\\begin{align*}\n", 200 | "Tl\\left( \\theta,T\\right) =N\\left\\{ \\hat{s}^{\\intercal}\\Phi\\theta-T\\log\\sum _{y}\\exp\\left( \\frac{\\left( \\Phi\\theta\\right) _{y}}{T}\\right) \\right\\}\n", 201 | "\\end{align*}\n", 202 | "\n", 203 | "and we note that $Tl\\left( \\theta,T\\right) \\rightarrow N\\left\\{ \\hat{s}^{\\intercal}\\Phi\\theta-\\max_{y\\in\\mathcal{Y}}\\left\\{ \\left( \\Phi\\theta\\right)_{y}\\right\\} \\right\\} $ as $T\\rightarrow0$.\n", 204 | "\n", 205 | "We have\n", 206 | "\n", 207 | "\\begin{align*}\n", 208 | "\\frac{Tl\\left( \\theta,T\\right) }{N}=\\hat{s}^{\\intercal}\\Phi\\theta-T\\log\\sum_{y}\\exp\\left( \\frac{\\left( \\Phi\\theta\\right) _{y}}{T}\\right)\n", 209 | "\\end{align*}\n", 210 | "\n", 211 | "Let $\\theta\\left( 0\\right) =\\lim_{T\\rightarrow0}\\theta\\left(T\\right) $. Calling $m\\left( \\theta\\right) =\\max_{y\\in\\mathcal{Y}}\\left\\{\\left( \\Phi\\theta\\right) _{y}\\right\\} $, we have\n", 212 | "\n", 213 | "\\begin{align*}\n", 214 | "\\theta\\left( 0\\right) \\in\\arg\\max_{\\theta}\\left\\{ \\hat{s}^{\\intercal}\\Phi\\theta-m\\left( \\theta\\right) \\right\\},\n", 215 | "\\end{align*}\n", 216 | "\n", 217 | "or\n", 218 | "\n", 219 | "\\begin{align*}\n", 220 | "\\theta\\left( 0\\right) \\in\\arg\\min_{\\theta}\\left\\{ m\\left( \\theta\\right)-\\hat{s}^{\\intercal}\\Phi\\theta\\right\\},\n", 221 | "\\end{align*}\n", 222 | "\n", 223 | "Calling $m\\left( \\theta\\right) =\\max_{y\\in\\mathcal{Y}}\\left\\{ \\left(\\Phi\\theta\\right) _{y}\\right\\} $, one has \n", 224 | "\n", 225 | "\\begin{align*}\n", 226 | "\\theta\\left( T\\right) \\in\\arg\\max\\left\\{ \\hat{s}^{\\intercal}\\Phi\\theta-m\\left( \\theta\\right) -T\\log\\sum_{y}\\exp\\left( \\frac{\\left(\\Phi\\theta\\right) _{y}-m\\left( \\theta\\right) }{T}\\right) \\right\\}\n", 227 | "\\end{align*}\n" 228 | ] 229 | }, 230 | { 231 | "cell_type": "markdown", 232 | "metadata": {}, 233 | "source": [ 234 | "### Minimax-regret estimation\n", 235 | "\n", 236 | "Note that\n", 237 | "\n", 238 | "\\begin{align*}\n", 239 | "\\theta\\left( 0\\right) \\in\\arg\\max\\left\\{ \\hat{s}^{\\intercal}\\Phi\\theta\n", 240 | "-m\\left( \\theta\\right) \\right\\} .\n", 241 | "\\end{align*}\n", 242 | "\n", 243 | "Define $R_{i}\\left( \\theta,y\\right) =\\left( \\Phi\\theta\\right)_{y}-\\left( \\Phi\\theta\\right) _{y_{i}}$ the regret associated with observation $i$ with respect to $y$. This is equal to the difference between the payoff given by $y$ and the payoff obtained under observation $i$, denoting $y_{i}$ the action taken in observation $i$. The max-regret associated with observation $i$ is therefore\n", 244 | "\n", 245 | "\\begin{align*}\n", 246 | "\\max_{y\\in\\mathcal{Y}}R_{i}\\left( \\theta,y\\right) =\\max_{y\\in\\mathcal{Y}}\\left\\{ \\left( \\Phi\\theta\\right)_{y}-\\left( \\Phi\\theta\\right)_{y_{i}}\\right\\}\n", 247 | "\\end{align*}\n", 248 | "\n", 249 | "and the max-regret associated with the sample is $\\frac{1}{N}\\sum\\max_{y\\in\\mathcal{Y}}\\left\\{ R_{i}\\left( \\theta,y\\right) \\right\\} $, that is $\\max_{y\\in\\mathcal{Y}}\\left\\{ \\left( \\Phi\\theta\\right) _{y}\\right\\} - \\hat{s}^{\\intercal}X\\theta$.\n", 250 | "\n", 251 | "The minimax regret estimator\n", 252 | "\n", 253 | "\\begin{align*}\n", 254 | "\\hat{\\theta}^{MMR}=\\min_{\\theta}\\left\\{ m\\left( \\theta\\right) -\\hat\n", 255 | "{s}^{\\intercal}\\Phi\\theta\\right\\}\n", 256 | "\\end{align*}\n", 257 | "\n", 258 | "which has a linear programming fomulation\n", 259 | "\n", 260 | "\\begin{align*}\n", 261 | "& \\min_{m,\\theta}m-\\hat{s}^{\\intercal}\\Phi\\theta\\\\\n", 262 | "s.t.~ & m-\\left( \\Phi\\theta\\right) _{y}\\geq\\forall y\\in\\mathcal{Y}\n", 263 | "\\end{align*}" 264 | ] 265 | }, 266 | { 267 | "cell_type": "markdown", 268 | "metadata": {}, 269 | "source": [ 270 | "### Set-identification\n", 271 | "\n", 272 | "Note that the set of $\\theta$ that enter the solution to the problem above is not unique, but is a convex set. Denoting $V$ the value of program, we can look for bounds of $\\theta^{\\intercal}d$ for a chosen direction $d$ by\n", 273 | "\n", 274 | "\\begin{align*}\n", 275 | "& \\min_{\\theta,m}/\\max_{\\theta,m} \\theta^{\\intercal}d\\\\\n", 276 | "s.t.~ & m-\\hat{s}^{\\intercal}X\\theta=V\\\\\n", 277 | "& m\\geq\\left( \\Phi\\theta\\right)_{y}, \\quad \\forall y\\in\\mathcal{Y}%\n", 278 | "\\end{align*}" 279 | ] 280 | }, 281 | { 282 | "cell_type": "markdown", 283 | "metadata": {}, 284 | "source": [ 285 | "## Link with exponential families and GLM\n", 286 | "\n", 287 | "See class notes" 288 | ] 289 | }, 290 | { 291 | "cell_type": "markdown", 292 | "metadata": {}, 293 | "source": [ 294 | "# Estimation with observed heterogeneity\n", 295 | "\n", 296 | "We now assume that we observe individual characteristics that are relevant for individual choices, that is $U_{iy}=\\sum_k \\Phi_{iyk} \\beta_k$, or in matrix form\n", 297 | "$$U = \\Phi \\beta,$$ where $\\beta\\in\\mathbb{R}^{p}$ is a parameter, and $\\Phi$ is a $\\left(\\left\\vert \\mathcal{I}\\left\\vert\\right\\vert\\mathcal{Y}\\right\\vert \\right) \\times p$ matrix.\n", 298 | "\n", 299 | "See class notes." 300 | ] 301 | }, 302 | { 303 | "cell_type": "markdown", 304 | "metadata": {}, 305 | "source": [ 306 | "# Application\n", 307 | "\n", 308 | "Back to the dataset:" 309 | ] 310 | }, 311 | { 312 | "cell_type": "code", 313 | "execution_count": null, 314 | "metadata": {}, 315 | "outputs": [], 316 | "source": [ 317 | "Phi_iy_k = np.column_stack((np.kron(np.identity(4)[0:4,1:4],np.repeat(1, ninds).reshape(ninds,1)), - travelmode['travel'].values, - (travelmode['travel']*travelmode['income']).values, - travelmode['gcost'].values))" 318 | ] 319 | }, 320 | { 321 | "cell_type": "code", 322 | "execution_count": null, 323 | "metadata": {}, 324 | "outputs": [], 325 | "source": [ 326 | "nbK = Phi_iy_k.shape[1]\n", 327 | "phi_mean = Phi_iy_k.mean(axis = 0)\n", 328 | "phi_stdev = Phi_iy_k.std(axis = 0, ddof = 1)\n", 329 | "Phi_iy_k = ((Phi_iy_k - phi_mean).T/phi_stdev[:,None]).T" 330 | ] 331 | }, 332 | { 333 | "cell_type": "code", 334 | "execution_count": null, 335 | "metadata": {}, 336 | "outputs": [], 337 | "source": [ 338 | "def log_likelihood(theta):\n", 339 | " nbK = np.asarray(theta).shape[0]\n", 340 | " Xtheta = Phi_iy_k.dot(theta)/sigma\n", 341 | " Xthetamat_iy = Xtheta.reshape(nbchoices, ninds).T\n", 342 | " max_i = np.amax(Xthetamat_iy, axis = 1)\n", 343 | " expPhi_iy = np.exp((Xthetamat_iy.T -max_i).T)\n", 344 | " d_i = np.sum(expPhi_iy, axis = 1)\n", 345 | " \n", 346 | " val = np.sum(np.multiply(Xtheta,muhat_iy)) - np.sum(max_i) - sigma * np.sum(np.log(d_i))\n", 347 | "\n", 348 | " return -val" 349 | ] 350 | }, 351 | { 352 | "cell_type": "code", 353 | "execution_count": null, 354 | "metadata": {}, 355 | "outputs": [], 356 | "source": [ 357 | "def grad_log_likelihood(theta):\n", 358 | " nbK = np.asarray(theta).shape[0]\n", 359 | " Xtheta = Phi_iy_k.dot(theta)/sigma\n", 360 | " Xthetamat_iy = Xtheta.reshape(nbchoices, ninds).T\n", 361 | " max_i = np.amax(Xthetamat_iy, axis = 1)\n", 362 | " expPhi_iy = np.exp((Xthetamat_iy.T -max_i).T)\n", 363 | " d_i = np.sum(expPhi_iy, axis = 1)\n", 364 | " \n", 365 | " temp_mat = np.multiply(Phi_iy_k.T, expPhi_iy.T.flatten()).T\n", 366 | " list_temp = []\n", 367 | " for i in range(nbchoices):\n", 368 | " list_temp.append(temp_mat[i*ninds:(i+1)*ninds,])\n", 369 | " n_i_k = np.sum(list_temp,axis = 0)\n", 370 | " \n", 371 | " thegrad = muhat_iy.reshape(1,nbchoices*ninds).dot(Phi_iy_k).flatten() - np.sum(n_i_k.T/d_i, axis = 1)\n", 372 | "\n", 373 | " return -thegrad" 374 | ] 375 | }, 376 | { 377 | "cell_type": "code", 378 | "execution_count": null, 379 | "metadata": {}, 380 | "outputs": [], 381 | "source": [ 382 | "theta0 = np.repeat(0,nbK)\n", 383 | "sigma = 1\n", 384 | "outcome = optimize.minimize(log_likelihood,method = 'CG',jac = grad_log_likelihood, x0 = theta0)" 385 | ] 386 | }, 387 | { 388 | "cell_type": "code", 389 | "execution_count": null, 390 | "metadata": {}, 391 | "outputs": [], 392 | "source": [ 393 | "outcome" 394 | ] 395 | }, 396 | { 397 | "cell_type": "code", 398 | "execution_count": null, 399 | "metadata": {}, 400 | "outputs": [], 401 | "source": [ 402 | "temp_mle = 1 / outcome['x'][nbK - 1]\n", 403 | "theta_mle = outcome['x']*temp_mle\n", 404 | "print(temp_mle)\n", 405 | "print(theta_mle)" 406 | ] 407 | }, 408 | { 409 | "cell_type": "code", 410 | "execution_count": null, 411 | "metadata": {}, 412 | "outputs": [], 413 | "source": [ 414 | "lenobj = nbK+ninds\n", 415 | "c = np.concatenate((muhat_iy.reshape(1,nbchoices*ninds).dot(Phi_iy_k).flatten(),np.repeat(-1,ninds)))\n", 416 | "\n", 417 | "m = grb.Model('lp')\n", 418 | "x = m.addMVar(lenobj, name='x', lb=-grb.GRB.INFINITY)\n", 419 | "m.setObjective(c @ x, grb.GRB.MAXIMIZE)\n", 420 | "cstMat = spr.hstack((spr.csr_matrix(-Phi_iy_k), spr.kron(two_d(np.repeat(1,nbchoices)),spr.identity(ninds))))\n", 421 | "rhs = np.repeat(0,ninds*nbchoices)\n", 422 | "m.addConstr(cstMat @ x >= rhs)\n", 423 | "nbCstr = cstMat.shape[0]\n", 424 | "const_2 = np.array([0]*(nbK - 1))\n", 425 | "const_2 = np.append(const_2, 1)\n", 426 | "const_2 = np.append(const_2 ,[0]*ninds)\n", 427 | "m.addConstr(const_2 @ x == 1)\n", 428 | "m.optimize()\n", 429 | "if m.status == grb.GRB.Status.OPTIMAL:\n", 430 | " print(\"Value of the problem (Gurobi) =\", m.objval)\n", 431 | " opt_x = m.getAttr('x')" 432 | ] 433 | }, 434 | { 435 | "cell_type": "code", 436 | "execution_count": null, 437 | "metadata": {}, 438 | "outputs": [], 439 | "source": [ 440 | "theta_lp = np.array(opt_x[:nbK])\n", 441 | "print(theta_lp)\n", 442 | "print(theta_mle)" 443 | ] 444 | }, 445 | { 446 | "cell_type": "code", 447 | "execution_count": null, 448 | "metadata": {}, 449 | "outputs": [], 450 | "source": [ 451 | "indMax=100\n", 452 | "tempMax=temp_mle\n", 453 | "outcomemat = np.zeros((indMax+1,nbK-1))" 454 | ] 455 | }, 456 | { 457 | "cell_type": "code", 458 | "execution_count": null, 459 | "metadata": {}, 460 | "outputs": [], 461 | "source": [ 462 | "def log_likelihood_fixedtemp(subsetoftheta, *temp):\n", 463 | " val = log_likelihood(np.append(subsetoftheta, 1/temp[0]))\n", 464 | " \n", 465 | " return val" 466 | ] 467 | }, 468 | { 469 | "cell_type": "code", 470 | "execution_count": null, 471 | "metadata": {}, 472 | "outputs": [], 473 | "source": [ 474 | "def grad_log_likelihood_fixedtemp(subsetoftheta, *temp):\n", 475 | " val = grad_log_likelihood(np.append(subsetoftheta, 1/temp[0]))[:-1]\n", 476 | " \n", 477 | " return val" 478 | ] 479 | }, 480 | { 481 | "cell_type": "code", 482 | "execution_count": null, 483 | "metadata": {}, 484 | "outputs": [], 485 | "source": [ 486 | "outcomemat[0,:] = theta_lp[:-1]\n", 487 | "iterMax = indMax+1\n", 488 | "for k in range(2,iterMax+1,1):\n", 489 | " thetemp = tempMax * (k-1)/indMax\n", 490 | " outcomeFixedTemp = optimize.minimize(log_likelihood_fixedtemp,method = 'CG',jac = grad_log_likelihood_fixedtemp, args = (thetemp,), x0 = theta0[:-1])\n", 491 | " outcomemat[k-1,:] = outcomeFixedTemp['x']*thetemp" 492 | ] 493 | }, 494 | { 495 | "cell_type": "code", 496 | "execution_count": null, 497 | "metadata": {}, 498 | "outputs": [], 499 | "source": [ 500 | "outcomemat" 501 | ] 502 | }, 503 | { 504 | "cell_type": "markdown", 505 | "metadata": {}, 506 | "source": [ 507 | "The zero-temperature estimator is:" 508 | ] 509 | }, 510 | { 511 | "cell_type": "code", 512 | "execution_count": null, 513 | "metadata": {}, 514 | "outputs": [], 515 | "source": [ 516 | "print(outcomemat[1,:])" 517 | ] 518 | }, 519 | { 520 | "cell_type": "markdown", 521 | "metadata": {}, 522 | "source": [ 523 | "The mle estimator is:" 524 | ] 525 | }, 526 | { 527 | "cell_type": "code", 528 | "execution_count": null, 529 | "metadata": {}, 530 | "outputs": [], 531 | "source": [ 532 | "print(outcomemat[indMax,])" 533 | ] 534 | }, 535 | { 536 | "cell_type": "code", 537 | "execution_count": null, 538 | "metadata": {}, 539 | "outputs": [], 540 | "source": [ 541 | "nbB = 100\n", 542 | "thetemp = 1\n", 543 | "epsilon_biy = special.digamma(1) -np.log(-np.log(np.random.uniform(0,1,ninds*nbchoices*nbB)))\n", 544 | "lenobj = ninds*nbB+nbK\n", 545 | "\n", 546 | "newc = np.concatenate((muhat_iy.reshape(1,nbchoices*ninds).dot(Phi_iy_k).flatten(),np.repeat(-1/nbB,ninds*nbB)))\n", 547 | "newm = grb.Model('new_lp')\n", 548 | "x = newm.addMVar(lenobj, name='x', lb=-grb.GRB.INFINITY)\n", 549 | "newm.setObjective(newc @ x, grb.GRB.MAXIMIZE)\n", 550 | "mat1 = spr.kron(-Phi_iy_k, two_d(np.repeat(1,nbB)))\n", 551 | "mat2 = spr.kron(two_d(np.repeat(1,nbchoices)),spr.identity(ninds*nbB))\n", 552 | "newcstMat = spr.hstack((mat1, mat2))\n", 553 | "rhs = epsilon_biy\n", 554 | "newm.addConstr(newcstMat @ x >= rhs)\n", 555 | "newm.optimize()" 556 | ] 557 | }, 558 | { 559 | "cell_type": "code", 560 | "execution_count": null, 561 | "metadata": {}, 562 | "outputs": [], 563 | "source": [ 564 | "if m.status == grb.GRB.Status.OPTIMAL:\n", 565 | " print(\"Value of the problem (Gurobi) =\", newm.objval)\n", 566 | " opt_x = np.array(newm.getAttr('x'))\n", 567 | "newtheta_lp = opt_x[:nbK] / opt_x[nbK-1]" 568 | ] 569 | }, 570 | { 571 | "cell_type": "code", 572 | "execution_count": null, 573 | "metadata": {}, 574 | "outputs": [], 575 | "source": [ 576 | "print(theta_mle)\n", 577 | "print(newtheta_lp)" 578 | ] 579 | }, 580 | { 581 | "cell_type": "markdown", 582 | "metadata": {}, 583 | "source": [ 584 | "Finally probit" 585 | ] 586 | }, 587 | { 588 | "cell_type": "code", 589 | "execution_count": null, 590 | "metadata": {}, 591 | "outputs": [], 592 | "source": [ 593 | "nbB = 100\n", 594 | "thetemp = 1\n", 595 | "epsilon_biy = np.random.normal(nbB*ninds*nbchoices)\n", 596 | "lenobj = ninds*nbB+nbK\n", 597 | "\n", 598 | "newc = np.concatenate((muhat_iy.reshape(1,nbchoices*ninds).dot(Phi_iy_k).flatten(),np.repeat(-1/nbB,ninds*nbB)))\n", 599 | "newm = grb.Model('new_lp')\n", 600 | "x = newm.addMVar(lenobj, name='x', lb=-grb.GRB.INFINITY)\n", 601 | "newm.setObjective(newc @ x, grb.GRB.MAXIMIZE)\n", 602 | "mat1 = spr.kron(-Phi_iy_k, two_d(np.repeat(1,nbB)))\n", 603 | "mat2 = spr.kron(two_d(np.repeat(1,nbchoices)),spr.identity(ninds*nbB))\n", 604 | "newcstMat = spr.hstack((mat1, mat2))\n", 605 | "rhs = epsilon_biy\n", 606 | "newm.addConstr(newcstMat @ x >= rhs)\n", 607 | "newm.optimize()" 608 | ] 609 | } 610 | ], 611 | "metadata": { 612 | "kernelspec": { 613 | "display_name": "Python 3", 614 | "language": "python", 615 | "name": "python3" 616 | }, 617 | "language_info": { 618 | "codemirror_mode": { 619 | "name": "ipython", 620 | "version": 3 621 | }, 622 | "file_extension": ".py", 623 | "mimetype": "text/x-python", 624 | "name": "python", 625 | "nbconvert_exporter": "python", 626 | "pygments_lexer": "ipython3", 627 | "version": "3.8.5" 628 | } 629 | }, 630 | "nbformat": 4, 631 | "nbformat_minor": 2 632 | } 633 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/B4b_characteristics-models-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "#
Block 4b: Characteristics-based models of demand
\n", 8 | "###
Alfred Galichon (NYU & Sciences Po)
\n", 9 | "##
'math+econ+code' masterclass on optimal transport and economic applications
\n", 10 | "####
With python code examples
\n", 11 | "© 2018-2022 by Alfred Galichon. Past and present support from NSF grant DMS-1716489, ERC grant CoG-866274 are acknowledged, as well as inputs from contributors listed [here](http://www.math-econ-code.org/theteam).\n", 12 | "\n", 13 | "**If you reuse material from this masterclass, please cite as:**
\n", 14 | "Alfred Galichon, 'math+econ+code' masterclass on optimal transport and economic applications, January 2022. https://github.com/math-econ-code/mec_optim" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "### Learning objectives\n", 22 | "\n", 23 | "* Beyond GEV: the pure characteristics models, the random coefficient logit model, the probit model\n", 24 | "\n", 25 | "* Simulation methods: Accept-Reject and SARS\n", 26 | "\n", 27 | "* Demand inversion: The inversion theorem" 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": {}, 33 | "source": [ 34 | "### References\n", 35 | "\n", 36 | "* Galichon (2016). *Optimal Transport Methods in Economics*, Chapter 9.2, Princeton University Press\n", 37 | "\n", 38 | "* McFadden (1981). \"Econometric Models of Probabilistic Choice\", in C.F. Manski and D. McFadden (eds.), *Structural analysis of discrete data with econometric applications*, MIT Press.\n", 39 | "\n", 40 | "* Berry, Levinsohn, and Pakes (1995). \"Automobile Prices in Market Equilibrium,\" *Econometrica*.\n", 41 | "\n", 42 | "* Train. (2009). *Discrete Choice Methods with Simulation*. 2nd Edition. Cambridge University Press.\n", 43 | "\n", 44 | "* Galichon and Salanie (2020). \"Cupid's Invisible Hands\". Preprint (first version 2011).\n", 45 | "\n", 46 | "* Chiong, Galichon and Shum (2016), \"Duality in Discrete Choice Models\". *Quantitative Economics*.\n", 47 | "\n", 48 | "* Bonnet, Galichon, O'Hara and Shum (2017). \"Yogurts choose consumers? Identification of Random Utility Models via Two-Sided Matching\". Working paper." 49 | ] 50 | }, 51 | { 52 | "cell_type": "markdown", 53 | "metadata": {}, 54 | "source": [ 55 | "### Libraries\n", 56 | "\n", 57 | "Let's start by loading the libraries we shall need for this course." 58 | ] 59 | }, 60 | { 61 | "cell_type": "code", 62 | "execution_count": null, 63 | "metadata": {}, 64 | "outputs": [], 65 | "source": [ 66 | "import numpy as np\n", 67 | "import scipy.sparse as spr\n", 68 | "# !python -m pip install -i https://pypi.gurobi.com gurobipy ## only if Gurobi not here\n", 69 | "import gurobipy as grb" 70 | ] 71 | }, 72 | { 73 | "cell_type": "markdown", 74 | "metadata": {}, 75 | "source": [ 76 | "## Choice models beyond GEV" 77 | ] 78 | }, 79 | { 80 | "cell_type": "markdown", 81 | "metadata": {}, 82 | "source": [ 83 | "## The need for further models\n", 84 | "\n", 85 | "The GEV models are convenient analytically, but not very flexible.\n", 86 | "\n", 87 | "The logit model imposes zero correlation across alternatives\n", 88 | "\n", 89 | "The nested logit allows for nonzero correlation, but in a very rigid way (needs to define nests).\n", 90 | "\n", 91 | "A good example is the probit model, where $\\varepsilon$ is a Gaussian vector. For this model, there is no close-form solution neither for $G$ nor for $G^*$.\n", 92 | "\n", 93 | "More recently, a number of modern models don't have closed-form either. These models require simulation methods in order to approximate them by discrete models." 94 | ] 95 | }, 96 | { 97 | "cell_type": "markdown", 98 | "metadata": {}, 99 | "source": [ 100 | "## The pure characteristics model\n", 101 | "\n", 102 | "#### Motivation\n", 103 | "\n", 104 | "The pure characteristics model (Berry and Pakes, 2007) can be motivated as follows. Assume $y$ stands for the number of bedrooms. The logit model would assume that the random utility associated with a 2-BR is uncorrelated with a 3-BR, which is not realistic.\n", 105 | "\n", 106 | "Let $\\xi_{y}$ is the typical size of a bedroom of size $y$, one may introduce $\\epsilon$ as the valuation of size; in which case the utility shock associated with $y$ should be $\\varepsilon_{y}=\\epsilon\\xi_{y}$. More generally, the characteristics $\\xi_{y}$ is a $d$-dimensional (deterministic) vector, and $\\epsilon\\sim\\mathbf{P}_{\\epsilon}$ is a (random) vector of the same size standing for the valuations of the respective dimensions, so that\n", 107 | "\n", 108 | "\\begin{align*}\n", 109 | "\\varepsilon_{y}=\\epsilon^{\\intercal}\\xi_{y}.\n", 110 | "\\end{align*}\n", 111 | "\n", 112 | "For example, if each alternative $y$ stands for a model of car, the first component of $\\xi_{y}$ may be the price of car $y$; the other components may be other characteristics such as number of seats, fuel efficiency, size, etc. In that case, for a given dimension $y\\in\\mathcal{Y}_{0}$, $\\epsilon_{y}$ is the (random) valuation of this dimension by the consumer with taste vector $\\epsilon$." 113 | ] 114 | }, 115 | { 116 | "cell_type": "markdown", 117 | "metadata": {}, 118 | "source": [ 119 | "#### Definition\n", 120 | "\n", 121 | "Assume without loss of generality that $\\varepsilon_{y}=0$, that is $\\xi_{0}=0$ as we can always reduce the setting to this case by replacing $\\xi_{y}$ by $\\xi_{y}-\\xi_{0}$.\n", 122 | "\n", 123 | "Letting $Z$ be the $\\left\\vert \\mathcal{Y}\\right\\vert \\times d\\,$\\ matrix of $\\left( y,k\\right) $-term $\\xi_{y}^{k}$, this rewrites as\n", 124 | "\n", 125 | "\\begin{align*}\n", 126 | "\\varepsilon = Z\\epsilon.\n", 127 | "\\end{align*}\n", 128 | "\n", 129 | "Hence, we have\n", 130 | "\n", 131 | "\\begin{align*}\n", 132 | "G\\left( U\\right) =\\mathbb{E}\\left[ \\max\\left\\{ U+Z\\epsilon,0\\right\\}\\right].\n", 133 | "\\end{align*}\n", 134 | "\n", 135 | "and\n", 136 | "\n", 137 | "\\begin{align*}\n", 138 | "\\sigma_{y}\\left( U\\right) =\\Pr\\left( U_{y}-U_{z}\\geq\\left( Z\\epsilon\\right)_{y}-\\left( Z\\epsilon\\right)_{z}, \\quad forall z\\in\\mathcal{Y}_{0}\\backslash\\left\\{ y\\right\\} \\right).\n", 139 | "\\end{align*}\n" 140 | ] 141 | }, 142 | { 143 | "cell_type": "markdown", 144 | "metadata": {}, 145 | "source": [ 146 | "#### In dimension 1\n", 147 | "\n", 148 | "When $d=1$ (scalar characteristics), one has $\\sigma_{y}\\left(U\\right) =\\Pr\\left( U_{y}-U_{z}\\geq\\left( \\xi_{y}-\\xi_{z}\\right)\\epsilon~\\forall z\\in\\mathcal{Y}_{0}\\backslash\\left\\{ y\\right\\} \\right) $, and thus\n", 149 | "\n", 150 | "\\begin{align*}\n", 151 | "\\sigma_{y}\\left( U\\right) =\\Pr\\left( \\max_{z:\\xi_{y}>\\xi_{z}}\\left\\{\\frac{U_{y}-U_{z}}{\\xi_{y}-\\xi_{z}}\\right\\} \\leq\\epsilon\\leq\\min_{z:\\xi_{y}<\\xi_{z}}\\left\\{ \\frac{U_{y}-U_{z}}{\\xi_{y}-\\xi_{z}}\\right\\} \\right)\n", 152 | "\\end{align*}\n", 153 | "\n", 154 | "with the understanding that $\\max_{z\\in\\emptyset}f_{z}=-\\infty$ and \n", 155 | "$\\min_{z\\in\\emptyset}f_{z}=+\\infty$.\n", 156 | "\n", 157 | "Therefore, letting $\\mathbf{F}_{\\epsilon}$ be the c.d.f. associated with the distribution of $\\epsilon$, one has a closed-form expression for $\\sigma_{y}$:\n", 158 | "\n", 159 | "\\begin{align*}\n", 160 | "\\sigma_{y}\\left( U\\right) =\\mathbf{F}_{\\epsilon}\\left( \\left[ \\max_{z:\\xi_{y}>\\xi_{z}}\\left\\{ \\frac{U_{y}-U_{z}}{\\xi_{y}-\\xi_{z}}\\right\\},\\min_{z:\\xi_{y}<\\xi_{z}}\\left\\{ \\frac{U_{y}-U_{z}}{\\xi_{y}-\\xi_{z}}\\right\\}\\right] \\right)\n", 161 | "\\end{align*}" 162 | ] 163 | }, 164 | { 165 | "cell_type": "markdown", 166 | "metadata": {}, 167 | "source": [ 168 | "### The probit model\n", 169 | "\n", 170 | "When $\\mathbf{P}_{\\epsilon}$ is the $\\mathcal{N}\\left( 0,S\\right) $\n", 171 | "distribution, then the pure characteristics model is called a Probit model; in this case,\n", 172 | "\n", 173 | "\\begin{align*}\n", 174 | "\\varepsilon\\sim\\mathcal{N}\\left( 0,\\Sigma\\right) \\text{ where }%\n", 175 | "\\Sigma=ZSZ^{\\intercal}.\n", 176 | "\\end{align*}\n", 177 | "\n", 178 | "Note the distribution $\\varepsilon\\,$will not have full support unless $d\\geq\\left\\vert \\mathcal{Y}\\right\\vert $ and $Z$ is of full rank.\n", 179 | "\n", 180 | "Computing $\\sigma$ in the Probit model thus implies computing the mass assigned by the Gaussian distribution to rectangles of the type \n", 181 | "\n", 182 | "\\begin{align*}\n", 183 | "\\left[ l_{y},u_{y}\\right] .\n", 184 | "\\end{align*}\n", 185 | "\n", 186 | "When $\\Sigma$ is diagonal (random utility terms are i.i.d. across alternatives), this is numerically easy. However, this is computationally difficult in general (more on this later)." 187 | ] 188 | }, 189 | { 190 | "cell_type": "markdown", 191 | "metadata": {}, 192 | "source": [ 193 | "### The random coefficient logit model\n", 194 | "\n", 195 | "The random coefficient logit model (Berry, Levinsohn and Pakes, 1995) may be viewed as an interpolant between the random characteristics model and the logit model. In this case,\n", 196 | "\n", 197 | "\\begin{align*}\n", 198 | "\\varepsilon=\\left( 1-\\lambda\\right) Z\\epsilon+\\lambda\\eta\n", 199 | "\\end{align*}\n", 200 | "\n", 201 | "where $\\epsilon\\sim\\mathbf{P}_{\\epsilon}$, $\\eta$ is an EV1 distribution independent from the previous term, and $\\lambda$ is a interpolation parameter ($\\lambda=1$ is the logit model, and $\\lambda=0$ is the pure characteristics model).\n", 202 | "\n", 203 | "In this case, one may compute the Emax operator as \n", 204 | "\n", 205 | "\\begin{align*}\n", 206 | "G\\left( U\\right) & =\\mathbb{E}\\left[ \\max_{y\\in\\mathcal{Y}_{0}}\\left\\{U_{y}+\\left( 1-\\lambda\\right) \\left( Z\\epsilon\\right) _{y}+\\lambda\\eta_{y}\\right\\} \\right] \\\\\n", 207 | "& =\\mathbb{E}\\left[ \\mathbb{E}\\left[ \\max_{y\\in\\mathcal{Y}_{0}}\\left\\{U_{y}+\\left( 1-\\lambda\\right) \\left( Z\\epsilon\\right) _{y}+\\lambda\\eta_{y}\\right\\} |\\epsilon\\right] \\right] \\\\\n", 208 | "& =\\mathbb{E}\\left[ \\lambda\\log\\sum_{y\\in\\mathcal{Y}_{0}}\\exp\\left(\n", 209 | "\\frac{U_{y}+\\left( 1-\\lambda\\right) \\left( Z\\epsilon\\right) _{y}}{\\lambda}\\right) \\right]\n", 210 | "\\end{align*}\n", 211 | "\n", 212 | "Recall\n", 213 | "\n", 214 | "\\begin{align*}\n", 215 | "G\\left( U\\right) =\\mathbb{E}\\left[ \\lambda\\log\\sum_{y\\in\\mathcal{Y}_{0}}\\exp\\left( \\frac{U_{y}+\\left( 1-\\lambda\\right) \\left( Z\\epsilon\\right){y}}{\\lambda}\\right) \\right] .\n", 216 | "\\end{align*}\n", 217 | "\n", 218 | "The demand map in the random coefficients logit model is obtained by derivation of the expression of the Emax, i.e.\n", 219 | "\n", 220 | "\\begin{align*}\n", 221 | "\\sigma_{y}\\left( U\\right) =\\mathbb{E}\\left[ \\frac{\\exp\\left( \\frac{U_{y}+\\left( 1-\\lambda\\right) \\left( Z\\epsilon\\right) _{y}}{\\lambda}\\right) }{\\sum_{y^{\\prime}\\in\\mathcal{Y}_{0}}\\exp\\left( \\frac{U_{y^{\\prime}}+\\left( 1-\\lambda\\right) \\left( Z\\epsilon\\right) _{y^{\\prime}}}{\\lambda}\\right) }\\right] .\n", 222 | "\\end{align*}" 223 | ] 224 | }, 225 | { 226 | "cell_type": "markdown", 227 | "metadata": {}, 228 | "source": [ 229 | "## Simulation methods\n", 230 | "\n", 231 | "In a number of cases, one cannot compute the choice probabilities $\\sigma\\left( U\\right)$ using a closed-form expression. In this case, we need to resort to simulation to compute $G$, $G^{\\ast}$, $\\sigma$ and $\\sigma^{-1}$.\n", 232 | "\n", 233 | "The idea is that:\n", 234 | "\n", 235 | "* One is able to compute $G$ and $G^{\\ast}$ for discrete distributions (more on this later)\n", 236 | "\n", 237 | "* The sampled versions of $G$, $G^{\\ast}$, $\\sigma$ and $\\sigma^{-1}$ converge to the populations objects when the sample size is large." 238 | ] 239 | }, 240 | { 241 | "cell_type": "markdown", 242 | "metadata": {}, 243 | "source": [ 244 | "### Accept-reject simulator\n", 245 | "\n", 246 | "One simulates $N$ points $\\varepsilon^{i}\\sim P$. The Emax operator associated with the empirical sample distribution $P_{N}$ is\n", 247 | "\n", 248 | "\\begin{align*}\n", 249 | "G_{N}=N^{-1}\\sum_{i=1}^{N}\\max_{y\\in\\mathcal{Y}}\\left\\{ U_{y} + \\varepsilon_{y}^{i}\\right\\}\n", 250 | "\\end{align*}\n", 251 | "\n", 252 | "and the demand map is given by\n", 253 | "\n", 254 | "\\begin{align*}\n", 255 | "\\sigma_{N,y}\\left( U\\right) =N^{-1}\\sum_{i=1}^{N}1\\left\\{ U_{y} + \\varepsilon_{y}^{i}\\geq U_{z}+\\varepsilon_{z}^{i}, \\quad \\forall z\\in\\mathcal{Y}\n", 256 | "_{0}\\right\\}\n", 257 | "\\end{align*}\n", 258 | "\n", 259 | "In the literature, $\\sigma_{N}$ is called the *accept-reject simulator*." 260 | ] 261 | }, 262 | { 263 | "cell_type": "markdown", 264 | "metadata": {}, 265 | "source": [ 266 | "**Example**. We shall code the AR simulator for the probit model and generate choice probabilities. Take a vector of systematic utilities:" 267 | ] 268 | }, 269 | { 270 | "cell_type": "code", 271 | "execution_count": null, 272 | "metadata": {}, 273 | "outputs": [], 274 | "source": [ 275 | "seed = 777\n", 276 | "nbDraws = 1000\n", 277 | "U_y = np.array([0.4, 0.5, 0.2, 0.3, 0.1, 0])\n", 278 | "nbY = len(U_y)" 279 | ] 280 | }, 281 | { 282 | "cell_type": "markdown", 283 | "metadata": {}, 284 | "source": [ 285 | "We shall specify a probit model where alternatives have correlation $\\rho = .5$." 286 | ] 287 | }, 288 | { 289 | "cell_type": "code", 290 | "execution_count": null, 291 | "metadata": {}, 292 | "outputs": [], 293 | "source": [ 294 | "rho = 0.5\n", 295 | "Covar = rho * np.ones((nbY, nbY)) + (1 - rho) * np.eye(nbY)\n", 296 | "\n", 297 | "E = np.linalg.eigh(Covar)\n", 298 | "V = E[0]\n", 299 | "Q = E[1]\n", 300 | "SqrtCovar = Q.dot(np.diag(np.sqrt(V))).dot(Q.T)" 301 | ] 302 | }, 303 | { 304 | "cell_type": "markdown", 305 | "metadata": {}, 306 | "source": [ 307 | "Now generate the $\\varepsilon_{iy}$'s:" 308 | ] 309 | }, 310 | { 311 | "cell_type": "code", 312 | "execution_count": null, 313 | "metadata": {}, 314 | "outputs": [], 315 | "source": [ 316 | "epsilon_iy = np.random.normal(0,1,nbDraws*nbY).reshape(nbDraws,nbY).dot(SqrtCovar)\n", 317 | "u_iy = epsilon_iy + U_y\n", 318 | "\n", 319 | "ui = np.max(u_iy, axis=1)\n", 320 | "s_y = np.sum((u_iy.T - ui).T == 0, axis=0) / nbDraws" 321 | ] 322 | }, 323 | { 324 | "cell_type": "markdown", 325 | "metadata": {}, 326 | "source": [ 327 | "### McFadden's SARS\n", 328 | "\n", 329 | "McFadden's smoothed accept-reject simulator (SARS) consists in sampling $\\varepsilon\\sim P$: $\\varepsilon^{1},...,\\varepsilon^{N}$, and replacing the max by the smooth-max\n", 330 | "\n", 331 | "\\begin{align*}\n", 332 | "\\sigma_{N,T,y}\\left( U\\right) =\\sum_{i=1}^{N}\\frac{1}{N}\\frac{\\exp\\left((U_{y}+\\varepsilon_{y}^{i})/T\\right) }{\\sum_{z}\\exp\\left( (U_{z}+\\varepsilon_{z}^{i})/T\\right)}\n", 333 | "\\end{align*}\n", 334 | "\n", 335 | "One seeks $U$ so that the induced choice probabilities are $s$, that is\n", 336 | "\n", 337 | "\\begin{align*}\n", 338 | "s_{y}=\\sum_{i=1}^{N}\\frac{1}{N}\\frac{\\exp\\left( (U_{y}+\\varepsilon_{y}^{i})/T\\right) }{\\sum_{z}\\exp\\left( (U_{z}+\\varepsilon_{z}^{i})/T\\right)}.\n", 339 | "\\end{align*}\n", 340 | "\n", 341 | "The associated Emax operator is\n", 342 | "\n", 343 | "\\begin{align*}\n", 344 | "G_{N,T}\\left( U\\right) =\\mathbb{E}_{\\mathbf{P}_{N}}\\left[G_{\\operatorname{logit}}\\left( U+\\varepsilon^{i}\\right) \\right]\n", 345 | "\\end{align*}\n", 346 | "\n", 347 | "so the underlying random utility structure is a random coefficient logit." 348 | ] 349 | }, 350 | { 351 | "cell_type": "markdown", 352 | "metadata": {}, 353 | "source": [ 354 | "# The inversion theorem" 355 | ] 356 | }, 357 | { 358 | "cell_type": "markdown", 359 | "metadata": {}, 360 | "source": [ 361 | "The following theorem, first shown in Galichon and Salanié (2011), shows that the inversion of discrete choice models is an optimal transport problem.\n", 362 | "\n", 363 | "---\n", 364 | "**Theorem** [Galichon and Salanie]\n", 365 | "\n", 366 | "Consider a solution $\\left(u\\left( \\varepsilon\\right),v_{y}\\right) $ to the dual Monge-Kantorovich problem with cost $\\Phi\\left(\\varepsilon,y\\right) =\\varepsilon_{y}$, that is:\n", 367 | "\n", 368 | "\n", 369 | "\\begin{align*}\n", 370 | "\\min_{u,v} & \\int u\\left( \\varepsilon\\right) d\\mathbf{P}\\left(\n", 371 | "\\varepsilon\\right) +\\sum_{y\\in\\mathcal{Y}_{0}}v_{y}s_{y}\\\\\n", 372 | "s.t.~ & u\\left( \\varepsilon\\right) +v_{y}\\geq\\Phi\\left( \\varepsilon\n", 373 | ",y\\right)\n", 374 | "\\end{align*}\n", 375 | "\n", 376 | "Then:\n", 377 | "\n", 378 | "* $U=\\sigma^{-1}\\left( s\\right) $ is given by $U_{y}=v_{0}-v_{y}$.\n", 379 | "\n", 380 | "* The value of the [MK dual](#MKDualDC) is $-G^{\\ast}\\left( s\\right) $.\n" 381 | ] 382 | }, 383 | { 384 | "cell_type": "markdown", 385 | "metadata": {}, 386 | "source": [ 387 | "**Proof**\n", 388 | "\n", 389 | "$\\sigma^{-1}\\left( s\\right) =\\arg\\max_{U:U_{0}=0}\\left\\{ \\sum_{y\\in\\mathcal{Y}}s_{y}U_{y}-G(U)\\right\\} $, thus, letting $v=-U$, $v$ is the solution to\n", 390 | "\n", 391 | "\\begin{align*}\n", 392 | "\\min_{v:v_{0}=0}\\left\\{ \\sum_{y\\in\\mathcal{Y}_{0}}s_{y}v_{y}+G(-v)\\right\\}\n", 393 | "\\end{align*}\n", 394 | "which is exactly the [MK dual](#MKDualDC).\n" 395 | ] 396 | }, 397 | { 398 | "cell_type": "markdown", 399 | "metadata": {}, 400 | "source": [ 401 | "### Inversion of the pure characteristics model\n", 402 | "\n", 403 | "It follows from the inversion theorem that the problem of demand inversion in the pure characteristics model is a semi-discrete transport problem, a point made in Bonnet, Galichon, O'Hara, and Shum (2017).\n", 404 | "\n", 405 | "Indeed, the correspondence is:\n", 406 | "\n", 407 | "* an alternative $y$ is a fountain\n", 408 | "\n", 409 | "* the characteristics of an alternative is a fountain location\n", 410 | "\n", 411 | "* the systematic utility associated with alternative $y$ is minus the price of fountain $y$\n", 412 | "\n", 413 | "* the market share of altenative $y$ coindides with the capacity of fountain $y$\n", 414 | "\n", 415 | "* the random vector $\\epsilon$ is the location of an inhabitant" 416 | ] 417 | }, 418 | { 419 | "cell_type": "markdown", 420 | "metadata": {}, 421 | "source": [ 422 | "**Example**. As an example, we invert the probit model above." 423 | ] 424 | }, 425 | { 426 | "cell_type": "code", 427 | "execution_count": null, 428 | "metadata": {}, 429 | "outputs": [], 430 | "source": [ 431 | "A1 = spr.kron(np.ones((1, nbY)), spr.identity(nbDraws))\n", 432 | "A2 = spr.kron(spr.identity(nbY), np.ones((1, nbDraws)))\n", 433 | "A = spr.vstack([A1, A2])\n", 434 | "obj = epsilon_iy.flatten(order='F') # change this\n", 435 | "rhs = np.ones(nbDraws)/nbDraws\n", 436 | "rhs = np.append(rhs, s_y)\n", 437 | "m = grb.Model('optimal')\n", 438 | "x = m.addMVar(len(obj), name='couple')\n", 439 | "m.setObjective(obj @ x, grb.GRB.MAXIMIZE)\n", 440 | "m.addConstr(A @ x == rhs, name=\"Constr\")\n", 441 | "\n", 442 | "m.optimize()\n", 443 | "if m.Status == grb.GRB.Status.OPTIMAL:\n", 444 | " pi = m.getAttr('pi')\n", 445 | " Uhat_y = -np.subtract(pi[nbDraws:nbY+nbDraws], pi[nbY + nbDraws - 1])\n", 446 | " print('U_y (true and recovered)')\n", 447 | " print(U_y)\n", 448 | " print(Uhat_y)" 449 | ] 450 | }, 451 | { 452 | "cell_type": "markdown", 453 | "metadata": {}, 454 | "source": [ 455 | "### McFadden's SARS and regularized Optimal Transport\n", 456 | "\n", 457 | "Cf. Bonnet, Galichon, O'Hara and Shum (2017). Let $u_{i}=T\\log\\sum_{z}\\exp\\left((U_{z}+\\varepsilon_{z}^{i})/T\\right) $. One has\n", 458 | "\n", 459 | "\\begin{align*}\n", 460 | "\\left\\{\n", 461 | "\\begin{array}\n", 462 | "[c]{l}%\n", 463 | "s_{y}=\\sum_{i=1}^{N}\\frac{1}{N}\\exp\\left( (U_{y}-u_{i}+\\varepsilon_{y}^{i})/T\\right) \\\\\n", 464 | "\\frac{1}{N}=\\sum_{y}\\frac{1}{N}\\exp\\left( (U_{y}-u_{i}+\\varepsilon_{y}^{i})/T\\right)\n", 465 | "\\end{array}\n", 466 | "\\right. .\n", 467 | "\\end{align*}\n", 468 | "\n", 469 | "As a result, $\\left( u_{i},U_{y}\\right) $ are the solution of the regularized OT problem\n", 470 | "\n", 471 | "\\begin{align*}\n", 472 | "\\min_{u,U}\\sum_{i=1}^{N}\\frac{1}{N}u_{i}-\\sum s_{y}U_{y}+\\sum_{i,y}\\frac{1}{N}\\exp\\left( (U_{y}-u_{i}+\\varepsilon_{y}^{i})/T\\right) .\n", 473 | "\\end{align*}" 474 | ] 475 | }, 476 | { 477 | "cell_type": "markdown", 478 | "metadata": {}, 479 | "source": [ 480 | "### BLP's contraction mapping\n", 481 | "\n", 482 | "Consider the IPFP algorithm for solving the latter problem:\n", 483 | "\n", 484 | "\\begin{align*}\n", 485 | "\\left\\{\n", 486 | "\\begin{array}\n", 487 | "[c]{l}\n", 488 | "\\exp\\left( u_{i}^{k+1}/T\\right) =\\sum_{z}\\exp\\left( (U_{z}^{k}%\n", 489 | "+\\varepsilon_{z}^{i})/T\\right) \\\\\n", 490 | "\\exp U_{y}^{k+1}/T=\\frac{Ns_{y}}{\\sum_{i=1}^{N}\\exp\\left( (-u_{i}%\n", 491 | "^{k+1}+\\varepsilon_{y}^{i})/T\\right) }\n", 492 | "\\end{array}\n", 493 | "\\right.\n", 494 | "\\end{align*}\n", 495 | "\n", 496 | "This rewrites as\n", 497 | "\n", 498 | "\\begin{align*}\n", 499 | "\\exp U_{y}^{k+1}/T & =\\frac{Ns_{y}}{\\sum_{i=1}^{N}\\frac{\\exp\\left(\n", 500 | "\\varepsilon_{y}^{i}/T\\right) }{\\sum_{z}\\exp\\left( (U_{z}^{k}+\\varepsilon\n", 501 | "_{z}^{i})/T\\right) }},\\text{ i.e.}\\\\\n", 502 | "U_{y}^{k+1} & =T\\log s_{y}-T\\log\\sum_{i=1}^{N}\\frac{1}{N}\\frac{\\exp\\left(\n", 503 | "\\varepsilon_{y}^{i}/T\\right) }{\\sum_{z}\\exp\\left( (U_{z}^{k}+\\varepsilon\n", 504 | "_{z}^{i})/T\\right) }\n", 505 | "\\end{align*}\n", 506 | "\n", 507 | "which is exactly the contraction mapping algorithm of Berry, Levinsohn and Pakes (1995, appendix 1)." 508 | ] 509 | } 510 | ], 511 | "metadata": { 512 | "kernelspec": { 513 | "display_name": "Python 3", 514 | "language": "python", 515 | "name": "python3" 516 | }, 517 | "language_info": { 518 | "codemirror_mode": { 519 | "name": "ipython", 520 | "version": 3 521 | }, 522 | "file_extension": ".py", 523 | "mimetype": "text/x-python", 524 | "name": "python", 525 | "nbconvert_exporter": "python", 526 | "pygments_lexer": "ipython3", 527 | "version": "3.8.5" 528 | } 529 | }, 530 | "nbformat": 4, 531 | "nbformat_minor": 2 532 | } 533 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/SL3_quantiles-regression-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "#
Quantile methods
\n", 8 | "###
Alfred Galichon (NYU+Sciences Po)
\n", 9 | "##
'math+econ+code' masterclass on optimal transport and economic applications
\n", 10 | "####
With python code examples
\n", 11 | "© 2018-2022 by Alfred Galichon. Past and present support from NSF grant DMS-1716489, ERC grant CoG-866274 are acknowledged, as well as inputs from contributors listed [here](http://www.math-econ-code.org/team).\n", 12 | "\n", 13 | "**If you reuse material from this masterclass, please cite as:**
\n", 14 | "Alfred Galichon, 'math+econ+code' masterclass on optimal transport and economic applications, January 2022. https://github.com/math-econ-code/mec_optim" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "## References\n", 22 | "* Koneker and Bassett (1978). `Regression quantile'. Econometrica.\n", 23 | "* Koenker (2005). Quantile Regression. Cambridge University Press.\n", 24 | "* Koenker, Roger and Kevin F. Hallock. “Quantile Regression”. Journal of Economic Perspectives, Volume 15, Number 4, Fall 2001, Pages 143–156" 25 | ] 26 | }, 27 | { 28 | "cell_type": "markdown", 29 | "metadata": {}, 30 | "source": [ 31 | "### What is the right definition of a quantile?\n", 32 | "\n", 33 | "\n", 34 | "In dimension one, the following statements equivalently define quantiles of a\n", 35 | "distribution $Y\\sim\\nu$:\n", 36 | "\n", 37 | "* The quantile map is the (generalized) inverse of the cdf of $\\nu$:\n", 38 | "$F_{\\nu}^{-1}$.\n", 39 | "\n", 40 | "* The quantile map is the nondecreasing map $T$ such that if\n", 41 | "$U\\sim\\mathcal{U}\\left( \\left[ 0,1\\right] \\right) $, then $T\\left(\n", 42 | "U\\right) \\sim\\nu$.\n", 43 | "\n", 44 | "* The quantile at $t$ $F_{\\nu}^{-1}\\left( t\\right) $ is the solution of\n", 45 | "$\\min_{q}\\mathbb{E}\\left[ \\rho_{t}\\left( Y-q\\right) \\right] $, where\n", 46 | "$\\rho_{t}\\left( z\\right) =tz^{+}+\\left( 1-t\\right) z^{-}$.\n", 47 | "\n", 48 | "* The quantile map is the solution to the Monge problem between\n", 49 | "distribution $\\mathcal{U}\\left( \\left[ 0,1\\right] \\right) $ and $\\nu$\n", 50 | "relative to cost $\\Phi\\left( u,y\\right) =uy$.\n" 51 | ] 52 | }, 53 | { 54 | "cell_type": "markdown", 55 | "metadata": {}, 56 | "source": [ 57 | "### Quantile: properties\n", 58 | "\n", 59 | "\n", 60 | "\n", 61 | "Quantiles have a number of enjoyable properties that make them easy to work with.\n", 62 | "\n", 63 | "* They fully characterize the distribution $\\nu$.\n", 64 | "\n", 65 | "* They allow to construct a representation of $\\nu$: $F_{\\nu}^{-1}\\left(\n", 66 | "U\\right) $, $U\\sim\\mu:=\\mathcal{U}\\left( \\left[ 0,1\\right] \\right) $ has\n", 67 | "distribution $\\nu$.\n", 68 | "\n", 69 | "* They embed the median ($F_{\\nu}^{-1}\\left( 1/2\\right) $) and the\n", 70 | "extreme values ($F_{\\nu}^{-1}\\left( 0\\right) $ and $F_{\\nu}^{-1}\\left(\n", 71 | "1\\right) $).\n", 72 | "\n", 73 | "* They allow to provide a construction of distance between distributions:\n", 74 | "for $p\\geq1$,
\n", 75 | "$\n", 76 | "\\left( \\int\\left\\vert F_{\\nu}^{-1}\\left( t\\right) -F_{\\nu}^{-1}\\left(\n", 77 | "t\\right) \\right\\vert ^{p}dt\\right) ^{1/p}\n", 78 | "$
\n", 79 | "is the $p$-Wasserstein distance between $\\mu$ and $\\nu$.\n", 80 | "\n", 81 | "* They allow for a natural construction of robust statistics by trimming\n", 82 | "the interval $\\left[ 0,1\\right] $.\n", 83 | "\n", 84 | "* **They lend themselves to a natural notion of regression: quantile\n", 85 | "regression (Koenker and Bassett, 1978; Koenker 2005).**\n", 86 | "\n" 87 | ] 88 | }, 89 | { 90 | "cell_type": "markdown", 91 | "metadata": {}, 92 | "source": [ 93 | "### Quantile/quantile regression: applications\n", 94 | "\n", 95 | "Quantiles are widely used in economics, finance and statistics.\n", 96 | "\n", 97 | "* Comonotonicity: $\\left( F_{\\nu_{1}}^{-1}\\left( U\\right) ,F_{\\nu_{2}%\n", 98 | "}^{-1}\\left( U\\right) \\right) $ for $U\\sim\\mu:=\\mathcal{U}\\left( \\left[\n", 99 | "0,1\\right] \\right) $ is a comonotone representation of $\\nu_{1}$ and\n", 100 | "$\\nu_{2}$.\n", 101 | "\n", 102 | "* Mesures of risk: Value-at-risk $F_{\\nu}^{-1}\\left( 1-\\alpha\\right) $;\n", 103 | "CVaR $\\int_{1-\\alpha}^{1}F_{\\nu}^{-1}\\left( t\\right) dt$.\n", 104 | "\n", 105 | "* Non-expected utility: Yaari's rank-dependent EU (Choquet integral)\n", 106 | "$\\int_{0}^{1}F_{\\nu}^{-1}\\left( t\\right) w\\left( t\\right) dt$.\n", 107 | "\n", 108 | "* Demand theory: Matzkin's identication of hedonic models.\n", 109 | "\n", 110 | "* Income and inequality: Chamberlain (1994)'s study of the effect of\n", 111 | "unionization on wages.\n", 112 | "\n", 113 | "* Biometrics: growth charts.\n" 114 | ] 115 | }, 116 | { 117 | "cell_type": "markdown", 118 | "metadata": {}, 119 | "source": [ 120 | "### What is quantile regression?\n", 121 | "\n", 122 | "* Quantile regression therefore adopts a parameterization of the\n", 123 | "conditional quantile which is linear in $Z$. That is
\n", 124 | "$\n", 125 | "Q_{Y|X}\\left( \\tau|x\\right) =x^{\\intercal}\\beta_{\\tau}\n", 126 | "$
\n", 127 | "(note that one can always augment $x$ with nonlinear functions of $x$, so this\n", 128 | "parameterization is quite general).\n", 129 | "\n", 130 | "* In order to estimate $\\beta_{\\tau}$, first note that
\n", 131 | "$\n", 132 | "Q_{Y|X}\\left( \\tau|x\\right) =\\arg\\min_{q}\\mathbb{E}\\left[ \\rho_{\\tau\n", 133 | "}\\left( Y-q\\right) |X=x\\right]\n", 134 | "$
\n", 135 | "where $\\rho_{\\tau}\\left( w\\right) =\\tau w^{+}+\\left( 1-\\tau\\right) w^{-}$.\n", 136 | "\n", 137 | "* Therefore, if the conditional quantile has the specified form,\n", 138 | "$\\beta_{u}$ is the solution to
\n", 139 | "$\n", 140 | "\\min_{\\beta\\in\\mathbb{R}^{k}}\\mathbb{E}\\left[ \\rho_{\\tau}\\left(\n", 141 | "Y-X^{\\intercal}\\beta\\right) |X=x\\right]$
\n", 142 | "for each $x$, and therefore it is the solution to the quantile regression\n", 143 | "problem introduced by Koenker and Bassett (1978)
\n", 144 | "$\n", 145 | "\\min_{\\beta\\in\\mathbb{R}^{k}}\\mathbb{E}\\left[ \\rho_{\\tau}\\left(\n", 146 | "Y-X^{\\intercal}\\beta\\right) \\right] .\n", 147 | "$
" 148 | ] 149 | }, 150 | { 151 | "cell_type": "markdown", 152 | "metadata": {}, 153 | "source": [ 154 | "### Quantile regression as linear programming\n", 155 | "\n", 156 | "* Koenker and Bassett showed that this problem has a linear programming formulation. Indeed, consider its sample version
\n", 157 | "$\\min_{\\beta\\in\\mathbb{R}^{k}}\\sum_{i=1}^{n}\\rho_{\\tau}\\left( Y_{i}%\n", 158 | "-X_{i}^{\\intercal}\\beta\\right)$\n", 159 | "\n", 160 | "\n", 161 | "* Introducing $Y_{i}-X_{i}^{\\intercal}\\beta=P_{i}-N_{i}$ with $P_{i},N_{i}\\geq0$, we have
\n", 162 | "$\n", 163 | "\\begin{array}\n", 164 | "~ \\min_{\\substack{\\beta\\in\\mathbb{R}^{k}\\\\P_{i}\\geq0,N_{i}\\geq0}} & \\sum\n", 165 | "_{i=1}^{n}\\tau P_{i}+\\left( 1-\\tau\\right) N_{i}\\\\\n", 166 | "s.t.~ & P_{i}-N_{i}=Y_{i}-X_{i}^{\\intercal}\\beta\n", 167 | "\\end{array}$
\n", 168 | "therefore $\\beta$ can be obtained by simple linear programming.\n", 169 | "\n", 170 | "* The above can be simplified to
\n", 171 | "$\n", 172 | "\\begin{array}\n", 173 | "~ \\min_{\\substack{\\beta\\in\\mathbb{R}^{k}\\\\P_{i}\\geq0}} & \\sum\n", 174 | "_{i=1}^{n} P_{i}+\\left( 1-\\tau\\right) X_{i}^{\\intercal}\\beta\\\\\n", 175 | "s.t.~ & P_{i} + X_{i}^{\\intercal}\\beta\\geq Y_{i}~\\left[ V_i\\geq 0\\right]\n", 176 | "\\end{array}$
\n", 177 | "\n", 178 | "* The dual of the latter is
\n", 179 | "$\\begin{array}\n", 180 | "& \\max_{V\\geq 0} & \\sum_i Y_iV_i \\\\\n", 181 | "s.t.~ & V_i\\leq1~\\left[ P_i\\geq0\\right] \\\\\n", 182 | "& \\frac{1}{I}\\sum_i V_i X_{ik} =\\left( 1-\\tau\\right) \\bar{x}_k ~\\left[ \\beta_k\\right]\n", 183 | "\\end{array}$
\n", 184 | "where $\\bar{x}_k:=\\frac{1}{I}\\sum_i X_{ik}$." 185 | ] 186 | }, 187 | { 188 | "cell_type": "markdown", 189 | "metadata": {}, 190 | "source": [ 191 | "Let's import the libraries we shall need." 192 | ] 193 | }, 194 | { 195 | "cell_type": "code", 196 | "execution_count": null, 197 | "metadata": {}, 198 | "outputs": [], 199 | "source": [ 200 | "!pip install statsmodels" 201 | ] 202 | }, 203 | { 204 | "cell_type": "code", 205 | "execution_count": null, 206 | "metadata": {}, 207 | "outputs": [], 208 | "source": [ 209 | "import pandas as pd\n", 210 | "import numpy as np\n", 211 | "import gurobipy as grb\n", 212 | "import scipy.sparse as spr" 213 | ] 214 | }, 215 | { 216 | "cell_type": "markdown", 217 | "metadata": {}, 218 | "source": [ 219 | "## Loading the data\n", 220 | "\n", 221 | "We shall use a historical dataset by Engle on food expenditures as a function of the household's income." 222 | ] 223 | }, 224 | { 225 | "cell_type": "code", 226 | "execution_count": null, 227 | "metadata": {}, 228 | "outputs": [], 229 | "source": [ 230 | "engle_path = 'https://raw.githubusercontent.com/alfredgalichon/VQR/master/engle-data/'\n", 231 | "engle_data = pd.read_csv(engle_path+ 'engel.csv')\n", 232 | "\n", 233 | "engle_data.head()" 234 | ] 235 | }, 236 | { 237 | "cell_type": "code", 238 | "execution_count": null, 239 | "metadata": {}, 240 | "outputs": [], 241 | "source": [ 242 | "income = np.array(engle_data['income'])\n", 243 | "food = np.array(engle_data['food'])\n", 244 | "housing = np.array(engle_data['housing'])\n", 245 | "nbi=len(income)\n", 246 | "X_i_k = np.array([np.ones(nbi),income]).T\n", 247 | "#Y = np.array([food,housing]).T\n", 248 | "_,nbk = X_i_k.shape" 249 | ] 250 | }, 251 | { 252 | "cell_type": "code", 253 | "execution_count": null, 254 | "metadata": {}, 255 | "outputs": [], 256 | "source": [ 257 | "qr_lp=grb.Model()\n", 258 | "τ = 0.5\n", 259 | "P = qr_lp.addMVar(shape=nbi, name=\"P\")\n", 260 | "β = qr_lp.addMVar(shape=nbk, name=\"β\", lb=-grb.GRB.INFINITY )\n", 261 | "qr_lp.setObjective(np.ones(nbi) @ P + (1-τ) * (np.ones(nbi) @ X_i_k) @ β, grb.GRB.MINIMIZE)\n", 262 | "qr_lp.addConstr(P + X_i_k @ β >= food)\n", 263 | "qr_lp.optimize()\n", 264 | "\n", 265 | "βhat = qr_lp.getAttr('x')[-nbk:]\n", 266 | "βhat" 267 | ] 268 | }, 269 | { 270 | "cell_type": "markdown", 271 | "metadata": {}, 272 | "source": [ 273 | "We can recover the result using the `quantreg` package of `statsmodel` library:" 274 | ] 275 | }, 276 | { 277 | "cell_type": "code", 278 | "execution_count": null, 279 | "metadata": {}, 280 | "outputs": [], 281 | "source": [ 282 | "import statsmodels.api as sm\n", 283 | "import statsmodels.formula.api as smf\n", 284 | "import matplotlib.pyplot as plt" 285 | ] 286 | }, 287 | { 288 | "cell_type": "markdown", 289 | "metadata": {}, 290 | "source": [ 291 | "Fit using:" 292 | ] 293 | }, 294 | { 295 | "cell_type": "code", 296 | "execution_count": null, 297 | "metadata": {}, 298 | "outputs": [], 299 | "source": [ 300 | "model = smf.quantreg('food ~ income', engle_data)\n", 301 | "print(model.fit(q=τ).summary())" 302 | ] 303 | }, 304 | { 305 | "cell_type": "code", 306 | "execution_count": null, 307 | "metadata": {}, 308 | "outputs": [], 309 | "source": [ 310 | "# code taken from statsmodel documentation: \n", 311 | "# https://www.statsmodels.org/dev/examples/notebooks/generated/quantile_regression.html\n", 312 | "\n", 313 | "quantiles = np.arange(0.05, 0.96, 0.1)\n", 314 | "\n", 315 | "\n", 316 | "def fit_model(q):\n", 317 | " res = model.fit(q=q)\n", 318 | " return [q, res.params[\"Intercept\"], res.params[\"income\"]] + res.conf_int().loc[\n", 319 | " \"income\"\n", 320 | " ].tolist()\n", 321 | "\n", 322 | "\n", 323 | "models = [fit_model(x) for x in quantiles]\n", 324 | "models = pd.DataFrame(models, columns=[\"q\", \"a\", \"b\", \"lb\", \"ub\"])\n", 325 | "\n", 326 | "ols = smf.ols(\"food ~ income\", engle_data).fit()\n", 327 | "ols_ci = ols.conf_int().loc[\"income\"].tolist()\n", 328 | "ols = dict(\n", 329 | " a=ols.params[\"Intercept\"], b=ols.params[\"income\"], lb=ols_ci[0], ub=ols_ci[1]\n", 330 | ")\n", 331 | "\n", 332 | "print(models)\n", 333 | "print(ols)" 334 | ] 335 | }, 336 | { 337 | "cell_type": "code", 338 | "execution_count": null, 339 | "metadata": {}, 340 | "outputs": [], 341 | "source": [ 342 | "# code taken from statsmodel documentation: \n", 343 | "# https://www.statsmodels.org/dev/examples/notebooks/generated/quantile_regression.html\n", 344 | "\n", 345 | "x = np.arange(engle_data.income.min(), engle_data.income.max(), 50)\n", 346 | "get_y = lambda a, b: a + b * x\n", 347 | "\n", 348 | "fig, ax = plt.subplots(figsize=(8, 6))\n", 349 | "\n", 350 | "for i in range(models.shape[0]):\n", 351 | " y = get_y(models.a[i], models.b[i])\n", 352 | " ax.plot(x, y, linestyle=\"dotted\", color=\"grey\")\n", 353 | "\n", 354 | "y = get_y(ols[\"a\"], ols[\"b\"])\n", 355 | "\n", 356 | "ax.plot(x, y, color=\"red\", label=\"OLS\")\n", 357 | "ax.scatter(engle_data.income, engle_data.food, alpha=0.2)\n", 358 | "ax.set_xlim((240, 3000))\n", 359 | "ax.set_ylim((240, 2000))\n", 360 | "legend = ax.legend()\n", 361 | "ax.set_xlabel(\"Income\", fontsize=16)\n", 362 | "ax.set_ylabel(\"Food expenditure\", fontsize=16)" 363 | ] 364 | }, 365 | { 366 | "cell_type": "markdown", 367 | "metadata": {}, 368 | "source": [ 369 | "# Part 3: vector quantile regression\n", 370 | "\n", 371 | "## References\n", 372 | "* Ekeland, Galichon and Henry (2012). Comonotonic measures of multivariate risks. *Mathematical Finance*.\n", 373 | "* Carlier, Galichon and Chernozhukov (2016). Vector quantile regression: an optimal transport approach. *Annals of Statistics*.\n", 374 | "* Carlier, Galichon and Chernozhukov (2017). Vector quantile regression beyond correct specification. *Journal of Multivariate Analysis*.\n", 375 | "* Chernozhukov, Galichon, Hallin and Henry (2017). Monge-Kantorovich Depth, Quantiles, Ranks and Signs. *Annals of Statistics*.\n", 376 | "* Carlier, Chernozhukov, De Bie, and G (2021). Vector quantile regression and optimal transport, from theory to numerics. Forthcoming, *Empirical Economics*." 377 | ] 378 | }, 379 | { 380 | "cell_type": "markdown", 381 | "metadata": {}, 382 | "source": [ 383 | "### Classical quantile regression and duality\n", 384 | "\n", 385 | "\n", 386 | "* Recall
\n", 387 | "$\\begin{array}\n", 388 | "\\min_{P\\geq0,N\\geq0,\\beta} & \\mathbb{E}\\left[ \\tau P+\\left( 1-\\tau\\right)\n", 389 | "N\\right] \\\\\n", 390 | "s.t.~ & P-N=Y-X^{\\top}\\beta~\\left[ V\\right]\n", 391 | "\\end{array}$
\n", 392 | "where
\n", 393 | "$P=\\left( Y-X^{\\top}\\beta\\right) ^{+}$ and $N=\\left( Y-X^{\\top}\n", 394 | "\\beta\\right) ^{-}$,
\n", 395 | "\n", 396 | "\n", 397 | "* Eliminate $N$ and rewrite
\n", 398 | "$\\begin{array}\n", 399 | "~\\min_{P\\geq0,\\beta} & \\mathbb{E}\\left[ P+\\left( 1-\\tau\\right)\n", 400 | "X^{\\top}\n", 401 | "\\beta \\right] \\\\\n", 402 | "s.t.~ & P + X^{\\top}\\beta\\geq Y ~\\left[ V\\right]\n", 403 | "\\end{array}$
\n", 404 | "which we call the *dual* problem.\n", 405 | "\n", 406 | "* The corresponding primal problem is
\n", 407 | "$\\begin{array}\n", 408 | "& \\max_{V\\geq 0} & \\mathbb{E}\\left[ YV\\right] \\\\\n", 409 | "s.t.~ & V\\leq1~\\left[ P\\geq0\\right] \\\\\n", 410 | "& \\mathbb{E}\\left[ VX\\right] =\\left( 1-\\tau\\right) \\mathbb{E}\\left[\n", 411 | "X\\right] ~\\left[ \\beta\\right]\n", 412 | "\\end{array}$
" 413 | ] 414 | }, 415 | { 416 | "cell_type": "markdown", 417 | "metadata": {}, 418 | "source": [ 419 | "### Complementary slackness\n", 420 | "\n", 421 | "* Let $V\\left( \\tau\\right) $ and $\\beta\\left( \\tau\\right) $ be\n", 422 | "solutions to the above program. Complementary slackness yields
\n", 423 | "$\\left\\{\\begin{array}\n", 424 | "~Y-X^{\\top}\\beta\\left( \\tau\\right) & <0\\implies V\\left( \\tau\\right) =0\\\\\n", 425 | "Y-X^{\\top}\\beta\\left( \\tau\\right) & >0\\implies V\\left( \\tau\\right) =1\n", 426 | "\\end{array}\\right.$
\n", 427 | "therefore
\n", 428 | "$1\\left\\{ Y>X^{\\top}\\beta\\left( \\tau\\right) \\right\\} \\leq V\\left(\n", 429 | "\\tau\\right) \\leq1\\left\\{ Y\\geq X^{\\top}\\beta\\left( \\tau\\right) \\right\\}.$\n", 430 | "\n", 431 | "* Assume $\\left( X,Y\\right) $ has a continuous distribution,. Then for\n", 432 | "any $\\beta$, $\\Pr\\left( Y-X^{\\top}\\beta=0\\right) =0$, and therefore one has\n", 433 | "almost surely
\n", 434 | "$V\\left( \\tau\\right) =1\\left\\{ Y\\geq X^{\\top}\\beta\\left( \\tau\\right)\n", 435 | "\\right\\}.$\n" 436 | ] 437 | }, 438 | { 439 | "cell_type": "markdown", 440 | "metadata": {}, 441 | "source": [ 442 | "### Quantile curve regression\n", 443 | "\n", 444 | "* Consider now solving the problems above for all values of $\\tau$ *all at once*. We can write:
\n", 445 | "$\\max_{V\\left( .\\right) \\geq 0 }\\int_{0}^{1}\\mathbb{E}\\left[ YV\\left(\n", 446 | "\\tau\\right) \\right] d\\tau$
\n", 447 | "s.t.
\n", 448 | "$V\\left( \\tau\\right) \\leq1~\\left[ P\\left( \\tau\\right) \\geq0\\right] $
\n", 449 | "$ \\mathbb{E}\\left[ V\\left( \\tau\\right) X\\right] =\\left( 1-\\tau\\right)\n", 450 | "\\mathbb{E}\\left[ X\\right] ~\\left[ \\beta\\left( \\tau\\right) \\right]$
\n", 451 | "\n", 452 | "\n", 453 | "* The problem has dual
\n", 454 | "$\\begin{array}\n", 455 | "~\\min_{P\\geq0,\\beta} & \\int_{0}^{1}\\mathbb{E}\\left[ P\\left(\n", 456 | "\\tau\\right) +\\left( 1-\\tau\\right) X^{\\top}\\beta\\left( \\tau\\right)\n", 457 | "\\right] d\\tau\\\\\n", 458 | "s.t.~ & P\\left( \\tau\\right) \\geq Y-X^{\\top}\\beta\\left(\n", 459 | "\\tau\\right) ~\\left[ V\\left( \\tau\\right) \\right]\n", 460 | "\\end{array}$\n", 461 | "\n", 462 | "\n", 463 | "* The solution to these problems are the same as the solutions to each of the previous problem -- the contraints don't interfere." 464 | ] 465 | }, 466 | { 467 | "cell_type": "markdown", 468 | "metadata": {}, 469 | "source": [ 470 | "* Sample version:
\n", 471 | "$\\begin{array}\n", 472 | "~\\max_{V_{ti}\\geq 0} & \\frac{1}{I}\\sum_{t,i} V_{ti}Y_{i}\\\\\n", 473 | "s.t.~ & V_{ti}\\leq 1\\\\\n", 474 | "& \\frac{1}{I}\\left( VX\\right) _{tk}=\\left( 1-\\tau _{t}\\right) \\bar{x}_{k}%\n", 475 | "\\left[ \\beta \\right] \n", 476 | "\\end{array}$\n", 477 | "\n", 478 | "* In matrix terms, this is
\n", 479 | "$\\begin{array}\n", 480 | "~\\max_{V \\geq 0} & \\frac{1}{I}1^\\top_T V Y\\\\\n", 481 | "s.t.~ & V\\leq 1\\\\\n", 482 | "& \\frac{1}{I}\\left( VX\\right) =\\left( 1-\\tau \\right) \\bar{x}^\\top\n", 483 | "\\left[ \\beta \\right] \n", 484 | "\\end{array}$\n", 485 | "\n", 486 | "* After vectorization $v=vec(V)$
\n", 487 | "$\\begin{array}\n", 488 | "~\\max_{v \\geq 0} & \\frac{1}{I}\\left( 1_{T}\\otimes Y\\right) ^{\\top }v\\\\\n", 489 | "s.t.~ & V\\leq 1\\\\\n", 490 | "& \\frac{1}{I}\\left( I_{T}\\otimes X^{\\top }\\right) v=vec\\left( \\left( 1-\\tau \\right) \\bar{%\n", 491 | "x}^{\\top }\\right) \n", 492 | "\\end{array}$\n", 493 | "\n", 494 | "\n", 495 | "\n", 496 | "\n" 497 | ] 498 | }, 499 | { 500 | "cell_type": "markdown", 501 | "metadata": {}, 502 | "source": [ 503 | "Code this as:" 504 | ] 505 | }, 506 | { 507 | "cell_type": "code", 508 | "execution_count": null, 509 | "metadata": {}, 510 | "outputs": [], 511 | "source": [ 512 | "Y_i_1 = food.reshape((-1,1))\n", 513 | "nbt=21\n", 514 | "τ_t_1 = np.linspace(0,1,nbt).reshape((-1,1))\n", 515 | "A = spr.kron(spr.identity(nbt),X_i_k.T) / nbi\n", 516 | "obj = np.kron(np.ones((nbt,1)),Y_i_1).T / nbi\n", 517 | "xbar_1_k = X_i_k.mean(axis = 0).reshape((1,-1))\n", 518 | "rhs = ((1-τ_t_1) * xbar_1_k).flatten()\n", 519 | "qrs_lp=grb.Model()\n", 520 | "qrs_lp.setParam( 'OutputFlag', False )\n", 521 | "v = qrs_lp.addMVar(shape=nbi*nbt, name=\"v\",lb=0,ub=1)\n", 522 | "qrs_lp.setObjective(obj @ v , grb.GRB.MAXIMIZE)\n", 523 | "qrs_lp.addConstr(A @ v == rhs)\n", 524 | "qrs_lp.optimize()\n", 525 | "\n", 526 | "βqrs_t_k = np.array(qrs_lp.getAttr('pi')).reshape((nbt,nbk))\n", 527 | "βqrs_t_k[10,:]" 528 | ] 529 | }, 530 | { 531 | "cell_type": "markdown", 532 | "metadata": {}, 533 | "source": [ 534 | "### Quantile curve regression under dual monotonicity constraint\n", 535 | "\n", 536 | "* Koenker and Ng (2005) consider imposing the monotonicity constraint of\n", 537 | "the estimated quantile curves. Thus, they impose a constraint on the dual,\n", 538 | "namely:
$X^{\\top}\\beta\\left( \\tau\\right) \\geq X^{\\top}\\beta\\left(\n", 539 | "\\tau^{\\prime}\\right) $ for $\\tau\\geq\\tau^{\\prime}$,
\n", 540 | "that is
\n", 541 | "$\\begin{array}\n", 542 | "~\\min_{P\\geq0,N\\geq0,\\beta} & \\int_{0}^{1}\\mathbb{E}\\left[ P\\left(\n", 543 | "\\tau\\right) +\\left( 1-\\tau\\right) X^{\\top}\\beta\\left( \\tau\\right)\n", 544 | "\\right] d\\tau\\\\\n", 545 | "s.t.~ & P\\left( \\tau\\right) -N\\left( \\tau\\right) =Y-X^{\\top}\\beta\\left(\n", 546 | "\\tau\\right) ~\\left[ V\\left( \\tau\\right) \\right] \\\\\n", 547 | "& X^{\\top}\\beta\\left( \\tau\\right) \\geq X^{\\top}\\beta\\left( \\tau^{\\prime\n", 548 | "}\\right) ,~\\tau\\geq\\tau^{\\prime}%\n", 549 | "\\end{array}$\n", 550 | "\n", 551 | "* This is the most natural approach to solve the non-monotonicity problem. However, it does not leads to a simple duality." 552 | ] 553 | }, 554 | { 555 | "cell_type": "markdown", 556 | "metadata": {}, 557 | "source": [ 558 | "### Quantile curve regression under primal monotonicity constraint\n", 559 | "\n", 560 | "* By contrast, [CCG]'s vector quantile regression approach imposes the constraint that the primal variable
\n", 561 | "$\\tau\\rightarrow V\\left( \\tau\\right) $
\n", 562 | "should be nonincreasing. This is justified by the fact that
\n", 563 | "$V\\left( \\tau\\right) =1\\left\\{ Y\\geq X^{\\top}\\beta\\left( \\tau\\right) \\right\\} $,
\n", 564 | "so
\n", 565 | "$ X^{\\top} \\beta\\left( \\tau\\right) \\text{ nondecreasing in } \\tau \\implies V\\left( \\tau\\right) $ nonincreasing.\n", 566 | "\n", 567 | "* Therefore, we let consider the program\n", 568 | "\\begin{align*}\n", 569 | "& \\max_{V\\left( \\tau\\right) }\\int_{0}^{1}\\mathbb{E}\\left[ YV\\left(\n", 570 | "\\tau\\right) \\right] d\\tau\\\\\n", 571 | "s.t.~ & V\\left( \\tau\\right) \\geq0~\\left[ N\\left( \\tau\\right)\n", 572 | "\\geq0\\right] \\\\\n", 573 | "& V\\left( \\tau\\right) \\leq1~\\left[ P\\left( \\tau\\right) \\geq0\\right] \\\\\n", 574 | "& \\mathbb{E}\\left[ V\\left( \\tau\\right) X\\right] =\\left( 1-\\tau\\right)\n", 575 | "\\mathbb{E}\\left[ X\\right] ~\\left[ \\beta\\left( \\tau\\right) \\right] \\\\\n", 576 | "& V\\left( \\tau\\right) \\leq V\\left( \\tau^{\\prime}\\right) ,~\\tau\\geq\n", 577 | "\\tau^{\\prime}%\n", 578 | "\\end{align*}" 579 | ] 580 | }, 581 | { 582 | "cell_type": "markdown", 583 | "metadata": {}, 584 | "source": [ 585 | "### Primal monotonicity constraint, sample version\n", 586 | "\n", 587 | "* Consider $\\tau_{1}=0<...<\\tau_{T}\\leq1$ and let $\\bar{x}$ be the\n", 588 | "$1\\times K$ row vector whose $k$-th entry is $\\mathbb{E}\\left[ X_{k}\\right]\n", 589 | "$.
One has\n", 590 | "\\begin{align*}\n", 591 | "& \\max_{V_{ti}\\geq0}\\frac{1}{I}\\sum_{\\substack{1\\leq i\\leq I\\\\1\\leq t\\leq T}}V_{ti}%\n", 592 | "Y_{i}\\\\\n", 593 | "& V_{ti}\\leq1\\\\\n", 594 | "& \\frac{1}{I}\\sum_{1\\leq i\\leq I}V_{ti}X_{ik}=\\left( 1-\\tau_{t}\\right)\n", 595 | "\\bar{x}_{k}\\\\\n", 596 | "& V_{\\left( t+1\\right) i}\\leq V_{ti}\n", 597 | "\\end{align*}\n", 598 | "\n", 599 | "\n", 600 | "* As $\\tau_{1}=0$, one has necessarly $V_{1i}=1$ and the program becomes\n", 601 | "\\begin{align*}\n", 602 | "& \\max_{V_{ti}}\\frac{1}{I}\\sum_{\\substack{1\\leq i\\leq I\\\\1\\leq t\\leq T}}V_{ti}Y_{i}\\\\\n", 603 | "& V_{1i}=1\\\\\n", 604 | "& \\frac{1}{I}\\sum_{1\\leq i\\leq I}V_{ti}X_{ik}=\\left( 1-\\tau_{t}\\right)\n", 605 | "\\bar{x}_{k}\\\\\n", 606 | "& V_{t1}\\geq V_{t2}\\geq...\\geq V_{t\\left( m-1\\right) }\\geq V_{T,i}\\geq0.\n", 607 | "\\end{align*}" 608 | ] 609 | }, 610 | { 611 | "cell_type": "markdown", 612 | "metadata": {}, 613 | "source": [ 614 | "### Matrix notations\n", 615 | "\n", 616 | "* Let $\\tau$ be the $T\\times1$ row matrix with entries $\\tau_{k}$.\n", 617 | "\n", 618 | "* Let $D$ be a $T\\times T$ matrix defined as\n", 619 | "$$\n", 620 | "D=\n", 621 | "\\begin{pmatrix}\n", 622 | "1 & 0 & 0 & \\cdots & 0 & 0\\\\\n", 623 | "-1 & 1 & 0 & \\ddots & \\vdots & \\vdots\\\\\n", 624 | "0 & -1 & 1 & \\ddots & 0 & 0\\\\\n", 625 | "\\vdots & \\ddots & \\ddots & \\ddots & 0 & 0\\\\\n", 626 | "\\vdots & & 0 & -1 & 1 & 0\\\\\n", 627 | "0 & & 0 & 0 & -1 & 1\n", 628 | "\\end{pmatrix}\n", 629 | "$$\n", 630 | "we have $V^{\\top}D\\geq0$ if and only if $$V_{1i}\\geq V_{2i}\\geq...\\geq\n", 631 | "V_{\\left( T-1\\right) i}\\geq V_{Ti}\\geq0.$$\n", 632 | "\n", 633 | "* One can write\n", 634 | "\\begin{align*}\n", 635 | "& \\frac{1}{I}\\max_{V}1_{T}^{\\top}VY\\\\\n", 636 | "& \\frac{1}{I}VX=\\left( 1_{T}-\\tau\\right) \\bar{x}\\\\\n", 637 | "& V^{\\top}D1_{T}=1_{I}\\\\\n", 638 | "& V^{\\top}D\\geq0\n", 639 | "\\end{align*}\n" 640 | ] 641 | }, 642 | { 643 | "cell_type": "code", 644 | "execution_count": null, 645 | "metadata": {}, 646 | "outputs": [], 647 | "source": [] 648 | }, 649 | { 650 | "cell_type": "markdown", 651 | "metadata": {}, 652 | "source": [ 653 | "\n", 654 | "* Thus, setting
$\\pi=D^{\\top}V/I$, and $U=D^{-1}1_{I}$, $\\mu=D^{\\top}\\left(\n", 655 | "1_{T}-\\tau\\right) $, and $p=1_{I}/I$,
one has
\n", 656 | "\\begin{align*}\n", 657 | "& \\max_{\\pi}U^{\\top}\\pi Y\\\\\n", 658 | "& \\pi X=\\mu\\bar{x}\\\\\n", 659 | "& \\pi^{\\top}1_{T}=p\\\\\n", 660 | "& \\pi\\geq0\n", 661 | "\\end{align*}\n", 662 | "\n", 663 | "\n", 664 | "* Assume that the first entry of $X$ is one. One has that if $\\pi$ satisfies the constraints, then\n", 665 | "$$\\sum_{i=1}^{I}\\pi_{ti}=\\mu_{t}\\text{ and }\\sum_{t=1}^{T}\\pi_{ti}=p_{i}$$\n", 666 | "thus $\\pi$ can be thought of as a joint probability on $\\tau$ and $X$.\n", 667 | "\n", 668 | "* One has\n", 669 | "\\begin{align*}\n", 670 | "& \\max_{\\pi\\geq0}\\sum_{\\substack{1\\leq t\\leq T\\\\1\\leq i\\leq I}}\\pi_{ti}\n", 671 | "U_{t}Y_{i}\\\\\n", 672 | "& \\sum_{1\\leq i\\leq I}\\pi_{ti}X_{ik}=\\mu_{t}\\bar{x}_{k}\\\\\n", 673 | "& \\sum_{1\\leq t\\leq T}\\pi_{ti}=p_{i}%\n", 674 | "\\end{align*}" 675 | ] 676 | }, 677 | { 678 | "cell_type": "markdown", 679 | "metadata": {}, 680 | "source": [ 681 | "### Computation\n", 682 | "\n", 683 | "* Recall the problem to compute\n", 684 | "\\begin{align*}\n", 685 | "& \\max_{\\pi\\geq0}U^{\\top}\\pi Y\\\\\n", 686 | "& \\pi X=\\mu\\bar{x}\\\\\n", 687 | "& \\pi^{\\top}1_{T}=p\n", 688 | "\\end{align*}\n", 689 | "which we sill vectorize using $vec\\left( A\\pi B\\right) =\\left( A \\otimes B^{\\top\n", 690 | "}\\right) vec\\left( \\pi\\right) $ so that the constraint becomes\n", 691 | "$$\n", 692 | "\\begin{pmatrix}\n", 693 | " I_{T}\\otimes X^{\\top}\\\\\n", 694 | "1_{T}^{\\top} \\otimes I_{I}\n", 695 | "\\end{pmatrix}\n", 696 | "vec\\left( \\pi\\right) =\\binom{vec\\left( \\mu\\bar{x}\\right) }{p}%\n", 697 | "$$\n", 698 | "\n", 699 | "\n", 700 | "* There are $IT$ primal variables and $KT+I$ constraints.\n", 701 | "\n", 702 | "* Computation is done with a linear programming solver like Gurobi. Large-scale linear programming solvers make use of sparsity of constraint matrix. However, if dimension of $Y$ is larger, $T$ will need to be large." 703 | ] 704 | }, 705 | { 706 | "cell_type": "markdown", 707 | "metadata": {}, 708 | "source": [ 709 | "### Recovering the $\\beta$\n", 710 | "\n", 711 | "* $\\beta$ is the vector of Lagrange multipliers of the constraint $ \\frac{1}{I}V X= \\left( 1_{T}-\\tau \\right) \\bar{x} $ in the former problem.\n", 712 | "\n", 713 | "* Let $\\psi$ be the vector of Lagrange multipliers of the constraint $\\pi X-\\mu \\bar{x}$ in the latter problem.\n", 714 | "\n", 715 | "* We have $\\beta = D \\psi$. Indeed:
\n", 716 | "$\\left( \\pi X-\\mu \\bar{x}\\right) ^{\\top }\\psi =0$
\n", 717 | "thus
\n", 718 | "$\\left( \\frac{1}{I}D^{\\top }V X-D^{\\top }\\left( 1_{T}-\\tau \\right) \\bar{x}%\n", 719 | "\\right) ^{\\top }\\psi =0$
\n", 720 | "and therefore
\n", 721 | "$\\left( \\frac{1}{I}V X-\\left( 1_{T}-\\tau \\right) \\bar{x}\\right) ^{\\top }D\\psi\n", 722 | "=0$\n", 723 | "\n", 724 | "* Compute in the following manner:" 725 | ] 726 | }, 727 | { 728 | "cell_type": "code", 729 | "execution_count": null, 730 | "metadata": {}, 731 | "outputs": [], 732 | "source": [ 733 | "D_t_t = spr.diags([1, -1], [ 0, -1], shape=(nbt, nbt))\n", 734 | "\n", 735 | "U_t_1 = np.linalg.inv(D_t_t.toarray()) @ np.ones( (nbt,1)) \n", 736 | "μ_t_1 = D_t_t.T @ (np.ones((nbt,1)) - τ_t_1)\n", 737 | "\n", 738 | "A1 = spr.kron(spr.identity(nbt),X_i_k.T)\n", 739 | "A2 = spr.kron(np.array(np.repeat(1,nbt)),spr.identity(nbi))\n", 740 | "A = spr.vstack([A1, A2])\n", 741 | "rhs = np.concatenate( [(μ_t_1 * xbar_1_k).flatten(), np.ones(nbi)/nbi]) \n", 742 | "obj = np.kron(U_t_1, Y_i_1).T\n", 743 | "vqr_lp=grb.Model()\n", 744 | "pi = vqr_lp.addMVar(shape=nbi*nbt, name=\"pi\")\n", 745 | "vqr_lp.setParam( 'OutputFlag', False )\n", 746 | "vqr_lp.setObjective( obj @ pi, grb.GRB.MAXIMIZE)\n", 747 | "vqr_lp.addConstr(A @ pi == rhs)\n", 748 | "vqr_lp.optimize()\n", 749 | "\n", 750 | "ϕ_t_k = np.array(vqr_lp.getAttr('pi'))[0:(nbt*nbk)].reshape((nbt,nbk))\n", 751 | "\n", 752 | "βvqr_t_k = D_t_t.toarray() @ ϕ_t_k\n", 753 | "\n", 754 | "βvqr_t_k[10,:]" 755 | ] 756 | }, 757 | { 758 | "cell_type": "markdown", 759 | "metadata": {}, 760 | "source": [ 761 | "## Vector quantile regression in the continuous case\n", 762 | "\n", 763 | "* This rewrites as a *Vector quantile regression* (yet for now in the scalar case), introduced in [CCG16]\n", 764 | "\\begin{align*}\n", 765 | "\\max_{\\pi} & \\mathbb{E}_{\\pi}\\left[ UY\\right] \\\\\n", 766 | "s.t. & U\\sim\\mu\\\\\n", 767 | "& \\left( X,Y\\right) \\sim P\\\\\n", 768 | "& \\mathbb{E}\\left[ X|U\\right] =\\mathbb{E}\\left[ X\\right]\n", 769 | "\\end{align*}\n", 770 | "\n", 771 | "\n", 772 | "* This is an extension of the optimal transport problem of Monge-Kantorovich. As a matter of fact, when $X$ is restricted to the constant, this is *exactly* an optimal transport problem." 773 | ] 774 | }, 775 | { 776 | "cell_type": "markdown", 777 | "metadata": {}, 778 | "source": [ 779 | "### Vector quantile regression, primal problem\n", 780 | "\n", 781 | "* Recall our previous problem\n", 782 | "\\begin{align*}\n", 783 | "\\max_{\\pi} & \\mathbb{E}_{\\pi}\\left[ UY\\right] \\\\\n", 784 | "s.t. & U\\sim\\mu\\\\\n", 785 | "& \\left( X,Y\\right) \\sim P\\\\\n", 786 | "& \\mathbb{E}\\left[ X|U\\right] \\sim\\mathbb{E}\\left[ X\\right]\n", 787 | "\\end{align*}\n", 788 | "and replace mean-independence by independence; one has\n", 789 | "\\begin{align*}\n", 790 | "\\max_{\\pi} & \\mathbb{E}_{\\pi}\\left[ UY\\right] \\\\\n", 791 | "s.t. & U\\sim\\mu\\\\\n", 792 | "& \\left( X,Y\\right) \\sim P\\\\\n", 793 | "& X {\\perp\\!\\!\\!\\perp}U\n", 794 | "\\end{align*}\n", 795 | "\n", 796 | "\n", 797 | "* The solution to the latter problem is simply
\n", 798 | "$U=F_{Y|X}\\left( Y|X\\right)$
\n", 799 | "which yields the nonparametric conditional quantile representation
\n", 800 | "$Y=F_{Y|X}^{-1}\\left( U|X\\right).$" 801 | ] 802 | }, 803 | { 804 | "cell_type": "markdown", 805 | "metadata": {}, 806 | "source": [ 807 | "### Vector quantile regression, dual problem\n", 808 | "\n", 809 | "* The dual problem yields\n", 810 | "\\begin{align*}\n", 811 | "\\min_{\\psi,b} & \\mathbb{E}_{P}\\left[ \\psi\\left( X,Y\\right) \\right]\n", 812 | "+\\bar{x}^{\\top}\\mathbb{E}_{\\mu}\\left[ b\\left( U\\right) \\right] \\\\\n", 813 | "s.t.~ & \\psi\\left( x,y\\right) +x^{\\top}b\\left( \\tau\\right) \\geq\\tau\n", 814 | "y,~\\forall x,y,\\tau\n", 815 | "\\end{align*}\n", 816 | "\n", 817 | "\n", 818 | "* Optimality of $\\left( \\psi,b\\right) $ yields\n", 819 | "$$\n", 820 | "\\psi\\left( x,y\\right) =\\sup_{\\tau\\in\\left[ 0,1\\right] }\\left\\{ \\tau\n", 821 | "y-x^{\\top}b\\left( \\tau\\right) \\right\\}\n", 822 | "$$\n", 823 | "which yields, if $b$ is differentiable,\n", 824 | "$$\n", 825 | "Y=X^{\\top}\\beta\\left( U\\right)\n", 826 | "$$\n", 827 | "where $(U,X,Y)$ are the solutions to the primal problem and $\\beta\\left(\n", 828 | "\\tau\\right) =b^{\\prime}\\left( \\tau\\right) $.\n" 829 | ] 830 | }, 831 | { 832 | "cell_type": "markdown", 833 | "metadata": {}, 834 | "source": [ 835 | "### Multivariate case\n", 836 | "\n", 837 | "* Vector quantile regression yields a natural way to extend classical\n", 838 | "quantile regression to the case when the dependent variable is multivariate.\n", 839 | "If $Y$ is valued in $\\mathbb{R}^{d}$, one may take $\\tau$ in $\\mathbb{R}^{d}$,\n", 840 | "$\\mu=\\mathcal{U}\\left( \\left[ 0,1\\right] ^{d}\\right) $. We replace the\n", 841 | "product $\\tau$ by the scalar product $\\tau^{\\top}\\beta\\left( U\\right) $, and\n", 842 | "the analysis goes unmodified.\n", 843 | "\n", 844 | "* We get a nice tensorization of vector quantile regression that way: when\n", 845 | "the components of $Y$ are independent, the previous propal amounts to running\n", 846 | "the scalar version component by component." 847 | ] 848 | } 849 | ], 850 | "metadata": { 851 | "kernelspec": { 852 | "display_name": "Python 3", 853 | "language": "python", 854 | "name": "python3" 855 | }, 856 | "language_info": { 857 | "codemirror_mode": { 858 | "name": "ipython", 859 | "version": 3 860 | }, 861 | "file_extension": ".py", 862 | "mimetype": "text/x-python", 863 | "name": "python", 864 | "nbconvert_exporter": "python", 865 | "pygments_lexer": "ipython3", 866 | "version": "3.8.5" 867 | } 868 | }, 869 | "nbformat": 4, 870 | "nbformat_minor": 2 871 | } 872 | -------------------------------------------------------------------------------- /L03_semi-discrete-optimal-transport.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "#
Block 2b: Semi-discrete optimal transport
\n", 8 | "###
Alfred Galichon (NYU & Sciences Po)
\n", 9 | "##
'math+econ+code' masterclass on optimal transport and economic applications
\n", 10 | "####
With python code examples
\n", 11 | "© 2018-2022 by Alfred Galichon. Past and present support from NSF grant DMS-1716489, ERC grant CoG-866274 are acknowledged, as well as inputs from contributors listed [here](http://www.math-econ-code.org/theteam).\n", 12 | "\n", 13 | "**If you reuse material from this masterclass, please cite as:**
\n", 14 | "Alfred Galichon, 'math+econ+code' masterclass on optimal transport and economic applications, January 2022. https://github.com/math-econ-code/mec_optim" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "## References\n", 22 | "\n", 23 | "* Galichon (2016). *Optimal Transport Methods in Economics*. Chapter 5. Princeton University Press.\n", 24 | "\n", 25 | "* Anderson, de Palma, and Thisse (1992). *Discrete Choice Theory of Product Differentiation*. MIT.\n", 26 | "\n", 27 | "* Aurenhammer (1987). Power Diagrams: Properties, Algorithms and Applications. *SIAM J Computing*.\n", 28 | "\n", 29 | "* Lancaster (1966). A New Approach to Consumer Theory. *JPE*.\n", 30 | "\n", 31 | "* Berry, Pakes (2007). The Pure Characteristics Demand Model. *IER*.\n", 32 | "\n", 33 | "* Feenstra, Levinsohn (1995). Estimating Markups and Market Conduct with Multidimensional Product Attributes. *ReStud*.\n", 34 | "\n", 35 | "* Bonnet, Galichon, Shum (2017). Yoghurts Choose Consumers. Identification of Random Utility Models via Two-Sided Matching. *Mimeo*.\n", 36 | "\n", 37 | "* Leclerc, Merigot. `pysdot` library. https://github.com/sd-ot/pysdot" 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "metadata": {}, 43 | "source": [ 44 | "# Motivation\n", 45 | "\n", 46 | "Today we'll consider a version of the transportation problem where we seek to match a continuous distribution on $\\mathbb{R}^{d}$ with a discrete distribution. This problem is called a *semi-discrete transportation* problem.\n", 47 | "\n", 48 | "Actually, we will introduce this problem not as a matching problem, but as a demand problem. We'll model the demand for facilities (such as schools, stores) in the physical space. The same approach applies to the demand for products (e.g. cars) in the characteristics space, see e.g. Lancaster (1966), Feenstra and Levinsohn (1995), and Berry and Pakes (2007).\n", 49 | "\n", 50 | "We'll simulate fountain locations on a city represented by the two dimensional square." 51 | ] 52 | }, 53 | { 54 | "cell_type": "markdown", 55 | "metadata": {}, 56 | "source": [ 57 | "## Loading the libraries\n", 58 | "\n", 59 | "We shall now load the libraries that we need. They are a bit specific, as they require combinatorial geometry routines. The `pysdot` library by Hugo Leclerc and Quentin Mérigot is still at an early stage of development, but is quite promising and easy to `pip install`, so we will adopt it for this course. " 60 | ] 61 | }, 62 | { 63 | "cell_type": "code", 64 | "execution_count": null, 65 | "metadata": {}, 66 | "outputs": [], 67 | "source": [ 68 | "# !pip install pysdot" 69 | ] 70 | }, 71 | { 72 | "cell_type": "code", 73 | "execution_count": null, 74 | "metadata": {}, 75 | "outputs": [], 76 | "source": [ 77 | "from pysdot import PowerDiagram\n", 78 | "from pysdot.radial_funcs import RadialFuncUnit\n", 79 | "from pysdot import OptimalTransport\n", 80 | "from pysdot.domain_types import ConvexPolyhedraAssembly\n", 81 | "import numpy as np\n", 82 | "import random as rd" 83 | ] 84 | }, 85 | { 86 | "cell_type": "markdown", 87 | "metadata": {}, 88 | "source": [ 89 | "## Setting\n", 90 | "\n", 91 | "Consider inhabitants of a city whose geographic coordinates are $x\\in\\mathcal{X}=\\left[0,1\\right]^{2}$. More generally, $\\mathcal{X}$ will be a convex subset of $\\mathbb{R}^{d}$ ($d=2$ is only to fix ideas). The location of inhabitants is distributed with a density of mass $n(x)$ which is positive on $\\mathcal{X}$. $n$ is assumed to have unit total mass: $\\int_\\mathcal{X} n(x)dx=1$, so it is a probability density function.\n", 92 | "\n", 93 | "There are $J$ fountains, located at points $y_{j}\\in\\mathbb{R}^{d}$, $1\\leq j\\leq J$. Fountain $j$ is assumed to have capacity $q_{j}$, which means it can serve a mass $q_j$ of inhabitants. It is assumed that $\\sum_{j}q_{j}=1$, which means that total supply equals the total demand.\n", 94 | "\n", 95 | "An inhabitant at $x$ has a transportation cost associated with using fountain located at $y$ which is proportional to the square distance to the fountain\n", 96 | "\n", 97 | "\\begin{align*}\n", 98 | "\\tilde{\\Phi}\\left( x,y\\right) :=-\\left\\vert x-y\\right\\vert ^{2}/2.\n", 99 | "\\label{Phistar}\n", 100 | "\\end{align*}\n", 101 | "\n", 102 | "Let $\\tilde{v}_{j}$ be the price charged by fountain $j$. The utility of the consumer at location $x$ is therefore $\\tilde{\\Phi}\\left( x,y_{j}\\right) -\\tilde{v}_{j}$, and the indirect surplus of the consumer at $x$ is given by\n", 103 | "\n", 104 | "\\begin{align*}\n", 105 | "\\tilde{u}\\left( x\\right) =\\max_{j\\in\\left\\{ 1,...,J\\right\\} }\\left\\{\n", 106 | "\\tilde{\\Phi}\\left( x,y_{j}\\right) -\\tilde{v}_{j}\\right\\} \\label{ustar}\n", 107 | "\\end{align*}" 108 | ] 109 | }, 110 | { 111 | "cell_type": "markdown", 112 | "metadata": {}, 113 | "source": [ 114 | "### A reformulation\n", 115 | "\n", 116 | "Without loss of generality, one can replace the quadratic surplus $\\tilde{\\Phi}\\left( x,y\\right) =-\\left\\vert x-y\\right\\vert ^{2}/2$ by the scalar product surplus\n", 117 | "\n", 118 | "\\begin{align*}\n", 119 | "\\Phi\\left( x,y\\right) :=x^{\\intercal}y. \\label{PhiScalProd}\n", 120 | "\\end{align*}\n", 121 | "\n", 122 | "Indeed, note that $\\tilde{\\Phi}\\left( x,y\\right) =\\Phi\\left( x,y\\right) - \\left\\vert x\\right\\vert ^{2}/2 - \\left\\vert y\\right\\vert ^{2}/2$, and introduce the *reduced indirect surplus* $u\\left( x\\right)$ and the $v_{j}$'s the *reduced prices* as\n", 123 | "\n", 124 | "\\begin{align*}\n", 125 | "u\\left( x\\right) =\\tilde{u}\\left( x\\right) +\\left\\vert x\\right\\vert ^{2}/2\\text{, and }v_{j}=\\tilde{v}_{j}+\\left\\vert y_{j}\\right\\vert ^{2}/2, \\label{uandv}\n", 126 | "\\end{align*}\n", 127 | "\n", 128 | "One immediately sees that $\\tilde{u}\\left( x\\right) +\\tilde{v}_{j}\\geq \\tilde{\\Phi}\\left( x,y_{j}\\right) $ if and only if $u\\left( x\\right) +v_{j}\\geq\\Phi\\left( x,y_{j}\\right) $. It follows that the consumer at location $x$ chooses fountain $j$ that maximizes\n", 129 | "\n", 130 | "\n", 131 | "\\begin{align*}\n", 132 | "u\\left( x\\right) =\\max_{j\\in\\left\\{ 1,...,J\\right\\} }\\left\\{ \\Phi\\left(x,y_{j}\\right) -v_{j}\\right\\} . \\label{PWAu}\n", 133 | "\\end{align*}\n", 134 | "\n", 135 | "Hence the problem can be reexpressed so that the surplus of consumer $x$ at fountain $j$ is simply $x^{\\intercal}y_{j}-v_{j}$. It is clear from inspection that (unlike $\\tilde{u}$), the reduced surplus $u$ is a piecewise affine and convex function from $\\mathbb{R}^{d}$ to $\\mathbb{R}$. The connection with convex and piecewise affine functions is the reason for\n", 136 | "reformulating the problem as we did." 137 | ] 138 | }, 139 | { 140 | "cell_type": "markdown", 141 | "metadata": {}, 142 | "source": [ 143 | "### Power Diagrams\n", 144 | "\n", 145 | "The demand set of fountain $j$ is\n", 146 | "\n", 147 | "\\begin{align*}\n", 148 | "\\mathcal{X}_{j}^{v}:=\\left\\{ x\\in\\mathcal{X}:\\tilde{\\Phi}\\left(x,y_{j}\\right) -\\tilde{v}_{j}\\geq\\tilde{\\Phi}\\left( x,y_{k}\\right) -\\tilde{v}_{k},~\\forall k\\right\\}\n", 149 | "\\end{align*}\n", 150 | "\n", 151 | "which is equivalent to\n", 152 | "\n", 153 | "\\begin{align*}\n", 154 | "\\mathcal{X}_{j}^{v}=\\left\\{ x\\in\\mathcal{X}:x^{\\intercal}\\left( y_{j}\n", 155 | "-y_{k}\\right) \\geq v_{j}-v_{k},~\\forall k\\right\\}.\n", 156 | "\\end{align*}\n", 157 | "\n", 158 | "\n", 159 | "**Basic properties:**\n", 160 | "\n", 161 | "* $\\mathcal{X}_{j}$ is a convex polyhedron;\n", 162 | "\n", 163 | "* The intersection of $\\mathcal{X}_{j}$ and $\\mathcal{X}_{k}$'s lies in the hyperplane of equation $\\{x:x^{\\intercal}\\left( y_{j}-y_{k}\\right) +v_{k}-v_{j}=0\\}$;\n", 164 | "\n", 165 | "* The set $\\mathcal{X}_{j}$ weakly increases when $v_{k}$ ($k\\neq j$) increases, and strictly decreases when $v_{j}$ decreases.\n", 166 | "\n", 167 | "The system of sets $\\left( \\mathcal{X}_{j}^{v}\\right) _{j}$ is called the *power diagram* associated to the price system $v$." 168 | ] 169 | }, 170 | { 171 | "cell_type": "markdown", 172 | "metadata": {}, 173 | "source": [ 174 | "### Voronoi tesselations\n", 175 | "\n", 176 | "If fountains do not charge any fee, that is, if $\\tilde{v}_{j}=0$, or equivalently if $v_{j}=\\left\\vert y_{j}\\right\\vert ^{2}/2$, then $\\mathcal{X}_{j}^{0}$ is the set of consumers who are closer to fountain $j$ than to any other fountain. The cells $\\mathcal{X}_{j}^{0}$ form a partition of $\\mathcal{X}$ called *Voronoi tesselation*, which is a very particular case of a power diagram. The Voronoi diagrams have the property that fountain $j$ belongs to cell $\\mathcal{X}_{j}^{0}$; when $\\tilde{v}\\neq0$, this property may no longer hold for more general power diagrams." 177 | ] 178 | }, 179 | { 180 | "cell_type": "markdown", 181 | "metadata": {}, 182 | "source": [ 183 | "**Example**. We will generate a Voronoi tesselations where $10$ fountains are distributed uniformly on $[0,1]^2$, and $\\tilde{v} = 0$." 184 | ] 185 | }, 186 | { 187 | "cell_type": "code", 188 | "execution_count": null, 189 | "metadata": {}, 190 | "outputs": [], 191 | "source": [ 192 | "rd.seed(777)\n", 193 | "nCells = 10\n", 194 | "\n", 195 | "Ys = np.random.uniform(0,1,2*nCells).reshape((-1,2))\n", 196 | "vor_dia = PowerDiagram(Ys)\n", 197 | "vor_dia.display_jupyter()" 198 | ] 199 | }, 200 | { 201 | "cell_type": "markdown", 202 | "metadata": {}, 203 | "source": [ 204 | "## Demand zone of a fountain\n", 205 | "\n", 206 | "The demand for fountain $j$ is given by $\\mathbb{P}_n\\left(\\mathcal{X}_{j}\\right) = \\int_\\mathcal{X} 1\\{x\\in\\mathcal{X}_{j}\\} n(x) dx $ where $\\mathbb{P}_n$ is the probability distribution of consumer locations.\n", 207 | "\n", 208 | "Note that in general $x^{\\intercal}y_{j}-u\\left( x\\right) \\leq v_{j}$; yet if consumer $x$ chooses fountain $j$, then this inequality holds as an equality. Hence, the set of consumer who prefer fountain $j$ is given by\n", 209 | "\n", 210 | "\\begin{align*}\n", 211 | "\\mathcal{X}_{j}=\\arg\\max_{x\\in\\mathcal{X}}\\left\\{ x^{\\intercal}y_{j}-u\\left(\n", 212 | "x\\right) \\right\\} \\label{defXj}\n", 213 | "\\end{align*}\n", 214 | "\n", 215 | "By first order conditions $x\\in\\mathcal{X}_{j}$ if and only if $\\nabla u\\left(x\\right) = y_{j}$ (assuming $u$ is differentiable at $x$). Therefore\n", 216 | "\n", 217 | "\\begin{align*}\n", 218 | "\\mathcal{X}_{j}:=\\nabla u^{-1}\\left( \\left\\{ y_{j}\\right\\} \\right) .\n", 219 | "\\label{Demand}\n", 220 | "\\end{align*}" 221 | ] 222 | }, 223 | { 224 | "cell_type": "markdown", 225 | "metadata": {}, 226 | "source": [ 227 | "\n", 228 | "---\n", 229 | "\n", 230 | "**Fountain example**. We see in the picture above that cells have different areas. The areas of the cells are given by: " 231 | ] 232 | }, 233 | { 234 | "cell_type": "code", 235 | "execution_count": null, 236 | "metadata": {}, 237 | "outputs": [], 238 | "source": [ 239 | "vor_dia.integrals()" 240 | ] 241 | }, 242 | { 243 | "cell_type": "markdown", 244 | "metadata": {}, 245 | "source": [ 246 | "## Equilibrium prices\n", 247 | "\n", 248 | "Introduce the social welfare of producers and consumers as\n", 249 | "\n", 250 | "\n", 251 | "\\begin{align*}\n", 252 | "S\\left( v\\right) :=\\sum_{j}q_{j}v_{j}+\\mathbb{E}_{\\mathbb{P}_n}\\left[ \\max_{j\\in\\left\\{ 1,...,J\\right\\} }\\left\\{ X^{\\intercal}y_{j}-v_{j}\\right\\} \\right] . \\label{defSbis}\n", 253 | "\\end{align*}\n", 254 | "\n", 255 | "We have\n", 256 | "\n", 257 | "\\begin{align*}\n", 258 | "\\frac{\\partial S\\left( v\\right) }{\\partial v_{k}}=q_{k}-\\mathbb{E}%\n", 259 | "_{\\mathbb{P}_n}\\left[ 1\\left\\{ \\nabla u\\left( X\\right) =y_{k}\\right\\} \\right]\n", 260 | "=q_{k}-\\mathbb{P}_n\\left( \\mathcal{X}_{k}^{v}\\right) .\n", 261 | "\\end{align*}\n", 262 | "\n", 263 | "Thus, the excess supply for fountain $j$ is given by\n", 264 | "\n", 265 | "\\begin{align*}\n", 266 | "q_{k}-\\mathbb{P}_n\\left( \\mathcal{X}_{k}^{v}\\right) =\\frac{\\partial S\\left( v\\right)\n", 267 | "}{\\partial v_{k}} \\label{exprDemand}%\n", 268 | "\\end{align*}\n", 269 | "\n", 270 | "where $S$ is defined by [the social welfare](#defSbis) above.\n", 271 | "\n", 272 | "Hence, market clearing prices, or equilibrium prices are prices $v$ such that demand and supply clear, that is, such that $q_{k}=\\mathbb{P}_n\\left(\\mathcal{X}_{k}^{v}\\right)$ for each $k$; in other words \n", 273 | "\n", 274 | "\\begin{align*}\n", 275 | "\\frac{\\partial S\\left( v\\right) }{\\partial v_{k}}=0.\n", 276 | "\\end{align*}" 277 | ] 278 | }, 279 | { 280 | "cell_type": "markdown", 281 | "metadata": {}, 282 | "source": [ 283 | "## Central planner's problem\n", 284 | "\n", 285 | "The central planner may decide arbitrarily on assigning to each inhabitant $x$ a fountain $T\\left( x\\right) \\in\\left\\{ y_{1},...,y_{J} \\right\\} $, in a such way that each fountain $j$ is used to its full capacity, that is\n", 286 | "\n", 287 | "\\begin{align*}\n", 288 | "\\mathbb{P}_n\\left( T\\left( X\\right) =y_{j}\\right) =q_{j},~\\forall j\\in\\left\\{\n", 289 | "1,...,J\\right\\} . \\label{massBalance}%\n", 290 | "\\end{align*}\n", 291 | "\n", 292 | "The planner seeks to maximize the total surplus subject to capacity constraints; hence\n", 293 | "\n", 294 | "\\begin{align*}\n", 295 | "& \\max\\mathbb{E}_{\\mathbb{P}_n}\\left[ X^{\\intercal}T\\left( X\\right) \\right]\n", 296 | "\\label{welfare}\\\\\n", 297 | "& s.t. P\\left( T\\left( X\\right) =y_{j}\\right) =q_{j},~\\forall j\\in\\left\\{\n", 298 | "1,...,J\\right\\}\n", 299 | "\\end{align*}\n", 300 | "\n", 301 | "This is a Monge problem, whose Kantorovich relaxation is\n", 302 | "\n", 303 | "\\begin{align*}\n", 304 | "\\max_{\\mu\\in\\mathcal{M}\\left( \\mathbb{P}_n,q\\right) }\\mathbb{E}_{\\mu}\\left[X^{\\intercal}Y\\right] .\n", 305 | "\\end{align*}" 306 | ] 307 | }, 308 | { 309 | "cell_type": "markdown", 310 | "metadata": {}, 311 | "source": [ 312 | "## Duality\n", 313 | "\n", 314 | "By the Monge-Kantorovich theorem, the dual problem is\n", 315 | "\n", 316 | "\n", 317 | "\\begin{align*}\n", 318 | "& \\min_{u,v}\\mathbb{E}_{\\mathbb{P}_n}\\left[ u\\left( X\\right) \\right] +\\mathbb{E} _{q}\\left[ v\\left( Y\\right) \\right] \\label{dualKantoContDiscr}\\\\\n", 319 | "& s.t. u\\left( x\\right) +v\\left( y\\right) \\geq x^{\\intercal}y,\n", 320 | "\\end{align*}\n", 321 | "\n", 322 | "where the constraint should hold almost surely with respect to $P$ and $Q$.\n", 323 | "\n", 324 | "The constraint should be verified for $y\\in\\left\\{ y_{1},...,y_{J}\\right\\} $, and the constraint+optimality implies $u\\left( x\\right) =\\max_{j\\in\\left\\{ 1,...,J\\right\\} }\\left\\{ \\Phi\\left( x,y_{j}\\right) -v_{j}\\right\\} $. Thus, the [dual problem](#dualKantoContDiscr) rewrites as\n", 325 | "\n", 326 | "\\begin{align*}\n", 327 | "\\min_{v\\in\\mathbb{R}^{J}}\\mathbb{E}_{\\mathbb{P}_n}\\left[ \\max_{j\\in\\left\\{1,...,J\\right\\} }\\left\\{ X^{\\intercal}y_{j}-v_{j}\\right\\} \\right] +\\sum_{j=1}^{J}q_{j}v_{j} \\label{MKfiniteDim}\n", 328 | "\\end{align*}\n", 329 | "\n", 330 | "which is the minimum of $S$ over $v\\in\\mathbb{R}^{J}.$\n", 331 | "\n", 332 | "As a result:\n", 333 | "\n", 334 | "1. There exist equilibrium prices, which are the minimizers of $S$.\n", 335 | "\n", 336 | "2. The total welfare at equilibrium coincides with the optimal welfare." 337 | ] 338 | }, 339 | { 340 | "cell_type": "markdown", 341 | "metadata": {}, 342 | "source": [ 343 | "### Splitting the mass\n", 344 | "\n", 345 | "Note that\n", 346 | "\n", 347 | "\\begin{align*}\n", 348 | "\\arg\\max_{j\\in\\left\\{ 1,...,J\\right\\} }\\left\\{ \\Phi\\left( x,y_{j}\\right)\n", 349 | "-v_{j}\\right\\}\n", 350 | "\\end{align*}\n", 351 | "\n", 352 | "is a singleton for almost every $x$ (it is not a singleton when $x$ is at the boundary between two cells). The assumption that $P$ is absolutely continuous is crucial here.\n", 353 | "\n", 354 | "Hence the map\n", 355 | "\n", 356 | "\\begin{align*}\n", 357 | "T\\left( x\\right) =\\nabla u\\left( x\\right)\n", 358 | "\\end{align*}\n", 359 | " \n", 360 | "is defined almost everywhere and coincides with $\\arg\\max$ whenever it is defined. Thus the solution does not involve to split mass." 361 | ] 362 | }, 363 | { 364 | "cell_type": "markdown", 365 | "metadata": {}, 366 | "source": [ 367 | "### Determination of the equilibrium prices: Aurenhammer's method\n", 368 | "\n", 369 | "We turn to a discussion on the numerical determination of the prices (we discuss the determination of the $v$'s, as the expression for the $w$'s immediately follows). The function $S$ to minimize being convex, we can use a standard gradient descent algorithm in which the increase in prices is given by\n", 370 | "\n", 371 | "\\begin{align*}\n", 372 | "v_{j}^{t+1}-v_{j}^{t}=\\varepsilon\\left( \\mathbb{P}_n\\left( \\nabla u\\left( X\\right)\n", 373 | "=y_{j}\\right) -q_{j}\\right) , \\label{tatonnement}%\n", 374 | "\\end{align*}\n", 375 | "\n", 376 | "which has immediately an economic interpretation: the fountains that are over-demanded *raise* their prices, while the fountains that are under-demanded *lower* their prices. This a *tâtonnement process*.\n", 377 | "\n", 378 | "---\n", 379 | "**Algorithm**\n", 380 | "Take an initial guess of $v^{0}$. At step $t$, define $v^{t+1}$ by\n", 381 | "\n", 382 | "\\begin{align*}\n", 383 | "v_{j}^{t+1}=v_{j}^{t}-\\varepsilon_{t}\\frac{\\partial S}{\\partial v_{j}}\\left(v^{t}\\right),\n", 384 | "\\end{align*}\n", 385 | "\n", 386 | "for $\\varepsilon_{t}$ small enough. Stop when $\\frac{\\partial S}{\\partial v_{j}}\\left( v^{t+1}\\right) $ is sufficiently close to zero." 387 | ] 388 | }, 389 | { 390 | "cell_type": "markdown", 391 | "metadata": {}, 392 | "source": [ 393 | "### Implementation\n", 394 | "\n", 395 | "The gradient descent method is implemented as follows. " 396 | ] 397 | }, 398 | { 399 | "cell_type": "code", 400 | "execution_count": null, 401 | "metadata": {}, 402 | "outputs": [], 403 | "source": [ 404 | "rel_tol = 1e-4\n", 405 | "q_j = np.ones(nCells)/nCells\n", 406 | "vtilde_j = np.zeros(nCells)\n", 407 | "cont = True\n", 408 | "pow_dia = PowerDiagram(Ys,vtilde_j)\n", 409 | "while cont:\n", 410 | " demand_j = pow_dia.integrals()\n", 411 | " if ((demand_j - q_j)/q_j).max()Block 3a: Optimal transport with entropic regularization\n", 8 | "###
Alfred Galichon (NYU & Sciences Po)
\n", 9 | "##
'math+econ+code' masterclass on optimal transport and economic applications
\n", 10 | "####
With python code examples
\n", 11 | "© 2018-2022 by Alfred Galichon. Past and present support from NSF grant DMS-1716489, ERC grant CoG-866274 are acknowledged, as well as inputs from contributors listed [here](http://www.math-econ-code.org/theteam).\n", 12 | "\n", 13 | "**If you reuse material from this masterclass, please cite as:**
\n", 14 | "Alfred Galichon, 'math+econ+code' masterclass on optimal transport and economic applications, January 2022. https://github.com/math-econ-code/mec_optim" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "### Learning objectives\n", 22 | "\n", 23 | "* Entropic regularization\n", 24 | "\n", 25 | "* The log-sum-exp trick\n", 26 | "\n", 27 | "* Gradient descent, coordinate descent\n", 28 | "\n", 29 | "* The Iterated Proportional Fitting Procedure (IPFP)" 30 | ] 31 | }, 32 | { 33 | "cell_type": "markdown", 34 | "metadata": {}, 35 | "source": [ 36 | "## References\n", 37 | "\n", 38 | "* Galichon, *Optimal Transport Methods in Economics*, Ch. 7.3\n", 39 | "\n", 40 | "* Peyré, Cuturi, *Computational Optimal Transport*, Ch. 4.\n" 41 | ] 42 | }, 43 | { 44 | "cell_type": "markdown", 45 | "metadata": {}, 46 | "source": [ 47 | "### Entropic regularization of the optimal transport problem\n", 48 | "\n", 49 | "Consider the problem\n", 50 | "\n", 51 | "\\begin{align*}\n", 52 | "\\max_{\\mu\\in\\mathcal{M}\\left( n,m\\right) }\\sum_{xy}\\mu_{xy}\\Phi_{xy}-\\sigma\\sum_{xy}\\mu_{xy}\\ln\\mu_{xy}\n", 53 | "\\end{align*}\n", 54 | "\n", 55 | "where $\\sigma>0$. The problem coincides with the optimal assignment problem when $\\sigma=0$. When $\\sigma\\rightarrow+\\infty$, the solution to this problem approaches the independent coupling, $\\mu_{xy}=n_{x}m_{y}$.\n", 56 | "\n", 57 | "Later on, we will provide microfoundations for this problem, and connect it with a number of important methods in economics (BLP, gravity model, Choo-Siow...). For now, let's just view this as an extension of the optimal transport problem." 58 | ] 59 | }, 60 | { 61 | "cell_type": "markdown", 62 | "metadata": {}, 63 | "source": [ 64 | "We shall compute this problem using Python libraries that we have already met with. Let us start loading them." 65 | ] 66 | }, 67 | { 68 | "cell_type": "code", 69 | "execution_count": null, 70 | "metadata": {}, 71 | "outputs": [], 72 | "source": [ 73 | "# !python -m pip install -i https://pypi.gurobi.com gurobipy ## only if Gurobi not here" 74 | ] 75 | }, 76 | { 77 | "cell_type": "code", 78 | "execution_count": null, 79 | "metadata": {}, 80 | "outputs": [], 81 | "source": [ 82 | "import numpy as np\n", 83 | "import os\n", 84 | "import pandas as pd\n", 85 | "import scipy.sparse as spr\n", 86 | "import gurobipy as grb\n", 87 | "from sklearn import linear_model\n", 88 | "from time import time\n", 89 | "from scipy.stats import entropy" 90 | ] 91 | }, 92 | { 93 | "cell_type": "markdown", 94 | "metadata": {}, 95 | "source": [ 96 | "We will also import objects created in the previous lecture, which are stored in the `objects_D1`module." 97 | ] 98 | }, 99 | { 100 | "cell_type": "code", 101 | "execution_count": null, 102 | "metadata": {}, 103 | "outputs": [], 104 | "source": [ 105 | "from objects_D1 import OTProblem" 106 | ] 107 | }, 108 | { 109 | "cell_type": "markdown", 110 | "metadata": {}, 111 | "source": [ 112 | "Note that in the above, Gurobi is for benchmark purposes with the case $\\sigma=0$, but is not suited to compute the nonlinear optimization problem above.\n", 113 | "\n", 114 | "---\n", 115 | "\n", 116 | "Now, let's load up the `affinitymatrix.csv`, `Xvals.csv` and `Yvals.csv` that you will recall from the previous block. We will work on a smaller population, with `nbx` types of men and `nby` types of women." 117 | ] 118 | }, 119 | { 120 | "cell_type": "code", 121 | "execution_count": null, 122 | "metadata": {}, 123 | "outputs": [], 124 | "source": [ 125 | "thepath = 'https://raw.githubusercontent.com/math-econ-code/mec_optim_2021-01/master/data_mec_optim/marriage_personality-traits/'\n", 126 | "data_X,data_Y = pd.read_csv(thepath + \"Xvals.csv\"), pd.read_csv(thepath + \"Yvals.csv\")\n", 127 | "sdX,sdY = data_X.std().values, data_Y.std().values\n", 128 | "mX,mY = data_X.mean().values, data_Y.mean().values\n", 129 | "feats_x_k, feats_y_l = ((data_X-mX)/sdX).values, ((data_Y-mY)/sdY).values\n", 130 | "nbx,nbk = feats_x_k.shape\n", 131 | "nby,nbl = feats_y_l.shape\n", 132 | "A_k_l = pd.read_csv(thepath + \"affinitymatrix.csv\").iloc[0:nbk,1:nbl+1].values\n", 133 | "Φ_x_y = feats_x_k @ A_k_l @ feats_y_l.T" 134 | ] 135 | }, 136 | { 137 | "cell_type": "markdown", 138 | "metadata": {}, 139 | "source": [ 140 | "We extract a smaller matrix of size 50x30 to speed up our numerical explorations." 141 | ] 142 | }, 143 | { 144 | "cell_type": "code", 145 | "execution_count": null, 146 | "metadata": {}, 147 | "outputs": [], 148 | "source": [ 149 | "nbx,nby = 50,30\n", 150 | "marriage_ex = OTProblem(Φ_x_y[:nbx,:nby],np.ones(nbx) / nbx, np.ones(nby) / nby)\n", 151 | "nrow , ncol = min(8, nbx) , min(8, nby) # number of rows / cols to displayc" 152 | ] 153 | }, 154 | { 155 | "cell_type": "markdown", 156 | "metadata": {}, 157 | "source": [ 158 | "As a warm-up, let us compute as in the previous lecture the solution to the problem for $\\sigma=0$ that we can compute with Gurobi. " 159 | ] 160 | }, 161 | { 162 | "cell_type": "code", 163 | "execution_count": null, 164 | "metadata": {}, 165 | "outputs": [], 166 | "source": [ 167 | "def LPsolve(self):\n", 168 | " ptm = time()\n", 169 | " μ_x_y,u_x,v_y = marriage_ex.solve_full_lp(OutputFlag = False)\n", 170 | " taken = time() - ptm \n", 171 | " valobs = self.Φ_a.dot(μ_x_y.flatten())\n", 172 | " valtot = valobs\n", 173 | " ite = None\n", 174 | " return μ_x_y,u_x,v_y,valobs,valtot,ite, taken,'LP via Gurobi'\n", 175 | "\n", 176 | "OTProblem.LPsolve = LPsolve" 177 | ] 178 | }, 179 | { 180 | "cell_type": "markdown", 181 | "metadata": {}, 182 | "source": [ 183 | "The following function will display the output and allow for a benchmark:" 184 | ] 185 | }, 186 | { 187 | "cell_type": "code", 188 | "execution_count": null, 189 | "metadata": {}, 190 | "outputs": [], 191 | "source": [ 192 | "def display( args ):\n", 193 | " μ_x_y,u_x,v_y,valobs,valtot,iterations, taken,name = args\n", 194 | " print('*'*60)\n", 195 | " print('*'* (30 -len(name) // 2) + ' '+name + ' ' + '*'*(26 - (1+len(name)) // 2 ) )\n", 196 | " print('*'*60)\n", 197 | " print('Converged in ', iterations, ' steps and ', taken, 's.')\n", 198 | " print('Sum(mu*Phi)+σ*Sum(mu*log(mu))= ', valtot)\n", 199 | " print('Sum(mu*Phi) = ', valobs)\n", 200 | " print('*'*60)\n", 201 | " return " 202 | ] 203 | }, 204 | { 205 | "cell_type": "code", 206 | "execution_count": null, 207 | "metadata": {}, 208 | "outputs": [], 209 | "source": [ 210 | "display(marriage_ex.LPsolve())" 211 | ] 212 | }, 213 | { 214 | "cell_type": "markdown", 215 | "metadata": {}, 216 | "source": [ 217 | "### Dual of the regularized problem\n", 218 | "\n", 219 | "Let's compute the dual by the minimax approach. We have\n", 220 | "\n", 221 | "\\begin{align*}\n", 222 | "\\max_{\\mu\\geq0}\\min_{u,v}\\sum_{xy}\\mu_{xy}\\left( \\Phi_{xy}-u_{x}-v_{y}%\n", 223 | "-\\sigma\\ln\\mu_{xy}\\right) +\\sum_{x}u_{x}n_{x}+\\sum_{y}v_{y}m_{y}%\n", 224 | "\\end{align*}\n", 225 | "\n", 226 | "thus\n", 227 | "\n", 228 | "\\begin{align*}\n", 229 | "\\min_{u,v}\\sum_{x}u_{x}n_{x}+\\sum_{y}v_{y}m_{y}+\\max_{\\mu\\geq0}\\sum_{xy}%\n", 230 | "\\mu_{xy}\\left( \\Phi_{xy}-u_{x}-v_{y}-\\sigma\\ln\\mu_{xy}\\right)\n", 231 | "\\end{align*}\n", 232 | "\n", 233 | "By FOC in the inner problem, one has $\\Phi_{xy}-u_{x}-v_{y}-\\sigma\\ln \\mu_{xy}-\\sigma=0,$thus\n", 234 | "\n", 235 | "\\begin{align*}\n", 236 | "\\mu_{xy}=\\exp\\left( \\frac{\\Phi_{xy}-u_{x}-v_{y}-\\sigma}{\\sigma}\\right)\n", 237 | "\\end{align*}\n", 238 | "\n", 239 | "and $\\mu_{xy}\\left( \\Phi_{xy}-u_{x}-v_{y}-\\sigma\\ln\\mu_{xy}\\right) =\\sigma\\mu_{xy}$, thus the dual problem is\n", 240 | "\n", 241 | "\\begin{align*}\n", 242 | "\\min_{u,v}\\sum_{x}u_{x}n_{x}+\\sum_{y}v_{y}m_{y}+\\sigma\\sum_{xy}\\exp\\left(\n", 243 | "\\frac{\\Phi_{xy}-u_{x}-v_{y}-\\sigma}{\\sigma}\\right) .\n", 244 | "\\end{align*}" 245 | ] 246 | }, 247 | { 248 | "cell_type": "markdown", 249 | "metadata": {}, 250 | "source": [ 251 | "After replacing $v_{y}$ by $v_{y}+\\sigma$, the dual is\n", 252 | "\n", 253 | "\\begin{align*}\n", 254 | "\\min_{u,v}\\sum_{x}u_{x}n_{x}+\\sum_{y}v_{y}m_{y}+\\sigma\\sum_{xy}\\exp\\left(\n", 255 | "\\frac{\\Phi_{xy}-u_{x}-v_{y}}{\\sigma}\\right) -\\sigma. \\tag{V1}\n", 256 | "\\end{align*}" 257 | ] 258 | }, 259 | { 260 | "cell_type": "markdown", 261 | "metadata": {}, 262 | "source": [ 263 | "### Another expression of the dual\n", 264 | "\n", 265 | "**Claim:** the problem is equivalent to\n", 266 | "\n", 267 | "\n", 268 | "\\begin{align*}\n", 269 | "\\min_{u,v}\\sum_{x}u_{x}n_{x}+\\sum_{y}v_{y}m_{y}+\\sigma\\log\\sum_{i,j}\n", 270 | "\\exp\\left( \\frac{\\Phi_{xy}-u_{x}-v_{y}}{\\sigma}\\right) \\tag{V2}\n", 271 | "\\end{align*}\n", 272 | "\n", 273 | "Indeed, let us go back to the minimax expression\n", 274 | "\n", 275 | "\\begin{align*}\n", 276 | "\\min_{u,v}\\sum_{x}u_{x}n_{x}+\\sum_{y}v_{y}m_{y}+\\max_{\\mu\\geq0}\\sum_{xy}\\mu_{xy}\\left( \\Phi_{xy}-u_{x}-v_{y}-\\sigma\\ln\\mu_{xy}\\right)\n", 277 | "\\end{align*}\n", 278 | "\n", 279 | "we see that the solution $\\mu$ has automatically $\\sum_{xy}\\mu_{xy}=1$; thus we can incorporate the constraint into\n", 280 | "\n", 281 | "\\begin{align*}\n", 282 | "\\min_{u,v}\\sum_{x}u_{x}n_{x}+\\sum_{y}v_{y}m_{y}+\\max_{\\mu\\geq0:\\sum_{xy}\\mu_{xy}=1}\\sum_{xy}\\mu_{xy}\\left( \\Phi_{xy}-u_{x}-v_{y}-\\sigma\\ln\\mu_{xy}\\right)\n", 283 | "\\end{align*}\n", 284 | "\n", 285 | "which yields the [our desired result](#V2)." 286 | ] 287 | }, 288 | { 289 | "cell_type": "markdown", 290 | "metadata": {}, 291 | "source": [ 292 | "[This expression](#V2) is interesting because, taking *any* $\\hat{\\mu}\\in\n", 293 | "M\\left( n,m\\right)$, it reexpresses as\n", 294 | "\n", 295 | "\\begin{align*}\n", 296 | "\\max_{u,v}\\sum_{xy}\\hat{\\mu}_{xy}\\left( \\frac{\\Phi_{xy}-u_{x}-v_{y}}{\\sigma}\\right) -\\log\\sum_{xy}\\exp\\left( \\frac{\\Phi_{xy}-u_{x}-v_{y}}{\\sigma}\\right)\n", 297 | "\\end{align*}\n", 298 | "\n", 299 | "therefore if the parameter is $\\theta=\\left( u,v\\right)$, observations are\n", 300 | "$xy$ pairs, and the likelihood of $xy$ is\n", 301 | "\n", 302 | "\\begin{align*}\n", 303 | "\\mu_{xy}^{\\theta}=\\frac{\\exp\\left( \\frac{\\Phi_{xy}-u_{x}-v_{y}}{\\sigma\n", 304 | "}\\right) }{\\sum_{xy}\\exp\\left( \\frac{\\Phi_{xy}-u_{x}-v_{y}}{\\sigma}\\right)\n", 305 | "}\n", 306 | "\\end{align*}\n", 307 | "\n", 308 | "Hence, [our expression](#problem) will coincide with the maximum likelihood in this model." 309 | ] 310 | }, 311 | { 312 | "cell_type": "markdown", 313 | "metadata": {}, 314 | "source": [ 315 | "### A third expression of the dual problem\n", 316 | "\n", 317 | "Consider\n", 318 | "\n", 319 | "\n", 320 | "\\begin{align*}\n", 321 | "\\min_{u,v} & \\sum_{x}u_{x}n_{x}+\\sum_{y}v_{y}m_{y} \\\\\n", 322 | "s.t. \\quad & \\sum_{i,j}\\exp\\left( \\frac{\\Phi_{xy}-u_{x}-v_{y}}{\\sigma}\\right)\n", 323 | "=1\n", 324 | "\\end{align*}\n", 325 | "\n", 326 | "It is easy to see that the solutions of this problem coincide with [version 2](#V2). Indeed, the Lagrange multiplier is forced to be one. In other words,\n", 327 | "\n", 328 | "\\begin{align*}\n", 329 | "\\min_{u,v} & \\sum_{x}u_{x}n_{x}+\\sum_{y}v_{y}m_{y}\\\\\n", 330 | "s.t. \\quad & \\sigma\\log\\sum_{i,j}\\exp\\left( \\frac{\\Phi_{xy}-u_{x}-v_{y}}{\\sigma\n", 331 | "}\\right) =0\n", 332 | "\\end{align*}" 333 | ] 334 | }, 335 | { 336 | "cell_type": "markdown", 337 | "metadata": {}, 338 | "source": [ 339 | "### Small-temperature limit and the log-sum-exp trick\n", 340 | "\n", 341 | "Recall that when $\\sigma\\rightarrow0$, one has\n", 342 | "\n", 343 | "\\begin{align*}\n", 344 | "\\sigma\\log\\left( e^{a/\\sigma}+e^{b/\\sigma}\\right) \\rightarrow\\max\\left(\n", 345 | "a,b\\right)\n", 346 | "\\end{align*}\n", 347 | "\n", 348 | "Indeed, letting $m=\\max\\left( a,b\\right)$,\n", 349 | "\n", 350 | "\n", 351 | "\\begin{align*}\n", 352 | "\\sigma\\log\\left( e^{a/\\sigma}+e^{b/\\sigma}\\right) =m+\\sigma\\log\\left(\\exp\\left( \\frac{a-m}{\\sigma}\\right) +\\exp\\left( \\frac{b-m}{\\sigma}\\right)\\right),\n", 353 | "\\end{align*}\n", 354 | "and the argument of the logarithm lies between $1$ and $2$.\n", 355 | "\n", 356 | "This simple remark is actually a useful numerical recipe called the *log-sum-exp trick*: when $\\sigma$ is small, using [the formula above](#lse) to compute $\\sigma\\log\\left( e^{a/\\sigma}+e^{b/\\sigma}\\right)$ ensures the exponentials won't blow up.\n" 357 | ] 358 | }, 359 | { 360 | "cell_type": "markdown", 361 | "metadata": {}, 362 | "source": [ 363 | "### The log-sum-exp trick for regularized OT\n", 364 | "\n", 365 | "Back to the third expression, with $\\sigma\\rightarrow0$, one has\n", 366 | "\n", 367 | "\\begin{align*}\n", 368 | "\\min_{u,v} & \\sum_{x}u_{x}n_{x}+\\sum_{y}v_{y}m_{y}\\tag{V3}\\\\\n", 369 | "s.t. & \\max_{xy}\\left( \\Phi_{xy}-u_{x}-v_{y}\\right) =0\\nonumber\n", 370 | "\\end{align*}\n", 371 | "\n", 372 | "This is exactly equivalent with the classical Monge-Kantorovich expression\n", 373 | "\n", 374 | "\\begin{align*}\n", 375 | "\\min_{u,v} & \\sum_{x}u_{x}n_{x}+\\sum_{y}v_{y}m_{y}\\tag{V3}\\\\\n", 376 | "s.t. & \\Phi_{xy}-u_{x}-v_{y}\\leq0\\nonumber\n", 377 | "\\end{align*}" 378 | ] 379 | }, 380 | { 381 | "cell_type": "markdown", 382 | "metadata": {}, 383 | "source": [ 384 | "Back to the third expression of the dual, with $\\sigma\\rightarrow0$, one has\n", 385 | "\n", 386 | "\\begin{align*}\n", 387 | "\\min_{u,v} & \\sum_{x}u_{x}n_{x}+\\sum_{y}v_{y}m_{y}\\tag{V3}\\\\\n", 388 | "s.t. & \\max_{xy}\\left( \\Phi_{xy}-u_{x}-v_{y}\\right) =0\\nonumber\n", 389 | "\\end{align*}\n", 390 | "\n", 391 | "This is exactly equivalent with the classical Monge-Kantorovich expression\n", 392 | "\n", 393 | "\\begin{align*}\n", 394 | "\\min_{u,v} & \\sum_{x}u_{x}n_{x}+\\sum_{y}v_{y}m_{y}\\tag{V3}\\\\\n", 395 | "s.t. & \\Phi_{xy}-u_{x}-v_{y}\\leq0\\nonumber\n", 396 | "\\end{align*}" 397 | ] 398 | }, 399 | { 400 | "cell_type": "markdown", 401 | "metadata": {}, 402 | "source": [ 403 | "### Computation\n", 404 | "\n", 405 | "We can compute $\\min F\\left( x\\right)$ by two methods:\n", 406 | "\n", 407 | "Either by gradient descent: $x\\left( t+1\\right) =x_{t}-\\epsilon _{t}\\nabla F\\left( x_{t}\\right) $. (Steepest descent has $\\epsilon _{t}=1/\\left\\vert \\nabla F\\left( x_{t}\\right) \\right\\vert $.)\n", 408 | "\n", 409 | "Or by coordinate descent: $x_{k}\\left( t+1\\right) =\\arg\\min_{x_{k}}F\\left( x_{k},x_{-k}\\left( t\\right) \\right)$.\n", 410 | "\n", 411 | "Why do these methods converge? Let's provide some justification. We will decrease $x_{t}$ by $\\epsilon d_{t}$, were $d_{t}$ is normalized by $\\left\\vert d_{t}\\right\\vert _{p}:=\\left( \\sum_{i=1}^{n}d_{t}^{i}\\right) ^{1/p}=1$. At first order, we have \n", 412 | "\n", 413 | "\\begin{align*}\n", 414 | "F\\left( x(t)-\\epsilon d_{t}\\right) =F\\left( x(t)\\right) -\\epsilon d_{t}^{\\intercal}\\nabla F\\left( x (t)\\right) +O\\left( \\epsilon^{1}\\right).\n", 415 | "\\end{align*}\n", 416 | "\n", 417 | "We need to maximize $d_{t}^{\\intercal}\\nabla F\\left( x(t)\\right)$ over $\\left\\vert d_{t}\\right\\vert _{p}=1$.\n", 418 | "\n", 419 | "* For $p=2$, we get $d_{t}=\\nabla F\\left( x(t)\\right) /\\left\\vert \\nabla F\\left( x(t)\\right) \\right\\vert $\n", 420 | "\n", 421 | "* For $p=1$, we get $d_{t}=sign\\left( \\partial F\\left( x(t)\\right)/\\partial x_{k}\\right) $ if $\\left\\vert \\partial F\\left( x_{t}\\right) /\\partial x_{k}\\right\\vert =\\max_{l}\\left\\vert \\partial F\\left( x(t)\\right) /\\partial x_{l}\\right\\vert $, $0$ otherwise.\n" 422 | ] 423 | }, 424 | { 425 | "cell_type": "markdown", 426 | "metadata": {}, 427 | "source": [ 428 | "In our context, gradient descent is\n", 429 | "\n", 430 | "\\begin{align*}\n", 431 | "u_{x}\\left( t+1\\right) & =u_{x}\\left( t\\right) -\\epsilon\\frac{\\partial\n", 432 | "F}{\\partial u_{x}}\\left( u\\left( t\\right) ,v\\left( t\\right) \\right)\n", 433 | ",\\text{ and }\\\\\n", 434 | "v_{y}\\left( t+1\\right) & =v_{y}\\left( t\\right) -\\epsilon\\frac{\\partial\n", 435 | "F}{\\partial v_{y}}\\left( u\\left( t\\right) ,v\\left( t\\right) \\right)\n", 436 | "\\end{align*}\n", 437 | "\n", 438 | "while coordinate descent is\n", 439 | "\n", 440 | "\\begin{align*}\n", 441 | "\\frac{\\partial F}{\\partial u_{x}}\\left( u_{x}\\left( t+1\\right)\n", 442 | ",u_{-i}\\left( t\\right) ,v\\left( t\\right) \\right) =0,\\text{ and }\n", 443 | "\\frac{\\partial F}{\\partial v_{y}}\\left( u\\left( t\\right) ,v_{y}\\left(\n", 444 | "t+1\\right) ,v_{-j}\\left( t\\right) \\right) =0.\n", 445 | "\\end{align*}" 446 | ] 447 | }, 448 | { 449 | "cell_type": "markdown", 450 | "metadata": {}, 451 | "source": [ 452 | "### Gradient descent\n", 453 | "\n", 454 | "Gradient of objective function in version 1 of our problem:\n", 455 | "\n", 456 | "\\begin{align*}\n", 457 | "\\left( n_{x}-\\sum_{y}\\exp\\left( \\frac{\\Phi_{xy}-u_{x}-v_{y}}{\\sigma}\\right)\n", 458 | ",m_{y}-\\sum_{x}\\exp\\left( \\frac{\\Phi_{xy}-u_{x}-v_{y}}{\\sigma}\\right)\n", 459 | "\\right)\n", 460 | "\\end{align*}\n", 461 | "\n", 462 | "Gradient of objective function in version 2\n", 463 | "\n", 464 | "\\begin{align*}\n", 465 | "\\left( n_{x}-\\frac{\\sum_{y}\\exp\\left( \\frac{\\Phi_{xy}-u_{x}-v_{y}}{\\sigma\n", 466 | "}\\right) }{\\sum_{xy}\\exp\\left( \\frac{\\Phi_{xy}-u_{x}-v_{y}}{\\sigma}\\right)\n", 467 | "},m_{y}-\\frac{\\sum_{x}\\exp\\left( \\frac{\\Phi_{xy}-u_{x}-v_{y}}{\\sigma}\\right)\n", 468 | "}{\\sum_{xy}\\exp\\left( \\frac{\\Phi_{xy}-u_{x}-v_{y}}{\\sigma}\\right) }\\right)\n", 469 | "\\end{align*}" 470 | ] 471 | }, 472 | { 473 | "cell_type": "markdown", 474 | "metadata": {}, 475 | "source": [ 476 | "### Coordinate descent\n", 477 | "\n", 478 | "Coordinate descent on objective function in version 1:\n", 479 | "\n", 480 | "\\begin{align*}\n", 481 | "n_{x} & =\\sum_{y}\\exp\\left( \\frac{\\Phi_{xy}-u_{x}\\left( t+1\\right)\n", 482 | "-v_{y}\\left( t\\right) }{\\sigma}\\right) ,\\\\\n", 483 | "m_{y} & =\\sum_{x}\\exp\\left( \\frac{\\Phi_{xy}-u_{x}\\left( t\\right)\n", 484 | "-v_{y}\\left( t+1\\right) }{\\sigma}\\right)\n", 485 | "\\end{align*}\n", 486 | "\n", 487 | "that is\n", 488 | "\n", 489 | "\\begin{align*}\n", 490 | "\\left\\{\n", 491 | "\\begin{array}\n", 492 | "[c]{c}\n", 493 | "u_{x}\\left( t+1\\right) =\\sigma\\log\\left( \\frac{1}{n_{x}}\\sum_{y}\\exp\\left(\n", 494 | "\\frac{\\Phi_{xy}-v_{y}\\left( t\\right) }{\\sigma}\\right) \\right) \\\\\n", 495 | "v_{y}\\left( t+1\\right) =\\sigma\\log\\left( \\frac{1}{m_{y}}\\sum_{x}\\exp\\left(\n", 496 | "\\frac{\\Phi_{xy}-u_{x}\\left( t\\right) }{\\sigma}\\right) \\right)\n", 497 | "\\end{array}\n", 498 | "\\right.\n", 499 | "\\end{align*}\n", 500 | "\n", 501 | "this is called the Iterated Fitting Proportional Procedure (IPFP), or Sinkhorn's algorithm.\n", 502 | "\n", 503 | "Coordinate descent on objective function in version 2 does not yield a closed-form expression." 504 | ] 505 | }, 506 | { 507 | "cell_type": "markdown", 508 | "metadata": {}, 509 | "source": [ 510 | "### IPFP, matrix version\n", 511 | "\n", 512 | "Letting $a_{x}=\\exp\\left( -u_{x}/\\sigma\\right) $ and $b_{y}=\\exp\\left( -v_{y}/\\sigma\\right) $ and $K_{xy}=\\exp\\left( \\Phi_{xy}/\\sigma\\right) $, one has $\\mu_{xy}=a_{x}b_{y}K_{xy}$, and the procedure reexpresses as\n", 513 | "\n", 514 | "\\begin{align*}\n", 515 | "\\left\\{\n", 516 | "\\begin{array}\n", 517 | "[c]{l}%\n", 518 | "a_{x}\\left( t+1\\right) =n_{x}/\\left( Kb\\left( t\\right) \\right)\n", 519 | "_{x}\\text{ and }\\\\\n", 520 | "b_{y}\\left( t+1\\right) =m_{y}/\\left( K^{\\intercal}a\\left( t\\right)\n", 521 | "\\right) _{y}.\n", 522 | "\\end{array}\n", 523 | "\\right.\n", 524 | "\\end{align*}\n", 525 | "\n", 526 | "Because this algorithm involves matrix operations only, and is naturally suited for parallel computation, GPUs are a tool of choice for addressing is. See chap. 4 of Peyré and Cuturi." 527 | ] 528 | }, 529 | { 530 | "cell_type": "markdown", 531 | "metadata": {}, 532 | "source": [ 533 | "## Implementation \n", 534 | "First, as a convenience, we would like to have a function that computes $\\sum_k \\mu_k \\log \\mu_k$ without failing when some of the entries of $\\mu_k$ are zero. We code this into:\n", 535 | "\n" 536 | ] 537 | }, 538 | { 539 | "cell_type": "code", 540 | "execution_count": null, 541 | "metadata": {}, 542 | "outputs": [], 543 | "source": [ 544 | "def sum_xlogx(a):\n", 545 | " s=a.sum()\n", 546 | " return s*np.log(s) - s * entropy(a.flatten(),axis=None)" 547 | ] 548 | }, 549 | { 550 | "cell_type": "markdown", 551 | "metadata": {}, 552 | "source": [ 553 | "Let's implement this algorithm. Return to the matrix-IPFP algorithm:" 554 | ] 555 | }, 556 | { 557 | "cell_type": "code", 558 | "execution_count": null, 559 | "metadata": {}, 560 | "outputs": [], 561 | "source": [ 562 | "def matrixIPFP(self,σ , tol = 1e-9, maxite = 1e+06 ):\n", 563 | " ptm = time()\n", 564 | " ite = 0\n", 565 | " K_x_y = np.exp(self.Φ_a / σ).reshape(self.nbx,-1)\n", 566 | " B_y = np.ones(self.nby)\n", 567 | " error = tol + 1\n", 568 | " while error > tol and ite < maxite:\n", 569 | " A_x = self.n_x / (K_x_y @ B_y)\n", 570 | " KA_y = (A_x.T @ K_x_y)\n", 571 | " error = (abs(KA_y * B_y / self.m_y)-1).max()\n", 572 | " B_y = self.m_y / KA_y\n", 573 | " ite = ite + 1\n", 574 | " \n", 575 | " u_x,v_y = - σ * np.log(A_x),- σ * np.log(B_y)\n", 576 | " μ_x_y = K_x_y * A_x.reshape((-1,1)) * B_y.reshape((1,-1))\n", 577 | " valobs = self.Φ_a.dot(μ_x_y.flatten())\n", 578 | " valtot = valobs - σ * sum_xlogx(μ_x_y)\n", 579 | " taken = time() - ptm\n", 580 | " if ite >= maxite:\n", 581 | " print('Maximum number of iteations reached in matrix IPFP.') \n", 582 | " return μ_x_y,u_x,v_y,valobs,valtot,ite, taken, 'matrix IPFP'\n", 583 | "\n", 584 | "OTProblem.matrixIPFP = matrixIPFP" 585 | ] 586 | }, 587 | { 588 | "cell_type": "code", 589 | "execution_count": null, 590 | "metadata": {}, 591 | "outputs": [], 592 | "source": [ 593 | "display(marriage_ex.matrixIPFP(0.1))\n" 594 | ] 595 | }, 596 | { 597 | "cell_type": "markdown", 598 | "metadata": {}, 599 | "source": [ 600 | "To see the benefit of the matrix version, let us recode the same algorithm as above, but in the log-domain, namely iterate over the values of $u$ and $v$." 601 | ] 602 | }, 603 | { 604 | "cell_type": "code", 605 | "execution_count": null, 606 | "metadata": {}, 607 | "outputs": [], 608 | "source": [ 609 | "def logdomainIPFP(self,σ = 0.1, tol = 1e-9, maxite = 1e+06 ):\n", 610 | " ptm = time()\n", 611 | " ite = 0\n", 612 | " Φ_x_y = self.Φ_a.reshape(self.nbx,-1)\n", 613 | " v_y = np.zeros(self.nby)\n", 614 | " λ_x,ζ_y = - σ * np.log(self.n_x), - σ * np.log(self.m_y)\n", 615 | " error = tol + 1\n", 616 | " while error > tol and ite < maxite:\n", 617 | " u_x = λ_x + σ * np.log( (np.exp((Φ_x_y - v_y.reshape((1,-1)))/σ)).sum( axis=1) )\n", 618 | " KA_y = (np.exp((Φ_x_y -u_x.reshape((-1,1))) / σ)).sum(axis=0)\n", 619 | " error = np.max(np.abs(KA_y * np.exp(-v_y / σ) / self.m_y - 1))\n", 620 | " v_y = ζ_y + σ * np.log(KA_y)\n", 621 | " ite = ite + 1\n", 622 | " \n", 623 | " μ_x_y =np.exp((Φ_x_y -u_x.reshape((-1,1)) - v_y.reshape((1,-1)))/σ )\n", 624 | " valobs = self.Φ_a.dot(μ_x_y.flatten())\n", 625 | " valtot = valobs - σ * sum_xlogx(μ_x_y)\n", 626 | " taken = time() - ptm\n", 627 | " if ite >= maxite:\n", 628 | " print('Maximum number of iteations reached in log-domain IPFP.')\n", 629 | " return μ_x_y,u_x,v_y,valobs,valtot,ite, taken, 'log-domain IPFP'\n", 630 | "\n", 631 | "OTProblem.logdomainIPFP = logdomainIPFP" 632 | ] 633 | }, 634 | { 635 | "cell_type": "code", 636 | "execution_count": null, 637 | "metadata": {}, 638 | "outputs": [], 639 | "source": [ 640 | "display(marriage_ex.logdomainIPFP(0.1))\n", 641 | "display(marriage_ex.matrixIPFP(0.1))" 642 | ] 643 | }, 644 | { 645 | "cell_type": "markdown", 646 | "metadata": {}, 647 | "source": [ 648 | "We see that the log-domain IPFP, while mathematically equivalent to matrix IPFP, it is noticeably slower. " 649 | ] 650 | }, 651 | { 652 | "cell_type": "markdown", 653 | "metadata": {}, 654 | "source": [ 655 | "### IPFP with the log-sum-exp trick\n", 656 | "\n", 657 | "The matrix IPFPis very fast, partly due to the fact that it involves linear algebra operations. However, it breaks down when $\\sigma$ is small; this is best seen taking a log transform and returning to $u^{k}=-\\sigma\\log a^{k}$ and $v^{k}=-\\sigma\\log b^{k}$, that is\n", 658 | "\n", 659 | "\\begin{align*}\n", 660 | "\\left\\{\n", 661 | "\\begin{array}\n", 662 | "[c]{l}%\n", 663 | "u_{x}^{k}=\\mu_{x}+\\sigma\\log\\sum_{y}\\exp\\left( \\frac{\\Phi_{xy}-v_{y}^{k-1}%\n", 664 | "}{\\sigma}\\right) \\\\\n", 665 | "v_{y}^{k}=\\zeta_{y}+\\sigma\\log\\sum_{x}\\exp\\left( \\frac{\\Phi_{xy}-u_{x}^{k}%\n", 666 | "}{\\sigma}\\right)\n", 667 | "\\end{array}\n", 668 | "\\right.\n", 669 | "\\end{align*}\n", 670 | "\n", 671 | "where $\\mu_{x}=-\\sigma\\log n_{x}$ and $\\zeta_{y}=-\\sigma\\log m_{y}$.\n", 672 | "\n", 673 | "One sees what may go wrong: if $\\Phi_{xy}-v_{y}^{k-1}$ is positive in the exponential in the first sum, then the exponential blows up due to the small $\\sigma$ at the denominator. However, the log-sum-exp trick can be used in order to avoid this issue.\n", 674 | "\n" 675 | ] 676 | }, 677 | { 678 | "cell_type": "markdown", 679 | "metadata": {}, 680 | "source": [ 681 | "Consider\n", 682 | "\n", 683 | "\\begin{align*}\n", 684 | "\\left\\{\n", 685 | "\\begin{array}\n", 686 | "[c]{l}%\n", 687 | "\\tilde{v}_{x}^{k}=\\max_{y}\\left\\{ \\Phi_{xy}-v_{y}^{k}\\right\\} \\\\\n", 688 | "\\tilde{u}_{y}^{k}=\\max_{x}\\left\\{ \\Phi_{xy}-u_{x}^{k}\\right\\}\n", 689 | "\\end{array}\n", 690 | "\\right.\n", 691 | "\\end{align*}\n", 692 | "\n", 693 | "(the indexing is not a typo: $\\tilde{v}$ is indexed by $i$ and $\\tilde{u}$ by $j$).\n", 694 | "\n", 695 | "One has\n", 696 | "\n", 697 | "\\begin{align*}\n", 698 | "\\left\\{\n", 699 | "\\begin{array}\n", 700 | "[c]{l}%\n", 701 | "u_{x}^{k}=\\mu_{x}+\\tilde{v}_{x}^{k-1}+\\sigma\\log\\sum_{y}\\exp\\left( \\frac\n", 702 | "{\\Phi_{xy}-v_{y}^{k-1}-\\tilde{v}_{x}^{k}}{\\sigma}\\right) \\\\\n", 703 | "v_{y}^{k}=\\zeta_{y}+\\tilde{u}_{y}^{k}+\\sigma\\log\\sum_{x}\\exp\\left( \\frac\n", 704 | "{\\Phi_{xy}-u_{x}^{k}-\\tilde{u}_{y}^{k}}{\\sigma}\\right)\n", 705 | "\\end{array}\n", 706 | "\\right.\n", 707 | "\\end{align*}\n", 708 | "\n", 709 | "and now the arguments of the exponentials are always nonpositive, ensuring the exponentials don't blow up." 710 | ] 711 | }, 712 | { 713 | "cell_type": "markdown", 714 | "metadata": {}, 715 | "source": [ 716 | "Both the matrix version and the log-domain version of the IPFP will break down when $\\sigma$ is small, e.g. $\\sigma=0.001$ (Try!). However if we modify the second procedure using the log-sum-exp trick, things work again:" 717 | ] 718 | }, 719 | { 720 | "cell_type": "code", 721 | "execution_count": null, 722 | "metadata": {}, 723 | "outputs": [], 724 | "source": [ 725 | "def logdomainIPFP_with_LSE_trick(self,σ , tol = 1e-9, maxite = 1e+06 ):\n", 726 | " ptm = time()\n", 727 | " ite = 0\n", 728 | " Φ_x_y = self.Φ_a.reshape(self.nbx,-1)\n", 729 | " v_y = np.zeros(self.nby)\n", 730 | " λ_x,ζ_y = - σ * np.log(self.n_x), - σ * np.log(self.m_y)\n", 731 | " error = tol + 1\n", 732 | " while error > tol and ite < maxite:\n", 733 | " vstar_x = (Φ_x_y - v_y.reshape((1,-1))).max( axis = 1)\n", 734 | " u_x = λ_x + vstar_x + σ * np.log( (np.exp((Φ_x_y - vstar_x.reshape((-1,1)) - v_y.reshape((1,-1)))/σ)).sum( axis=1) )\n", 735 | " ustar_y = (Φ_x_y - u_x.reshape((-1,1)) ).max( axis = 0)\n", 736 | " KA_y = (np.exp((Φ_x_y -u_x.reshape((-1,1)) - ustar_y.reshape((1,-1)) ) / σ)).sum(axis=0)\n", 737 | " error = np.max(np.abs(KA_y * np.exp( (ustar_y-v_y) / σ) / self.m_y - 1))\n", 738 | " v_y = ζ_y + ustar_y+ σ * np.log(KA_y)\n", 739 | " ite = ite + 1\n", 740 | " μ_x_y =np.exp((Φ_x_y -u_x.reshape((-1,1)) - v_y.reshape((1,-1)))/σ )\n", 741 | " valobs = self.Φ_a.dot(μ_x_y.flatten())\n", 742 | " valtot = valobs - σ * sum_xlogx(μ_x_y)\n", 743 | " taken = time() - ptm\n", 744 | " if ite >= maxite:\n", 745 | " print('Maximum number of iteations reached in log-domain IPFP with LSE trick.')\n", 746 | " return μ_x_y,u_x,v_y,valobs,valtot,ite, taken, 'log-domain IPFP with LSE trick'\n", 747 | "\n", 748 | "OTProblem.logdomainIPFP_with_LSE_trick = logdomainIPFP_with_LSE_trick" 749 | ] 750 | }, 751 | { 752 | "cell_type": "code", 753 | "execution_count": null, 754 | "metadata": {}, 755 | "outputs": [], 756 | "source": [ 757 | "display(marriage_ex.logdomainIPFP_with_LSE_trick(0.1))\n", 758 | "display(marriage_ex.logdomainIPFP(0.1))" 759 | ] 760 | }, 761 | { 762 | "cell_type": "markdown", 763 | "metadata": {}, 764 | "source": [ 765 | "In contrast, when $\\sigma = 0.01$ we see that the algorithm works with the log-sum-exp trick but fails without:" 766 | ] 767 | }, 768 | { 769 | "cell_type": "code", 770 | "execution_count": null, 771 | "metadata": {}, 772 | "outputs": [], 773 | "source": [ 774 | "# display(marriage_ex.logdomainIPFP_with_LSE_trick(0.1))\n", 775 | "# display(marriage_ex.logdomainIPFP(0.001))" 776 | ] 777 | }, 778 | { 779 | "cell_type": "markdown", 780 | "metadata": {}, 781 | "source": [ 782 | "## Computations using GLM\n", 783 | "\n", 784 | "Recall that the *margining-out matrix* of shape $\\left( nbx+nby\\right) \\times \\left(\n", 785 | "nbx.nby\\right) $ is given by\n", 786 | "\n", 787 | "$M=\\binom{I_{X}\\otimes 1_{Y}^{\\top }}{1_{Y}^{\\top }\\otimes I_{Y}}$\n", 788 | "\n", 789 | "Introduce $\\hat{\\mu}=nm^{\\top } / (\\sum_x n_x)$\n", 790 | "\n", 791 | "We have $M\\hat{\\mu}=\\binom{n}{m}$" 792 | ] 793 | }, 794 | { 795 | "cell_type": "markdown", 796 | "metadata": {}, 797 | "source": [ 798 | "The problem writes\n", 799 | "\n", 800 | "$\\min_{p}1_{\\mathcal{A}}^{\\top }\\exp \\left( \\frac{\\Phi -M^{\\top }p}{\\sigma }%\n", 801 | "\\right) -\\hat{\\mu}^{\\top }\\left( \\frac{\\Phi -M^{\\top }p}{\\sigma }\\right) $\n", 802 | "\n", 803 | "And setting $\\tilde{p}=p/\\sigma $ and $\\tilde{\\Phi}=\\Phi /\\sigma $ yields\n", 804 | "that $\\tilde{p}$ is obtained by\n", 805 | "\n", 806 | "$\\min_{\\tilde{p}}1^{\\top }\\exp \\left( \\tilde{\\Phi}-M^{\\top }\\tilde{p}\\right)\n", 807 | "-\\hat{\\mu}^{\\top }\\left( \\tilde{\\Phi}-M^{\\top }\\tilde{p}\\right) $\n", 808 | "\n", 809 | "which we can rewrite as a weighted Poisson regression \n", 810 | "\n", 811 | "$\\min_{\\tilde{p}}\\sum_{a\\in \\mathcal{A}}w_{a}\\exp \\left( -\\left( M^{\\top }%\n", 812 | "\\tilde{p}\\right) _{a}\\right) -\\sum_{a\\in \\mathcal{A}}w_{a}\\hat{\\mu}_{a}e^{-%\n", 813 | "\\tilde{\\Phi}_{a}}\\left( \\tilde{\\Phi}-M^{\\top }\\tilde{p}\\right) _{a}$\n", 814 | "\n", 815 | "where $w_{a}=\\exp \\tilde{\\Phi}_{a}$\n", 816 | "\n", 817 | "Dropping the constant term, this implements \n", 818 | "\n", 819 | "$\\min_{\\tilde{p}}\\sum_{a\\in \\mathcal{A}}w_{a}\\exp \\left( -M^{\\top }\\tilde{p}%\n", 820 | "\\right) _{a}+\\sum_{a\\in \\mathcal{A}}w_{a}\\left( \\hat{\\mu}e^{-\\tilde{\\Phi}%\n", 821 | "}\\right) _{a}\\left( M^{\\top }\\tilde{p}\\right) _{a}$\n", 822 | "\n" 823 | ] 824 | }, 825 | { 826 | "cell_type": "code", 827 | "execution_count": null, 828 | "metadata": {}, 829 | "outputs": [], 830 | "source": [ 831 | "def solveGLM(self,σ , tol = 1e-9):\n", 832 | " ptm = time()\n", 833 | " muhat_a = (self.n_x.reshape((self.nbx,-1)) @ self.m_y.reshape((-1,self.nby))).flatten() / self.n_x.sum()\n", 834 | " ot_as_glm = linear_model.PoissonRegressor(fit_intercept=False,tol=tol ,verbose=3,alpha=0)\n", 835 | " ot_as_glm.fit( - self.M_z_a().T, muhat_a * np.exp(-self.Φ_a / σ) , sample_weight = np.exp(self.Φ_a / σ))\n", 836 | "\n", 837 | " p = σ * ot_as_glm.coef_\n", 838 | " u_x,v_y = p[:self.nbx] - p[0], p[self.nbx:]+p[0]\n", 839 | " μ_x_y =np.exp((self.Φ_a.reshape((self.nbx,-1)) -u_x.reshape((-1,1)) - v_y.reshape((1,-1)))/σ )\n", 840 | " valobs = self.Φ_a.dot(μ_x_y.flatten())\n", 841 | " valtot = valobs - σ * sum_xlogx(μ_x_y)\n", 842 | " taken = time() - ptm\n", 843 | " return μ_x_y,u_x,v_y,valobs,valtot,None, taken, 'GLM'\n", 844 | "\n", 845 | "OTProblem.solveGLM = solveGLM" 846 | ] 847 | }, 848 | { 849 | "cell_type": "code", 850 | "execution_count": null, 851 | "metadata": {}, 852 | "outputs": [], 853 | "source": [ 854 | "σ = 0.5\n", 855 | "display(marriage_ex.solveGLM(σ ))\n", 856 | "display(marriage_ex.matrixIPFP(σ ))" 857 | ] 858 | }, 859 | { 860 | "cell_type": "markdown", 861 | "metadata": {}, 862 | "source": [ 863 | "However, when $\\sigma = 0.1$ the GLM approach fails:" 864 | ] 865 | }, 866 | { 867 | "cell_type": "code", 868 | "execution_count": null, 869 | "metadata": {}, 870 | "outputs": [], 871 | "source": [ 872 | "σ = 0.1\n", 873 | "display(marriage_ex.solveGLM(σ ))\n", 874 | "display(marriage_ex.matrixIPFP(σ ))" 875 | ] 876 | } 877 | ], 878 | "metadata": { 879 | "kernelspec": { 880 | "display_name": "Python 3", 881 | "language": "python", 882 | "name": "python3" 883 | }, 884 | "language_info": { 885 | "codemirror_mode": { 886 | "name": "ipython", 887 | "version": 3 888 | }, 889 | "file_extension": ".py", 890 | "mimetype": "text/x-python", 891 | "name": "python", 892 | "nbconvert_exporter": "python", 893 | "pygments_lexer": "ipython3", 894 | "version": "3.8.5" 895 | } 896 | }, 897 | "nbformat": 4, 898 | "nbformat_minor": 2 899 | } 900 | -------------------------------------------------------------------------------- /L06_gravity_equation.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "#
Block 5b: the gravity equation
\n", 8 | "###
Alfred Galichon (NYU & Sciences Po)
\n", 9 | "##
'math+econ+code' masterclass on optimal transport and economic applications
\n", 10 | "####
With python code examples
\n", 11 | "© 2018-2022 by Alfred Galichon. Past and present support from NSF grant DMS-1716489, ERC grant CoG-866274 are acknowledged, as well as inputs from contributors listed [here](http://www.math-econ-code.org/theteam), in particular Giovanni Montanari.\n", 12 | "\n", 13 | "**If you reuse material from this masterclass, please cite as:**
\n", 14 | "Alfred Galichon, 'math+econ+code' masterclass on optimal transport and economic applications, January 2022. https://github.com/math-econ-code/mec_optim" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "## Learning objectives\n", 22 | "\n", 23 | "* Regularized optimal transport\n", 24 | "\n", 25 | "* The gravity equation\n", 26 | "\n", 27 | "* Generalized linear models\n", 28 | "\n", 29 | "* Pseudo-Poisson maximum likelihood estimation" 30 | ] 31 | }, 32 | { 33 | "cell_type": "markdown", 34 | "metadata": {}, 35 | "source": [ 36 | "## References\n", 37 | "\n", 38 | "* Anderson and van Wincoop (2003). \"Gravity with Gravitas: A Solution to the Border Puzzle\". *American Economic Review*.\n", 39 | "\n", 40 | "* Head and Mayer (2014). \"Gravity Equations: Workhorse, Toolkit and Cookbook\". *Handbook of International Economics*.\n", 41 | "\n", 42 | "* Choo and Siow (2005). \"Who marries whom and why\". *Journal of Political Economy*.\n", 43 | "\n", 44 | "* Gourieroux, Trognon, Monfort (1984). \"Pseudo Maximum Likelihood Methods: Theory\". *Econometrica*.\n", 45 | "\n", 46 | "* McCullagh and Nelder (1989). *Generalized Linear Models*. Chapman and Hall/CRC.\n", 47 | "\n", 48 | "* Santos Silva and Tenreyro (2006). \"The Log of Gravity\". *Review of Economics and Statistics*.\n", 49 | "\n", 50 | "* Yotov et al. (2011). *An advanced guide to trade policy analysis*. WTO.\n", 51 | "\n", 52 | "* Guimares and Portugal (2012). \"Real Wages and the Business Cycle: Accounting for Worker, Firm, and Job Title Heterogeneity\". *AEJ: Macro*.\n", 53 | "\n", 54 | "* Dupuy and G (2014), \"Personality traits and the marriage market\". *Journal of Political Economy*.\n", 55 | "\n", 56 | "* Dupuy, G and Sun (2019), \"Estimating matching affinity matrix under low-rank constraints\". *Information and Inference*.\n", 57 | "\n", 58 | "* Carlier, Dupuy, Galichon and Sun \"SISTA: learning optimal transport costs under sparsity constraints.\" *Communications on Pure and Applied Mathematics* (forthcoming)." 59 | ] 60 | }, 61 | { 62 | "cell_type": "markdown", 63 | "metadata": {}, 64 | "source": [ 65 | "# Motivation\n", 66 | "\n", 67 | "The gravity equation is a very useful tool for explaining trade flows by various measures of proximity between countries.\n", 68 | "\n", 69 | "A number of regressors have been proposed. They include: geographic distance, common official languague, common colonial past, share of common religions, etc.\n", 70 | "\n", 71 | "The dependent variable is the volume of exports from country $x$ to country $y$, for each pair of country $\\left( x, y\\right)$.\n", 72 | "\n", 73 | "Today, we shall see a close connection between gravity models of international trade and separable matching models." 74 | ] 75 | }, 76 | { 77 | "cell_type": "markdown", 78 | "metadata": {}, 79 | "source": [ 80 | "# The gravity equation\n", 81 | "\n", 82 | "\"Structural gravity equation\" (Anderson and van Wincoop, 2003) as exposited in Head and Mayer (2014)\n", 83 | "handbook chapter:\n", 84 | "\n", 85 | "\\begin{align*}\n", 86 | "\\mu_{xy}=\\frac{n_x}{\\Psi_{x}} \\frac{m_y}{\\Omega_{y}} \\Phi_{xy}%\n", 87 | "\\end{align*}\n", 88 | "\n", 89 | "where $x$=exporter, $y$=importer, $\\mu_{xy}$=trade flow from $x$ to $y$, $n_x=\\sum_{y}\\mu_{xy}$ is value of production, $m_y=\\sum_{x}\\mu_{xy}$ is importers' expenditures, and $\\phi_{xy}$=bilateral accessibility of $x$ to $y$.\n", 90 | "\n", 91 | "$\\Omega_{y}$ and $\\Psi_{x}$ are *multilateral resistances*, satisfying the set of implicit equations\n", 92 | "\n", 93 | "\\begin{align*}\n", 94 | "\\Psi_{x}=\\sum_{y}\\frac{\\Phi_{xy}m_y}{\\Omega_{y}}\\text{ and }\\Omega_{y}%\n", 95 | "=\\sum_{x}\\frac{\\Phi_{xy}n_x}{\\Psi_{x}}%\n", 96 | "\\end{align*}\n", 97 | "\n", 98 | "We will see that these are exactly the same equations as those of the regularized OT." 99 | ] 100 | }, 101 | { 102 | "cell_type": "markdown", 103 | "metadata": {}, 104 | "source": [ 105 | "## Explaining trade\n", 106 | "\n", 107 | "Parameterize $\\Phi_{xy}=\\exp\\left( \\sum_{k=1}^{K}\\beta_{k}D_{xy}^{k}\\right) $, where the $D_{xy}^{k}$ are $K$ pairwise measures of distance between $x$ and $y$. We have\n", 108 | "\n", 109 | "\\begin{align*}\n", 110 | "\\mu_{xy}=\\exp\\left( \\sum_{k=1}^{K}\\beta_{k}D_{xy}^{k}-a_{x}-b_{y}\\right)\n", 111 | "\\end{align*}\n", 112 | "\n", 113 | "where fixed effects $b_{y}=-\\ln \\frac{m_y}{\\Omega_{y}}$ and $a_{x}=-\\ln \\frac{n_x}{\\Psi_{x}}$ are adjusted by\n", 114 | "\n", 115 | "\\begin{align*}\n", 116 | "\\sum_{x}\\mu_{xy}=n_x\\text{ and }\\sum_{y}\\mu_{xy}=m_y.\n", 117 | "\\end{align*}\n", 118 | "\n", 119 | "Standard choices of $D_{xy}^{k}$'s:\n", 120 | "\n", 121 | "* Logarithm of bilateral distance between $x$ and $y$\n", 122 | "\n", 123 | "* Indicator of contiguous borders; of common official language; of\n", 124 | "colonial ties\n", 125 | "\n", 126 | "* Trade policy variables: presence of a regional trade agreement; tariffs\n", 127 | "\n", 128 | "* Could include many other measures of proximity, e.g. measure of genetic/cultural distance, intensity of communications, etc." 129 | ] 130 | }, 131 | { 132 | "cell_type": "markdown", 133 | "metadata": {}, 134 | "source": [ 135 | "### Regularized optimal transport\n", 136 | "\n", 137 | "Consider the optimal transport duality\n", 138 | "\n", 139 | "\\begin{align*}\n", 140 | "\\max_{\\mu\\in\\mathcal{M}\\left( P,Q\\right) }\\sum_{xy}\\mu_{xy}\\Phi_{xy}=\\min_{u_{x}+v_{y}\\geq\\Phi_{xy}}\\sum_{x\\in\\mathcal{X}}n_xu_{x}+\\sum_{y\\in\\mathcal{Y}}m_yv_{y}\n", 141 | "\\end{align*}\n", 142 | "\n", 143 | "Now let's assume that we are adding an entropy to the primal objective function. For any $\\sigma>0$, we get\n", 144 | "\n", 145 | "\\begin{align*}\n", 146 | "& \\max_{\\mu\\in\\mathcal{M}\\left( P,Q\\right) }\\sum_{xy}\\mu_{xy}\\Phi_{xy}-\\sigma\\sum_{xy}\\mu_{xy}\\ln\\mu_{xy}\\\\\n", 147 | "& =\\min_{u,v}\\sum_{x\\in\\mathcal{X}}n_xu_{x}+\\sum_{y\\in\\mathcal{Y}}m_y v_{y}+\\sigma\\sum_{xy}\\exp\\left( \\frac{\\Phi_{xy}-u_{x}-v_{y}-\\sigma}{\\sigma}\\right)\n", 148 | "\\end{align*}\n", 149 | "\n", 150 | "The latter problem is an unconstrained convex optimization problem. But the most efficient numerical computation technique is often coordinate descent, i.e. alternate between minimization in $u$ and minimization in $v$." 151 | ] 152 | }, 153 | { 154 | "cell_type": "markdown", 155 | "metadata": {}, 156 | "source": [ 157 | "### Iterated fitting\n", 158 | "\n", 159 | "Maximize wrt to $u$ yields\n", 160 | "\n", 161 | "\\begin{align*}\n", 162 | "e^{-u_{x}/\\sigma}=\\frac{n_x}{\\sum_{y}\\exp\\left( \\frac{\\Phi_{xy}-v_{y}-\\sigma}{\\sigma}\\right) }\n", 163 | "\\end{align*}\n", 164 | "\n", 165 | "and wrt $v$ yields\n", 166 | "\n", 167 | "\\begin{align*}\n", 168 | "e^{-v_{y}/\\sigma}=\\frac{m_y}{\\sum_{x}\\exp\\left( \\frac{\\Phi_{xy}-v_{y}-\\sigma}{\\sigma}\\right) }\n", 169 | "\\end{align*}\n", 170 | "\n", 171 | "It is called the \"iterated projection fitting procedure\" (ipfp), aka \"matrix scaling\", \"RAS algorithm\", \"Sinkhorn-Knopp algorithm\", \"Kruithof's method\", \"Furness procedure\", \"biproportional fitting procedure\", \"Bregman's procedure\". See survey in Idel (2016).\n", 172 | "\n", 173 | "Maybe the most often reinvented algorithm in applied mathematics. Recently rediscovered in a machine learning context." 174 | ] 175 | }, 176 | { 177 | "cell_type": "markdown", 178 | "metadata": {}, 179 | "source": [ 180 | "### Econometrics of matching\n", 181 | "\n", 182 | "The goal is to estimate the matching surplus $\\Phi_{xy}$. For this, take a linear parameterization\n", 183 | "\n", 184 | "\\begin{align*}\n", 185 | "\\Phi_{xy}^{\\beta}=\\sum_{k=1}^{K}\\beta_{k}\\phi_{xy}^{k}.\n", 186 | "\\end{align*}\n", 187 | "\n", 188 | "Following Choo and Siow (2006), Galichon and Salanie (2011) introduce logit heterogeneity in individual preferences and show that the equilibrium now maximizes the *regularized Monge-Kantorovich problem*\n", 189 | "\n", 190 | "\\begin{align*}\n", 191 | "W\\left( \\beta\\right) =\\max_{\\mu\\in\\mathcal{M}\\left( P,Q\\right) }\\sum_{xy}\\mu_{xy}\\Phi_{xy}^{\\beta}-\\sigma\\sum_{xy}\\mu_{xy}\\ln\\mu_{xy}\n", 192 | "\\end{align*}\n", 193 | "\n", 194 | "By duality, $W\\left( \\beta\\right) $ can be expressed\n", 195 | "\n", 196 | "\\begin{align*}\n", 197 | "W\\left( \\beta\\right) =\\min_{u,v}\\sum_{x}n_xu_{x}+\\sum_{y}m_yv_{y}+\\sigma\\sum_{xy}\\exp\\left( \\frac{\\Phi_{xy}^{\\beta}-u_{x}-v_{y}-\\sigma}{\\sigma}\\right)\n", 198 | "\\end{align*}\n", 199 | "\n", 200 | "and w.l.o.g. can set $\\sigma=1$ and drop the additive constant $-\\sigma$ in the $\\exp$." 201 | ] 202 | }, 203 | { 204 | "cell_type": "markdown", 205 | "metadata": {}, 206 | "source": [ 207 | "### Estimation\n", 208 | "\n", 209 | "We observe the actual matching $\\hat{\\mu}_{xy}$. Note that $\\partial W/ \\partial\\beta^{k}=\\sum_{xy}\\mu_{xy}\\phi_{xy}^{k},$ hence $\\beta$ is estimated by running\n", 210 | "\n", 211 | "\n", 212 | "\\begin{align*}\n", 213 | "\\min_{u,v,\\beta}\\sum_{x}n_xu_{x}+\\sum_{y}m_yv_{y}+\\sum_{xy}\\exp\\left(\\Phi_{xy}^{\\beta}-u_{x}-v_{y}\\right) -\\sum_{xy,k}\\hat{\\mu}_{xy}\\beta_{k}\\phi_{xy}^{k}\n", 214 | "\\end{align*}\n", 215 | "\n", 216 | "which is still a convex optimization problem.\n", 217 | "\n", 218 | "As we will show later, this is actually the objective function of the log-likelihood in a Poisson regression with $x$ and $y$ fixed effects, where we assume\n", 219 | "\n", 220 | "\\begin{align*}\n", 221 | "\\mu_{xy}|xy\\sim Poisson\\left( \\exp\\left( \\sum_{k=1}^{K}\\beta_{k}\\phi\n", 222 | "_{xy}^{k}-u_{x}-v_{y}\\right) \\right) .\n", 223 | "\\end{align*}" 224 | ] 225 | }, 226 | { 227 | "cell_type": "markdown", 228 | "metadata": {}, 229 | "source": [ 230 | "---\n", 231 | "To start working with our application, let's load some of the libraries we shall need." 232 | ] 233 | }, 234 | { 235 | "cell_type": "code", 236 | "execution_count": 1, 237 | "metadata": {}, 238 | "outputs": [], 239 | "source": [ 240 | "import numpy as np # used to work with arrays\n", 241 | "import pandas as pd # used to load and work with dataframes\n", 242 | "#import math # used to work with logs and infinities\n", 243 | "import time # used to time the execution of the code\n", 244 | "\n", 245 | "import scipy.sparse as spr # used to work with sparse matrices (when working with the Poisson matrix representation)\n", 246 | "from sklearn import linear_model # used to implement the Poisson regression" 247 | ] 248 | }, 249 | { 250 | "cell_type": "markdown", 251 | "metadata": {}, 252 | "source": [ 253 | "And let's load our data, which comes from the book *An Advanced Guide to Trade Policy Analysis: The Structural Gravity Mode*, by Yotov et al. We will estimate the gravity model using optimal transport as well as using Poisson regression.\n", 254 | "\n", 255 | "While the table of trade data includes several possible regressors, we focus on four types of regressors: the logarithm of the distance between countries and dummy variables for whether any two countries are contiguous, share a common official language, or share colonial ties. These are regressors known in the literature to have explanatory power, and they are the same as the ones used in Yotov et al.\n", 256 | "\n", 257 | "Our data look as follows:" 258 | ] 259 | }, 260 | { 261 | "cell_type": "code", 262 | "execution_count": 2, 263 | "metadata": {}, 264 | "outputs": [ 265 | { 266 | "data": { 267 | "text/html": [ 268 | "
\n", 269 | "\n", 282 | "\n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | "
exporterimporteryeartradeDISTln_DISTCNTGLANGCLNY
0ARGARG198661288.590263533.9082406.280224000
1ARGAUS198627.76487412044.5741349.396370000
2ARGAUT19863.55984311751.1465219.371706000
3ARGBEL198696.10256711305.2857649.333026000
4ARGBGR19863.12923112115.5720469.402246000
\n", 360 | "
" 361 | ], 362 | "text/plain": [ 363 | " exporter importer year trade DIST ln_DIST CNTG LANG \\\n", 364 | "0 ARG ARG 1986 61288.590263 533.908240 6.280224 0 0 \n", 365 | "1 ARG AUS 1986 27.764874 12044.574134 9.396370 0 0 \n", 366 | "2 ARG AUT 1986 3.559843 11751.146521 9.371706 0 0 \n", 367 | "3 ARG BEL 1986 96.102567 11305.285764 9.333026 0 0 \n", 368 | "4 ARG BGR 1986 3.129231 12115.572046 9.402246 0 0 \n", 369 | "\n", 370 | " CLNY \n", 371 | "0 0 \n", 372 | "1 0 \n", 373 | "2 0 \n", 374 | "3 0 \n", 375 | "4 0 " 376 | ] 377 | }, 378 | "execution_count": 2, 379 | "metadata": {}, 380 | "output_type": "execute_result" 381 | } 382 | ], 383 | "source": [ 384 | "thepath = 'https://raw.githubusercontent.com/math-econ-code/mec_optim_2021-01/master/data_mec_optim/gravity_wtodata/'\n", 385 | "\n", 386 | "tradedata = pd.read_csv(thepath + '1_TraditionalGravity_from_WTO_book.csv') # load full table\n", 387 | "tradedata = tradedata[['exporter', 'importer','year', 'trade', 'DIST','ln_DIST', 'CNTG', 'LANG', 'CLNY']] # focus on a subset of regressors\n", 388 | "\n", 389 | "tradedata.sort_values(['year','exporter','importer'], inplace = True)\n", 390 | "tradedata.reset_index(inplace = True, drop = True)\n", 391 | "\n", 392 | "nbt = len(tradedata['year'].unique()) # number of periods\n", 393 | "nbi = len(tradedata['importer'].unique()) # number of countries (we have the same number of importers and exporters)\n", 394 | "nbk = 4 # number of regressors we are interested in using \n", 395 | "\n", 396 | "tradedata.head()" 397 | ] 398 | }, 399 | { 400 | "cell_type": "markdown", 401 | "metadata": {}, 402 | "source": [ 403 | "Let's extract the data from the table and format it using multidimensional tensors. Note that we are storing both absolute and normalized trade flows, as we need the former (properly cleaned of the flows within a country) to normalize the latter. " 404 | ] 405 | }, 406 | { 407 | "cell_type": "code", 408 | "execution_count": 3, 409 | "metadata": {}, 410 | "outputs": [], 411 | "source": [ 412 | "years = tradedata['year'].unique() # array of calendar years reported in the data\n", 413 | "distances = np.array(['ln_DIST', 'CNTG', 'LANG', 'CLNY']) # array of trade \"distances\" we are interested in using as regressors\n", 414 | "\n", 415 | "D_x_y_t_k = np.zeros((nbi,nbi,nbt,nbk)) # Initialize an empty tensor of dimensions nbi x nbi x nbt x nbk to store distances\n", 416 | "tradevol_x_y_t = np.zeros((nbi,nbi,nbt)) # Initialize empty tensor nbi x nbi x nbt to store trade volume\n", 417 | "muhat_x_y_t = np.zeros((nbi,nbi,nbt)) # Initialize empty tensor nbi x nbi x nbt to store normalized trade flows\n", 418 | "\n", 419 | "# fill tensors with distance, contiguity, language, and colony variables, as well as the trade flow\n", 420 | "for t, year in enumerate(years):\n", 421 | " tradevol_x_y_t[:, :, t] = np.array(tradedata.loc[tradedata['year'] == year, 'trade']).reshape((nbi, nbi)) # store trade flows\n", 422 | " np.fill_diagonal(tradevol_x_y_t[:, :, t], 0) # set to zero the trade within a country; we will repeat this operation within the estimation functions\n", 423 | " for k, distance in enumerate(distances):\n", 424 | " D_x_y_t_k[:, :, t, k] = np.array(tradedata.loc[tradedata['year'] == year, distance]).reshape((nbi, nbi)) # store distances\n", 425 | "\n", 426 | "# normalize and store trade flows\n", 427 | "muhat_x_y_t = tradevol_x_y_t / (tradevol_x_y_t.sum() / len(years))" 428 | ] 429 | }, 430 | { 431 | "cell_type": "markdown", 432 | "metadata": {}, 433 | "source": [ 434 | "We define a class `GravityModel` to store the data relevant for estimation. Later on we will populate it with our estimation methods." 435 | ] 436 | }, 437 | { 438 | "cell_type": "code", 439 | "execution_count": 4, 440 | "metadata": {}, 441 | "outputs": [], 442 | "source": [ 443 | "class GravityModel():\n", 444 | " def __init__(self, muhat_x_y_t, D_x_y_t_k):\n", 445 | " self.nbi, _, self.nbt, self.nbk = D_x_y_t_k.shape # number of countries, periods, and regressors\n", 446 | " self.muhat_x_y_t = muhat_x_y_t # tensor of trade flows over time\n", 447 | " self.D_x_y_t_k = D_x_y_t_k # tensor of bilateral resistances in each time period " 448 | ] 449 | }, 450 | { 451 | "cell_type": "markdown", 452 | "metadata": {}, 453 | "source": [ 454 | "We will solve this model by fixing a $\\beta$ and solving the matching problem using IPFP. Then in an outer loop we will solve for the $\\beta$ which minimizes the distance between model and empirical moments, where the optimization is based on gradient descent:" 455 | ] 456 | }, 457 | { 458 | "cell_type": "code", 459 | "execution_count": 5, 460 | "metadata": {}, 461 | "outputs": [], 462 | "source": [ 463 | "def fit_ipfp(self, sigma = 1, maxiterIpfp = 1000, maxiter = 500, tolIpfp = 1e-12, tolDescent = 1e-6, t_s = 0.03):\n", 464 | "\n", 465 | " iterCount = 0\n", 466 | " contIter = True\n", 467 | " meanD_k =self.D_x_y_t_k.mean(axis=(0,1,2))\n", 468 | " sdD_k = self.D_x_y_t_k.std(axis=(0, 1), ddof = 1).mean(axis = 0)\n", 469 | " D_x_y_t_k = (self.D_x_y_t_k-meanD_k[None,None,None,:]) / sdD_k[None,None,None,:]\n", 470 | " n_x_t = self.muhat_x_y_t.sum(axis=1)\n", 471 | " m_y_t = self.muhat_x_y_t.sum(axis=0)\n", 472 | "\n", 473 | " beta_k = np.zeros(self.nbk)\n", 474 | "\n", 475 | " ptm = time.time()\n", 476 | " while(contIter):\n", 477 | " iterCount += 1\n", 478 | " thegrad = np.zeros(nbk)\n", 479 | " for t in range(self.nbt):\n", 480 | " v_y = np.zeros(self.nbi)\n", 481 | " D_xy_k = D_x_y_t_k[:,:,t,:].reshape((-1,self.nbk))\n", 482 | " K_x_y = np.exp(D_xy_k @ beta_k / sigma).reshape((self.nbi,self.nbi))\n", 483 | " np.fill_diagonal(K_x_y, 0) # no self-flow: having already exponentiated Phi, we set the diagonal to zero\n", 484 | " contIpfp = True\n", 485 | " iterIpfp = 0\n", 486 | " \n", 487 | " while(contIpfp):\n", 488 | " iterIpfp += 1\n", 489 | " u_x = sigma * np.log( ( K_x_y @ np.exp(-v_y / sigma) ) / n_x_t[:,t] ).flatten() \n", 490 | " v_y = sigma * np.log( ( np.exp(-u_x / sigma) @ K_x_y ) / m_y_t[:,t] ).flatten()\n", 491 | " mu_x_y = (K_x_y * np.exp(-(u_x[:,None] +v_y[None,:] ) / sigma))\n", 492 | " if (np.max(np.abs( mu_x_y.sum(axis=1) / n_x_t[:,t] - 1)) < tolIpfp or iterIpfp >= maxiterIpfp):\n", 493 | " contIpfp = False\n", 494 | " #print(iterIpfp)\n", 495 | " thegrad = thegrad + ((mu_x_y - self.muhat_x_y_t[:, :, t]).flatten().dot(D_xy_k)).flatten()\n", 496 | " beta_k = beta_k - t_s * thegrad\n", 497 | " #print(beta_k)\n", 498 | " \n", 499 | " if (iterCount > maxiter or np.sum(np.abs(thegrad)) < tolDescent): # measure distance against value of the problem\n", 500 | " contIter = False\n", 501 | "\n", 502 | " diff = time.time() - ptm\n", 503 | " print('Time elapsed = ', diff, 's.')\n", 504 | "\n", 505 | " return np.asarray(beta_k / sdD_k).round(3)\n", 506 | "\n", 507 | "GravityModel.fit_ipfp = fit_ipfp" 508 | ] 509 | }, 510 | { 511 | "cell_type": "markdown", 512 | "metadata": {}, 513 | "source": [ 514 | "Let's test this solution method by initializing an instance of the `GravityModel` class with the data from Yotov et al. :" 515 | ] 516 | }, 517 | { 518 | "cell_type": "code", 519 | "execution_count": 6, 520 | "metadata": {}, 521 | "outputs": [], 522 | "source": [ 523 | "trade_yotov = GravityModel(muhat_x_y_t, D_x_y_t_k)" 524 | ] 525 | }, 526 | { 527 | "cell_type": "markdown", 528 | "metadata": {}, 529 | "source": [ 530 | "Let's run our estimation method:" 531 | ] 532 | }, 533 | { 534 | "cell_type": "code", 535 | "execution_count": 7, 536 | "metadata": {}, 537 | "outputs": [ 538 | { 539 | "name": "stdout", 540 | "output_type": "stream", 541 | "text": [ 542 | "Time elapsed = 2.7788071632385254 s.\n" 543 | ] 544 | }, 545 | { 546 | "data": { 547 | "text/plain": [ 548 | "array([-0.841, 0.437, 0.247, -0.222])" 549 | ] 550 | }, 551 | "execution_count": 7, 552 | "metadata": {}, 553 | "output_type": "execute_result" 554 | } 555 | ], 556 | "source": [ 557 | "trade_yotov.fit_ipfp()" 558 | ] 559 | }, 560 | { 561 | "cell_type": "markdown", 562 | "metadata": {}, 563 | "source": [ 564 | "We recover the PPML estimates on Table 1 p. 42 of [Yotov et al.'s book](https://www.wto.org/english/res_e/booksp_e/advancedwtounctad2016_e.pdf).\n", 565 | "\n", 566 | "We now proceed to show how this problem can be recast as an instance of Poisson regression with fixed effects.\n", 567 | "\n", 568 | "---" 569 | ] 570 | }, 571 | { 572 | "cell_type": "markdown", 573 | "metadata": {}, 574 | "source": [ 575 | "### Poisson regression with fixed effects\n", 576 | "\n", 577 | "Let $\\theta=\\left( \\beta,u,v\\right) $ and $Z=\\left( \\phi,D^{x},D^{y}\\right) $ where $D_{x^{\\prime}y^{\\prime}}^{x}=1\\left\\{ x=x^{\\prime}\\right\\} $ and $D_{x^{\\prime}y^{\\prime}}^{y}=1\\left\\{ y=y^{\\prime}\\right\\}$ are $x$-and $y$-dummies. Let $\\lambda_{xy}\\left( Z;\\theta\\right) =\\exp\\left(\\theta^{\\intercal}Z_{xy}\\right) $ be the parameter of the Poisson distribution.\n", 578 | "\n", 579 | "The conditional likelihood of $\\hat{\\mu}_{xy}$ given $Z_{xy}$ is\n", 580 | "\n", 581 | "\\begin{align*}\n", 582 | "l_{xy}\\left( \\hat{\\mu}_{xy};\\theta\\right) & =\\hat{\\mu}_{xy}\\log \\lambda_{xy}\\left( Z;\\theta\\right) -\\lambda_{xy}\\left( Z;\\theta\\right) \\\\\n", 583 | "& =\\hat{\\mu}_{xy}\\left( \\theta^{\\intercal}Z_{xy}\\right) -\\exp\\left(\\theta^{\\intercal}Z_{xy}\\right) \\\\\n", 584 | "& =\\hat{\\mu}_{xy}\\left( \\sum_{k=1}^{K}\\beta_{k}\\phi_{xy}^{k}-u_{x}-v_{y}\\right) -\\exp\\left( \\sum_{k=1}^{K}\\beta_{k}\\phi_{xy}^{k}-u_{x}-v_{y}\\right)\n", 585 | "\\end{align*}\n", 586 | "\n", 587 | "Summing over $x$ and $y$, the sample log-likelihood is\n", 588 | "\n", 589 | "\\begin{align*}\n", 590 | "\\sum_{xy}\\hat{\\mu}_{xy}\\sum_{k=1}^{K}\\beta_{k}\\phi_{xy}^{k}-\\sum_{x}n_xu_{x}-\\sum_{y}m_yv_{y}-\\sum_{xy}\\exp\\left( \\sum_{k=1}^{K}\\beta_{k}\\phi_{xy}^{k}-u_{x}-v_{y}\\right)\n", 591 | "\\end{align*}\n", 592 | "\n", 593 | "hence we recover the [objective function](#objFun)." 594 | ] 595 | }, 596 | { 597 | "cell_type": "markdown", 598 | "metadata": {}, 599 | "source": [ 600 | "### From Poisson to pseudo-Poisson\n", 601 | "\n", 602 | "If $\\mu_{xy}|xy$ is Poisson, then $\\mathbb{E}\\left[\\mu_{xy}\\right]=\\lambda_{xy}\\left( Z_{xy};\\theta\\right) =\\mathbb{V}ar\\left( \\mu_{xy}\\right) $. While it makes sense to assume the former equality, the latter is a rather strong assumption.\n", 603 | "\n", 604 | "For estimation purposes, $\\hat{\\theta}$ is obtained by\n", 605 | "\n", 606 | "\\begin{align*}\n", 607 | "\\max_{\\theta}\\sum_{xy}l\\left( \\hat{\\mu}_{xy};\\theta\\right) =\\sum_{xy}\\left(\\hat{\\mu}_{xy}\\left( \\theta^{\\intercal}Z_{xy}\\right) -\\exp\\left(\\theta^{\\intercal}Z_{xy}\\right) \\right)\n", 608 | "\\end{align*}\n", 609 | "\n", 610 | "however, for inference purposes, one shall not assume the Poisson distribution. Instead\n", 611 | "\n", 612 | "\\begin{align*}\n", 613 | "\\sqrt{N}\\left( \\hat{\\theta}-\\theta\\right) \\Longrightarrow\\left(A_{0}\\right) ^{-1}B_{0}\\left( A_{0}\\right) ^{-1}\n", 614 | "\\end{align*}\n", 615 | "\n", 616 | "where $N=\\left\\vert \\mathcal{X}\\right\\vert \\times\\left\\vert \\mathcal{Y}\\right\\vert $ and $A_{0}$ and $B_{0}$ are estimated by\n", 617 | "\n", 618 | "\\begin{align*}\n", 619 | "\\hat{A}_{0} & =N^{-1}\\sum_{xy}D_{\\theta\\theta}^{2}l\\left( \\hat{\\mu}_{xy};\\hat{\\theta}\\right) =N^{-1}\\sum_{xy}\\exp\\left( \\hat{\\theta}^{\\intercal}Z_{xy}\\right) Z_{xy}Z_{xy}^{\\intercal}\\\\\n", 620 | "\\hat{B}_{0} & =N^{-1}\\sum_{xy}\\left( \\hat{\\mu}_{xy}-\\exp\\left( \\hat{\\theta}^{\\intercal}Z_{xy}\\right) \\right) ^{2}Z_{xy}Z_{xy}^{\\intercal}.\n", 621 | "\\end{align*}" 622 | ] 623 | }, 624 | { 625 | "cell_type": "markdown", 626 | "metadata": {}, 627 | "source": [ 628 | "## Implementation: Poisson regression with fixed effects" 629 | ] 630 | }, 631 | { 632 | "cell_type": "markdown", 633 | "metadata": {}, 634 | "source": [ 635 | "We now introduce an additional method to recover the coefficients of interest through Poisson regression. Notice that this approach naturally recovers the fixed effects too, although in this application we are not directly interested in them. \n", 636 | "\n", 637 | "Consistent with common practice and as previously described, we do not want to consider the trade flow between a country and itself. To incorporate the restriction in the Poisson regression, we use weights and assign zero weight to the trade flow between a country and itself. We can then recover the Poisson objective function expressed in matrix formulation by appropriately stacking the matrix of bilateral resistancies and the matrices of fixed effects." 638 | ] 639 | }, 640 | { 641 | "cell_type": "code", 642 | "execution_count": 8, 643 | "metadata": {}, 644 | "outputs": [], 645 | "source": [ 646 | "def fit_glm(self, verbosity=0, max_iter = 8000, tol=1e-12, pretest = False):\n", 647 | " \"\"\"\n", 648 | " fit_glm(args) estimates the gravity equation via weighted Poisson regression.\n", 649 | " \"\"\"\n", 650 | "\n", 651 | " kr = spr.kron # shorthand to implement the Kronecker product\n", 652 | "\n", 653 | " M1 = kr(spr.identity(self.nbi), kr(np.ones((self.nbi, 1)), spr.identity(self.nbt)))\n", 654 | " M2 = kr(np.ones((self.nbi, 1)), kr(spr.identity(self.nbi), spr.identity(self.nbt)))\n", 655 | " C_a_k = spr.hstack([self.D_x_y_t_k.reshape((-1, self.nbk)), -M1, -M2])\n", 656 | " muhat_a = self.muhat_x_y_t.flatten()\n", 657 | "\n", 658 | " weighting_matrix_xyt = kr(np.eye(self.nbi**2), np.ones((self.nbt, 1)))@(np.ones((self.nbi, self.nbi)) - np.eye(self.nbi)).flatten()\n", 659 | "\n", 660 | " clf = linear_model.PoissonRegressor(fit_intercept=False, tol=tol , max_iter=max_iter, verbose=verbosity, alpha=0)\n", 661 | " clf.fit(C_a_k, muhat_a, sample_weight=weighting_matrix_xyt)\n", 662 | "\n", 663 | " return clf.coef_[:self.nbk].round(3)\n", 664 | "\n", 665 | "GravityModel.fit_glm = fit_glm" 666 | ] 667 | }, 668 | { 669 | "cell_type": "markdown", 670 | "metadata": {}, 671 | "source": [ 672 | "Solving the model using the newly added method:" 673 | ] 674 | }, 675 | { 676 | "cell_type": "code", 677 | "execution_count": 9, 678 | "metadata": {}, 679 | "outputs": [ 680 | { 681 | "data": { 682 | "text/plain": [ 683 | "array([-0.841, 0.438, 0.248, -0.223])" 684 | ] 685 | }, 686 | "execution_count": 9, 687 | "metadata": {}, 688 | "output_type": "execute_result" 689 | } 690 | ], 691 | "source": [ 692 | "estimates_glm = trade_yotov.fit_glm()\n", 693 | "estimates_glm" 694 | ] 695 | }, 696 | { 697 | "cell_type": "markdown", 698 | "metadata": {}, 699 | "source": [ 700 | "Again, we recover the same estimates as in the book by Yotov et al. " 701 | ] 702 | } 703 | ], 704 | "metadata": { 705 | "kernelspec": { 706 | "display_name": "Python 3 (ipykernel)", 707 | "language": "python", 708 | "name": "python3" 709 | }, 710 | "language_info": { 711 | "codemirror_mode": { 712 | "name": "ipython", 713 | "version": 3 714 | }, 715 | "file_extension": ".py", 716 | "mimetype": "text/x-python", 717 | "name": "python", 718 | "nbconvert_exporter": "python", 719 | "pygments_lexer": "ipython3", 720 | "version": "3.9.12" 721 | }, 722 | "vscode": { 723 | "interpreter": { 724 | "hash": "beda123ca6d46414026d3c59f732de1f5fb19d6ba2f32753cc4223591eed0a9d" 725 | } 726 | } 727 | }, 728 | "nbformat": 4, 729 | "nbformat_minor": 2 730 | } 731 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # mec_optim, Jan'22 2 | ‘math+econ+code’ masterclass on optimal transport and economic applications, NYU, January 2022. 3 | 4 | © 2018-2022 by Alfred Galichon. Past and present support from NSF grant DMS-1716489, ERC grant CoG-866274, and contributions by Jules Baudet, Pauline Corblet, Gregory Dannay, Giovanni Montanari, and James Nesbit are acknowledged. 5 | 6 | # Getting set up 7 | 8 | In this course, we will primarily using pythnon as our primary programming language. We will also have code available for R, but these will be unsupported, so use at your own risk. The code will be presented using Jupyter notebooks, or you can run in RStudio. In addition we will using Gurobi, a commercial linear programming solver. 9 | 10 | The course material will be hosted on github, in [this repository](https://github.com/math-econ-code/mec_optim). 11 | 12 | All the software will be installed in a docker container. The image can be dowloaded [here](https://hub.docker.com/repository/docker/alfredgalichon/mec_optim). 13 | -------------------------------------------------------------------------------- /docker/mec_optim.Dockerfile: -------------------------------------------------------------------------------- 1 | FROM fedora:31 2 | # (c) Alfred Galichon (math+econ+code) with contributions from Keith O'Hara and Jules Baudet 3 | 4 | RUN dnf install -y \ 5 | which \ 6 | file \ 7 | tar \ 8 | gzip \ 9 | unzip \ 10 | make \ 11 | cmake \ 12 | ninja-build \ 13 | git \ 14 | gcc \ 15 | gcc-c++ \ 16 | gfortran \ 17 | gmp-devel \ 18 | libtool \ 19 | libcurl-devel \ 20 | wget \ 21 | libicu-devel \ 22 | openssl-devel \ 23 | zlib-devel \ 24 | libxml2-devel \ 25 | expat-devel \ 26 | python3-devel \ 27 | python3-pip 28 | 29 | ############################ 30 | # set timezone 31 | ENV TZ=America/Los_Angeles 32 | RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone 33 | 34 | ############################ 35 | # install python packages 36 | # see: https://stackoverflow.com/questions/2720014/how-to-upgrade-all-python-packages-with-pip 37 | RUN pip3 list --outdated --format=freeze | grep -v '^\-e' | cut -d = -f 1 | xargs -n1 pip3 install -U 38 | 39 | 40 | RUN mkdir /src 41 | WORKDIR /src/ 42 | COPY ./content . 43 | RUN pip3 install -r requirements.txt 44 | RUN rm requirements.txt 45 | ############################ 46 | 47 | 48 | # RUN dnf install -y \ 49 | # spatialindex-devel \ 50 | # boost-devel \ 51 | # swig \ 52 | # suitesparse-devel \ 53 | # eigen3-devel \ 54 | # CGAL-devel \ 55 | # CImg-devel \ 56 | # openblas-devel 57 | 58 | 59 | # cleanup 60 | # RUN dnf clean all 61 | ############################ 62 | # R installation 63 | 64 | #### 65 | # libraries needed to build R (libXt-devel for X11) 66 | 67 | RUN dnf install -y \ 68 | xz-devel \ 69 | bzip2-devel \ 70 | libjpeg-turbo-devel \ 71 | libpng-devel \ 72 | cairo-devel \ 73 | pcre-devel \ 74 | java-latest-openjdk-devel \ 75 | perl \ 76 | libX11-devel \ 77 | libXt-devel 78 | 79 | #### 80 | # download and install R (but do not link with OpenBLAS) 81 | 82 | ENV R_VERSION=4.1.1 83 | 84 | RUN cd ~ && \ 85 | curl -O --progress-bar https://cran.r-project.org/src/base/R-4/R-${R_VERSION}.tar.gz && \ 86 | tar zxvf R-${R_VERSION}.tar.gz && \ 87 | cd R-${R_VERSION} && \ 88 | ./configure --with-readline=no --with-x --with-cairo && \ 89 | make && \ 90 | make install 91 | 92 | #### 93 | # install IRKernel 94 | 95 | RUN dnf install -y czmq-devel 96 | 97 | RUN echo -e "options(bitmapType = 'cairo', repos = c(CRAN = 'https://cran.rstudio.com/'))" > ~/.Rprofile 98 | RUN R -e "install.packages(c('repr', 'IRdisplay', 'IRkernel'), type = 'source')" 99 | RUN R -e "IRkernel::installspec(user = FALSE)" 100 | 101 | #### 102 | # cleanup 103 | 104 | RUN cd ~ && \ 105 | rm -rf R-${R_VERSION} && \ 106 | rm -f R-${R_VERSION}.tar.gz && \ 107 | dnf clean all 108 | 109 | ############################ 110 | # Gurobi installation 111 | 112 | ENV GUROBI_VERSION=9.5.0 113 | 114 | RUN mkdir -p /home/gurobi/ && \ 115 | cd /home/gurobi/ && \ 116 | wget -P /home/gurobi/ http://packages.gurobi.com/${GUROBI_VERSION::-2}/gurobi${GUROBI_VERSION}_linux64.tar.gz && \ 117 | tar xvfz /home/gurobi/gurobi${GUROBI_VERSION}_linux64.tar.gz && \ 118 | mkdir -p /opt/gurobi && \ 119 | mv /home/gurobi/gurobi${GUROBI_VERSION:0:1}${GUROBI_VERSION:2:1}${GUROBI_VERSION:4:1}/linux64/ /opt/gurobi && \ 120 | rm -rf /home/gurobi 121 | 122 | ENV GUROBI_HOME="/opt/gurobi/linux64" 123 | ENV PATH="${PATH}:${GUROBI_HOME}/bin" 124 | ENV LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:${GUROBI_HOME}/lib" 125 | 126 | 127 | # install Gurobi R package 128 | 129 | RUN R -e "install.packages(c('slam'), type = 'source')" 130 | 131 | RUN cd /opt/gurobi/linux64/R && \ 132 | tar xvfz gurobi_${GUROBI_VERSION:0:3}-${GUROBI_VERSION:4:1}_R_${R_VERSION}.tar.gz && \ 133 | R -e "install.packages('gurobi', repos = NULL)" 134 | 135 | 136 | # install gurobipy package 137 | RUN python -m pip install -i https://pypi.gurobi.com gurobipy 138 | 139 | ############################ 140 | # clone/pull latest github repository 141 | 142 | RUN mkdir -p /src/notebooks && \ 143 | cd /src/notebooks && \ 144 | git clone https://github.com/math-econ-code/mec_optim.git 145 | 146 | 147 | CMD cd /src/notebooks/mec_optim && \ 148 | git pull origin master && \ 149 | cd .. && \ 150 | jupyter notebook --port=8888 --no-browser --ip=0.0.0.0 --allow-root 151 | # ["jupyter", "notebook", "--port=8888", "--no-browser", "--ip=0.0.0.0", "--allow-root"] 152 | 153 | 154 | 155 | 156 | ############################################ 157 | ############################################ 158 | # RUN cd ~ && mkdir ot_libs 159 | # 160 | # PyMongeAmpere 161 | # RUN cd ~/ot_libs && \ 162 | # git clone https://github.com/mrgt/MongeAmpere.git && \ 163 | # git clone https://github.com/mrgt/PyMongeAmpere.git && \ 164 | # cd PyMongeAmpere && git submodule update --init --recursive && \ 165 | # mkdir build && cd build && \ 166 | # cmake -DCGAL_DIR=/usr/lib64/cmake/CGAL .. && \ 167 | # make -j1 168 | ######### 169 | # Siconos 170 | #RUN cd ~/ot_libs && \ 171 | # git clone https://github.com/siconos/siconos.git && \ 172 | # cd siconos && \ 173 | # mkdir build && cd build && \ 174 | # cmake .. && \ 175 | # make -j1 && \ 176 | # make install 177 | # 178 | # Siconos examples 179 | #RUN cd ~/ot_libs && \ 180 | # git clone https://github.com/siconos/siconos-tutorials.git 181 | ######### 182 | # Install NYU's floating license 183 | # RUN cd /opt/gurobi && \ 184 | # echo "TOKENSERVER=10.130.0.234" > gurobi.lic 185 | # # cd .. && cd .. && \ 186 | ######### -------------------------------------------------------------------------------- /docker/setup_mec_optim.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/math-econ-code/mec_optim/5a0746387bf622bd8a73f3418388a3b1e8c37175/docker/setup_mec_optim.pdf --------------------------------------------------------------------------------