├── .github └── workflows │ └── python-publish.yml ├── .gitignore ├── LICENSE ├── README.md ├── data └── msci.csv ├── docs └── images │ ├── logo_pycop.svg │ ├── plot │ ├── 2c_mixture_contour_mpdf.svg │ ├── 2c_mixture_contour_pdf.svg │ ├── 3c_mixture_contour_mpdf.svg │ ├── 3c_mixture_contour_pdf.svg │ ├── clayton_3d_mpdf.svg │ ├── clayton_contour_mpdf.svg │ ├── gumbel_3d_cdf.svg │ ├── gumbel_3d_pdf.svg │ ├── plackett_contour_cdf.svg │ └── plackett_contour_pdf.svg │ └── simu │ ├── 2c_mixture_simu.svg │ ├── 3c_mixture_simu.svg │ ├── clayton_simu_n3.svg │ ├── gaussian_simu.svg │ ├── gaussian_simu_n3.svg │ ├── gumbel_simu.svg │ ├── rgumbel_simu.svg │ └── student_simu.svg ├── examples ├── example_estim.ipynb ├── example_plot.ipynb └── example_simu.ipynb ├── pycop ├── __init__.py ├── __init__.pyc ├── bivariate │ ├── __init__.py │ ├── archimedean.py │ ├── copula.py │ ├── empirical.py │ ├── estimation.py │ ├── gaussian.py │ ├── mixture.py │ └── student.py ├── multivariate │ ├── gaussian.py │ └── student.py ├── simulation.py └── utils.py ├── pyproject.toml └── setup.py /.github/workflows/python-publish.yml: -------------------------------------------------------------------------------- 1 | # This workflow will upload a Python Package using Twine when a release is created 2 | # For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python#publishing-to-package-registries 3 | 4 | # This workflow uses actions that are not certified by GitHub. 5 | # They are provided by a third-party and are governed by 6 | # separate terms of service, privacy policy, and support 7 | # documentation. 8 | 9 | name: Upload Python Package 10 | 11 | on: 12 | release: 13 | types: [published] 14 | 15 | permissions: 16 | contents: read 17 | 18 | jobs: 19 | deploy: 20 | 21 | runs-on: ubuntu-latest 22 | 23 | steps: 24 | - uses: actions/checkout@v3 25 | - name: Set up Python 26 | uses: actions/setup-python@v3 27 | with: 28 | python-version: '3.x' 29 | - name: Install dependencies 30 | run: | 31 | python -m pip install --upgrade pip 32 | pip install build 33 | - name: Build package 34 | run: python -m build 35 | - name: Publish package 36 | uses: pypa/gh-action-pypi-publish@27b31702a0e7fc50959f5ad993c78deac1bdfc29 37 | with: 38 | user: __token__ 39 | password: ${{ secrets.PYPI_API_TOKEN }} 40 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | 2 | pycop/bivariate/__pycache__/ 3 | pycop/multivariate/__pycache__/ 4 | pycop/__pycache__/ 5 | pycop.egg-info 6 | build/lib/pycop 7 | setup.cfg 8 | dist/ 9 | tests/ 10 | build/ 11 | pycop/simulation.pyc 12 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2021 The Python Packaging Authority 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 |

2 | 3 |

4 | 5 | 6 | [![PyPi version](https://badgen.net/pypi/v/pycop/)](https://pypi.org/project/pycop) 7 | [![Downloads](https://pepy.tech/badge/pycop)](https://pepy.tech/project/pycop) 8 | [![License](https://img.shields.io/pypi/l/pycop)](https://img.shields.io/pypi/l/pycop) 9 | [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7030034.svg)](https://doi.org/10.5281/zenodo.7030034) 10 | 11 | # How to cite 12 | 13 | If you use pycop in a scientific publication 14 | ``` 15 | @article{nicolas2022pycop, 16 | title={pycop: a Python package for dependence modeling with copulas}, 17 | author={Nicolas, Maxime LD}, 18 | journal={Zenodo Software Package}, 19 | volume={70}, 20 | pages={7030034}, 21 | year={2022} 22 | } 23 | ``` 24 | 25 | 26 | # Overview 27 | 28 | Pycop is the most complete tool for modeling multivariate dependence with Python. The package provides methods such as estimation, random sample generation, and graphical representation for commonly used copula functions. The package supports the use of mixture models defined as convex combinations of copulas. Other methods based on the empirical copula such as the non-parametric Tail Dependence Coefficient are given. 29 | 30 | Some of the features covered: 31 | * Elliptical copulas (Gaussian & Student) and common Archimedean Copulas functions 32 | * Mixture model of multiple copula functions (up to 3 copula functions) 33 | * Multivariate random sample generation 34 | * Empirical copula method 35 | * Parametric and Non-parametric Tail Dependence Coefficient (TDC) 36 | 37 | 38 | ### Available copula function 39 |

40 | 41 | | Copula | Bivariate
Graph & Estimation | Multivariate
Simulation | 42 | |--- | :-: | :-: | 43 | | Mixture | ✓ | ✓ | 44 | | Gaussian | ✓ | ✓ | 45 | | Student | ✓ | ✓ | 46 | | Clayton | ✓ | ✓ | 47 | | Rotated Clayton | ✓ | ✓ | 48 | | Gumbel | ✓ | ✓ | 49 | | Rotated Gumbel | ✓ | ✓ | 50 | | Frank | ✓ | ✓ | 51 | | Joe | ✓ | ✓ | 52 | | Rotated Joe | ✓ | ✓ | 53 | | Galambos | ✓ | ✗ | 54 | | Rotated Galambos | ✓ | ✗ | 55 | | BB1 | ✓ | ✗ | 56 | | BB2 | ✓ | ✗ | 57 | | FGM | ✓ | ✗ | 58 | | Plackett | ✓ | ✗ | 59 | | AMH | ✗ | ✓ | 60 |

61 | 62 | # Usage 63 | 64 | Install pycop using pip 65 | ``` 66 | pip install pycop 67 | ``` 68 | 69 | # Examples 70 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/maximenc/pycop/blob/master/examples/example_estim.ipynb) 71 | [Estimations on msci returns](https://github.com/maximenc/pycop/blob/master/examples/example_estim.ipynb) 72 | 73 | 74 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/maximenc/pycop/blob/master/examples/example_plot.ipynb) 75 | [Graphical Representations](https://github.com/maximenc/pycop/blob/master/examples/example_plot.ipynb) 76 | 77 | 78 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/maximenc/pycop/blob/master/examples/example_simu.ipynb) 79 | [Simulations](https://github.com/maximenc/pycop/blob/master/examples/example_simu.ipynb) 80 | 81 | 82 | 83 | # Table of Contents 84 | 85 | - [Graphical Representation](#Graphical-Representation) 86 | - [3d plot](#3d-plot) 87 | - [Contour plot](#Contour-plot) 88 | - [Mixture plot](#Mixture-plot) 89 | - [Simulation](#Simulation) 90 | - [Gaussian](#Gaussian) 91 | - [Student](#Student) 92 | - [Archimedean](#Archimedean) 93 | - [High dimension](#High-dimension) 94 | - [Mixture simulation](#Mixture-simulation) 95 | - [Estimation](#Estimation) 96 | - [Canonical Maximum Likelihood Estimation](#Canonical-Maximum-Likelihood-Estimation) 97 | - [Tail Dependence Coefficient](#Tail-Dependence-Coefficient) 98 | - [Theoretical TDC](#Theoretical-TDC) 99 | - [Non-parametric TDC](#Non-parametric-TDC) 100 | - [Optimal Empirical TDC](#Optimal-Empirical-TDC) 101 | 102 | 103 | # Graphical Representation 104 | 105 | We first create a copula object by specifying the copula familly 106 | 107 | ```python 108 | from pycop import archimedean 109 | cop = archimedean(family="clayton") 110 | ``` 111 | 112 | Plot the cdf and pdf of the copula. 113 | 114 | 115 | ## 3d plot 116 | 117 | ```python 118 | cop = archimedean(family="gumbel") 119 | 120 | cop.plot_cdf([2], plot_type="3d", Nsplit=100 ) 121 | cop.plot_pdf([2], plot_type="3d", Nsplit=100, cmap="cividis" ) 122 | ``` 123 | 124 | 125 |

126 | 127 | 128 |

129 | 130 | 131 | ## Contour plot 132 | 133 | plot the contour 134 | 135 | ```python 136 | cop = archimedean(family="plackett") 137 | 138 | cop.plot_cdf([2], plot_type="contour", Nsplit=100 ) 139 | cop.plot_pdf([2], plot_type="contour", Nsplit=100, ) 140 | ``` 141 | 142 | 143 |

144 | 145 | 146 |

147 | 148 | 149 | It is also possible to add specific marginals 150 | 151 | ```python 152 | cop = archimedean.archimedean(family="clayton") 153 | 154 | from scipy.stats import norm 155 | 156 | 157 | marginals = [ 158 | { 159 | "distribution": norm, "loc" : 0, "scale" : 0.8, 160 | }, 161 | { 162 | "distribution": norm, "loc" : 0, "scale": 0.6, 163 | }] 164 | 165 | cop.plot_mpdf([2], marginals, plot_type="3d",Nsplit=100, 166 | rstride=1, cstride=1, 167 | antialiased=True, 168 | cmap="cividis", 169 | edgecolor='black', 170 | linewidth=0.1, 171 | zorder=1, 172 | alpha=1) 173 | 174 | lvls = [0.02, 0.05, 0.1, 0.2, 0.3] 175 | 176 | cop.plot_mpdf([2], marginals, plot_type="contour", Nsplit=100, levels=lvls) 177 | ``` 178 | 179 | 180 |

181 | 182 | 183 |

184 | 185 | ## Mixture plot 186 | 187 | mixture of 2 copulas 188 | 189 | ```python 190 | from pycop import mixture 191 | 192 | cop = mixture(["clayton", "gumbel"]) 193 | cop.plot_pdf([0.2, 2, 2], plot_type="contour", Nsplit=40, levels=[0.1,0.4,0.8,1.3,1.6] ) 194 | # plot with defined marginals 195 | cop.plot_mpdf([0.2, 2, 2], marginals, plot_type="contour", Nsplit=50) 196 | ``` 197 |

198 | 199 | 200 |

201 | 202 | 203 | ```python 204 | 205 | cop = mixture(["clayton", "gaussian", "gumbel"]) 206 | cop.plot_pdf([1/3, 1/3, 1/3, 2, 0.5, 4], plot_type="contour", Nsplit=40, levels=[0.1,0.4,0.8,1.3,1.6] ) 207 | cop.plot_mpdf([1/3, 1/3, 1/3, 2, 0.5, 2], marginals, plot_type="contour", Nsplit=50) 208 | ``` 209 |

210 | 211 | 212 |

213 | 214 | 215 | # Simulation 216 | 217 | ## Gaussian 218 | 219 | 220 | ```python 221 | from scipy.stats import norm 222 | from pycop import simulation 223 | 224 | n = 2 # dimension 225 | m = 1000 # sample size 226 | 227 | corrMatrix = np.array([[1, 0.8], [0.8, 1]]) 228 | u1, u2 = simulation.simu_gaussian(n, m, corrMatrix) 229 | ``` 230 | Adding gaussian marginals, (using distribution.ppf from scipy.statsto transform uniform margin to the desired distribution) 231 | 232 | ```python 233 | u1 = norm.ppf(u1) 234 | u2 = norm.ppf(u2) 235 | ``` 236 | 237 | ## Student 238 | ```python 239 | u1, u2 = simulation.simu_tstudent(n, m, corrMatrix, nu=1) 240 | 241 | ``` 242 | 243 | 244 |

245 | 246 | 247 |

248 | 249 | 250 | 251 | ## Archimedean 252 | 253 | List of archimedean cop available 254 | 255 | ```python 256 | u1, u2 = simulation.simu_archimedean("gumbel", n, m, theta=2) 257 | u1, u2 = 1 - u1, 1 - u2 258 | ``` 259 | 260 | Rotated 261 | 262 | ```python 263 | u1, u2 = 1 - u1, 1 - u2 264 | ``` 265 | 266 | 267 |

268 | 269 | 270 |

271 | 272 | 273 | ## High dimension 274 | 275 | 276 | ```python 277 | 278 | n = 3 # Dimension 279 | m = 1000 # Sample size 280 | 281 | corrMatrix = np.array([[1, 0.9, 0], [0.9, 1, 0], [0, 0, 1]]) 282 | u = simulation.simu_gaussian(n, m, corrMatrix) 283 | u = norm.ppf(u) 284 | ``` 285 |

286 | 287 |

288 | 289 | 290 | ```python 291 | u = simulation.simu_archimedean("clayton", n, m, theta=2) 292 | u = norm.ppf(u) 293 | ``` 294 | 295 |

296 | 297 |

298 | 299 | ## Mixture simulation 300 | 301 | Simulation from a mixture of 2 copulas 302 | 303 | ```python 304 | n = 3 305 | m = 2000 306 | 307 | combination = [ 308 | {"type": "clayton", "weight": 1/3, "theta": 2}, 309 | {"type": "gumbel", "weight": 1/3, "theta": 3} 310 | ] 311 | 312 | u = simulation.simu_mixture(n, m, combination) 313 | u = norm.ppf(u) 314 | ``` 315 |

316 | 317 |

318 | 319 | Simulation from a mixture of 3 copulas 320 | ```python 321 | corrMatrix = np.array([[1, 0.8, 0], [0.8, 1, 0], [0, 0, 1]]) 322 | 323 | 324 | combination = [ 325 | {"type": "clayton", "weight": 1/3, "theta": 2}, 326 | {"type": "student", "weight": 1/3, "corrMatrix": corrMatrix, "nu":2}, 327 | {"type": "gumbel", "weight": 1/3, "theta":3} 328 | ] 329 | 330 | u = simulation.simu_mixture(n, m, combination) 331 | u = norm.ppf(u) 332 | ``` 333 | 334 |

335 | 336 |

337 | 338 | 339 | # Estimation 340 | 341 | Estimation available : 342 | CMLE 343 | 344 | 345 | ## Canonical Maximum Likelihood Estimation (CMLE) 346 | 347 | Import a sample with pandas 348 | 349 | ```python 350 | import pandas as pd 351 | import numpy as np 352 | 353 | df = pd.read_csv("data/msci.csv") 354 | df.index = pd.to_datetime(df["Date"], format="%m/%d/%Y") 355 | df = df.drop(["Date"], axis=1) 356 | 357 | for col in df.columns.values: 358 | df[col] = np.log(df[col]) - np.log(df[col].shift(1)) 359 | 360 | df = df.dropna() 361 | ``` 362 | 363 | 364 | ```python 365 | from pycop import estimation, archimedean 366 | 367 | cop = archimedean("clayton") 368 | data = df[["US","UK"]].T.values 369 | param, cmle = estimation.fit_cmle(cop, data) 370 | 371 | ``` 372 | clayton estim: 0.8025977727691012 373 | 374 | 375 | 376 | # Tail Dependence coefficient 377 | 378 | ## Theoretical TDC 379 | 380 | ```python 381 | from pycop import archimedean 382 | 383 | cop = archimedean("clayton") 384 | 385 | cop.LTDC(theta=0.5) 386 | cop.UTDC(theta=0.5) 387 | ``` 388 | 389 | For a mixture copula, the copula with lower tail dependence comes first, and the one with upper tail dependence is last. 390 | 391 | ```python 392 | from pycop import mixture 393 | 394 | cop = mixture(["clayton", "gaussian", "gumbel"]) 395 | 396 | LTDC = cop.LTDC(weight = 0.2, theta = 0.5) 397 | UTDC = cop.UTDC(weight = 0.2, theta = 1.5) 398 | ``` 399 | 400 | ## Non-parametric TDC 401 | Create an empirical copula object 402 | ```python 403 | from pycop import empirical 404 | 405 | cop = empirical(df[["US","UK"]].T.values) 406 | ``` 407 | Compute the non-parametric Upper TDC (UTDC) or the Lower TDC (LTDC) for a given threshold: 408 | 409 | ```python 410 | cop.LTDC(0.01) # i/n = 1% 411 | cop.UTDC(0.99) # i/n = 99% 412 | ``` 413 | 414 | ## Optimal Empirical TDC 415 | Returns the optimal non-parametric TDC based on the heuristic plateau-finding algorithm from Frahm et al (2005) "Estimating the tail-dependence coefficient: properties and pitfalls" 416 | 417 | ```python 418 | cop.optimal_tdc("upper") 419 | cop.optimal_tdc("lower") 420 | ``` 421 | -------------------------------------------------------------------------------- /docs/images/logo_pycop.svg: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /docs/images/plot/clayton_contour_mpdf.svg: -------------------------------------------------------------------------------- 1 | 2 | 4 | 5 | 6 | 7 | 8 | 9 | 2022-04-25T14:06:52.492538 10 | image/svg+xml 11 | 12 | 13 | Matplotlib v3.5.1, https://matplotlib.org/ 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 30 | 31 | 32 | 33 | 39 | 40 | 41 | 42 | 43 | 44 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 63 | 95 | 96 | 97 | 98 | 99 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 | 110 | 111 | 135 | 136 | 137 | 138 | 139 | 140 | 141 | 142 | 143 | 144 | 145 | 146 | 147 | 148 | 149 | 150 | 151 | 165 | 166 | 167 | 168 | 169 | 170 | 171 | 172 | 173 | 174 | 175 | 176 | 177 | 178 | 179 | 180 | 181 | 202 | 203 | 204 | 205 | 206 | 207 | 208 | 209 | 210 | 211 | 212 | 213 | 214 | 215 | 216 | 217 | 218 | 219 | 220 | 221 | 222 | 223 | 224 | 225 | 226 | 227 | 228 | 229 | 230 | 231 | 232 | 233 | 234 | 235 | 236 | 237 | 238 | 239 | 240 | 241 | 242 | 243 | 244 | 245 | 246 | 247 | 248 | 249 | 250 | 251 | 254 | 255 | 256 | 257 | 258 | 259 | 260 | 261 | 262 | 263 | 264 | 265 | 266 | 267 | 268 | 269 | 270 | 271 | 272 | 273 | 274 | 275 | 276 | 277 | 278 | 279 | 280 | 281 | 282 | 283 | 284 | 285 | 286 | 287 | 288 | 289 | 290 | 291 | 292 | 293 | 294 | 295 | 296 | 297 | 298 | 299 | 300 | 301 | 302 | 303 | 304 | 305 | 306 | 307 | 308 | 309 | 310 | 311 | 312 | 313 | 314 | 315 | 316 | 317 | 318 | 319 | 320 | 321 | 322 | 323 | 324 | 325 | 326 | 327 | 328 | 329 | 330 | 331 | 332 | 333 | 334 | 335 | 336 | 337 | 338 | 339 | 340 | 341 | 342 | 343 | 344 | 345 | 346 | 347 | 348 | 349 | 465 | 466 | 467 | 564 | 565 | 566 | 645 | 646 | 647 | 702 | 703 | 704 | 805 | 806 | 807 | 810 | 811 | 812 | 815 | 816 | 817 | 820 | 821 | 822 | 825 | 826 | 827 | 828 | 829 | 830 | 851 | 858 | 891 | 908 | 929 | 950 | 969 | 970 | 996 | 1018 | 1039 | 1058 | 1071 | 1072 | 1073 | 1074 | 1075 | 1076 | 1077 | 1078 | 1079 | 1080 | 1081 | 1082 | 1083 | 1084 | 1085 | 1086 | 1087 | 1088 | 1089 | 1090 | 1091 | 1092 | 1093 | 1094 | 1095 | 1096 | 1097 | 1104 | 1105 | 1106 | 1107 | 1108 | 1109 | 1110 | 1111 | 1112 | 1113 | 1114 | 1115 | 1116 | 1117 | 1142 | 1143 | 1144 | 1145 | 1146 | 1147 | 1148 | 1149 | 1150 | 1151 | 1152 | 1153 | 1154 | 1155 | 1156 | 1157 | 1158 | 1159 | 1160 | 1161 | 1162 | 1163 | 1164 | 1165 | 1166 | 1167 | 1168 | 1169 | 1170 | 1171 | 1172 | 1173 | 1174 | 1175 | 1176 | 1177 | 1178 | 1179 | 1180 | 1181 | 1182 | 1183 | 1184 | 1185 | 1186 | 1187 | 1188 | 1189 | 1190 | 1191 | -------------------------------------------------------------------------------- /docs/images/plot/plackett_contour_cdf.svg: -------------------------------------------------------------------------------- 1 | 2 | 4 | 5 | 6 | 7 | 8 | 9 | 2022-04-25T14:05:43.626792 10 | image/svg+xml 11 | 12 | 13 | Matplotlib v3.5.1, https://matplotlib.org/ 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 30 | 31 | 32 | 33 | 39 | 40 | 41 | 42 | 43 | 44 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 77 | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99 | 100 | 101 | 125 | 126 | 127 | 128 | 129 | 130 | 131 | 132 | 133 | 134 | 135 | 136 | 137 | 138 | 139 | 140 | 141 | 142 | 161 | 162 | 163 | 164 | 165 | 166 | 167 | 168 | 169 | 170 | 171 | 172 | 173 | 174 | 175 | 176 | 177 | 178 | 208 | 209 | 210 | 211 | 212 | 213 | 214 | 215 | 216 | 217 | 218 | 219 | 220 | 221 | 222 | 223 | 224 | 225 | 264 | 265 | 266 | 267 | 268 | 269 | 270 | 271 | 272 | 273 | 274 | 275 | 276 | 277 | 278 | 279 | 280 | 281 | 295 | 296 | 297 | 298 | 299 | 300 | 301 | 302 | 303 | 304 | 305 | 306 | 307 | 310 | 311 | 312 | 313 | 314 | 315 | 316 | 317 | 318 | 319 | 320 | 321 | 322 | 323 | 324 | 325 | 326 | 327 | 328 | 329 | 330 | 331 | 332 | 333 | 334 | 335 | 336 | 337 | 338 | 339 | 340 | 341 | 342 | 343 | 344 | 345 | 346 | 347 | 348 | 349 | 350 | 351 | 352 | 353 | 354 | 355 | 356 | 357 | 358 | 359 | 360 | 361 | 362 | 363 | 364 | 365 | 366 | 367 | 368 | 369 | 370 | 371 | 372 | 373 | 374 | 375 | 376 | 377 | 378 | 379 | 380 | 381 | 382 | 383 | 384 | 385 | 386 | 387 | 388 | 389 | 390 | 391 | 392 | 393 | 394 | 395 | 396 | 397 | 398 | 399 | 400 | 401 | 402 | 450 | 462 | 463 | 464 | 473 | 597 | 598 | 599 | 608 | 701 | 702 | 703 | 760 | 775 | 776 | 777 | 810 | 818 | 819 | 820 | 826 | 831 | 832 | 833 | 834 | 837 | 838 | 839 | 842 | 843 | 844 | 847 | 848 | 849 | 852 | 853 | 854 | 855 | 856 | 857 | 878 | 885 | 918 | 939 | 953 | 978 | 999 | 1000 | 1021 | 1042 | 1068 | 1090 | 1109 | 1122 | 1123 | 1124 | 1125 | 1126 | 1127 | 1128 | 1129 | 1130 | 1131 | 1132 | 1133 | 1134 | 1135 | 1136 | 1137 | 1138 | 1139 | 1140 | 1141 | 1142 | 1143 | 1144 | 1145 | 1146 | 1147 | 1148 | 1149 | 1174 | 1175 | 1176 | 1177 | 1178 | 1179 | 1180 | 1181 | 1182 | 1183 | 1184 | 1185 | 1186 | 1187 | 1219 | 1220 | 1221 | 1222 | 1223 | 1224 | 1225 | 1226 | 1227 | 1228 | 1229 | 1230 | 1231 | 1232 | 1233 | 1234 | 1235 | 1236 | 1237 | 1238 | 1239 | 1240 | 1241 | 1242 | 1243 | 1244 | 1245 | 1246 | 1247 | 1248 | 1249 | 1250 | 1251 | 1252 | 1253 | 1254 | 1264 | 1265 | 1266 | 1267 | 1268 | 1269 | 1270 | 1271 | 1272 | 1273 | 1274 | 1275 | 1276 | 1277 | 1307 | 1308 | 1309 | 1310 | 1311 | 1312 | 1313 | 1314 | 1315 | 1316 | 1317 | 1318 | 1319 | 1320 | 1321 | 1322 | 1323 | -------------------------------------------------------------------------------- /examples/example_estim.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "id": "TB34GcWrBjef" 7 | }, 8 | "source": [ 9 | "Estimation" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": { 15 | "id": "Xwhmxt7tAEjG" 16 | }, 17 | "source": [ 18 | "Data import" 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": null, 24 | "metadata": { 25 | "colab": { 26 | "base_uri": "https://localhost:8080/", 27 | "height": 388 28 | }, 29 | "id": "tMcqcgSo_7Q6", 30 | "outputId": "9e1ea4dc-6d74-49fa-99d3-c3eb2819e5c0" 31 | }, 32 | "outputs": [ 33 | { 34 | "data": { 35 | "text/html": [ 36 | "\n", 37 | "
\n", 38 | "
\n", 39 | "
\n", 40 | "\n", 53 | "\n", 54 | " \n", 55 | " \n", 56 | " \n", 57 | " \n", 58 | " \n", 59 | " \n", 60 | " \n", 61 | " \n", 62 | " \n", 63 | " \n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | "
USUKSwitzerlandSwedenSpainSingaporeNorwayNetherlandsJapanItalyHongKongGermanyFranceDenmarkCanadaBelgiumAustriaAustralia
Date
2000-01-03-0.0063640.0074180.0119090.0284050.0079950.0336860.0269810.0157210.008980-0.0201800.014310-0.0159290.0075710.0273450.0049730.0014080.0161720.008597
2000-01-04-0.040896-0.029006-0.023266-0.013283-0.014809-0.014969-0.029136-0.033486-0.016367-0.010596-0.013452-0.010644-0.029656-0.010718-0.030336-0.0290990.004299-0.013132
2000-01-050.002601-0.018383-0.009263-0.047861-0.022228-0.056299-0.025306-0.010066-0.044737-0.016052-0.070700-0.009102-0.030744-0.020555-0.016362-0.0327840.004861-0.023347
2000-01-06-0.008583-0.0068080.0129650.0001260.001160-0.0227110.011887-0.007967-0.038755-0.016775-0.044639-0.006322-0.0042720.009259-0.0109370.0024150.001160-0.007932
2000-01-070.0339730.0003710.0162610.0046220.0177360.0289700.0101700.027949-0.0030750.0254740.0189070.0405920.011951-0.0007750.0506760.0187960.0182100.010410
\n", 206 | "
\n", 207 | " \n", 217 | " \n", 218 | " \n", 255 | "\n", 256 | " \n", 280 | "
\n", 281 | "
\n", 282 | " " 283 | ], 284 | "text/plain": [ 285 | " US UK Switzerland Sweden Spain Singapore \\\n", 286 | "Date \n", 287 | "2000-01-03 -0.006364 0.007418 0.011909 0.028405 0.007995 0.033686 \n", 288 | "2000-01-04 -0.040896 -0.029006 -0.023266 -0.013283 -0.014809 -0.014969 \n", 289 | "2000-01-05 0.002601 -0.018383 -0.009263 -0.047861 -0.022228 -0.056299 \n", 290 | "2000-01-06 -0.008583 -0.006808 0.012965 0.000126 0.001160 -0.022711 \n", 291 | "2000-01-07 0.033973 0.000371 0.016261 0.004622 0.017736 0.028970 \n", 292 | "\n", 293 | " Norway Netherlands Japan Italy HongKong Germany \\\n", 294 | "Date \n", 295 | "2000-01-03 0.026981 0.015721 0.008980 -0.020180 0.014310 -0.015929 \n", 296 | "2000-01-04 -0.029136 -0.033486 -0.016367 -0.010596 -0.013452 -0.010644 \n", 297 | "2000-01-05 -0.025306 -0.010066 -0.044737 -0.016052 -0.070700 -0.009102 \n", 298 | "2000-01-06 0.011887 -0.007967 -0.038755 -0.016775 -0.044639 -0.006322 \n", 299 | "2000-01-07 0.010170 0.027949 -0.003075 0.025474 0.018907 0.040592 \n", 300 | "\n", 301 | " France Denmark Canada Belgium Austria Australia \n", 302 | "Date \n", 303 | "2000-01-03 0.007571 0.027345 0.004973 0.001408 0.016172 0.008597 \n", 304 | "2000-01-04 -0.029656 -0.010718 -0.030336 -0.029099 0.004299 -0.013132 \n", 305 | "2000-01-05 -0.030744 -0.020555 -0.016362 -0.032784 0.004861 -0.023347 \n", 306 | "2000-01-06 -0.004272 0.009259 -0.010937 0.002415 0.001160 -0.007932 \n", 307 | "2000-01-07 0.011951 -0.000775 0.050676 0.018796 0.018210 0.010410 " 308 | ] 309 | }, 310 | "execution_count": 15, 311 | "metadata": {}, 312 | "output_type": "execute_result" 313 | } 314 | ], 315 | "source": [ 316 | "import pandas as pd\n", 317 | "import numpy as np\n", 318 | "\n", 319 | "df = pd.read_csv(\"https://raw.githubusercontent.com/maximenc/pycop/master/data/msci.csv\")\n", 320 | "df.index = pd.to_datetime(df[\"Date\"], format=\"%m/%d/%Y\")\n", 321 | "df = df.drop([\"Date\"], axis=1)\n", 322 | "\n", 323 | "for col in df.columns.values:\n", 324 | " df[col] = np.log(df[col]) - np.log(df[col].shift(1))\n", 325 | "\n", 326 | "df = df.dropna()\n", 327 | "df.head()" 328 | ] 329 | }, 330 | { 331 | "cell_type": "markdown", 332 | "metadata": { 333 | "id": "_CVvuX-EAbfp" 334 | }, 335 | "source": [ 336 | "Import the pycop library that's not in Colab" 337 | ] 338 | }, 339 | { 340 | "cell_type": "code", 341 | "execution_count": null, 342 | "metadata": { 343 | "colab": { 344 | "base_uri": "https://localhost:8080/" 345 | }, 346 | "id": "jdJ-mnShApAZ", 347 | "outputId": "3219fbe2-8cb1-4635-c2e6-9b9404544128" 348 | }, 349 | "outputs": [ 350 | { 351 | "name": "stdout", 352 | "output_type": "stream", 353 | "text": [ 354 | "Requirement already satisfied: pycop in /usr/local/lib/python3.7/dist-packages (0.0.6)\n" 355 | ] 356 | } 357 | ], 358 | "source": [ 359 | "!pip install pycop" 360 | ] 361 | }, 362 | { 363 | "cell_type": "markdown", 364 | "metadata": { 365 | "id": "moCHVzNiEo2Y" 366 | }, 367 | "source": [ 368 | "Estimation of a single Copula parameter" 369 | ] 370 | }, 371 | { 372 | "cell_type": "code", 373 | "execution_count": null, 374 | "metadata": { 375 | "colab": { 376 | "base_uri": "https://localhost:8080/" 377 | }, 378 | "id": "Gm4gwwAFBGJw", 379 | "outputId": "3884fe54-e4f1-40c2-db2f-a178b215a0ab" 380 | }, 381 | "outputs": [ 382 | { 383 | "name": "stdout", 384 | "output_type": "stream", 385 | "text": [ 386 | "method = SLSQP - termination = True - message: Optimization terminated successfully.\n", 387 | "Estimated parameter: 0.8025977727691012\n" 388 | ] 389 | } 390 | ], 391 | "source": [ 392 | "from pycop import archimedean, estimation\n", 393 | "cop = archimedean(family=\"clayton\")\n", 394 | "\n", 395 | "data = df[[\"US\",\"UK\"]].T.values\n", 396 | "param, cmle = estimation.fit_cmle(cop, data)\n", 397 | "print(\"Estimated parameter: \", param[0])" 398 | ] 399 | }, 400 | { 401 | "cell_type": "markdown", 402 | "metadata": { 403 | "id": "N5nAZWSlBlvD" 404 | }, 405 | "source": [ 406 | "Mixture\n" 407 | ] 408 | }, 409 | { 410 | "cell_type": "markdown", 411 | "metadata": { 412 | "id": "aNLJyyUaChBr" 413 | }, 414 | "source": [ 415 | "Mixture of 2 Copulas\n", 416 | "\n", 417 | "\n" 418 | ] 419 | }, 420 | { 421 | "cell_type": "code", 422 | "execution_count": null, 423 | "metadata": { 424 | "colab": { 425 | "base_uri": "https://localhost:8080/" 426 | }, 427 | "id": "6QQySVjLBuSd", 428 | "outputId": "51b34629-e701-4e4a-d374-4d4415bd0967" 429 | }, 430 | "outputs": [ 431 | { 432 | "name": "stdout", 433 | "output_type": "stream", 434 | "text": [ 435 | "method = SLSQP - termination = True - message: Optimization terminated successfully.\n", 436 | "Estimated parameters: \n", 437 | "weight in Clayton copula: 0.5515374306606079\n", 438 | "weight in Gumbel copula: 0.4484625693393921\n", 439 | "Clayton parameter: 0.42308968740122027\n", 440 | "Gumbel parameter: 2.265138697501126\n" 441 | ] 442 | } 443 | ], 444 | "source": [ 445 | "from pycop import mixture\n", 446 | "\n", 447 | "cop = mixture([\"clayton\", \"gumbel\"])\n", 448 | "\n", 449 | "param, mle = estimation.fit_cmle_mixt(cop,data )\n", 450 | "print(\"Estimated parameters: \")\n", 451 | "print(\"weight in Clayton copula: \", param[0])\n", 452 | "print(\"weight in Gumbel copula: \", 1-param[0])\n", 453 | "print(\"Clayton parameter: \", param[1])\n", 454 | "print(\"Gumbel parameter: \", param[2])" 455 | ] 456 | }, 457 | { 458 | "cell_type": "markdown", 459 | "metadata": { 460 | "id": "5Kg9_BpUDVpP" 461 | }, 462 | "source": [ 463 | "Mixture of 3 Copulas" 464 | ] 465 | }, 466 | { 467 | "cell_type": "code", 468 | "execution_count": null, 469 | "metadata": { 470 | "colab": { 471 | "base_uri": "https://localhost:8080/" 472 | }, 473 | "id": "L-5ACn1TDWSp", 474 | "outputId": "ece145a2-5aa0-44ae-e8a7-3c2acace8008" 475 | }, 476 | "outputs": [ 477 | { 478 | "name": "stdout", 479 | "output_type": "stream", 480 | "text": [ 481 | "method = SLSQP - termination = True - message: Optimization terminated successfully.\n", 482 | "Estimated parameters: \n", 483 | "weight in Clayton copula: 0.35613959154707575\n", 484 | "weight in Frank copula: 0.12637150155636992\n", 485 | "weight in Gumbel copula: 0.5174889068965544\n", 486 | "Clayton parameter: 1.5089246645156034\n", 487 | "Frank parameter: -5.899959588867644\n", 488 | "Gumbel parameter: 1.8415684650315947\n" 489 | ] 490 | } 491 | ], 492 | "source": [ 493 | "cop = mixture([\"clayton\", \"frank\", \"gumbel\"])\n", 494 | "\n", 495 | "param, mle = estimation.fit_cmle_mixt(cop, data)\n", 496 | "print(\"Estimated parameters: \")\n", 497 | "print(\"weight in Clayton copula: \", param[0])\n", 498 | "print(\"weight in Frank copula: \", param[1])\n", 499 | "print(\"weight in Gumbel copula: \", param[2])\n", 500 | "print(\"Clayton parameter: \", param[3])\n", 501 | "print(\"Frank parameter: \", param[4])\n", 502 | "print(\"Gumbel parameter: \", param[5])" 503 | ] 504 | } 505 | ], 506 | "metadata": { 507 | "colab": { 508 | "name": "example_estim.ipynb", 509 | "provenance": [] 510 | }, 511 | "kernelspec": { 512 | "display_name": "Python 3.10.6 ('test-env': venv)", 513 | "language": "python", 514 | "name": "python3" 515 | }, 516 | "language_info": { 517 | "name": "python", 518 | "version": "3.10.6" 519 | }, 520 | "vscode": { 521 | "interpreter": { 522 | "hash": "684c597edd994fb8a573e32bcd1af30dbfcaa9a74f1316b10fafe91077b46267" 523 | } 524 | } 525 | }, 526 | "nbformat": 4, 527 | "nbformat_minor": 0 528 | } 529 | -------------------------------------------------------------------------------- /pycop/__init__.py: -------------------------------------------------------------------------------- 1 | from pycop import simulation 2 | from pycop import utils 3 | from pycop.bivariate.archimedean import archimedean 4 | from pycop.bivariate.empirical import empirical 5 | from pycop.bivariate.gaussian import gaussian 6 | from pycop.bivariate.student import student 7 | from pycop.bivariate.mixture import mixture 8 | from pycop.bivariate import estimation 9 | -------------------------------------------------------------------------------- /pycop/__init__.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/maximenc/pycop/dffc1833693e15d7ab1cff1fb0d78fe3d024f995/pycop/__init__.pyc -------------------------------------------------------------------------------- /pycop/bivariate/__init__.py: -------------------------------------------------------------------------------- 1 | from pycop.bivariate import copula 2 | -------------------------------------------------------------------------------- /pycop/bivariate/archimedean.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from pycop.bivariate.copula import copula 3 | 4 | class archimedean(copula): 5 | """ 6 | # Creates an Archimedean copula objects 7 | Source for the CDF and PDF functions: 8 | Joe, H. (2014). Dependence modeling with copulas. CRC press. 9 | Chapter 4: Parametric copula families and properties (p.159) 10 | 11 | ... 12 | 13 | Attributes 14 | ---------- 15 | family : str 16 | The name of the Archimedean copula function. 17 | type : str 18 | The type of copula = "archimedean". 19 | bounds_param : list 20 | A list that contains the domain of the parameter(s) in a tuple. 21 | Exemple : [(lower, upper)] 22 | parameters_start : array 23 | Value(s) of the initial guess when estimating the copula parameter(s). 24 | It represents the parameter `x0` in the `scipy.optimize.minimize` function. 25 | 26 | Methods 27 | ------- 28 | get_cdf(u, v, param) 29 | Computes the Cumulative Distribution Function (CDF). 30 | get_pdf(u, v, param) 31 | Computes the Probability Density Function (PDF). 32 | LTDC(theta) 33 | Computes the Lower Tail Dependence Coefficient (TDC). 34 | UTDC(theta) 35 | Computes the upper TDC. 36 | """ 37 | 38 | Archimedean_families = [ 39 | 'clayton', 'gumbel', 'frank', 'joe', 'galambos','fgm', 'plackett', 40 | 'rgumbel', 'rclayton', 'rjoe','rgalambos', 'BB1', 'BB2'] 41 | 42 | 43 | def __init__(self, family): 44 | """ 45 | Parameters 46 | ---------- 47 | family : str 48 | The name of the Archimedean copula function. 49 | 50 | Raises 51 | ------ 52 | ValueError 53 | If the given `family` is not supported. 54 | """ 55 | 56 | # the `archimedean` copula class inherit the `copula` class 57 | super().__init__() 58 | self.family = family 59 | self.type = "archimedean" 60 | 61 | if family in ['clayton', 'galambos', 'plackett', 'rclayton', 'rgalambos'] : 62 | self.bounds_param = [(1e-6, None)] 63 | self.parameters_start = np.array(0.5) 64 | 65 | elif family in ['gumbel', 'joe', 'rgumbel', 'rjoe'] : 66 | self.bounds_param = [(1, None)] 67 | self.parameters_start = np.array(1.5) 68 | 69 | elif family == 'frank': 70 | self.bounds_param = [(None, None)] 71 | self.parameters_start = np.array(2) 72 | 73 | elif family == 'fgm': 74 | self.bounds_param = [(-1, 1-1e-6)] 75 | self.parameters_start = np.array(0) 76 | 77 | elif family in ['BB1'] : 78 | self.bounds_param = [(1e-6, None), (1, None)] 79 | self.parameters_start = (np.array(.5), np.array(1.5)) 80 | 81 | elif family in ['BB2'] : 82 | self.bounds_param = [(1e-6, None), (1e-6, None)] 83 | self.parameters_start = (np.array(1), np.array(1)) 84 | else: 85 | print("family \"%s\" not in list: %s" % (family, archimedean.Archimedean_families) ) 86 | raise ValueError 87 | 88 | def get_cdf(self, u, v, param): 89 | """ 90 | # Computes the CDF 91 | 92 | Parameters 93 | ---------- 94 | u, v : float 95 | Values of the marginal CDFs 96 | param : list 97 | A list that contains the copula parameter(s) (float) 98 | """ 99 | 100 | if self.family == 'clayton': 101 | return (u ** (-param[0]) + v ** (-param[0]) - 1) ** (-1 / param[0]) 102 | 103 | elif self.family == 'rclayton': 104 | return (u + v - 1 + archimedean(family='clayton').get_cdf((1 - u),(1 - v), param) ) 105 | 106 | elif self.family == 'gumbel': 107 | return np.exp(-((-np.log(u)) ** param[0] + (-np.log(v)) ** param[0] ) ** (1 / param[0])) 108 | 109 | elif self.family == 'rgumbel': 110 | return (u + v - 1 + archimedean(family='gumbel').get_cdf((1-u),(1-v), param) ) 111 | 112 | elif self.family == 'frank': 113 | a = (np.exp(-param[0] * u) - 1) * (np.exp(-param[0] * v) - 1) 114 | return (-1 / param[0]) * np.log(1 + a / (np.exp(-param[0]) - 1)) 115 | 116 | elif self.family == 'joe': 117 | u_ = (1 - u) ** param[0] 118 | v_ = (1 - v) ** param[0] 119 | return 1 - (u_ + v_ - u_ * v_) ** (1 / param[0]) 120 | 121 | elif self.family == 'rjoe': 122 | return (u + v - 1 + archimedean(family='joe').get_cdf((1 - u),(1 - v), param) ) 123 | 124 | elif self.family == 'galambos': 125 | return u * v * np.exp(((-np.log(u)) ** (-param[0]) + (-np.log(v)) ** (-param[0])) ** (-1 / param[0]) ) 126 | 127 | elif self.family == 'rgalambos': 128 | return (u + v - 1 + archimedean(family='galambos').get_cdf((1 - u),(1 - v), param) ) 129 | 130 | elif self.family == 'fgm': 131 | return u * v * (1 + param[0] * (1 - u) * (1 - v)) 132 | 133 | elif self.family == 'plackett': 134 | eta = param[0] - 1 135 | term1 = 0.5 * eta ** -1 136 | term2 = 1 + eta * (u + v) 137 | term3 = (1 + eta * (u + v)) ** 2 138 | term4 = 4 * param[0] * eta * u * v 139 | return term1 * (term2 - (term3 - term4) ** 0.5) 140 | 141 | elif self.family == 'BB1': 142 | term1 = (u ** (-param[1]) - 1) ** param[0] 143 | term2 = (v ** (-param[1]) - 1) ** param[0] 144 | term3 = (1 + term1 + term2) ** (1 / param[0]) 145 | return (term3) ** (-1 / param[1]) 146 | 147 | elif self.family == 'BB2': 148 | u_ = np.exp(param[0] * (u ** (-param[1]) - 1)) 149 | v_ = np.exp(param[0] * (v ** (-param[1]) - 1)) 150 | return (1 + (1 / param[0]) * np.log(u_ + v_ - 1)) ** (-1 / param[1]) 151 | 152 | 153 | def get_pdf(self, u, v, param): 154 | """ 155 | # Computes the PDF 156 | 157 | Parameters 158 | ---------- 159 | u, v : float 160 | Values of the marginal CDFs 161 | param : list 162 | A list that contains the copula parameter(s) (float) 163 | """ 164 | 165 | if self.family == 'clayton': 166 | term1 = (param[0] + 1) * (u * v) ** (-param[0] - 1) 167 | term2 = (u ** (-param[0]) + v ** (-param[0]) - 1) ** (-2 - 1 / param[0]) 168 | return term1 * term2 169 | 170 | if self.family == 'rclayton': 171 | return archimedean(family='clayton').get_pdf((1 - u),(1 - v), param) 172 | 173 | elif self.family == 'gumbel': 174 | term1 = np.power(np.multiply(u, v), -1) 175 | tmp = np.power(-np.log(u), param[0]) + np.power(-np.log(v), param[0]) 176 | term2 = np.power(tmp, -2 + 2.0 / param[0]) 177 | term3 = np.power(np.multiply(np.log(u), np.log(v)), param[0] - 1) 178 | term4 = 1 + (param[0] - 1) * np.power(tmp, -1 / param[0]) 179 | return archimedean(family='gumbel').get_cdf(u,v, param) * term1 * term2 * term3 * term4 180 | 181 | if self.family == 'rgumbel': 182 | return archimedean(family='gumbel').get_pdf((1 - u), (1 - v), param) 183 | 184 | elif self.family == 'frank': 185 | term1 = param[0] * (1 - np.exp(-param[0])) * np.exp(-param[0] * (u + v)) 186 | term2 = (1 - np.exp(-param[0]) - (1 - np.exp(-param[0] * u)) \ 187 | * (1 - np.exp(-param[0] * v))) ** 2 188 | return term1 / term2 189 | 190 | elif self.family == 'joe': 191 | u_ = (1 - u) ** param[0] 192 | v_ = (1 - v) ** param[0] 193 | term1 = (u_ + v_ - u_ * v_) ** (-2 + 1 / param[0]) 194 | term2 = ((1 - u) ** (param[0] - 1)) * ((1 - v) ** (param[0] - 1)) 195 | term3 = param[0] - 1 + u_ + v_ + u_ * v_ 196 | return term1 * term2 * term3 197 | 198 | if self.family == 'rjoe': 199 | return archimedean(family='joe').get_pdf((1 - u),(1 - v), param) 200 | 201 | elif self.family == 'galambos': 202 | x = -np.log(u) 203 | y = -np.log(v) 204 | term1 = self.get_cdf(u, v, param) / (v * u) 205 | term2 = 1 - ((x ** (-param[0]) + y ** (-param[0])) ** (-1 - 1 / param[0])) \ 206 | * (x ** (-param[0] - 1) + y ** (-param[0] - 1)) 207 | term3 = ((x ** (-param[0]) + y ** (-param[0])) ** (-2 - 1 / param[0])) \ 208 | * ((x * y) ** (-param[0] - 1)) 209 | term4 = 1 + param[0] + ((x ** (-param[0]) + y ** (-param[0])) ** (-1 / param[0])) 210 | return term1 * term2 + term3 * term4 211 | 212 | if self.family == 'rgalambos': 213 | return archimedean(family='galambos').get_pdf((1 - u),(1 - v), param) 214 | 215 | elif self.family == 'fgm': 216 | return 1 + param[0] * (1 - 2 * u) * (1 - 2 * v) 217 | 218 | elif self.family == 'plackett': 219 | eta = (param[0] - 1) 220 | term1 = param[0] * (1 + eta * (u + v - 2 * u * v)) 221 | term2 = (1 + eta * (u + v)) ** 2 222 | term3 = 4 * param[0] * eta * u * v 223 | return term1 / (term2 - term3) ** (3 / 2) 224 | 225 | elif self.family == 'BB1': 226 | theta, delta = param[0], param[1] 227 | x = (u ** (-theta) - 1) ** (delta) 228 | y = (v ** (-theta) - 1) ** (delta) 229 | term1 = (1 + (x + y) ** (1 / delta)) ** (-1 / theta - 2) 230 | term2 = (x + y) ** (1 / delta - 2) 231 | term3 = theta * (delta - 1) + (theta * delta + 1) * (x + y) ** (1 / delta) 232 | term4 = (x * y) ** (1 - 1 / delta) * (u * v) ** (-theta - 1) 233 | return term1 * term2 * term3 * term4 234 | 235 | elif self.family == 'BB2': 236 | theta, delta = param[0], param[1] 237 | x = np.exp(delta * (u ** (-theta) )) - 1 238 | y = np.exp(delta * (v ** (-theta) )) - 1 239 | term1 = (1 + (delta ** (-1)) * np.log(x + y - 1)) ** (-2 -1 / theta) 240 | term2 = (x + y - 1) ** (-2) 241 | term3 = 1 + theta + theta * delta + theta * np.log(x + y - 1) 242 | term4 = x * y * (u * v) ** (-theta - 1) 243 | return term1 * term2 * term3 * term4 244 | 245 | def LTDC(self, theta): 246 | """ 247 | # Computes the lower TDC for a given theta 248 | 249 | Parameters 250 | ---------- 251 | theta : float 252 | The copula parameter 253 | """ 254 | 255 | if self.family in ['gumbel', 'joe', 'frank', 'galambos', 'fgm', 'plackett', 'rclayton']: 256 | return 0 257 | 258 | elif self.family in ['rgalambos', 'clayton'] : 259 | return 2 ** (-1 / theta) 260 | 261 | elif self.family in ['rgumbel', 'rjoe'] : 262 | return 2 - 2 ** (1 / theta) 263 | 264 | def UTDC(self, theta): 265 | """ 266 | # Computes the upper TDC for a given theta 267 | 268 | Parameters 269 | ---------- 270 | theta : float 271 | The copula parameter 272 | """ 273 | 274 | if self.family in ['clayton', 'frank', 'fgm', 'plackett', 'rgumbel', 'rjoe', 'rgalambos']: 275 | return 0 276 | 277 | elif self.family in ['galambos', 'rclayton'] : 278 | return 2 ** (-1 / theta) 279 | 280 | elif self.family in ['gumbel', 'joe'] : 281 | return 2 - 2 ** (1 / theta) 282 | -------------------------------------------------------------------------------- /pycop/bivariate/copula.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import matplotlib.pyplot as plt 3 | 4 | 5 | def plot_bivariate_3d(X, Y, Z, bounds, title, **kwargs): 6 | """ 7 | # Plot the 3D surface 8 | 9 | Parameters 10 | ---------- 11 | X, Y, Z : array 12 | Positions of data points. 13 | bounds : list 14 | A list that contains the `xlim` and `ylim` of the graph. 15 | title : str 16 | A string for the title of the plot. 17 | 18 | **kwargs 19 | Additional keyword arguments passed to `ax.plot_surface`. 20 | """ 21 | 22 | fig = plt.figure() 23 | ax = fig.add_subplot(111, projection='3d') 24 | ax.set_xticks(np.linspace(bounds[0],bounds[1],6)) 25 | ax.set_yticks(np.linspace(bounds[0],bounds[1],6)) 26 | ax.set_xlim(bounds) 27 | ax.set_ylim(bounds) 28 | ax.plot_surface(X,Y,Z, **kwargs) 29 | plt.title(title) 30 | plt.show() 31 | 32 | def plot_bivariate_contour(X, Y, Z, bounds, title, **kwargs): 33 | """ 34 | # Plot the contour surface 35 | 36 | Parameters 37 | ---------- 38 | X, Y, Z : array 39 | Positions of data points. 40 | bounds : list 41 | A list that contains the `xlim` and `ylim` of the graph. 42 | title : str 43 | A string for the title of the plot. 44 | 45 | **kwargs 46 | Additional keyword arguments passed to `plt.contour`. 47 | """ 48 | plt.figure() 49 | CS = plt.contour(X, Y, Z, colors='k', linewidths=1., linestyles=None, **kwargs) 50 | plt.clabel(CS, fontsize=8, inline=1) 51 | plt.xlim(bounds) 52 | plt.ylim(bounds) 53 | plt.title(title) 54 | plt.show() 55 | 56 | class copula(): 57 | """ 58 | # A class used to create a Copula object 59 | 60 | Set attributes and methods common to all copula objects (elliptical and Archimedean). 61 | 62 | ... 63 | 64 | Attributes 65 | ---------- 66 | 67 | Methods 68 | ------- 69 | plot_cdf(param, plot_type, Nsplit=50, **kwargs) 70 | Plot the bivariate Cumulative Distribution Function (CDF) 71 | plot_pdf(param, plot_type, Nsplit=50, **kwargs) 72 | Plot the Probability Density Function (PDF) 73 | plot_mpdf(param, margin, plot_type, Nsplit=50, **kwargs) 74 | Plot the PDF with given marginal distributions 75 | """ 76 | 77 | def __init__(self): 78 | pass 79 | 80 | def plot_cdf(self, param, plot_type, Nsplit=50, **kwargs): 81 | """ 82 | # Plot the bivariate CDF 83 | 84 | Parameters 85 | ---------- 86 | param : list 87 | A list of the copula parameter(s) 88 | plot_type : str 89 | The type of the plot either "3d" or "contour" 90 | Nsplit : int, optional 91 | The number of points plotted (Nsplit*Nsplit) (default is 50) 92 | 93 | **kwargs 94 | Additional keyword arguments passed to either `plot_bivariate_3d` or 95 | `plot_bivariate_contour`. 96 | Examples : 97 | - `colormap` can be passed in to change the default color 98 | of the 3d plot. 99 | - `levels` can be passed to determine the positions of the contour lines. 100 | 101 | """ 102 | title = self.family.capitalize() + " Copula CDF" 103 | 104 | bounds = [0+1e-2, 1-1e-2] 105 | U_grid, V_grid = np.meshgrid( 106 | np.linspace(bounds[0], bounds[1], Nsplit), 107 | np.linspace(bounds[0], bounds[1], Nsplit)) 108 | 109 | Z = np.array( 110 | [self.get_cdf(uu, vv, param) for uu, vv in zip(np.ravel(U_grid), np.ravel(V_grid)) ] ) 111 | 112 | Z = Z.reshape(U_grid.shape) 113 | 114 | if plot_type == "3d": 115 | plot_bivariate_3d(U_grid,V_grid,Z, [0,1], title, **kwargs) 116 | elif plot_type == "contour": 117 | plot_bivariate_contour(U_grid,V_grid,Z, [0,1], title, **kwargs) 118 | else: 119 | print("only \"contour\" or \"3d\" arguments supported for type") 120 | raise ValueError 121 | 122 | def plot_pdf(self, param, plot_type, Nsplit=50, **kwargs): 123 | """ 124 | # Plot the bivariate PDF 125 | 126 | Parameters 127 | ---------- 128 | param : list 129 | A list of the copula parameter(s) 130 | plot_type : str 131 | The type of the plot either "3d" or "contour" 132 | Nsplit : int, optional 133 | The number of points plotted (Nsplit*Nsplit) (default is 50) 134 | 135 | **kwargs 136 | Additional keyword arguments passed to either `plot_bivariate_3d` or 137 | `plot_bivariate_contour`. 138 | Examples : 139 | - `colormap` can be passed in to change the default color 140 | of the 3d plot. 141 | - `levels` can be passed to determine the positions of the contour lines. 142 | """ 143 | 144 | title = self.family.capitalize() + " Copula PDF" 145 | 146 | if plot_type == "3d": 147 | bounds = [0+1e-1/2, 1-1e-1/2] 148 | 149 | elif plot_type == "contour": 150 | bounds = [0+1e-2, 1-1e-2] 151 | 152 | U_grid, V_grid = np.meshgrid( 153 | np.linspace(bounds[0], bounds[1], Nsplit), 154 | np.linspace(bounds[0], bounds[1], Nsplit)) 155 | 156 | Z = np.array( 157 | [self.get_pdf(uu, vv, param) for uu, vv in zip(np.ravel(U_grid), np.ravel(V_grid)) ] ) 158 | 159 | Z = Z.reshape(U_grid.shape) 160 | 161 | if plot_type == "3d": 162 | 163 | plot_bivariate_3d(U_grid,V_grid,Z, [0,1], title, **kwargs) 164 | elif plot_type == "contour": 165 | plot_bivariate_contour(U_grid,V_grid,Z, [0,1], title, **kwargs) 166 | else: 167 | print("only \"contour\" or \"3d\" arguments supported for type") 168 | raise ValueError 169 | 170 | 171 | def plot_mpdf(self, param, margin, plot_type, Nsplit=50, **kwargs): 172 | """ 173 | # Plot the bivariate PDF with given marginal distributions. 174 | 175 | The method supports only scipy distribution with `loc` and `scale` 176 | parameters as marginals. 177 | 178 | Parameters 179 | ---------- 180 | - param : list 181 | A list of the copula parameter(s) 182 | - margin : list 183 | A list of dictionaries that contains the scipy distribution and 184 | the location and scale parameters. 185 | 186 | Examples : 187 | marginals = [ 188 | { 189 | "distribution": norm, "loc" : 0, "scale" : 1, 190 | }, 191 | { 192 | "distribution": norm, "loc" : 0, "scale": 1, 193 | }] 194 | - plot_type : str 195 | The type of the plot either "3d" or "contour" 196 | - Nsplit : int, optional 197 | The number of points plotted (Nsplit*Nsplit) (default is 50) 198 | 199 | **kwargs 200 | Additional keyword arguments passed to either `plot_bivariate_3d` or 201 | `plot_bivariate_contour`. 202 | exemples : 203 | - `colormap` can be passed in to change the default color 204 | of the 3d plot. 205 | - `levels` can be passed to determine the positions of the contour lines. 206 | 207 | """ 208 | 209 | title = self.family.capitalize() + " Copula PDF" 210 | 211 | # We retrieve the univariate marginal distribution from the list 212 | univariate1 = margin[0]["distribution"] 213 | univariate2 = margin[1]["distribution"] 214 | 215 | bounds = [-3, 3] 216 | 217 | U_grid, V_grid = np.meshgrid( 218 | np.linspace(bounds[0], bounds[1], Nsplit), 219 | np.linspace(bounds[0], bounds[1], Nsplit)) 220 | 221 | mpdf = lambda uu, vv : self.get_pdf( 222 | univariate1.cdf(uu, margin[0]["loc"], margin[0]["scale"]), \ 223 | univariate2.cdf(vv, margin[1]["loc"], margin[1]["scale"]), param) \ 224 | * univariate1.pdf(uu, margin[0]["loc"], margin[0]["scale"]) \ 225 | * univariate2.pdf(vv, margin[1]["loc"], margin[1]["scale"]) 226 | 227 | Z = np.array( 228 | [mpdf(uu, vv) for uu, vv in zip(np.ravel(U_grid), np.ravel(V_grid)) ] ) 229 | Z = Z.reshape(U_grid.shape) 230 | 231 | if plot_type == "3d": 232 | plot_bivariate_3d(U_grid,V_grid,Z, bounds, title, **kwargs) 233 | elif plot_type == "contour": 234 | plot_bivariate_contour(U_grid,V_grid,Z, bounds, title, **kwargs) 235 | else: 236 | print("only \"contour\" or \"3d\" arguments supported for type") 237 | raise ValueError -------------------------------------------------------------------------------- /pycop/bivariate/empirical.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import matplotlib.pyplot as plt 3 | 4 | def plot_bivariate(U,V,Z): 5 | fig = plt.figure() 6 | ax = fig.add_subplot(111, projection='3d') 7 | ax.plot_surface(U,V,Z) 8 | plt.show() 9 | 10 | class empirical(): 11 | """ 12 | # A class used to create an Empirical copula object 13 | 14 | ... 15 | 16 | Attributes 17 | ---------- 18 | data : numpy array 19 | A numpy array with two vectors that corresponds to the observations 20 | n : int 21 | The number of observations 22 | 23 | Methods 24 | ------- 25 | cdf(i_n, j_n) 26 | Compute the empirical Cumulative Distribution Function (CDF) 27 | pdf(i_n, j_n) 28 | Compute the empirical Probability Density Function (PDF) 29 | plot_cdf(Nsplit) 30 | Plot the empirical CDF 31 | plot_pdf(Nsplit) 32 | Plot the empirical PDF 33 | LTDC(i_n) 34 | Compute the lower Tail Dependence Coefficient (TDC) for a given threshold i/n 35 | UTDC(i_n) 36 | Compute the upper Tail Dependence Coefficient (TDC) for a given threshold i/n 37 | optimal_tdc(case) 38 | Compute the lower or upper TDC accoring to Frahm et al (2005) algorithm. 39 | """ 40 | 41 | def __init__(self, data): 42 | """ 43 | Parameters 44 | ---------- 45 | data : numpy array 46 | """ 47 | self.data = data 48 | self.n = len(data[0]) 49 | 50 | def get_cdf(self, i_n, j_n): 51 | """ 52 | # Compute the CDF 53 | 54 | Parameters 55 | ---------- 56 | i_n : float 57 | The threshold to compute the univariate distribution for the first vector. 58 | j_n : float 59 | The threshold to compute the univariate distribution for the second vector. 60 | """ 61 | 62 | # Calculate rank indices for both vectors 63 | i = int(round(self.n * i_n)) 64 | j = int(round(self.n * j_n)) 65 | 66 | ith_order_u = sorted(self.data[0])[i-1] 67 | ith_order_v = sorted(self.data[1])[j-1] 68 | 69 | # Find indices where both vectors are less than the corresponding rank indices 70 | mask_x = self.data[0] <= ith_order_u 71 | mask_y = self.data[1] <= ith_order_v 72 | 73 | return np.sum(np.logical_and(mask_x, mask_y)) / self.n 74 | 75 | def LTDC(self, i_n): 76 | """ 77 | # Compute the empirical lower TDC for a given threshold i/n 78 | 79 | Parameters 80 | ---------- 81 | i_n : float 82 | The threshold to compute the lower TDC. 83 | """ 84 | 85 | if int(round(self.n * i_n)) == 0: 86 | return 0 87 | 88 | return self.get_cdf(i_n, i_n) / i_n 89 | 90 | def UTDC(self, i_n): 91 | """ 92 | # Compute the empirical upper TDC for a given threshold i/n 93 | 94 | Parameters 95 | ---------- 96 | i_n : float 97 | The threshold to compute the upper TDC. 98 | """ 99 | 100 | return (1 - 2 * i_n + self.get_cdf(i_n, i_n) ) / (1-i_n) 101 | 102 | 103 | def plot_cdf(self, Nsplit): 104 | """ 105 | # Plot the empirical CDF 106 | 107 | Parameters 108 | ---------- 109 | Nsplit : The number of splits used to compute the grid 110 | """ 111 | U_grid = np.linspace(0, 1, Nsplit)[:-1] 112 | V_grid = np.linspace(0, 1, Nsplit)[:-1] 113 | U_grid, V_grid = np.meshgrid(U_grid, V_grid) 114 | Z = np.array( 115 | [self.get_cdf(uu, vv) for uu, vv in zip(np.ravel(U_grid), np.ravel(V_grid)) ] ) 116 | Z = Z.reshape(U_grid.shape) 117 | plot_bivariate(U_grid,V_grid,Z) 118 | 119 | def plot_pdf(self, Nsplit): 120 | """ 121 | # Plot the empirical PDF 122 | 123 | Parameters 124 | ---------- 125 | Nsplit : The number of splits used to compute the grid 126 | """ 127 | 128 | U_grid = np.linspace(self.data[0].min(), self.data[0].max(), Nsplit) 129 | V_grid = np.linspace(self.data[1].min(), self.data[1].max(), Nsplit) 130 | 131 | # Initialize a matrix to hold the counts 132 | counts = np.zeros((Nsplit, Nsplit)) 133 | 134 | for i in range(Nsplit-1): 135 | for j in range(Nsplit-1): 136 | # Define the edges of the bin 137 | Xa, Xb = U_grid[i], U_grid[i + 1] 138 | Ya, Yb = V_grid[j], V_grid[j + 1] 139 | 140 | # Use boolean indexing to count points within the bin 141 | mask = (Xa <= self.data[0]) & (self.data[0] < Xb) & (Ya <= self.data[1]) & (self.data[1] < Yb) 142 | counts[i, j] = np.sum(mask) 143 | # Adjust the grid centers for plotting 144 | U_grid_centered = U_grid + (U_grid[1] - U_grid[0]) / 2 145 | V_grid_centered = V_grid + (V_grid[1] - V_grid[0]) / 2 146 | 147 | U, V = np.meshgrid(U_grid_centered, V_grid_centered) # Create coordinate points of X and Y 148 | Z = counts / np.sum(counts) 149 | 150 | plot_bivariate(U,V,Z) 151 | 152 | def optimal_tdc(self, case): 153 | """ 154 | # Compute the optimal Empirical Tail Dependence coefficient (TDC) 155 | 156 | The algorithm is based on the heuristic plateau-finding algorithm 157 | from Frahm et al (2005) "Estimating the tail-dependence coefficient: 158 | properties and pitfalls" 159 | 160 | Parameters 161 | ---------- 162 | case: str 163 | takes "upper" or "lower" for upper TDC or lower TDC 164 | """ 165 | 166 | #### 1) The series of TDC is smoothed using a box kernel with bandwidth b ∈ N 167 | # Consists in applying a moving average on 2b + 1 consecutive points 168 | tdc_array = np.zeros((self.n,)) 169 | 170 | # b is chosen such that ~1% of the data fall into the mooving average 171 | b = int(np.ceil(self.n/200)) 172 | 173 | if case == "upper": 174 | # Compute the Upper TDC for every possible threshold i/n 175 | for i in range(1, self.n-1): 176 | tdc_array[i] = self.UTDC(i_n=i/self.n) 177 | # We reverse the order, the plateau finding algorithm starts with lower values 178 | tdc_array = tdc_array[::-1] 179 | 180 | elif case =="lower": 181 | # Compute the Lower TDC for every possible threshold i/n 182 | for i in range(1, self.n-1): 183 | tdc_array[i] = self.LTDC(i_n=i/self.n) 184 | else: 185 | print("Takes \"upper\" or \"lower\" argument only") 186 | return None 187 | 188 | # Smooth the TDC with a mooving average of lenght 2b+1 189 | # The total lenght = n-2b-1 because i ∈ [1, n−2b] 190 | tdc_smooth_array = np.zeros((self.n-2*b-1,)) 191 | 192 | for i, j in zip(range(b+1, self.n-b), range(0, self.n-2*b-1)): 193 | tdc_smooth_array[j] = sum(tdc_array[i-b:i+b+1]) / (2*b+1) 194 | 195 | #### 2) We select a vector of m consecutive estimates that satisfies a plateau condition 196 | 197 | # m = lenght of the plateau = number of consecutive smoothed TDC estimates 198 | m = int(np.floor(np.sqrt(self.n-2*b))) 199 | # The standard deviation of the smoothed TDC series 200 | std_tdc_smooth = tdc_smooth_array.std() 201 | 202 | # We iterate k from 0 to n-2b-m because k ∈ [1, n−2b−m+1] 203 | for k in range(0,self.n-2*b-m): 204 | plateau = 0 205 | for i in range(1,m-1): 206 | plateau = plateau + np.abs(tdc_smooth_array[k+i] - tdc_smooth_array[k]) 207 | # if the plateau satisfies the following condition: 208 | if plateau <= 2*std_tdc_smooth: 209 | #### 3) Then, the TDC estimator is defined as the average of the estimators in the corresponding plateau 210 | avg_tdc_plateau = np.mean(tdc_smooth_array[k:k+m-1]) 211 | print("Optimal threshold: ", k/self.n) 212 | return avg_tdc_plateau 213 | 214 | # In case the condition is not satisfied the TDC estimate is set to zero 215 | return 0 -------------------------------------------------------------------------------- /pycop/bivariate/estimation.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from scipy.optimize import minimize 3 | from scipy.stats import norm, t 4 | 5 | import warnings 6 | # suppress warnings 7 | warnings.filterwarnings('ignore') 8 | 9 | def pseudo_obs(data): 10 | """ 11 | # Transform the dataset to uniform margins. 12 | 13 | The pseudo-observations are the scaled ranks. 14 | 15 | Parameters 16 | ---------- 17 | data : array like 18 | The dataset to transform. 19 | 20 | Returns 21 | ------- 22 | scaled_ranks : array like 23 | 24 | """ 25 | 26 | n = len(data[0]) # Assuming data[0] and data[1] are of the same length 27 | 28 | scaled_rank = lambda values: (np.argsort(np.argsort(values)) + 1) / (n + 1) 29 | 30 | scaled_ranks = np.array([scaled_rank(data[0]), scaled_rank(data[1])]) 31 | 32 | return scaled_ranks 33 | 34 | 35 | def fit_cmle(copula, data, opti_method='SLSQP', options={}): 36 | """ 37 | # Compute the Canonical Maximum likelihood Estimator (CMLE) using the pseudo-observations 38 | 39 | Parameters 40 | ---------- 41 | data : array like 42 | The dataset. 43 | 44 | copula : class 45 | The copula object 46 | opti_method : str, optional 47 | The optimization method to pass to known_parametersscipy.optimize.minimize`. 48 | The default algorithm is set to `SLSQP` 49 | options : dict, optional 50 | The dictionary that contains the options to pass to the scipy.minimize function 51 | options={'maxiter': 100000} 52 | 53 | Returns 54 | ------- 55 | Return the estimated parameter(s) in a list 56 | 57 | """ 58 | 59 | psd_obs = pseudo_obs(data) 60 | 61 | def log_likelihood(parameters): 62 | """ 63 | The number of parameters depends on the type of copule function 64 | """ 65 | if len(copula.bounds_param) == 1: 66 | params = [parameters] 67 | else: 68 | param1, param2 = parameters 69 | params = [param1, param2] 70 | logl = -np.sum(np.log(copula.get_pdf(psd_obs[0], psd_obs[1], params))) 71 | return logl 72 | 73 | if (copula.bounds_param[0] == (None, None)): 74 | results = minimize(log_likelihood, copula.parameters_start, method='Nelder-Mead', options=options) 75 | # print("method: Nelder-Mead - success:", results.success, ":", results.message) 76 | return (results.x, -results.fun) 77 | 78 | else: 79 | results = minimize(log_likelihood, copula.parameters_start, method=opti_method, bounds=copula.bounds_param, options=options) 80 | # print("method:", opti_method, "- success:", results.success, ":", results.message) 81 | if results.success == True: 82 | return (results.x, -results.fun) 83 | else: 84 | print(results) 85 | print("optimization failed") 86 | return None 87 | 88 | 89 | def fit_cmle_mixt(copula, data, opti_method='SLSQP', options={}): 90 | """ 91 | # Compute the CMLE for a mixture copula using the pseudo-observations 92 | 93 | Parameters 94 | ---------- 95 | data : array like 96 | The dataset. 97 | 98 | copula : class 99 | The mixture copula object 100 | opti_method : str, optional 101 | The optimization method to pass to known_parametersscipy.optimize.minimize`. 102 | The default algorithm is set to `SLSQP` 103 | 104 | Returns 105 | ------- 106 | Return the estimated parameter(s) in a list 107 | 108 | """ 109 | psd_obs = pseudo_obs(data) 110 | 111 | def log_likelihood(parameters): 112 | if copula.dim == 2: 113 | w1, param1, param2 = parameters 114 | params = [w1, param1, param2] 115 | else: # dim = 3 116 | w1, w2, w3, param1, param2, param3 = parameters 117 | params = [w1, w2, w3, param1, param2, param3] 118 | logl = -np.sum(np.log(copula.get_pdf(psd_obs[0], psd_obs[1], params))) 119 | return logl 120 | 121 | # copula.dim gives the number of weights to consider 122 | cons = [{'type': 'eq', 'fun': lambda parameters: np.sum(parameters[:copula.dim]) - 1}] 123 | 124 | results = minimize(log_likelihood, 125 | copula.parameters_start, 126 | method=opti_method, 127 | bounds=copula.bounds_param, 128 | constraints=cons, 129 | options=options) 130 | 131 | #print("method:", opti_method, "- success:", results.success, ":", results.message) 132 | if results.success == True: 133 | return (results.x, -results.fun) 134 | 135 | print("optimization failed") 136 | return None 137 | 138 | 139 | def fit_mle(data, copula, marginals, opti_method='SLSQP', known_parameters=False): 140 | """ 141 | # Compute the Maximum likelihood Estimator (MLE) 142 | 143 | Parameters 144 | ---------- 145 | data : array like 146 | The dataset. 147 | 148 | copula : class 149 | The copula object 150 | marginals : list 151 | A list of dictionary that contains the marginal distributions and their 152 | `loc` and `scale` parameters when the parameters are known. 153 | Example: 154 | marginals = [ 155 | { 156 | "distribution": norm, "loc" : 0, "scale" : 1, 157 | }, 158 | { 159 | "distribution": norm, "loc" : 0, "scale": 1, 160 | }] 161 | opti_method : str, optional 162 | The optimization method to pass to known_parametersscipy.optimize.minimize`. 163 | The default algorithm is set to `SLSQP` 164 | known_parameters : bool 165 | If the variable is set to `True` then we estimate the `loc` and `scale` 166 | parameters of the marginal distributions. 167 | 168 | Returns 169 | ------- 170 | Return the estimated parameter(s) in a list 171 | 172 | """ 173 | 174 | if copula.type == "mixture": 175 | print("estimation of mixture only available with CMLE try fit mle") 176 | raise ValueError 177 | 178 | if known_parameters == True: 179 | 180 | marg_cdf1 = lambda i : marginals[0]["distribution"].cdf(data[0][i], marginals[0]["loc"], marginals[0]["scale"]) 181 | marg_pdf1 = lambda i : marginals[0]["distribution"].pdf(data[0][i], marginals[0]["loc"], marginals[0]["scale"]) 182 | 183 | marg_cdf2 = lambda i : marginals[1]["distribution"].cdf(data[1][i], marginals[1]["loc"], marginals[1]["scale"]) 184 | marg_pdf2 = lambda i : marginals[1]["distribution"].pdf(data[1][i], marginals[1]["loc"], marginals[1]["scale"]) 185 | 186 | logi = lambda i, theta: np.log(copula.get_pdf(marg_cdf1(i),marg_cdf2(i),[theta]))+np.log(marg_pdf1(i)) +np.log(marg_pdf2(i)) 187 | log_likelihood = lambda theta: -sum([logi(i, theta) for i in range(0,len(data[0]))]) 188 | 189 | results = minimize(log_likelihood, copula.parameters_start, method=opti_method, )# options={'maxiter': 300})#.x[0] 190 | 191 | else: 192 | marg_cdf1 = lambda i, loc, scale : marginals[0]["distribution"].cdf(data[0][i], loc, scale) 193 | marg_pdf1 = lambda i, loc, scale : marginals[0]["distribution"].pdf(data[0][i], loc, scale) 194 | 195 | marg_cdf2 = lambda i, loc, scale : marginals[1]["distribution"].cdf(data[1][i], loc, scale) 196 | marg_pdf2 = lambda i, loc, scale : marginals[1]["distribution"].pdf(data[1][i], loc, scale) 197 | 198 | logi = lambda i, theta, loc1, scale1, loc2, scale2: \ 199 | np.log(copula.get_pdf(marg_cdf1(i, loc1, scale1),marg_cdf2(i, loc2, scale2),[theta])) \ 200 | + np.log(marg_pdf1(i, loc1, scale1)) +np.log(marg_pdf2(i, loc2, scale2)) 201 | 202 | def log_likelihood(params): 203 | theta, loc1, scale1, loc2, scale2 = params 204 | return -sum([logi(i, theta, loc1, scale1, loc2, scale2) for i in range(0,len(data[0]))]) 205 | 206 | results = minimize(log_likelihood, (copula.parameters_start, np.array(0), np.array(1), np.array(0), np.array(1)), method=opti_method, )# options={'maxiter': 300})#.x[0] 207 | 208 | print("method:", opti_method, "- success:", results.success, ":", results.message) 209 | if results.success == True: 210 | return results.x 211 | 212 | print("Optimization failed") 213 | return None 214 | 215 | def IAD_dist(copula, data, param): 216 | """ 217 | Compute the Integrated Anderson-Darling (IAD) distance between 218 | the parametric copula and the empirical copula with vectorization. 219 | 220 | Info: 221 | This function first computes the empirical copula. It then computes the 222 | theoretical (parametric) copula values using the provided copula function 223 | and parameters. The IAD distance is calculated based on the differences 224 | between these two copulas. 225 | Based on equation 9 in "Crash Sensitivity and the Cross Section of Expected 226 | Stock Returns" (2018) Journal of Financial and Quantitative Analysis 227 | 228 | Args: 229 | copula (function): The copula object, providing a method `get_cdf` to compute the CDF. 230 | data (array-like): The underlying data as a 2D array, where each row is a dimension. 231 | param (array-like): The parameters of the copula. 232 | 233 | Returns: 234 | float: The IAD distance between the empirical and the parametric copulas. 235 | """ 236 | 237 | n = len(data[0]) 238 | 239 | # Get the order statistics for each dimension 240 | sorted_u = np.sort(data[0]) 241 | sorted_v = np.sort(data[1]) 242 | 243 | # Create a grid of comparisons for each pair (u, v) 244 | u_grid, v_grid = np.meshgrid(sorted_u, sorted_v, indexing='ij') 245 | 246 | # Count the number of points below the threshold in both dimensions 247 | # Use broadcasting to compare all pairs and count 248 | counts = np.sum((data[0][:, None, None] <= u_grid) & (data[1][:, None, None] <= v_grid), axis=0) 249 | # Compute the empirical copula 250 | C_empirical = counts / n 251 | 252 | # Prepare the grid for computing the parametric copula 253 | x_values, y_values = np.linspace(1/n, 1-1/n, n), np.linspace(1/n, 1-1/n, n) 254 | 255 | # Compute the parametric (theoretical) copula values 256 | u_flat = np.array([[x for x in x_values] for y in y_values]).flatten() 257 | v_flat = np.array([[y for x in x_values] for y in y_values]).flatten() 258 | 259 | C_copula = copula.get_cdf(np.array(u_flat), np.array(v_flat), param) 260 | C_copula = C_copula.reshape((n,n)) 261 | 262 | # Calculate the Integrated Anderson-Darling distance 263 | IAD = np.sum(((C_empirical - C_copula) ** 2) / (C_copula - C_copula**2)) 264 | 265 | return IAD 266 | 267 | 268 | def AD_dist(copula, data, param): 269 | """ 270 | Compute the Anderson-Darling (IAD) distance between the parametric 271 | copula and the empirical copula with vectorization. 272 | 273 | Same principle as IAD_dist() 274 | """ 275 | 276 | n = len(data[0]) 277 | 278 | sorted_u = np.sort(data[0]) 279 | sorted_v = np.sort(data[1]) 280 | 281 | u_grid, v_grid = np.meshgrid(sorted_u, sorted_v, indexing='ij') 282 | 283 | counts = np.sum((data[0][:, None, None] <= u_grid) & (data[1][:, None, None] <= v_grid), axis=0) 284 | C_empirical = counts / n 285 | 286 | x_values, y_values = np.linspace(1/n, 1-1/n, n), np.linspace(1/n, 1-1/n, n) 287 | 288 | u_flat = np.array([[x for x in x_values] for y in y_values]).flatten() 289 | v_flat = np.array([[y for x in x_values] for y in y_values]).flatten() 290 | 291 | C_copula = copula.get_cdf(np.array(u_flat), np.array(v_flat), param) 292 | C_copula = C_copula.reshape((n,n)) 293 | 294 | AD = np.max(((C_empirical - C_copula) ** 2) / (C_copula - C_copula**2)) 295 | return AD -------------------------------------------------------------------------------- /pycop/bivariate/gaussian.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from scipy.stats import norm, multivariate_normal 3 | from scipy.special import erfinv 4 | from pycop.bivariate.copula import copula 5 | 6 | class gaussian(copula): 7 | """ 8 | # Creates a gaussian copula object 9 | 10 | ... 11 | 12 | Attributes 13 | ---------- 14 | family : str 15 | = "gaussian" 16 | bounds_param : list 17 | A list that contains the domain of the parameter(s) in a tuple. 18 | Exemple : [(lower, upper)] 19 | parameters_start : array 20 | Value(s) of the initial guess when estimating the copula parameter(s). 21 | It represents the parameter `x0` in the `scipy.optimize.minimize` function. 22 | 23 | Methods 24 | ------- 25 | get_cdf(u, v, param) 26 | Computes the Cumulative Distribution Function (CDF). 27 | get_pdf(u, v, param) 28 | Computes the Probability Density Function (PDF). 29 | """ 30 | 31 | def __init__(self): 32 | # the `gaussian` copula class inherit the `copula` class 33 | super().__init__() 34 | self.family = "gaussian" 35 | self.bounds_param = [(-1, 1)] 36 | self.parameters_start = np.array(0) 37 | 38 | def get_cdf(self, u, v, param): 39 | """ 40 | # Computes the CDF 41 | 42 | Parameters 43 | ---------- 44 | u, v : float 45 | Values of the marginal CDFs 46 | param : list 47 | The correlation coefficient param[0] ∈ [-1,1]. 48 | Used to defined the correlation matrix (squared, symetric and definite positive) 49 | """ 50 | 51 | y1 = norm.ppf(u, 0, 1) 52 | y2 = norm.ppf(v, 0, 1) 53 | rho = param[0] 54 | 55 | return multivariate_normal.cdf(np.array([y1, y2]).T, mean=None, cov=[[1, rho], [rho, 1]]) 56 | 57 | def get_pdf(self, u, v, param): 58 | """ 59 | # Computes the PDF 60 | 61 | Parameters 62 | ---------- 63 | u, v : float 64 | Values of the marginal CDFs 65 | param : list 66 | The correlation coefficient param[0] ∈ [-1,1]. 67 | Used to defined the correlation matrix (squared, symetric and definite positive) 68 | """ 69 | 70 | rho = param[0] 71 | a = np.sqrt(2) * erfinv(2 * u - 1) 72 | b = np.sqrt(2) * erfinv(2 * v - 1) 73 | det_rho = 1 - rho**2 74 | 75 | return det_rho**-0.5 * np.exp(-((a**2 + b**2) * rho**2 -2 * a * b * rho) / (2 * det_rho)) 76 | 77 | 78 | -------------------------------------------------------------------------------- /pycop/bivariate/mixture.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from pycop.bivariate.copula import copula 3 | from pycop.bivariate.archimedean import archimedean 4 | from pycop.bivariate.gaussian import gaussian 5 | from pycop.bivariate.student import student 6 | 7 | 8 | class mixture(copula): 9 | """ 10 | # Creates an mixture copula objects 11 | 12 | ... 13 | 14 | Attributes 15 | ---------- 16 | dim : int 17 | The number of copula combined, only 2 or 3 supported 18 | mixture_type : str 19 | The type of mixture as bundle of the combinated copula 20 | cop : list 21 | A list that contains the copula objects to combine 22 | bounds_param : list 23 | A list that contains the domain of the parameter(s) in a tuple. 24 | Example : [(lower, upper), (lower, upper)] 25 | parameters_start : array 26 | Value(s) of the initial guess when estimating the copula parameter(s). 27 | It represents the parameter `x0` in the `scipy.optimize.minimize` function. 28 | 29 | Methods 30 | ------- 31 | get_cdf(u, v, param) 32 | Computes the Cumulative Distribution Function (CDF). 33 | get_pdf(u, v, param) 34 | Computes the Probability Density Function (PDF). 35 | LTDC(w1, theta1) 36 | Computes the Lower Tail Dependence Coefficient (TDC). 37 | UTDC(w1, theta2) 38 | Computes the upper TDC. 39 | """ 40 | 41 | def __init__(self, copula_list): 42 | """ 43 | Parameters 44 | ---------- 45 | copula_list : list 46 | A list of string of the type of copula to combined 47 | Example : ["clayton", "gumbel"] 48 | 49 | Raises 50 | ------ 51 | ValueError 52 | dim : the lenght of `copula_list` must be equal to 2 or 3. 53 | Mixtures are only available for a combination of 2 or 3 copulas 54 | 55 | copula_list : element must be supported functions. 56 | Mixtures are only available for archimedean and gaussian. 57 | 58 | """ 59 | # the `student` copula class inherit the `copula` class 60 | super().__init__() 61 | 62 | Archimedean_families = [ 63 | 'clayton', 'gumbel', 'frank', 'joe', 'galambos','fgm', 'plackett', 64 | 'rgumbel', 'rclayton', 'rjoe','rgalambos'] 65 | 66 | self.dim = len(copula_list) 67 | 68 | if self.dim != 2 and self.dim != 3: 69 | print("Mixture supported only for combinaison of 2 or 3 copulas") 70 | raise ValueError 71 | 72 | self.cop = [] 73 | mixture_type = copula_list[0].capitalize() 74 | 75 | for cop in copula_list[1:]: 76 | mixture_type+= "-"+cop.capitalize() 77 | 78 | self.family = mixture_type + " mixture" 79 | if self.dim ==2: 80 | self.bounds_param = [(0,1)] 81 | self.parameters_start = [np.array(1/self.dim)] 82 | else: 83 | self.bounds_param = [(0,1) for i in range(0, 3)] 84 | self.parameters_start = [np.array(1/self.dim) for i in range(0, 3)] 85 | 86 | for i in range(0,self.dim): 87 | 88 | if copula_list[i] == "gaussian": 89 | self.cop.append(gaussian()) 90 | self.bounds_param.append((-1,1)) 91 | self.parameters_start.append(np.array(0)) 92 | 93 | elif copula_list[i] in Archimedean_families: 94 | cop_mixt = archimedean(family=copula_list[i]) 95 | self.cop.append(cop_mixt) 96 | self.bounds_param.append(cop_mixt.bounds_param[0]) 97 | self.parameters_start.append(cop_mixt.parameters_start) 98 | else: 99 | print("Mixture only supported for archimedean and gaussian mixture only") 100 | print("Archimedean copula available are: ", Archimedean_families) 101 | raise ValueError 102 | self.parameters_start = tuple(self.parameters_start) 103 | 104 | def get_cdf(self, u, v, param): 105 | """ 106 | # Computes the CDF 107 | 108 | Parameters 109 | ---------- 110 | u, v : float 111 | Values of the marginal CDFs 112 | param : list 113 | A list that contains the parameters of the mixture and the copula. 114 | The element of the list must be ordered as follow, for 2-dimensional mixture : 115 | [ 116 | weight1 : float, weight1 ∈ [-1,1] 117 | The weight given in the first copula. 118 | theta1 : float 119 | The theta parameter of the first copula. 120 | theta2 : float 121 | " second. 122 | ] 123 | For a 3-dimensional mixture : 124 | [ 125 | weight1 : float 126 | the weight given in the first copula. 127 | weight2 : float 128 | " second. 129 | weight3 : float 130 | " third. 131 | theta1 : float 132 | The theta parameter of the first copula. 133 | theta2 : float 134 | " second. 135 | theta3 : float 136 | " third. 137 | ] 138 | The sum of the weights must be equal to 1. 139 | """ 140 | if self.dim == 2: 141 | cdf = param[0]*(self.cop[0].get_cdf(u,v,[param[1]])) \ 142 | +(1-param[0])*(self.cop[1].get_cdf(u,v,[param[2]])) 143 | else: 144 | cdf= param[0]*(self.cop[0].get_cdf(u,v,[param[3]])) \ 145 | + param[1]*(self.cop[1].get_cdf(u,v,[param[4]])) \ 146 | + param[2]*(self.cop[2].get_cdf(u,v,[param[5]])) 147 | return cdf 148 | 149 | def get_pdf(self, u, v, param): 150 | """ 151 | # Computes the CDF 152 | 153 | Parameters 154 | ---------- 155 | u, v : float 156 | Values of the marginal CDFs 157 | param : list 158 | A list that contains the parameters of the mixture and the copula. 159 | See how to order the list in the above method `get_cdf` 160 | """ 161 | if self.dim == 2: 162 | pdf = param[0]*(self.cop[0].get_pdf(u,v,[param[1]])) \ 163 | +(1-param[0])*(self.cop[1].get_pdf(u,v,[param[2]])) 164 | else: 165 | pdf = param[0]*(self.cop[0].get_pdf(u,v,[param[3]])) \ 166 | + param[1]*(self.cop[1].get_pdf(u,v,[param[4]])) \ 167 | + param[2]*(self.cop[2].get_pdf(u,v,[param[5]])) 168 | 169 | return pdf 170 | 171 | def LTDC(self, theta, weight): 172 | """ 173 | # Computes the upper TDC 174 | 175 | Parameters 176 | ---------- 177 | weight : float 178 | The weight associated to the copula with Lower Tail Dependence 179 | theta : float 180 | The parameter of the copula with Lower Tail Dependence 181 | """ 182 | return self.cop[0].LTDC(theta)*weight 183 | 184 | def UTDC(self, theta, weight): 185 | """ 186 | # Computes the upper TDC 187 | 188 | Parameters 189 | ---------- 190 | weight : float 191 | The weight associated to the copula with Upper Tail Dependence 192 | theta : float 193 | The parameter of the copula with Upper Tail Dependence 194 | """ 195 | return self.cop[-1].UTDC(theta)*weight 196 | 197 | 198 | 199 | -------------------------------------------------------------------------------- /pycop/bivariate/student.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from scipy.stats import t 3 | from scipy.special import gamma 4 | from pycop.bivariate.copula import copula 5 | 6 | class student(copula): 7 | """ 8 | # Creates a student copula object 9 | 10 | The multivariate student CDF has no analytic expression but it can be 11 | approximated numerically 12 | 13 | ... 14 | 15 | Attributes 16 | ---------- 17 | family : str 18 | = "student" 19 | bounds_param : list 20 | A list that contains the domain of the parameter(s) in a tuple. 21 | Exemple : [(lower, upper)] 22 | parameters_start : array 23 | Value(s) of the initial guess when estimating the copula parameter(s). 24 | It represents the parameter `x0` in the `scipy.optimize.minimize` function. 25 | 26 | Methods 27 | ------- 28 | 29 | get_pdf(u, v, rho) 30 | Computes the Probability Density Function (PDF). 31 | """ 32 | 33 | def __init__(self): 34 | # the `student` copula class inherit the `copula` class 35 | super().__init__() 36 | self.family = "student" 37 | self.bounds_param = [(-1+1e-6, 1-1e-6), (1e-6, None)] 38 | self.parameters_start = (np.array(0), np.array(1)) 39 | 40 | def get_pdf(self, u, v, param): 41 | """ 42 | # Computes the PDF 43 | 44 | # Source: 45 | Joe, H. (2014). Dependence modeling with copulas. CRC press. 46 | 4.13 Multivariate t - Student's Copula p.181 Equation (4.32) 47 | 48 | Parameters 49 | ---------- 50 | u, v : float 51 | Values of the marginal CDFs 52 | param : list 53 | A list that contains the correlation coefficient rho ∈ [-1,1] and 54 | nu > 0, the degrees of freedom. 55 | """ 56 | 57 | rho = param[0] 58 | nu = param[1] 59 | 60 | term1 = gamma((nu + 2) / 2) * gamma(nu / 2) 61 | term2 = gamma((nu + 1) / 2) ** 2 62 | 63 | u_ = t.ppf(u, df=nu) 64 | v_ = t.ppf(v, df=nu) 65 | 66 | det_rho = 1-rho**2 67 | multid = (-2 * u_ * v_ * rho + (u_ ** 2) + (v_ ** 2) ) / det_rho 68 | term3 = (1 + multid / nu) ** ((nu + 2) / 2) 69 | 70 | prod1 = (1 + (u_ ** 2) / nu) ** ((nu + 1) / 2) 71 | prod2 = (1 + (v_ ** 2) / nu) ** ((nu + 1) / 2) 72 | prod = prod1 * prod2 73 | 74 | return (1/np.sqrt(det_rho)) * (term1 * prod) / (term2 * term3) 75 | 76 | -------------------------------------------------------------------------------- /pycop/multivariate/gaussian.py: -------------------------------------------------------------------------------- 1 | 2 | 3 | from scipy.stats import t 4 | from scipy.special import gamma 5 | from scipy.stats import norm, multivariate_normal 6 | import numpy as np 7 | 8 | 9 | class gaussian(): 10 | 11 | def __init__(self, corrMatrix): 12 | """ 13 | Creates a gaussian copula 14 | 15 | corrMatrix: length determines the dimension of random variable 16 | 17 | the correlation matrix must be squared + symetric 18 | and definite positive 19 | 20 | """ 21 | self.corrMatrix = np.asarray(corrMatrix) 22 | self.n = len(corrMatrix) 23 | 24 | def cdf(self, d): 25 | """ 26 | returns the cumulative distribution 27 | d = (U1, ..., Un) 28 | 29 | """ 30 | y = norm.ppf(d, 0, 1) 31 | 32 | return multivariate_normal.cdf(y, mean=None, cov=self.corrMatrix) 33 | 34 | def pdf(self, d): 35 | """ 36 | returns the density 37 | """ 38 | y = norm.ppf(d, 0, 1) 39 | 40 | rho_det = np.linalg.det(self.corrMatrix) 41 | rho_inv = np.linalg.inv(self.corrMatrix) 42 | 43 | return rho_det**(-0.5) * np.exp(-0.5 * np.dot(y, np.dot(rho_inv - np.identity(self.n), y))) 44 | 45 | -------------------------------------------------------------------------------- /pycop/multivariate/student.py: -------------------------------------------------------------------------------- 1 | from scipy.stats import t 2 | from scipy.special import gamma 3 | from scipy.stats import norm, multivariate_normal 4 | import numpy as np 5 | 6 | 7 | class student(): 8 | 9 | def __init__(self, corrMatrix, nu): 10 | self.corrMatrix = np.asarray(corrMatrix) 11 | self.nu = nu 12 | self.n = len(corrMatrix) 13 | 14 | """ 15 | The multivariate student distribution CDF has no analytic expression but it can be approximated numerically 16 | """ 17 | 18 | def pdf(self, d): 19 | """ 20 | returns the density 21 | """ 22 | y = t.ppf(d, df=self.nu) 23 | 24 | rho_det = np.linalg.det(self.corrMatrix) 25 | rho_inv = np.linalg.inv(self.corrMatrix) 26 | 27 | A = gamma((self.nu+self.n)/2)*( gamma(self.nu/2)**(self.n-1) ) 28 | B = gamma((self.nu+1)/2)**self.n 29 | C = (1 + (np.dot(y, np.dot(rho_inv, y)))/self.nu)**((self.nu+self.n)/2) 30 | 31 | [ (1 + yi**2/self.nu)**((self.nu+1)/2) for yi in y] 32 | 33 | prod = 1 34 | for comp in [ (1 + yi**2/self.nu)**((self.nu+1)/2) for yi in y]: 35 | prod *=comp 36 | 37 | return rho_det**(-0.5) *(A*prod)/(B*C) 38 | -------------------------------------------------------------------------------- /pycop/simulation.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | import numpy as np 3 | from scipy import linalg 4 | from scipy.stats import norm, t, levy_stable, logser 5 | from scipy.special import gamma, comb 6 | from typing import List 7 | 8 | 9 | def simu_gaussian(n: int, m: int, corr_matrix: np.array): 10 | """ 11 | # Gaussian Copula simulations with a given correlation matrix 12 | 13 | Parameters 14 | ---------- 15 | n : int 16 | the dimension number of simulated variables 17 | m : int 18 | the sample size 19 | corr_matrix : array 20 | the correlation matrix 21 | 22 | Returns 23 | ------- 24 | u : array 25 | the simulated sample 26 | 27 | """ 28 | if not all(isinstance(v, int) for v in [n, m]): 29 | raise TypeError("The 'n' and 'm' arguments must both be integer types.") 30 | if not isinstance(corr_matrix, np.ndarray): 31 | raise TypeError("The 'corr_matrix' argument must be a numpy array.") 32 | # Generate n independent standard Gaussian random variables V = (v1 ,..., vn): 33 | v = [np.random.normal(0, 1, m) for i in range(0, n)] 34 | 35 | # Compute the lower triangular Cholesky factorization of the correlation matrix: 36 | l = linalg.cholesky(corr_matrix, lower=True) 37 | y = np.dot(l, v) 38 | u = norm.cdf(y, 0, 1) 39 | 40 | return u 41 | 42 | 43 | def simu_tstudent(n: int, m: int, corr_matrix: np.array, nu: float): 44 | """ 45 | # Student Copula with k degrees of freedom and a given correlation matrix 46 | 47 | Parameters 48 | ---------- 49 | n : int 50 | the dimension number of simulated variables 51 | m : int 52 | the sample size 53 | corr_matrix : array 54 | the correlation matrix 55 | nu : float 56 | the degree of freedom 57 | 58 | Returns 59 | ------- 60 | u : array 61 | the simulated sample 62 | 63 | """ 64 | if not all(isinstance(v, int) for v in [n, m]): 65 | raise TypeError("The 'n' and 'm' arguments must both be integer types.") 66 | if not isinstance(corr_matrix, np.ndarray): 67 | raise TypeError("The 'corr_matrix' argument must be a numpy array.") 68 | if not isinstance(nu, (int, float)): 69 | raise TypeError("The 'nu' argument must be a float type.") 70 | 71 | # Generate n independent standard Gaussian random variables V = (v1 ,..., vn): 72 | v = [np.random.normal(0, 1, m) for i in range(0, n)] 73 | 74 | # Compute the lower triangular Cholesky factorization of rho: 75 | l = linalg.cholesky(corr_matrix, lower=True) 76 | z = np.dot(l, v) 77 | 78 | # generate a random variable r, following a chi2-distribution with nu degrees of freedom 79 | r = np.random.chisquare(df=nu,size=m) 80 | 81 | y = np.sqrt(nu/ r)*z 82 | u = t.cdf(y, df=nu, loc=0, scale=1) 83 | 84 | return u 85 | 86 | 87 | def SimuSibuya(alpha: float, m: int): 88 | """ 89 | # Sibuya distribution Sibuya(α) 90 | Used for sampling F=Sibuya(α) for Joe copula 91 | The algorithm is given in Proposition 3.2 in Hofert (2011) "Efficiently sampling nested Archimedean copulas" 92 | 93 | Parameters 94 | ---------- 95 | alpha : float 96 | the alpha parameter, α = 1/θ 97 | m : int 98 | the sample size 99 | 100 | Returns 101 | ------- 102 | X : array 103 | the simulated sample 104 | 105 | """ 106 | if not isinstance(alpha, (int, float)): 107 | raise TypeError("The 'alpha' argument must be a float type.") 108 | if not isinstance(m, int): 109 | raise TypeError("The 'm' argument must be an integer type.") 110 | 111 | G_1 = lambda y: ((1-y)*gamma(1-alpha))**(-1/alpha) 112 | F = lambda n: 1 - ((-1)**n)*comb(n, alpha-1) 113 | 114 | X = np.random.uniform(0, 1, m) 115 | 116 | for i in range(0, len(X)): 117 | if X[i] <= alpha: 118 | X[i] = 1 119 | elif F(np.floor(G_1(X[i]))) < X[i]: 120 | X[i] = np.ceil(G_1(X[i])) 121 | else: 122 | X[i] = np.floor(G_1(X[i])) 123 | 124 | return X 125 | 126 | 127 | def simu_archimedean(family: str, n: int, m: int, theta: float): 128 | """ 129 | Archimedean copula simulation 130 | 131 | Parameters 132 | ---------- 133 | 134 | family : str 135 | type of the distribution 136 | n : int 137 | the dimension number of simulated variables 138 | m : int 139 | the sample size 140 | theta : float 141 | copula parameter 142 | Clayton: θ ∈ [0, inf) 143 | Gumbel: θ ∈ [1, +inf) 144 | Frank: θ ∈ (-inf, +inf) 145 | Joe: θ ∈ [1, +inf) 146 | AMH: θ ∈ [0, 1) 147 | 148 | Returns 149 | ------- 150 | u : array 151 | the simulated sample, array matrix with dim (m, n) 152 | 153 | """ 154 | if family not in ["clayton", "gumbel", "frank", "joe", "amh"]: 155 | raise ValueError("The family argument must be one of 'clayton', 'gumbel', 'frank', 'joe' or 'amh'.") 156 | if not all(isinstance(v, int) for v in [n, m]): 157 | raise TypeError("The 'n' and 'm' arguments must both be integer types.") 158 | if not isinstance(theta, (int, float)): 159 | raise TypeError("The 'theta' argument must be a float type.") 160 | 161 | if family == "clayton": 162 | # Generate n independent standard uniform random variables V = (v1 ,..., vn): 163 | v = [np.random.uniform(0, 1, m) for i in range(0, n)] 164 | # generate a random variable x following the gamma distribution gamma(theta**(-1), 1) 165 | X = np.array([np.random.gamma(theta**(-1), scale=1.0) for i in range(0, m)]) 166 | phi_t = lambda t: (1+t)**(-1/theta) 167 | u = [phi_t(-np.log(v[i])/X) for i in range(0, n)] 168 | 169 | elif family == "gumbel": 170 | v = [np.random.uniform(0, 1, m) for i in range(0, n)] 171 | X = levy_stable.rvs(alpha=1/theta, beta=1, scale=(np.cos(np.pi/(2*theta)))**theta, loc=0, size=m) 172 | phi_t = lambda t: np.exp(-t**(1/theta)) 173 | u = [phi_t(-np.log(v[i])/X) for i in range(0, n)] 174 | 175 | elif family == "frank": 176 | v = [np.random.uniform(0, 1, m) for i in range(0, n)] 177 | p = 1-np.exp(-theta) 178 | X = logser.rvs(p, loc=0, size=m, random_state=None) 179 | phi_t = lambda t: -np.log(1-np.exp(-t)*(1-np.exp(-theta)))/theta 180 | u = [phi_t(-np.log(v[i])/X) for i in range(0, n)] 181 | 182 | elif family == "joe": 183 | alpha = 1/theta 184 | X = SimuSibuya(alpha, m) 185 | v = [np.random.uniform(0, 1, m) for i in range(0, n)] 186 | phi_t = lambda t: (1-(1-np.exp(-t))**(1/theta)) 187 | u = [phi_t(-np.log(v[i])/X) for i in range(0, n)] 188 | 189 | elif family == "amh": 190 | v = [np.random.uniform(0, 1, m) for i in range(0, n)] 191 | X = np.random.geometric(p=1-theta, size=m) 192 | phi_t = lambda t: (1-theta)/(np.exp(t)-theta) 193 | u = [phi_t(-np.log(v[i])/X) for i in range(0, n)] 194 | return u 195 | 196 | 197 | def simu_mixture(n: int, m: int, combination: List[dict]): 198 | """ 199 | Mixture copula simulation 200 | 201 | Parameters 202 | ---------- 203 | 204 | n : int 205 | the dimension number of simulated variables 206 | m : int 207 | the sample size 208 | 209 | combination : list 210 | A list of dictionaries that contains information on the copula to combine. 211 | 212 | example: 213 | combination =[ 214 | { 215 | "type": "clayton", 216 | "weight": 0.5, 217 | "theta": 4 218 | }, 219 | { 220 | "type": "student", 221 | "weight": 0.5, 222 | "corrMatrix": corrMatrix, 223 | "nu":2 224 | } 225 | ] 226 | 227 | Returns 228 | ------- 229 | u : array 230 | the simulated sample, array matrix with dim (m, n) 231 | 232 | """ 233 | if not all(isinstance(v, int) for v in [n, m]): 234 | raise TypeError("The 'n' and 'm' arguments must both be integer types.") 235 | if not isinstance(combination, list): 236 | raise TypeError("The 'combination' argument must be a list type") 237 | if not all(isinstance(v, dict) for v in combination): 238 | raise TypeError("Each element of the 'combination' argument must be a dict type.") 239 | 240 | v = [np.random.uniform(0, 1, m) for i in range(0, n)] 241 | weights = [comb["weight"] for comb in combination] 242 | #Generate a random sample of indexes of combination types 243 | y = np.array([np.where(ls == 1)[0][0] for ls in np.random.multinomial(n=1, pvals=weights, size=m)]) 244 | 245 | for i in range(0, len(combination)): 246 | combinationsize = len(v[0][y == i]) 247 | 248 | if combination[i]["type"] == "gaussian": 249 | corr_matrix = combination[i]["corrMatrix"] 250 | 251 | vi = simu_gaussian(n, combinationsize, corr_matrix) 252 | for j in range(0, len(vi)): 253 | v[j][y == i] = vi[j] 254 | elif combination[i]["type"] == "student": 255 | corr_matrix = combination[i]["corrMatrix"] 256 | nu = combination[i]["nu"] 257 | vi = simu_tstudent(n, combinationsize, corr_matrix, nu) 258 | 259 | elif combination[i]["type"] in ["clayton", "gumbel", "frank", "joe", "amh"]: 260 | vi = simu_archimedean(combination[i]["type"], n, combinationsize, combination[i]["theta"]) 261 | 262 | for j in range(0, len(vi)): 263 | v[j][y == i] = vi[j] 264 | else: 265 | raise error 266 | 267 | return v 268 | 269 | -------------------------------------------------------------------------------- /pycop/utils.py: -------------------------------------------------------------------------------- 1 | import matplotlib.pyplot as plt 2 | import numpy as np 3 | 4 | 5 | def empirical_density_contourplot(u, v, lims): 6 | 7 | res = 10 8 | 9 | pts = np.array([u,v]) 10 | 11 | pts = (pts - lims[0]) * res / (lims[1] - lims[0]) 12 | pts = np.round(pts).astype(int) 13 | pts[pts<0] = 0 14 | pts[pts>(res-1)] = res - 1 15 | 16 | Z = np.zeros((res,res)) 17 | for i in range(0, len(u)): 18 | Z[pts[0,i],pts[1,i]] += 1. 19 | 20 | #Z /= len(u) * (lims[1]-lims[0])**2 / res**2 21 | 22 | lvls = np.percentile(Z.flatten(), (50, 80, 90,)) 23 | x = np.linspace(lims[0],lims[1],res) 24 | y = np.linspace(lims[0],lims[1],res) 25 | X, Y = np.meshgrid(x,y) 26 | CS2 = plt.contour(X,Y,Z, levels=lvls, colors="k", linewidths=0.8) 27 | fmt = {} 28 | 29 | for l,s in zip( CS2.levels, [ "90%", "80%", "50%" ] ): 30 | fmt[l] = s 31 | 32 | plt.clabel(CS2, inline=1, inline_spacing=0, fmt=fmt, fontsize=8) 33 | 34 | 35 | def empiricalplot(u, contour=True): 36 | 37 | minu = min([min(ui) for ui in u]) 38 | maxu = max([max(ui) for ui in u]) 39 | setplenght = maxu-minu 40 | str("{:.1f}".format(minu)) 41 | lowerticks = [minu,minu+0.2*setplenght,minu+0.4*setplenght] 42 | upperticks = [minu+0.6*setplenght,minu+0.8*setplenght,maxu] 43 | limticks = [minu-0.1*setplenght, maxu+0.1*setplenght] 44 | 45 | n=len(u) 46 | for i in range(0,n): 47 | for j in range(0,n): 48 | if i == j: 49 | ax = plt.subplot(n, n, 1+(n+1)*i) 50 | #plt.text(0.5, 0.5, r"$u_%s$" % str(i+1)) 51 | plt.xlim(limticks) 52 | plt.ylim(limticks) 53 | plt.xticks(lowerticks, [str("{:.1f}".format(tcks)) for tcks in lowerticks], ha='center') 54 | plt.yticks(upperticks, [str("{:.1f}".format(tcks)) for tcks in upperticks], va='center', ha='left') 55 | ax.tick_params(axis="y",direction="in", pad=-10) 56 | ax.tick_params(axis="x",direction="in", pad=-15) 57 | 58 | ax2 = ax.twinx() 59 | plt.xlim(limticks) 60 | plt.ylim(limticks) 61 | plt.yticks(lowerticks, [str("{:.1f}".format(tcks)) for tcks in lowerticks], va='center', ha='right') 62 | ax2.tick_params(axis="y",direction="in", pad=-10) 63 | 64 | ax3 = ax.twiny() 65 | plt.xlim(limticks) 66 | plt.ylim(limticks) 67 | plt.xticks(upperticks, [str("{:.1f}".format(tcks)) for tcks in upperticks], va='center', ha='center') 68 | ax3.tick_params(axis="x",direction="in", pad=-12) 69 | elif i < j: 70 | if contour == True: 71 | ax = plt.subplot(n, n, n*i+j+1) 72 | empirical_density_contourplot(u[i], u[j], [minu, maxu]) 73 | else: 74 | pass 75 | 76 | plt.xlim(limticks) 77 | plt.ylim(limticks) 78 | plt.xticks([]) 79 | plt.yticks([]) 80 | else: 81 | ax = plt.subplot(n, n, n*i+j+1) 82 | plt.scatter(u[i], u[j], alpha=0.8, facecolors='none', edgecolors='b', s=2) 83 | plt.xlim(limticks) 84 | plt.ylim(limticks) 85 | plt.xticks([]) 86 | plt.yticks([]) 87 | 88 | plt.subplots_adjust(bottom=0.02, right=0.98, top=0.98, left=0.02, wspace=0.0, hspace=0.0) 89 | plt.show() 90 | 91 | 92 | 93 | -------------------------------------------------------------------------------- /pyproject.toml: -------------------------------------------------------------------------------- 1 | [build-system] 2 | requires = ["setuptools>=61.0"] 3 | build-backend = "setuptools.build_meta" 4 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import setup, find_packages 2 | 3 | setup( 4 | name = 'pycop', 5 | version = '0.0.13', 6 | description = 'Copula for multivariate dependence modeling', 7 | long_description=open('README.md', 'r').read(), 8 | long_description_content_type="text/markdown", 9 | author = 'Maxime N', 10 | author_email = 'maxime.nlc@proton.me', 11 | url = 'https://github.com/maximenc/pycop/', 12 | download_url = 'https://github.com/maximenc/pycop/', 13 | classifiers = [], 14 | include_package_data=True, 15 | packages=find_packages(".") 16 | ) 17 | --------------------------------------------------------------------------------