├── .github
└── workflows
│ └── python-publish.yml
├── .gitignore
├── LICENSE
├── README.md
├── data
└── msci.csv
├── docs
└── images
│ ├── logo_pycop.svg
│ ├── plot
│ ├── 2c_mixture_contour_mpdf.svg
│ ├── 2c_mixture_contour_pdf.svg
│ ├── 3c_mixture_contour_mpdf.svg
│ ├── 3c_mixture_contour_pdf.svg
│ ├── clayton_3d_mpdf.svg
│ ├── clayton_contour_mpdf.svg
│ ├── gumbel_3d_cdf.svg
│ ├── gumbel_3d_pdf.svg
│ ├── plackett_contour_cdf.svg
│ └── plackett_contour_pdf.svg
│ └── simu
│ ├── 2c_mixture_simu.svg
│ ├── 3c_mixture_simu.svg
│ ├── clayton_simu_n3.svg
│ ├── gaussian_simu.svg
│ ├── gaussian_simu_n3.svg
│ ├── gumbel_simu.svg
│ ├── rgumbel_simu.svg
│ └── student_simu.svg
├── examples
├── example_estim.ipynb
├── example_plot.ipynb
└── example_simu.ipynb
├── pycop
├── __init__.py
├── __init__.pyc
├── bivariate
│ ├── __init__.py
│ ├── archimedean.py
│ ├── copula.py
│ ├── empirical.py
│ ├── estimation.py
│ ├── gaussian.py
│ ├── mixture.py
│ └── student.py
├── multivariate
│ ├── gaussian.py
│ └── student.py
├── simulation.py
└── utils.py
├── pyproject.toml
└── setup.py
/.github/workflows/python-publish.yml:
--------------------------------------------------------------------------------
1 | # This workflow will upload a Python Package using Twine when a release is created
2 | # For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python#publishing-to-package-registries
3 |
4 | # This workflow uses actions that are not certified by GitHub.
5 | # They are provided by a third-party and are governed by
6 | # separate terms of service, privacy policy, and support
7 | # documentation.
8 |
9 | name: Upload Python Package
10 |
11 | on:
12 | release:
13 | types: [published]
14 |
15 | permissions:
16 | contents: read
17 |
18 | jobs:
19 | deploy:
20 |
21 | runs-on: ubuntu-latest
22 |
23 | steps:
24 | - uses: actions/checkout@v3
25 | - name: Set up Python
26 | uses: actions/setup-python@v3
27 | with:
28 | python-version: '3.x'
29 | - name: Install dependencies
30 | run: |
31 | python -m pip install --upgrade pip
32 | pip install build
33 | - name: Build package
34 | run: python -m build
35 | - name: Publish package
36 | uses: pypa/gh-action-pypi-publish@27b31702a0e7fc50959f5ad993c78deac1bdfc29
37 | with:
38 | user: __token__
39 | password: ${{ secrets.PYPI_API_TOKEN }}
40 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 |
2 | pycop/bivariate/__pycache__/
3 | pycop/multivariate/__pycache__/
4 | pycop/__pycache__/
5 | pycop.egg-info
6 | build/lib/pycop
7 | setup.cfg
8 | dist/
9 | tests/
10 | build/
11 | pycop/simulation.pyc
12 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2021 The Python Packaging Authority
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 | [](https://pypi.org/project/pycop)
7 | [](https://pepy.tech/project/pycop)
8 | [](https://img.shields.io/pypi/l/pycop)
9 | [](https://doi.org/10.5281/zenodo.7030034)
10 |
11 | # How to cite
12 |
13 | If you use pycop in a scientific publication
14 | ```
15 | @article{nicolas2022pycop,
16 | title={pycop: a Python package for dependence modeling with copulas},
17 | author={Nicolas, Maxime LD},
18 | journal={Zenodo Software Package},
19 | volume={70},
20 | pages={7030034},
21 | year={2022}
22 | }
23 | ```
24 |
25 |
26 | # Overview
27 |
28 | Pycop is the most complete tool for modeling multivariate dependence with Python. The package provides methods such as estimation, random sample generation, and graphical representation for commonly used copula functions. The package supports the use of mixture models defined as convex combinations of copulas. Other methods based on the empirical copula such as the non-parametric Tail Dependence Coefficient are given.
29 |
30 | Some of the features covered:
31 | * Elliptical copulas (Gaussian & Student) and common Archimedean Copulas functions
32 | * Mixture model of multiple copula functions (up to 3 copula functions)
33 | * Multivariate random sample generation
34 | * Empirical copula method
35 | * Parametric and Non-parametric Tail Dependence Coefficient (TDC)
36 |
37 |
38 | ### Available copula function
39 |
40 |
41 | | Copula | Bivariate Graph & Estimation | Multivariate Simulation |
42 | |--- | :-: | :-: |
43 | | Mixture | ✓ | ✓ |
44 | | Gaussian | ✓ | ✓ |
45 | | Student | ✓ | ✓ |
46 | | Clayton | ✓ | ✓ |
47 | | Rotated Clayton | ✓ | ✓ |
48 | | Gumbel | ✓ | ✓ |
49 | | Rotated Gumbel | ✓ | ✓ |
50 | | Frank | ✓ | ✓ |
51 | | Joe | ✓ | ✓ |
52 | | Rotated Joe | ✓ | ✓ |
53 | | Galambos | ✓ | ✗ |
54 | | Rotated Galambos | ✓ | ✗ |
55 | | BB1 | ✓ | ✗ |
56 | | BB2 | ✓ | ✗ |
57 | | FGM | ✓ | ✗ |
58 | | Plackett | ✓ | ✗ |
59 | | AMH | ✗ | ✓ |
60 |
61 |
62 | # Usage
63 |
64 | Install pycop using pip
65 | ```
66 | pip install pycop
67 | ```
68 |
69 | # Examples
70 | [](https://githubtocolab.com/maximenc/pycop/blob/master/examples/example_estim.ipynb)
71 | [Estimations on msci returns](https://github.com/maximenc/pycop/blob/master/examples/example_estim.ipynb)
72 |
73 |
74 | [](https://githubtocolab.com/maximenc/pycop/blob/master/examples/example_plot.ipynb)
75 | [Graphical Representations](https://github.com/maximenc/pycop/blob/master/examples/example_plot.ipynb)
76 |
77 |
78 | [](https://githubtocolab.com/maximenc/pycop/blob/master/examples/example_simu.ipynb)
79 | [Simulations](https://github.com/maximenc/pycop/blob/master/examples/example_simu.ipynb)
80 |
81 |
82 |
83 | # Table of Contents
84 |
85 | - [Graphical Representation](#Graphical-Representation)
86 | - [3d plot](#3d-plot)
87 | - [Contour plot](#Contour-plot)
88 | - [Mixture plot](#Mixture-plot)
89 | - [Simulation](#Simulation)
90 | - [Gaussian](#Gaussian)
91 | - [Student](#Student)
92 | - [Archimedean](#Archimedean)
93 | - [High dimension](#High-dimension)
94 | - [Mixture simulation](#Mixture-simulation)
95 | - [Estimation](#Estimation)
96 | - [Canonical Maximum Likelihood Estimation](#Canonical-Maximum-Likelihood-Estimation)
97 | - [Tail Dependence Coefficient](#Tail-Dependence-Coefficient)
98 | - [Theoretical TDC](#Theoretical-TDC)
99 | - [Non-parametric TDC](#Non-parametric-TDC)
100 | - [Optimal Empirical TDC](#Optimal-Empirical-TDC)
101 |
102 |
103 | # Graphical Representation
104 |
105 | We first create a copula object by specifying the copula familly
106 |
107 | ```python
108 | from pycop import archimedean
109 | cop = archimedean(family="clayton")
110 | ```
111 |
112 | Plot the cdf and pdf of the copula.
113 |
114 |
115 | ## 3d plot
116 |
117 | ```python
118 | cop = archimedean(family="gumbel")
119 |
120 | cop.plot_cdf([2], plot_type="3d", Nsplit=100 )
121 | cop.plot_pdf([2], plot_type="3d", Nsplit=100, cmap="cividis" )
122 | ```
123 |
124 |
125 |
126 |
127 |
128 |
129 |
130 |
131 | ## Contour plot
132 |
133 | plot the contour
134 |
135 | ```python
136 | cop = archimedean(family="plackett")
137 |
138 | cop.plot_cdf([2], plot_type="contour", Nsplit=100 )
139 | cop.plot_pdf([2], plot_type="contour", Nsplit=100, )
140 | ```
141 |
142 |
143 |
144 |
145 |
146 |
147 |
148 |
149 | It is also possible to add specific marginals
150 |
151 | ```python
152 | cop = archimedean.archimedean(family="clayton")
153 |
154 | from scipy.stats import norm
155 |
156 |
157 | marginals = [
158 | {
159 | "distribution": norm, "loc" : 0, "scale" : 0.8,
160 | },
161 | {
162 | "distribution": norm, "loc" : 0, "scale": 0.6,
163 | }]
164 |
165 | cop.plot_mpdf([2], marginals, plot_type="3d",Nsplit=100,
166 | rstride=1, cstride=1,
167 | antialiased=True,
168 | cmap="cividis",
169 | edgecolor='black',
170 | linewidth=0.1,
171 | zorder=1,
172 | alpha=1)
173 |
174 | lvls = [0.02, 0.05, 0.1, 0.2, 0.3]
175 |
176 | cop.plot_mpdf([2], marginals, plot_type="contour", Nsplit=100, levels=lvls)
177 | ```
178 |
179 |
180 |
181 |
182 |
183 |
184 |
185 | ## Mixture plot
186 |
187 | mixture of 2 copulas
188 |
189 | ```python
190 | from pycop import mixture
191 |
192 | cop = mixture(["clayton", "gumbel"])
193 | cop.plot_pdf([0.2, 2, 2], plot_type="contour", Nsplit=40, levels=[0.1,0.4,0.8,1.3,1.6] )
194 | # plot with defined marginals
195 | cop.plot_mpdf([0.2, 2, 2], marginals, plot_type="contour", Nsplit=50)
196 | ```
197 |
198 |
199 |
200 |
201 |
202 |
203 | ```python
204 |
205 | cop = mixture(["clayton", "gaussian", "gumbel"])
206 | cop.plot_pdf([1/3, 1/3, 1/3, 2, 0.5, 4], plot_type="contour", Nsplit=40, levels=[0.1,0.4,0.8,1.3,1.6] )
207 | cop.plot_mpdf([1/3, 1/3, 1/3, 2, 0.5, 2], marginals, plot_type="contour", Nsplit=50)
208 | ```
209 |
210 |
211 |
212 |
213 |
214 |
215 | # Simulation
216 |
217 | ## Gaussian
218 |
219 |
220 | ```python
221 | from scipy.stats import norm
222 | from pycop import simulation
223 |
224 | n = 2 # dimension
225 | m = 1000 # sample size
226 |
227 | corrMatrix = np.array([[1, 0.8], [0.8, 1]])
228 | u1, u2 = simulation.simu_gaussian(n, m, corrMatrix)
229 | ```
230 | Adding gaussian marginals, (using distribution.ppf from scipy.statsto transform uniform margin to the desired distribution)
231 |
232 | ```python
233 | u1 = norm.ppf(u1)
234 | u2 = norm.ppf(u2)
235 | ```
236 |
237 | ## Student
238 | ```python
239 | u1, u2 = simulation.simu_tstudent(n, m, corrMatrix, nu=1)
240 |
241 | ```
242 |
243 |
244 |
245 |
246 |
247 |
248 |
249 |
250 |
251 | ## Archimedean
252 |
253 | List of archimedean cop available
254 |
255 | ```python
256 | u1, u2 = simulation.simu_archimedean("gumbel", n, m, theta=2)
257 | u1, u2 = 1 - u1, 1 - u2
258 | ```
259 |
260 | Rotated
261 |
262 | ```python
263 | u1, u2 = 1 - u1, 1 - u2
264 | ```
265 |
266 |
267 |
268 |
269 |
270 |
271 |
272 |
273 | ## High dimension
274 |
275 |
276 | ```python
277 |
278 | n = 3 # Dimension
279 | m = 1000 # Sample size
280 |
281 | corrMatrix = np.array([[1, 0.9, 0], [0.9, 1, 0], [0, 0, 1]])
282 | u = simulation.simu_gaussian(n, m, corrMatrix)
283 | u = norm.ppf(u)
284 | ```
285 |
286 |
287 |
288 |
289 |
290 | ```python
291 | u = simulation.simu_archimedean("clayton", n, m, theta=2)
292 | u = norm.ppf(u)
293 | ```
294 |
295 |
296 |
297 |
298 |
299 | ## Mixture simulation
300 |
301 | Simulation from a mixture of 2 copulas
302 |
303 | ```python
304 | n = 3
305 | m = 2000
306 |
307 | combination = [
308 | {"type": "clayton", "weight": 1/3, "theta": 2},
309 | {"type": "gumbel", "weight": 1/3, "theta": 3}
310 | ]
311 |
312 | u = simulation.simu_mixture(n, m, combination)
313 | u = norm.ppf(u)
314 | ```
315 |
316 |
317 |
318 |
319 | Simulation from a mixture of 3 copulas
320 | ```python
321 | corrMatrix = np.array([[1, 0.8, 0], [0.8, 1, 0], [0, 0, 1]])
322 |
323 |
324 | combination = [
325 | {"type": "clayton", "weight": 1/3, "theta": 2},
326 | {"type": "student", "weight": 1/3, "corrMatrix": corrMatrix, "nu":2},
327 | {"type": "gumbel", "weight": 1/3, "theta":3}
328 | ]
329 |
330 | u = simulation.simu_mixture(n, m, combination)
331 | u = norm.ppf(u)
332 | ```
333 |
334 |
335 |
336 |
337 |
338 |
339 | # Estimation
340 |
341 | Estimation available :
342 | CMLE
343 |
344 |
345 | ## Canonical Maximum Likelihood Estimation (CMLE)
346 |
347 | Import a sample with pandas
348 |
349 | ```python
350 | import pandas as pd
351 | import numpy as np
352 |
353 | df = pd.read_csv("data/msci.csv")
354 | df.index = pd.to_datetime(df["Date"], format="%m/%d/%Y")
355 | df = df.drop(["Date"], axis=1)
356 |
357 | for col in df.columns.values:
358 | df[col] = np.log(df[col]) - np.log(df[col].shift(1))
359 |
360 | df = df.dropna()
361 | ```
362 |
363 |
364 | ```python
365 | from pycop import estimation, archimedean
366 |
367 | cop = archimedean("clayton")
368 | data = df[["US","UK"]].T.values
369 | param, cmle = estimation.fit_cmle(cop, data)
370 |
371 | ```
372 | clayton estim: 0.8025977727691012
373 |
374 |
375 |
376 | # Tail Dependence coefficient
377 |
378 | ## Theoretical TDC
379 |
380 | ```python
381 | from pycop import archimedean
382 |
383 | cop = archimedean("clayton")
384 |
385 | cop.LTDC(theta=0.5)
386 | cop.UTDC(theta=0.5)
387 | ```
388 |
389 | For a mixture copula, the copula with lower tail dependence comes first, and the one with upper tail dependence is last.
390 |
391 | ```python
392 | from pycop import mixture
393 |
394 | cop = mixture(["clayton", "gaussian", "gumbel"])
395 |
396 | LTDC = cop.LTDC(weight = 0.2, theta = 0.5)
397 | UTDC = cop.UTDC(weight = 0.2, theta = 1.5)
398 | ```
399 |
400 | ## Non-parametric TDC
401 | Create an empirical copula object
402 | ```python
403 | from pycop import empirical
404 |
405 | cop = empirical(df[["US","UK"]].T.values)
406 | ```
407 | Compute the non-parametric Upper TDC (UTDC) or the Lower TDC (LTDC) for a given threshold:
408 |
409 | ```python
410 | cop.LTDC(0.01) # i/n = 1%
411 | cop.UTDC(0.99) # i/n = 99%
412 | ```
413 |
414 | ## Optimal Empirical TDC
415 | Returns the optimal non-parametric TDC based on the heuristic plateau-finding algorithm from Frahm et al (2005) "Estimating the tail-dependence coefficient: properties and pitfalls"
416 |
417 | ```python
418 | cop.optimal_tdc("upper")
419 | cop.optimal_tdc("lower")
420 | ```
421 |
--------------------------------------------------------------------------------
/docs/images/logo_pycop.svg:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/docs/images/plot/clayton_contour_mpdf.svg:
--------------------------------------------------------------------------------
1 |
2 |
4 |
5 |
6 |
7 |
8 |
9 | 2022-04-25T14:06:52.492538
10 | image/svg+xml
11 |
12 |
13 | Matplotlib v3.5.1, https://matplotlib.org/
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
30 |
31 |
32 |
33 |
39 |
40 |
41 |
42 |
43 |
44 |
47 |
48 |
49 |
50 |
51 |
52 |
53 |
54 |
55 |
56 |
63 |
95 |
96 |
97 |
98 |
99 |
100 |
101 |
102 |
103 |
104 |
105 |
106 |
107 |
108 |
109 |
110 |
111 |
135 |
136 |
137 |
138 |
139 |
140 |
141 |
142 |
143 |
144 |
145 |
146 |
147 |
148 |
149 |
150 |
151 |
165 |
166 |
167 |
168 |
169 |
170 |
171 |
172 |
173 |
174 |
175 |
176 |
177 |
178 |
179 |
180 |
181 |
202 |
203 |
204 |
205 |
206 |
207 |
208 |
209 |
210 |
211 |
212 |
213 |
214 |
215 |
216 |
217 |
218 |
219 |
220 |
221 |
222 |
223 |
224 |
225 |
226 |
227 |
228 |
229 |
230 |
231 |
232 |
233 |
234 |
235 |
236 |
237 |
238 |
239 |
240 |
241 |
242 |
243 |
244 |
245 |
246 |
247 |
248 |
249 |
250 |
251 |
254 |
255 |
256 |
257 |
258 |
259 |
260 |
261 |
262 |
263 |
264 |
265 |
266 |
267 |
268 |
269 |
270 |
271 |
272 |
273 |
274 |
275 |
276 |
277 |
278 |
279 |
280 |
281 |
282 |
283 |
284 |
285 |
286 |
287 |
288 |
289 |
290 |
291 |
292 |
293 |
294 |
295 |
296 |
297 |
298 |
299 |
300 |
301 |
302 |
303 |
304 |
305 |
306 |
307 |
308 |
309 |
310 |
311 |
312 |
313 |
314 |
315 |
316 |
317 |
318 |
319 |
320 |
321 |
322 |
323 |
324 |
325 |
326 |
327 |
328 |
329 |
330 |
331 |
332 |
333 |
334 |
335 |
336 |
337 |
338 |
339 |
340 |
341 |
342 |
343 |
344 |
345 |
346 |
347 |
348 |
349 |
465 |
466 |
467 |
564 |
565 |
566 |
645 |
646 |
647 |
702 |
703 |
704 |
805 |
806 |
807 |
810 |
811 |
812 |
815 |
816 |
817 |
820 |
821 |
822 |
825 |
826 |
827 |
828 |
829 |
830 |
851 |
858 |
891 |
908 |
929 |
950 |
969 |
970 |
996 |
1018 |
1039 |
1058 |
1071 |
1072 |
1073 |
1074 |
1075 |
1076 |
1077 |
1078 |
1079 |
1080 |
1081 |
1082 |
1083 |
1084 |
1085 |
1086 |
1087 |
1088 |
1089 |
1090 |
1091 |
1092 |
1093 |
1094 |
1095 |
1096 |
1097 |
1104 |
1105 |
1106 |
1107 |
1108 |
1109 |
1110 |
1111 |
1112 |
1113 |
1114 |
1115 |
1116 |
1117 |
1142 |
1143 |
1144 |
1145 |
1146 |
1147 |
1148 |
1149 |
1150 |
1151 |
1152 |
1153 |
1154 |
1155 |
1156 |
1157 |
1158 |
1159 |
1160 |
1161 |
1162 |
1163 |
1164 |
1165 |
1166 |
1167 |
1168 |
1169 |
1170 |
1171 |
1172 |
1173 |
1174 |
1175 |
1176 |
1177 |
1178 |
1179 |
1180 |
1181 |
1182 |
1183 |
1184 |
1185 |
1186 |
1187 |
1188 |
1189 |
1190 |
1191 |
--------------------------------------------------------------------------------
/docs/images/plot/plackett_contour_cdf.svg:
--------------------------------------------------------------------------------
1 |
2 |
4 |
5 |
6 |
7 |
8 |
9 | 2022-04-25T14:05:43.626792
10 | image/svg+xml
11 |
12 |
13 | Matplotlib v3.5.1, https://matplotlib.org/
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
30 |
31 |
32 |
33 |
39 |
40 |
41 |
42 |
43 |
44 |
47 |
48 |
49 |
50 |
51 |
52 |
53 |
54 |
55 |
56 |
77 |
84 |
85 |
86 |
87 |
88 |
89 |
90 |
91 |
92 |
93 |
94 |
95 |
96 |
97 |
98 |
99 |
100 |
101 |
125 |
126 |
127 |
128 |
129 |
130 |
131 |
132 |
133 |
134 |
135 |
136 |
137 |
138 |
139 |
140 |
141 |
142 |
161 |
162 |
163 |
164 |
165 |
166 |
167 |
168 |
169 |
170 |
171 |
172 |
173 |
174 |
175 |
176 |
177 |
178 |
208 |
209 |
210 |
211 |
212 |
213 |
214 |
215 |
216 |
217 |
218 |
219 |
220 |
221 |
222 |
223 |
224 |
225 |
264 |
265 |
266 |
267 |
268 |
269 |
270 |
271 |
272 |
273 |
274 |
275 |
276 |
277 |
278 |
279 |
280 |
281 |
295 |
296 |
297 |
298 |
299 |
300 |
301 |
302 |
303 |
304 |
305 |
306 |
307 |
310 |
311 |
312 |
313 |
314 |
315 |
316 |
317 |
318 |
319 |
320 |
321 |
322 |
323 |
324 |
325 |
326 |
327 |
328 |
329 |
330 |
331 |
332 |
333 |
334 |
335 |
336 |
337 |
338 |
339 |
340 |
341 |
342 |
343 |
344 |
345 |
346 |
347 |
348 |
349 |
350 |
351 |
352 |
353 |
354 |
355 |
356 |
357 |
358 |
359 |
360 |
361 |
362 |
363 |
364 |
365 |
366 |
367 |
368 |
369 |
370 |
371 |
372 |
373 |
374 |
375 |
376 |
377 |
378 |
379 |
380 |
381 |
382 |
383 |
384 |
385 |
386 |
387 |
388 |
389 |
390 |
391 |
392 |
393 |
394 |
395 |
396 |
397 |
398 |
399 |
400 |
401 |
402 |
450 |
462 |
463 |
464 |
473 |
597 |
598 |
599 |
608 |
701 |
702 |
703 |
760 |
775 |
776 |
777 |
810 |
818 |
819 |
820 |
826 |
831 |
832 |
833 |
834 |
837 |
838 |
839 |
842 |
843 |
844 |
847 |
848 |
849 |
852 |
853 |
854 |
855 |
856 |
857 |
878 |
885 |
918 |
939 |
953 |
978 |
999 |
1000 |
1021 |
1042 |
1068 |
1090 |
1109 |
1122 |
1123 |
1124 |
1125 |
1126 |
1127 |
1128 |
1129 |
1130 |
1131 |
1132 |
1133 |
1134 |
1135 |
1136 |
1137 |
1138 |
1139 |
1140 |
1141 |
1142 |
1143 |
1144 |
1145 |
1146 |
1147 |
1148 |
1149 |
1174 |
1175 |
1176 |
1177 |
1178 |
1179 |
1180 |
1181 |
1182 |
1183 |
1184 |
1185 |
1186 |
1187 |
1219 |
1220 |
1221 |
1222 |
1223 |
1224 |
1225 |
1226 |
1227 |
1228 |
1229 |
1230 |
1231 |
1232 |
1233 |
1234 |
1235 |
1236 |
1237 |
1238 |
1239 |
1240 |
1241 |
1242 |
1243 |
1244 |
1245 |
1246 |
1247 |
1248 |
1249 |
1250 |
1251 |
1252 |
1253 |
1254 |
1264 |
1265 |
1266 |
1267 |
1268 |
1269 |
1270 |
1271 |
1272 |
1273 |
1274 |
1275 |
1276 |
1277 |
1307 |
1308 |
1309 |
1310 |
1311 |
1312 |
1313 |
1314 |
1315 |
1316 |
1317 |
1318 |
1319 |
1320 |
1321 |
1322 |
1323 |
--------------------------------------------------------------------------------
/examples/example_estim.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {
6 | "id": "TB34GcWrBjef"
7 | },
8 | "source": [
9 | "Estimation"
10 | ]
11 | },
12 | {
13 | "cell_type": "markdown",
14 | "metadata": {
15 | "id": "Xwhmxt7tAEjG"
16 | },
17 | "source": [
18 | "Data import"
19 | ]
20 | },
21 | {
22 | "cell_type": "code",
23 | "execution_count": null,
24 | "metadata": {
25 | "colab": {
26 | "base_uri": "https://localhost:8080/",
27 | "height": 388
28 | },
29 | "id": "tMcqcgSo_7Q6",
30 | "outputId": "9e1ea4dc-6d74-49fa-99d3-c3eb2819e5c0"
31 | },
32 | "outputs": [
33 | {
34 | "data": {
35 | "text/html": [
36 | "\n",
37 | " \n",
38 | "
\n",
39 | "
\n",
40 | "\n",
53 | "
\n",
54 | " \n",
55 | " \n",
56 | " \n",
57 | " US \n",
58 | " UK \n",
59 | " Switzerland \n",
60 | " Sweden \n",
61 | " Spain \n",
62 | " Singapore \n",
63 | " Norway \n",
64 | " Netherlands \n",
65 | " Japan \n",
66 | " Italy \n",
67 | " HongKong \n",
68 | " Germany \n",
69 | " France \n",
70 | " Denmark \n",
71 | " Canada \n",
72 | " Belgium \n",
73 | " Austria \n",
74 | " Australia \n",
75 | " \n",
76 | " \n",
77 | " Date \n",
78 | " \n",
79 | " \n",
80 | " \n",
81 | " \n",
82 | " \n",
83 | " \n",
84 | " \n",
85 | " \n",
86 | " \n",
87 | " \n",
88 | " \n",
89 | " \n",
90 | " \n",
91 | " \n",
92 | " \n",
93 | " \n",
94 | " \n",
95 | " \n",
96 | " \n",
97 | " \n",
98 | " \n",
99 | " \n",
100 | " 2000-01-03 \n",
101 | " -0.006364 \n",
102 | " 0.007418 \n",
103 | " 0.011909 \n",
104 | " 0.028405 \n",
105 | " 0.007995 \n",
106 | " 0.033686 \n",
107 | " 0.026981 \n",
108 | " 0.015721 \n",
109 | " 0.008980 \n",
110 | " -0.020180 \n",
111 | " 0.014310 \n",
112 | " -0.015929 \n",
113 | " 0.007571 \n",
114 | " 0.027345 \n",
115 | " 0.004973 \n",
116 | " 0.001408 \n",
117 | " 0.016172 \n",
118 | " 0.008597 \n",
119 | " \n",
120 | " \n",
121 | " 2000-01-04 \n",
122 | " -0.040896 \n",
123 | " -0.029006 \n",
124 | " -0.023266 \n",
125 | " -0.013283 \n",
126 | " -0.014809 \n",
127 | " -0.014969 \n",
128 | " -0.029136 \n",
129 | " -0.033486 \n",
130 | " -0.016367 \n",
131 | " -0.010596 \n",
132 | " -0.013452 \n",
133 | " -0.010644 \n",
134 | " -0.029656 \n",
135 | " -0.010718 \n",
136 | " -0.030336 \n",
137 | " -0.029099 \n",
138 | " 0.004299 \n",
139 | " -0.013132 \n",
140 | " \n",
141 | " \n",
142 | " 2000-01-05 \n",
143 | " 0.002601 \n",
144 | " -0.018383 \n",
145 | " -0.009263 \n",
146 | " -0.047861 \n",
147 | " -0.022228 \n",
148 | " -0.056299 \n",
149 | " -0.025306 \n",
150 | " -0.010066 \n",
151 | " -0.044737 \n",
152 | " -0.016052 \n",
153 | " -0.070700 \n",
154 | " -0.009102 \n",
155 | " -0.030744 \n",
156 | " -0.020555 \n",
157 | " -0.016362 \n",
158 | " -0.032784 \n",
159 | " 0.004861 \n",
160 | " -0.023347 \n",
161 | " \n",
162 | " \n",
163 | " 2000-01-06 \n",
164 | " -0.008583 \n",
165 | " -0.006808 \n",
166 | " 0.012965 \n",
167 | " 0.000126 \n",
168 | " 0.001160 \n",
169 | " -0.022711 \n",
170 | " 0.011887 \n",
171 | " -0.007967 \n",
172 | " -0.038755 \n",
173 | " -0.016775 \n",
174 | " -0.044639 \n",
175 | " -0.006322 \n",
176 | " -0.004272 \n",
177 | " 0.009259 \n",
178 | " -0.010937 \n",
179 | " 0.002415 \n",
180 | " 0.001160 \n",
181 | " -0.007932 \n",
182 | " \n",
183 | " \n",
184 | " 2000-01-07 \n",
185 | " 0.033973 \n",
186 | " 0.000371 \n",
187 | " 0.016261 \n",
188 | " 0.004622 \n",
189 | " 0.017736 \n",
190 | " 0.028970 \n",
191 | " 0.010170 \n",
192 | " 0.027949 \n",
193 | " -0.003075 \n",
194 | " 0.025474 \n",
195 | " 0.018907 \n",
196 | " 0.040592 \n",
197 | " 0.011951 \n",
198 | " -0.000775 \n",
199 | " 0.050676 \n",
200 | " 0.018796 \n",
201 | " 0.018210 \n",
202 | " 0.010410 \n",
203 | " \n",
204 | " \n",
205 | "
\n",
206 | "
\n",
207 | "
\n",
210 | " \n",
211 | " \n",
213 | " \n",
214 | " \n",
215 | " \n",
216 | " \n",
217 | " \n",
218 | " \n",
255 | "\n",
256 | " \n",
280 | "
\n",
281 | "
\n",
282 | " "
283 | ],
284 | "text/plain": [
285 | " US UK Switzerland Sweden Spain Singapore \\\n",
286 | "Date \n",
287 | "2000-01-03 -0.006364 0.007418 0.011909 0.028405 0.007995 0.033686 \n",
288 | "2000-01-04 -0.040896 -0.029006 -0.023266 -0.013283 -0.014809 -0.014969 \n",
289 | "2000-01-05 0.002601 -0.018383 -0.009263 -0.047861 -0.022228 -0.056299 \n",
290 | "2000-01-06 -0.008583 -0.006808 0.012965 0.000126 0.001160 -0.022711 \n",
291 | "2000-01-07 0.033973 0.000371 0.016261 0.004622 0.017736 0.028970 \n",
292 | "\n",
293 | " Norway Netherlands Japan Italy HongKong Germany \\\n",
294 | "Date \n",
295 | "2000-01-03 0.026981 0.015721 0.008980 -0.020180 0.014310 -0.015929 \n",
296 | "2000-01-04 -0.029136 -0.033486 -0.016367 -0.010596 -0.013452 -0.010644 \n",
297 | "2000-01-05 -0.025306 -0.010066 -0.044737 -0.016052 -0.070700 -0.009102 \n",
298 | "2000-01-06 0.011887 -0.007967 -0.038755 -0.016775 -0.044639 -0.006322 \n",
299 | "2000-01-07 0.010170 0.027949 -0.003075 0.025474 0.018907 0.040592 \n",
300 | "\n",
301 | " France Denmark Canada Belgium Austria Australia \n",
302 | "Date \n",
303 | "2000-01-03 0.007571 0.027345 0.004973 0.001408 0.016172 0.008597 \n",
304 | "2000-01-04 -0.029656 -0.010718 -0.030336 -0.029099 0.004299 -0.013132 \n",
305 | "2000-01-05 -0.030744 -0.020555 -0.016362 -0.032784 0.004861 -0.023347 \n",
306 | "2000-01-06 -0.004272 0.009259 -0.010937 0.002415 0.001160 -0.007932 \n",
307 | "2000-01-07 0.011951 -0.000775 0.050676 0.018796 0.018210 0.010410 "
308 | ]
309 | },
310 | "execution_count": 15,
311 | "metadata": {},
312 | "output_type": "execute_result"
313 | }
314 | ],
315 | "source": [
316 | "import pandas as pd\n",
317 | "import numpy as np\n",
318 | "\n",
319 | "df = pd.read_csv(\"https://raw.githubusercontent.com/maximenc/pycop/master/data/msci.csv\")\n",
320 | "df.index = pd.to_datetime(df[\"Date\"], format=\"%m/%d/%Y\")\n",
321 | "df = df.drop([\"Date\"], axis=1)\n",
322 | "\n",
323 | "for col in df.columns.values:\n",
324 | " df[col] = np.log(df[col]) - np.log(df[col].shift(1))\n",
325 | "\n",
326 | "df = df.dropna()\n",
327 | "df.head()"
328 | ]
329 | },
330 | {
331 | "cell_type": "markdown",
332 | "metadata": {
333 | "id": "_CVvuX-EAbfp"
334 | },
335 | "source": [
336 | "Import the pycop library that's not in Colab"
337 | ]
338 | },
339 | {
340 | "cell_type": "code",
341 | "execution_count": null,
342 | "metadata": {
343 | "colab": {
344 | "base_uri": "https://localhost:8080/"
345 | },
346 | "id": "jdJ-mnShApAZ",
347 | "outputId": "3219fbe2-8cb1-4635-c2e6-9b9404544128"
348 | },
349 | "outputs": [
350 | {
351 | "name": "stdout",
352 | "output_type": "stream",
353 | "text": [
354 | "Requirement already satisfied: pycop in /usr/local/lib/python3.7/dist-packages (0.0.6)\n"
355 | ]
356 | }
357 | ],
358 | "source": [
359 | "!pip install pycop"
360 | ]
361 | },
362 | {
363 | "cell_type": "markdown",
364 | "metadata": {
365 | "id": "moCHVzNiEo2Y"
366 | },
367 | "source": [
368 | "Estimation of a single Copula parameter"
369 | ]
370 | },
371 | {
372 | "cell_type": "code",
373 | "execution_count": null,
374 | "metadata": {
375 | "colab": {
376 | "base_uri": "https://localhost:8080/"
377 | },
378 | "id": "Gm4gwwAFBGJw",
379 | "outputId": "3884fe54-e4f1-40c2-db2f-a178b215a0ab"
380 | },
381 | "outputs": [
382 | {
383 | "name": "stdout",
384 | "output_type": "stream",
385 | "text": [
386 | "method = SLSQP - termination = True - message: Optimization terminated successfully.\n",
387 | "Estimated parameter: 0.8025977727691012\n"
388 | ]
389 | }
390 | ],
391 | "source": [
392 | "from pycop import archimedean, estimation\n",
393 | "cop = archimedean(family=\"clayton\")\n",
394 | "\n",
395 | "data = df[[\"US\",\"UK\"]].T.values\n",
396 | "param, cmle = estimation.fit_cmle(cop, data)\n",
397 | "print(\"Estimated parameter: \", param[0])"
398 | ]
399 | },
400 | {
401 | "cell_type": "markdown",
402 | "metadata": {
403 | "id": "N5nAZWSlBlvD"
404 | },
405 | "source": [
406 | "Mixture\n"
407 | ]
408 | },
409 | {
410 | "cell_type": "markdown",
411 | "metadata": {
412 | "id": "aNLJyyUaChBr"
413 | },
414 | "source": [
415 | "Mixture of 2 Copulas\n",
416 | "\n",
417 | "\n"
418 | ]
419 | },
420 | {
421 | "cell_type": "code",
422 | "execution_count": null,
423 | "metadata": {
424 | "colab": {
425 | "base_uri": "https://localhost:8080/"
426 | },
427 | "id": "6QQySVjLBuSd",
428 | "outputId": "51b34629-e701-4e4a-d374-4d4415bd0967"
429 | },
430 | "outputs": [
431 | {
432 | "name": "stdout",
433 | "output_type": "stream",
434 | "text": [
435 | "method = SLSQP - termination = True - message: Optimization terminated successfully.\n",
436 | "Estimated parameters: \n",
437 | "weight in Clayton copula: 0.5515374306606079\n",
438 | "weight in Gumbel copula: 0.4484625693393921\n",
439 | "Clayton parameter: 0.42308968740122027\n",
440 | "Gumbel parameter: 2.265138697501126\n"
441 | ]
442 | }
443 | ],
444 | "source": [
445 | "from pycop import mixture\n",
446 | "\n",
447 | "cop = mixture([\"clayton\", \"gumbel\"])\n",
448 | "\n",
449 | "param, mle = estimation.fit_cmle_mixt(cop,data )\n",
450 | "print(\"Estimated parameters: \")\n",
451 | "print(\"weight in Clayton copula: \", param[0])\n",
452 | "print(\"weight in Gumbel copula: \", 1-param[0])\n",
453 | "print(\"Clayton parameter: \", param[1])\n",
454 | "print(\"Gumbel parameter: \", param[2])"
455 | ]
456 | },
457 | {
458 | "cell_type": "markdown",
459 | "metadata": {
460 | "id": "5Kg9_BpUDVpP"
461 | },
462 | "source": [
463 | "Mixture of 3 Copulas"
464 | ]
465 | },
466 | {
467 | "cell_type": "code",
468 | "execution_count": null,
469 | "metadata": {
470 | "colab": {
471 | "base_uri": "https://localhost:8080/"
472 | },
473 | "id": "L-5ACn1TDWSp",
474 | "outputId": "ece145a2-5aa0-44ae-e8a7-3c2acace8008"
475 | },
476 | "outputs": [
477 | {
478 | "name": "stdout",
479 | "output_type": "stream",
480 | "text": [
481 | "method = SLSQP - termination = True - message: Optimization terminated successfully.\n",
482 | "Estimated parameters: \n",
483 | "weight in Clayton copula: 0.35613959154707575\n",
484 | "weight in Frank copula: 0.12637150155636992\n",
485 | "weight in Gumbel copula: 0.5174889068965544\n",
486 | "Clayton parameter: 1.5089246645156034\n",
487 | "Frank parameter: -5.899959588867644\n",
488 | "Gumbel parameter: 1.8415684650315947\n"
489 | ]
490 | }
491 | ],
492 | "source": [
493 | "cop = mixture([\"clayton\", \"frank\", \"gumbel\"])\n",
494 | "\n",
495 | "param, mle = estimation.fit_cmle_mixt(cop, data)\n",
496 | "print(\"Estimated parameters: \")\n",
497 | "print(\"weight in Clayton copula: \", param[0])\n",
498 | "print(\"weight in Frank copula: \", param[1])\n",
499 | "print(\"weight in Gumbel copula: \", param[2])\n",
500 | "print(\"Clayton parameter: \", param[3])\n",
501 | "print(\"Frank parameter: \", param[4])\n",
502 | "print(\"Gumbel parameter: \", param[5])"
503 | ]
504 | }
505 | ],
506 | "metadata": {
507 | "colab": {
508 | "name": "example_estim.ipynb",
509 | "provenance": []
510 | },
511 | "kernelspec": {
512 | "display_name": "Python 3.10.6 ('test-env': venv)",
513 | "language": "python",
514 | "name": "python3"
515 | },
516 | "language_info": {
517 | "name": "python",
518 | "version": "3.10.6"
519 | },
520 | "vscode": {
521 | "interpreter": {
522 | "hash": "684c597edd994fb8a573e32bcd1af30dbfcaa9a74f1316b10fafe91077b46267"
523 | }
524 | }
525 | },
526 | "nbformat": 4,
527 | "nbformat_minor": 0
528 | }
529 |
--------------------------------------------------------------------------------
/pycop/__init__.py:
--------------------------------------------------------------------------------
1 | from pycop import simulation
2 | from pycop import utils
3 | from pycop.bivariate.archimedean import archimedean
4 | from pycop.bivariate.empirical import empirical
5 | from pycop.bivariate.gaussian import gaussian
6 | from pycop.bivariate.student import student
7 | from pycop.bivariate.mixture import mixture
8 | from pycop.bivariate import estimation
9 |
--------------------------------------------------------------------------------
/pycop/__init__.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/maximenc/pycop/dffc1833693e15d7ab1cff1fb0d78fe3d024f995/pycop/__init__.pyc
--------------------------------------------------------------------------------
/pycop/bivariate/__init__.py:
--------------------------------------------------------------------------------
1 | from pycop.bivariate import copula
2 |
--------------------------------------------------------------------------------
/pycop/bivariate/archimedean.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from pycop.bivariate.copula import copula
3 |
4 | class archimedean(copula):
5 | """
6 | # Creates an Archimedean copula objects
7 | Source for the CDF and PDF functions:
8 | Joe, H. (2014). Dependence modeling with copulas. CRC press.
9 | Chapter 4: Parametric copula families and properties (p.159)
10 |
11 | ...
12 |
13 | Attributes
14 | ----------
15 | family : str
16 | The name of the Archimedean copula function.
17 | type : str
18 | The type of copula = "archimedean".
19 | bounds_param : list
20 | A list that contains the domain of the parameter(s) in a tuple.
21 | Exemple : [(lower, upper)]
22 | parameters_start : array
23 | Value(s) of the initial guess when estimating the copula parameter(s).
24 | It represents the parameter `x0` in the `scipy.optimize.minimize` function.
25 |
26 | Methods
27 | -------
28 | get_cdf(u, v, param)
29 | Computes the Cumulative Distribution Function (CDF).
30 | get_pdf(u, v, param)
31 | Computes the Probability Density Function (PDF).
32 | LTDC(theta)
33 | Computes the Lower Tail Dependence Coefficient (TDC).
34 | UTDC(theta)
35 | Computes the upper TDC.
36 | """
37 |
38 | Archimedean_families = [
39 | 'clayton', 'gumbel', 'frank', 'joe', 'galambos','fgm', 'plackett',
40 | 'rgumbel', 'rclayton', 'rjoe','rgalambos', 'BB1', 'BB2']
41 |
42 |
43 | def __init__(self, family):
44 | """
45 | Parameters
46 | ----------
47 | family : str
48 | The name of the Archimedean copula function.
49 |
50 | Raises
51 | ------
52 | ValueError
53 | If the given `family` is not supported.
54 | """
55 |
56 | # the `archimedean` copula class inherit the `copula` class
57 | super().__init__()
58 | self.family = family
59 | self.type = "archimedean"
60 |
61 | if family in ['clayton', 'galambos', 'plackett', 'rclayton', 'rgalambos'] :
62 | self.bounds_param = [(1e-6, None)]
63 | self.parameters_start = np.array(0.5)
64 |
65 | elif family in ['gumbel', 'joe', 'rgumbel', 'rjoe'] :
66 | self.bounds_param = [(1, None)]
67 | self.parameters_start = np.array(1.5)
68 |
69 | elif family == 'frank':
70 | self.bounds_param = [(None, None)]
71 | self.parameters_start = np.array(2)
72 |
73 | elif family == 'fgm':
74 | self.bounds_param = [(-1, 1-1e-6)]
75 | self.parameters_start = np.array(0)
76 |
77 | elif family in ['BB1'] :
78 | self.bounds_param = [(1e-6, None), (1, None)]
79 | self.parameters_start = (np.array(.5), np.array(1.5))
80 |
81 | elif family in ['BB2'] :
82 | self.bounds_param = [(1e-6, None), (1e-6, None)]
83 | self.parameters_start = (np.array(1), np.array(1))
84 | else:
85 | print("family \"%s\" not in list: %s" % (family, archimedean.Archimedean_families) )
86 | raise ValueError
87 |
88 | def get_cdf(self, u, v, param):
89 | """
90 | # Computes the CDF
91 |
92 | Parameters
93 | ----------
94 | u, v : float
95 | Values of the marginal CDFs
96 | param : list
97 | A list that contains the copula parameter(s) (float)
98 | """
99 |
100 | if self.family == 'clayton':
101 | return (u ** (-param[0]) + v ** (-param[0]) - 1) ** (-1 / param[0])
102 |
103 | elif self.family == 'rclayton':
104 | return (u + v - 1 + archimedean(family='clayton').get_cdf((1 - u),(1 - v), param) )
105 |
106 | elif self.family == 'gumbel':
107 | return np.exp(-((-np.log(u)) ** param[0] + (-np.log(v)) ** param[0] ) ** (1 / param[0]))
108 |
109 | elif self.family == 'rgumbel':
110 | return (u + v - 1 + archimedean(family='gumbel').get_cdf((1-u),(1-v), param) )
111 |
112 | elif self.family == 'frank':
113 | a = (np.exp(-param[0] * u) - 1) * (np.exp(-param[0] * v) - 1)
114 | return (-1 / param[0]) * np.log(1 + a / (np.exp(-param[0]) - 1))
115 |
116 | elif self.family == 'joe':
117 | u_ = (1 - u) ** param[0]
118 | v_ = (1 - v) ** param[0]
119 | return 1 - (u_ + v_ - u_ * v_) ** (1 / param[0])
120 |
121 | elif self.family == 'rjoe':
122 | return (u + v - 1 + archimedean(family='joe').get_cdf((1 - u),(1 - v), param) )
123 |
124 | elif self.family == 'galambos':
125 | return u * v * np.exp(((-np.log(u)) ** (-param[0]) + (-np.log(v)) ** (-param[0])) ** (-1 / param[0]) )
126 |
127 | elif self.family == 'rgalambos':
128 | return (u + v - 1 + archimedean(family='galambos').get_cdf((1 - u),(1 - v), param) )
129 |
130 | elif self.family == 'fgm':
131 | return u * v * (1 + param[0] * (1 - u) * (1 - v))
132 |
133 | elif self.family == 'plackett':
134 | eta = param[0] - 1
135 | term1 = 0.5 * eta ** -1
136 | term2 = 1 + eta * (u + v)
137 | term3 = (1 + eta * (u + v)) ** 2
138 | term4 = 4 * param[0] * eta * u * v
139 | return term1 * (term2 - (term3 - term4) ** 0.5)
140 |
141 | elif self.family == 'BB1':
142 | term1 = (u ** (-param[1]) - 1) ** param[0]
143 | term2 = (v ** (-param[1]) - 1) ** param[0]
144 | term3 = (1 + term1 + term2) ** (1 / param[0])
145 | return (term3) ** (-1 / param[1])
146 |
147 | elif self.family == 'BB2':
148 | u_ = np.exp(param[0] * (u ** (-param[1]) - 1))
149 | v_ = np.exp(param[0] * (v ** (-param[1]) - 1))
150 | return (1 + (1 / param[0]) * np.log(u_ + v_ - 1)) ** (-1 / param[1])
151 |
152 |
153 | def get_pdf(self, u, v, param):
154 | """
155 | # Computes the PDF
156 |
157 | Parameters
158 | ----------
159 | u, v : float
160 | Values of the marginal CDFs
161 | param : list
162 | A list that contains the copula parameter(s) (float)
163 | """
164 |
165 | if self.family == 'clayton':
166 | term1 = (param[0] + 1) * (u * v) ** (-param[0] - 1)
167 | term2 = (u ** (-param[0]) + v ** (-param[0]) - 1) ** (-2 - 1 / param[0])
168 | return term1 * term2
169 |
170 | if self.family == 'rclayton':
171 | return archimedean(family='clayton').get_pdf((1 - u),(1 - v), param)
172 |
173 | elif self.family == 'gumbel':
174 | term1 = np.power(np.multiply(u, v), -1)
175 | tmp = np.power(-np.log(u), param[0]) + np.power(-np.log(v), param[0])
176 | term2 = np.power(tmp, -2 + 2.0 / param[0])
177 | term3 = np.power(np.multiply(np.log(u), np.log(v)), param[0] - 1)
178 | term4 = 1 + (param[0] - 1) * np.power(tmp, -1 / param[0])
179 | return archimedean(family='gumbel').get_cdf(u,v, param) * term1 * term2 * term3 * term4
180 |
181 | if self.family == 'rgumbel':
182 | return archimedean(family='gumbel').get_pdf((1 - u), (1 - v), param)
183 |
184 | elif self.family == 'frank':
185 | term1 = param[0] * (1 - np.exp(-param[0])) * np.exp(-param[0] * (u + v))
186 | term2 = (1 - np.exp(-param[0]) - (1 - np.exp(-param[0] * u)) \
187 | * (1 - np.exp(-param[0] * v))) ** 2
188 | return term1 / term2
189 |
190 | elif self.family == 'joe':
191 | u_ = (1 - u) ** param[0]
192 | v_ = (1 - v) ** param[0]
193 | term1 = (u_ + v_ - u_ * v_) ** (-2 + 1 / param[0])
194 | term2 = ((1 - u) ** (param[0] - 1)) * ((1 - v) ** (param[0] - 1))
195 | term3 = param[0] - 1 + u_ + v_ + u_ * v_
196 | return term1 * term2 * term3
197 |
198 | if self.family == 'rjoe':
199 | return archimedean(family='joe').get_pdf((1 - u),(1 - v), param)
200 |
201 | elif self.family == 'galambos':
202 | x = -np.log(u)
203 | y = -np.log(v)
204 | term1 = self.get_cdf(u, v, param) / (v * u)
205 | term2 = 1 - ((x ** (-param[0]) + y ** (-param[0])) ** (-1 - 1 / param[0])) \
206 | * (x ** (-param[0] - 1) + y ** (-param[0] - 1))
207 | term3 = ((x ** (-param[0]) + y ** (-param[0])) ** (-2 - 1 / param[0])) \
208 | * ((x * y) ** (-param[0] - 1))
209 | term4 = 1 + param[0] + ((x ** (-param[0]) + y ** (-param[0])) ** (-1 / param[0]))
210 | return term1 * term2 + term3 * term4
211 |
212 | if self.family == 'rgalambos':
213 | return archimedean(family='galambos').get_pdf((1 - u),(1 - v), param)
214 |
215 | elif self.family == 'fgm':
216 | return 1 + param[0] * (1 - 2 * u) * (1 - 2 * v)
217 |
218 | elif self.family == 'plackett':
219 | eta = (param[0] - 1)
220 | term1 = param[0] * (1 + eta * (u + v - 2 * u * v))
221 | term2 = (1 + eta * (u + v)) ** 2
222 | term3 = 4 * param[0] * eta * u * v
223 | return term1 / (term2 - term3) ** (3 / 2)
224 |
225 | elif self.family == 'BB1':
226 | theta, delta = param[0], param[1]
227 | x = (u ** (-theta) - 1) ** (delta)
228 | y = (v ** (-theta) - 1) ** (delta)
229 | term1 = (1 + (x + y) ** (1 / delta)) ** (-1 / theta - 2)
230 | term2 = (x + y) ** (1 / delta - 2)
231 | term3 = theta * (delta - 1) + (theta * delta + 1) * (x + y) ** (1 / delta)
232 | term4 = (x * y) ** (1 - 1 / delta) * (u * v) ** (-theta - 1)
233 | return term1 * term2 * term3 * term4
234 |
235 | elif self.family == 'BB2':
236 | theta, delta = param[0], param[1]
237 | x = np.exp(delta * (u ** (-theta) )) - 1
238 | y = np.exp(delta * (v ** (-theta) )) - 1
239 | term1 = (1 + (delta ** (-1)) * np.log(x + y - 1)) ** (-2 -1 / theta)
240 | term2 = (x + y - 1) ** (-2)
241 | term3 = 1 + theta + theta * delta + theta * np.log(x + y - 1)
242 | term4 = x * y * (u * v) ** (-theta - 1)
243 | return term1 * term2 * term3 * term4
244 |
245 | def LTDC(self, theta):
246 | """
247 | # Computes the lower TDC for a given theta
248 |
249 | Parameters
250 | ----------
251 | theta : float
252 | The copula parameter
253 | """
254 |
255 | if self.family in ['gumbel', 'joe', 'frank', 'galambos', 'fgm', 'plackett', 'rclayton']:
256 | return 0
257 |
258 | elif self.family in ['rgalambos', 'clayton'] :
259 | return 2 ** (-1 / theta)
260 |
261 | elif self.family in ['rgumbel', 'rjoe'] :
262 | return 2 - 2 ** (1 / theta)
263 |
264 | def UTDC(self, theta):
265 | """
266 | # Computes the upper TDC for a given theta
267 |
268 | Parameters
269 | ----------
270 | theta : float
271 | The copula parameter
272 | """
273 |
274 | if self.family in ['clayton', 'frank', 'fgm', 'plackett', 'rgumbel', 'rjoe', 'rgalambos']:
275 | return 0
276 |
277 | elif self.family in ['galambos', 'rclayton'] :
278 | return 2 ** (-1 / theta)
279 |
280 | elif self.family in ['gumbel', 'joe'] :
281 | return 2 - 2 ** (1 / theta)
282 |
--------------------------------------------------------------------------------
/pycop/bivariate/copula.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import matplotlib.pyplot as plt
3 |
4 |
5 | def plot_bivariate_3d(X, Y, Z, bounds, title, **kwargs):
6 | """
7 | # Plot the 3D surface
8 |
9 | Parameters
10 | ----------
11 | X, Y, Z : array
12 | Positions of data points.
13 | bounds : list
14 | A list that contains the `xlim` and `ylim` of the graph.
15 | title : str
16 | A string for the title of the plot.
17 |
18 | **kwargs
19 | Additional keyword arguments passed to `ax.plot_surface`.
20 | """
21 |
22 | fig = plt.figure()
23 | ax = fig.add_subplot(111, projection='3d')
24 | ax.set_xticks(np.linspace(bounds[0],bounds[1],6))
25 | ax.set_yticks(np.linspace(bounds[0],bounds[1],6))
26 | ax.set_xlim(bounds)
27 | ax.set_ylim(bounds)
28 | ax.plot_surface(X,Y,Z, **kwargs)
29 | plt.title(title)
30 | plt.show()
31 |
32 | def plot_bivariate_contour(X, Y, Z, bounds, title, **kwargs):
33 | """
34 | # Plot the contour surface
35 |
36 | Parameters
37 | ----------
38 | X, Y, Z : array
39 | Positions of data points.
40 | bounds : list
41 | A list that contains the `xlim` and `ylim` of the graph.
42 | title : str
43 | A string for the title of the plot.
44 |
45 | **kwargs
46 | Additional keyword arguments passed to `plt.contour`.
47 | """
48 | plt.figure()
49 | CS = plt.contour(X, Y, Z, colors='k', linewidths=1., linestyles=None, **kwargs)
50 | plt.clabel(CS, fontsize=8, inline=1)
51 | plt.xlim(bounds)
52 | plt.ylim(bounds)
53 | plt.title(title)
54 | plt.show()
55 |
56 | class copula():
57 | """
58 | # A class used to create a Copula object
59 |
60 | Set attributes and methods common to all copula objects (elliptical and Archimedean).
61 |
62 | ...
63 |
64 | Attributes
65 | ----------
66 |
67 | Methods
68 | -------
69 | plot_cdf(param, plot_type, Nsplit=50, **kwargs)
70 | Plot the bivariate Cumulative Distribution Function (CDF)
71 | plot_pdf(param, plot_type, Nsplit=50, **kwargs)
72 | Plot the Probability Density Function (PDF)
73 | plot_mpdf(param, margin, plot_type, Nsplit=50, **kwargs)
74 | Plot the PDF with given marginal distributions
75 | """
76 |
77 | def __init__(self):
78 | pass
79 |
80 | def plot_cdf(self, param, plot_type, Nsplit=50, **kwargs):
81 | """
82 | # Plot the bivariate CDF
83 |
84 | Parameters
85 | ----------
86 | param : list
87 | A list of the copula parameter(s)
88 | plot_type : str
89 | The type of the plot either "3d" or "contour"
90 | Nsplit : int, optional
91 | The number of points plotted (Nsplit*Nsplit) (default is 50)
92 |
93 | **kwargs
94 | Additional keyword arguments passed to either `plot_bivariate_3d` or
95 | `plot_bivariate_contour`.
96 | Examples :
97 | - `colormap` can be passed in to change the default color
98 | of the 3d plot.
99 | - `levels` can be passed to determine the positions of the contour lines.
100 |
101 | """
102 | title = self.family.capitalize() + " Copula CDF"
103 |
104 | bounds = [0+1e-2, 1-1e-2]
105 | U_grid, V_grid = np.meshgrid(
106 | np.linspace(bounds[0], bounds[1], Nsplit),
107 | np.linspace(bounds[0], bounds[1], Nsplit))
108 |
109 | Z = np.array(
110 | [self.get_cdf(uu, vv, param) for uu, vv in zip(np.ravel(U_grid), np.ravel(V_grid)) ] )
111 |
112 | Z = Z.reshape(U_grid.shape)
113 |
114 | if plot_type == "3d":
115 | plot_bivariate_3d(U_grid,V_grid,Z, [0,1], title, **kwargs)
116 | elif plot_type == "contour":
117 | plot_bivariate_contour(U_grid,V_grid,Z, [0,1], title, **kwargs)
118 | else:
119 | print("only \"contour\" or \"3d\" arguments supported for type")
120 | raise ValueError
121 |
122 | def plot_pdf(self, param, plot_type, Nsplit=50, **kwargs):
123 | """
124 | # Plot the bivariate PDF
125 |
126 | Parameters
127 | ----------
128 | param : list
129 | A list of the copula parameter(s)
130 | plot_type : str
131 | The type of the plot either "3d" or "contour"
132 | Nsplit : int, optional
133 | The number of points plotted (Nsplit*Nsplit) (default is 50)
134 |
135 | **kwargs
136 | Additional keyword arguments passed to either `plot_bivariate_3d` or
137 | `plot_bivariate_contour`.
138 | Examples :
139 | - `colormap` can be passed in to change the default color
140 | of the 3d plot.
141 | - `levels` can be passed to determine the positions of the contour lines.
142 | """
143 |
144 | title = self.family.capitalize() + " Copula PDF"
145 |
146 | if plot_type == "3d":
147 | bounds = [0+1e-1/2, 1-1e-1/2]
148 |
149 | elif plot_type == "contour":
150 | bounds = [0+1e-2, 1-1e-2]
151 |
152 | U_grid, V_grid = np.meshgrid(
153 | np.linspace(bounds[0], bounds[1], Nsplit),
154 | np.linspace(bounds[0], bounds[1], Nsplit))
155 |
156 | Z = np.array(
157 | [self.get_pdf(uu, vv, param) for uu, vv in zip(np.ravel(U_grid), np.ravel(V_grid)) ] )
158 |
159 | Z = Z.reshape(U_grid.shape)
160 |
161 | if plot_type == "3d":
162 |
163 | plot_bivariate_3d(U_grid,V_grid,Z, [0,1], title, **kwargs)
164 | elif plot_type == "contour":
165 | plot_bivariate_contour(U_grid,V_grid,Z, [0,1], title, **kwargs)
166 | else:
167 | print("only \"contour\" or \"3d\" arguments supported for type")
168 | raise ValueError
169 |
170 |
171 | def plot_mpdf(self, param, margin, plot_type, Nsplit=50, **kwargs):
172 | """
173 | # Plot the bivariate PDF with given marginal distributions.
174 |
175 | The method supports only scipy distribution with `loc` and `scale`
176 | parameters as marginals.
177 |
178 | Parameters
179 | ----------
180 | - param : list
181 | A list of the copula parameter(s)
182 | - margin : list
183 | A list of dictionaries that contains the scipy distribution and
184 | the location and scale parameters.
185 |
186 | Examples :
187 | marginals = [
188 | {
189 | "distribution": norm, "loc" : 0, "scale" : 1,
190 | },
191 | {
192 | "distribution": norm, "loc" : 0, "scale": 1,
193 | }]
194 | - plot_type : str
195 | The type of the plot either "3d" or "contour"
196 | - Nsplit : int, optional
197 | The number of points plotted (Nsplit*Nsplit) (default is 50)
198 |
199 | **kwargs
200 | Additional keyword arguments passed to either `plot_bivariate_3d` or
201 | `plot_bivariate_contour`.
202 | exemples :
203 | - `colormap` can be passed in to change the default color
204 | of the 3d plot.
205 | - `levels` can be passed to determine the positions of the contour lines.
206 |
207 | """
208 |
209 | title = self.family.capitalize() + " Copula PDF"
210 |
211 | # We retrieve the univariate marginal distribution from the list
212 | univariate1 = margin[0]["distribution"]
213 | univariate2 = margin[1]["distribution"]
214 |
215 | bounds = [-3, 3]
216 |
217 | U_grid, V_grid = np.meshgrid(
218 | np.linspace(bounds[0], bounds[1], Nsplit),
219 | np.linspace(bounds[0], bounds[1], Nsplit))
220 |
221 | mpdf = lambda uu, vv : self.get_pdf(
222 | univariate1.cdf(uu, margin[0]["loc"], margin[0]["scale"]), \
223 | univariate2.cdf(vv, margin[1]["loc"], margin[1]["scale"]), param) \
224 | * univariate1.pdf(uu, margin[0]["loc"], margin[0]["scale"]) \
225 | * univariate2.pdf(vv, margin[1]["loc"], margin[1]["scale"])
226 |
227 | Z = np.array(
228 | [mpdf(uu, vv) for uu, vv in zip(np.ravel(U_grid), np.ravel(V_grid)) ] )
229 | Z = Z.reshape(U_grid.shape)
230 |
231 | if plot_type == "3d":
232 | plot_bivariate_3d(U_grid,V_grid,Z, bounds, title, **kwargs)
233 | elif plot_type == "contour":
234 | plot_bivariate_contour(U_grid,V_grid,Z, bounds, title, **kwargs)
235 | else:
236 | print("only \"contour\" or \"3d\" arguments supported for type")
237 | raise ValueError
--------------------------------------------------------------------------------
/pycop/bivariate/empirical.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import matplotlib.pyplot as plt
3 |
4 | def plot_bivariate(U,V,Z):
5 | fig = plt.figure()
6 | ax = fig.add_subplot(111, projection='3d')
7 | ax.plot_surface(U,V,Z)
8 | plt.show()
9 |
10 | class empirical():
11 | """
12 | # A class used to create an Empirical copula object
13 |
14 | ...
15 |
16 | Attributes
17 | ----------
18 | data : numpy array
19 | A numpy array with two vectors that corresponds to the observations
20 | n : int
21 | The number of observations
22 |
23 | Methods
24 | -------
25 | cdf(i_n, j_n)
26 | Compute the empirical Cumulative Distribution Function (CDF)
27 | pdf(i_n, j_n)
28 | Compute the empirical Probability Density Function (PDF)
29 | plot_cdf(Nsplit)
30 | Plot the empirical CDF
31 | plot_pdf(Nsplit)
32 | Plot the empirical PDF
33 | LTDC(i_n)
34 | Compute the lower Tail Dependence Coefficient (TDC) for a given threshold i/n
35 | UTDC(i_n)
36 | Compute the upper Tail Dependence Coefficient (TDC) for a given threshold i/n
37 | optimal_tdc(case)
38 | Compute the lower or upper TDC accoring to Frahm et al (2005) algorithm.
39 | """
40 |
41 | def __init__(self, data):
42 | """
43 | Parameters
44 | ----------
45 | data : numpy array
46 | """
47 | self.data = data
48 | self.n = len(data[0])
49 |
50 | def get_cdf(self, i_n, j_n):
51 | """
52 | # Compute the CDF
53 |
54 | Parameters
55 | ----------
56 | i_n : float
57 | The threshold to compute the univariate distribution for the first vector.
58 | j_n : float
59 | The threshold to compute the univariate distribution for the second vector.
60 | """
61 |
62 | # Calculate rank indices for both vectors
63 | i = int(round(self.n * i_n))
64 | j = int(round(self.n * j_n))
65 |
66 | ith_order_u = sorted(self.data[0])[i-1]
67 | ith_order_v = sorted(self.data[1])[j-1]
68 |
69 | # Find indices where both vectors are less than the corresponding rank indices
70 | mask_x = self.data[0] <= ith_order_u
71 | mask_y = self.data[1] <= ith_order_v
72 |
73 | return np.sum(np.logical_and(mask_x, mask_y)) / self.n
74 |
75 | def LTDC(self, i_n):
76 | """
77 | # Compute the empirical lower TDC for a given threshold i/n
78 |
79 | Parameters
80 | ----------
81 | i_n : float
82 | The threshold to compute the lower TDC.
83 | """
84 |
85 | if int(round(self.n * i_n)) == 0:
86 | return 0
87 |
88 | return self.get_cdf(i_n, i_n) / i_n
89 |
90 | def UTDC(self, i_n):
91 | """
92 | # Compute the empirical upper TDC for a given threshold i/n
93 |
94 | Parameters
95 | ----------
96 | i_n : float
97 | The threshold to compute the upper TDC.
98 | """
99 |
100 | return (1 - 2 * i_n + self.get_cdf(i_n, i_n) ) / (1-i_n)
101 |
102 |
103 | def plot_cdf(self, Nsplit):
104 | """
105 | # Plot the empirical CDF
106 |
107 | Parameters
108 | ----------
109 | Nsplit : The number of splits used to compute the grid
110 | """
111 | U_grid = np.linspace(0, 1, Nsplit)[:-1]
112 | V_grid = np.linspace(0, 1, Nsplit)[:-1]
113 | U_grid, V_grid = np.meshgrid(U_grid, V_grid)
114 | Z = np.array(
115 | [self.get_cdf(uu, vv) for uu, vv in zip(np.ravel(U_grid), np.ravel(V_grid)) ] )
116 | Z = Z.reshape(U_grid.shape)
117 | plot_bivariate(U_grid,V_grid,Z)
118 |
119 | def plot_pdf(self, Nsplit):
120 | """
121 | # Plot the empirical PDF
122 |
123 | Parameters
124 | ----------
125 | Nsplit : The number of splits used to compute the grid
126 | """
127 |
128 | U_grid = np.linspace(self.data[0].min(), self.data[0].max(), Nsplit)
129 | V_grid = np.linspace(self.data[1].min(), self.data[1].max(), Nsplit)
130 |
131 | # Initialize a matrix to hold the counts
132 | counts = np.zeros((Nsplit, Nsplit))
133 |
134 | for i in range(Nsplit-1):
135 | for j in range(Nsplit-1):
136 | # Define the edges of the bin
137 | Xa, Xb = U_grid[i], U_grid[i + 1]
138 | Ya, Yb = V_grid[j], V_grid[j + 1]
139 |
140 | # Use boolean indexing to count points within the bin
141 | mask = (Xa <= self.data[0]) & (self.data[0] < Xb) & (Ya <= self.data[1]) & (self.data[1] < Yb)
142 | counts[i, j] = np.sum(mask)
143 | # Adjust the grid centers for plotting
144 | U_grid_centered = U_grid + (U_grid[1] - U_grid[0]) / 2
145 | V_grid_centered = V_grid + (V_grid[1] - V_grid[0]) / 2
146 |
147 | U, V = np.meshgrid(U_grid_centered, V_grid_centered) # Create coordinate points of X and Y
148 | Z = counts / np.sum(counts)
149 |
150 | plot_bivariate(U,V,Z)
151 |
152 | def optimal_tdc(self, case):
153 | """
154 | # Compute the optimal Empirical Tail Dependence coefficient (TDC)
155 |
156 | The algorithm is based on the heuristic plateau-finding algorithm
157 | from Frahm et al (2005) "Estimating the tail-dependence coefficient:
158 | properties and pitfalls"
159 |
160 | Parameters
161 | ----------
162 | case: str
163 | takes "upper" or "lower" for upper TDC or lower TDC
164 | """
165 |
166 | #### 1) The series of TDC is smoothed using a box kernel with bandwidth b ∈ N
167 | # Consists in applying a moving average on 2b + 1 consecutive points
168 | tdc_array = np.zeros((self.n,))
169 |
170 | # b is chosen such that ~1% of the data fall into the mooving average
171 | b = int(np.ceil(self.n/200))
172 |
173 | if case == "upper":
174 | # Compute the Upper TDC for every possible threshold i/n
175 | for i in range(1, self.n-1):
176 | tdc_array[i] = self.UTDC(i_n=i/self.n)
177 | # We reverse the order, the plateau finding algorithm starts with lower values
178 | tdc_array = tdc_array[::-1]
179 |
180 | elif case =="lower":
181 | # Compute the Lower TDC for every possible threshold i/n
182 | for i in range(1, self.n-1):
183 | tdc_array[i] = self.LTDC(i_n=i/self.n)
184 | else:
185 | print("Takes \"upper\" or \"lower\" argument only")
186 | return None
187 |
188 | # Smooth the TDC with a mooving average of lenght 2b+1
189 | # The total lenght = n-2b-1 because i ∈ [1, n−2b]
190 | tdc_smooth_array = np.zeros((self.n-2*b-1,))
191 |
192 | for i, j in zip(range(b+1, self.n-b), range(0, self.n-2*b-1)):
193 | tdc_smooth_array[j] = sum(tdc_array[i-b:i+b+1]) / (2*b+1)
194 |
195 | #### 2) We select a vector of m consecutive estimates that satisfies a plateau condition
196 |
197 | # m = lenght of the plateau = number of consecutive smoothed TDC estimates
198 | m = int(np.floor(np.sqrt(self.n-2*b)))
199 | # The standard deviation of the smoothed TDC series
200 | std_tdc_smooth = tdc_smooth_array.std()
201 |
202 | # We iterate k from 0 to n-2b-m because k ∈ [1, n−2b−m+1]
203 | for k in range(0,self.n-2*b-m):
204 | plateau = 0
205 | for i in range(1,m-1):
206 | plateau = plateau + np.abs(tdc_smooth_array[k+i] - tdc_smooth_array[k])
207 | # if the plateau satisfies the following condition:
208 | if plateau <= 2*std_tdc_smooth:
209 | #### 3) Then, the TDC estimator is defined as the average of the estimators in the corresponding plateau
210 | avg_tdc_plateau = np.mean(tdc_smooth_array[k:k+m-1])
211 | print("Optimal threshold: ", k/self.n)
212 | return avg_tdc_plateau
213 |
214 | # In case the condition is not satisfied the TDC estimate is set to zero
215 | return 0
--------------------------------------------------------------------------------
/pycop/bivariate/estimation.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from scipy.optimize import minimize
3 | from scipy.stats import norm, t
4 |
5 | import warnings
6 | # suppress warnings
7 | warnings.filterwarnings('ignore')
8 |
9 | def pseudo_obs(data):
10 | """
11 | # Transform the dataset to uniform margins.
12 |
13 | The pseudo-observations are the scaled ranks.
14 |
15 | Parameters
16 | ----------
17 | data : array like
18 | The dataset to transform.
19 |
20 | Returns
21 | -------
22 | scaled_ranks : array like
23 |
24 | """
25 |
26 | n = len(data[0]) # Assuming data[0] and data[1] are of the same length
27 |
28 | scaled_rank = lambda values: (np.argsort(np.argsort(values)) + 1) / (n + 1)
29 |
30 | scaled_ranks = np.array([scaled_rank(data[0]), scaled_rank(data[1])])
31 |
32 | return scaled_ranks
33 |
34 |
35 | def fit_cmle(copula, data, opti_method='SLSQP', options={}):
36 | """
37 | # Compute the Canonical Maximum likelihood Estimator (CMLE) using the pseudo-observations
38 |
39 | Parameters
40 | ----------
41 | data : array like
42 | The dataset.
43 |
44 | copula : class
45 | The copula object
46 | opti_method : str, optional
47 | The optimization method to pass to known_parametersscipy.optimize.minimize`.
48 | The default algorithm is set to `SLSQP`
49 | options : dict, optional
50 | The dictionary that contains the options to pass to the scipy.minimize function
51 | options={'maxiter': 100000}
52 |
53 | Returns
54 | -------
55 | Return the estimated parameter(s) in a list
56 |
57 | """
58 |
59 | psd_obs = pseudo_obs(data)
60 |
61 | def log_likelihood(parameters):
62 | """
63 | The number of parameters depends on the type of copule function
64 | """
65 | if len(copula.bounds_param) == 1:
66 | params = [parameters]
67 | else:
68 | param1, param2 = parameters
69 | params = [param1, param2]
70 | logl = -np.sum(np.log(copula.get_pdf(psd_obs[0], psd_obs[1], params)))
71 | return logl
72 |
73 | if (copula.bounds_param[0] == (None, None)):
74 | results = minimize(log_likelihood, copula.parameters_start, method='Nelder-Mead', options=options)
75 | # print("method: Nelder-Mead - success:", results.success, ":", results.message)
76 | return (results.x, -results.fun)
77 |
78 | else:
79 | results = minimize(log_likelihood, copula.parameters_start, method=opti_method, bounds=copula.bounds_param, options=options)
80 | # print("method:", opti_method, "- success:", results.success, ":", results.message)
81 | if results.success == True:
82 | return (results.x, -results.fun)
83 | else:
84 | print(results)
85 | print("optimization failed")
86 | return None
87 |
88 |
89 | def fit_cmle_mixt(copula, data, opti_method='SLSQP', options={}):
90 | """
91 | # Compute the CMLE for a mixture copula using the pseudo-observations
92 |
93 | Parameters
94 | ----------
95 | data : array like
96 | The dataset.
97 |
98 | copula : class
99 | The mixture copula object
100 | opti_method : str, optional
101 | The optimization method to pass to known_parametersscipy.optimize.minimize`.
102 | The default algorithm is set to `SLSQP`
103 |
104 | Returns
105 | -------
106 | Return the estimated parameter(s) in a list
107 |
108 | """
109 | psd_obs = pseudo_obs(data)
110 |
111 | def log_likelihood(parameters):
112 | if copula.dim == 2:
113 | w1, param1, param2 = parameters
114 | params = [w1, param1, param2]
115 | else: # dim = 3
116 | w1, w2, w3, param1, param2, param3 = parameters
117 | params = [w1, w2, w3, param1, param2, param3]
118 | logl = -np.sum(np.log(copula.get_pdf(psd_obs[0], psd_obs[1], params)))
119 | return logl
120 |
121 | # copula.dim gives the number of weights to consider
122 | cons = [{'type': 'eq', 'fun': lambda parameters: np.sum(parameters[:copula.dim]) - 1}]
123 |
124 | results = minimize(log_likelihood,
125 | copula.parameters_start,
126 | method=opti_method,
127 | bounds=copula.bounds_param,
128 | constraints=cons,
129 | options=options)
130 |
131 | #print("method:", opti_method, "- success:", results.success, ":", results.message)
132 | if results.success == True:
133 | return (results.x, -results.fun)
134 |
135 | print("optimization failed")
136 | return None
137 |
138 |
139 | def fit_mle(data, copula, marginals, opti_method='SLSQP', known_parameters=False):
140 | """
141 | # Compute the Maximum likelihood Estimator (MLE)
142 |
143 | Parameters
144 | ----------
145 | data : array like
146 | The dataset.
147 |
148 | copula : class
149 | The copula object
150 | marginals : list
151 | A list of dictionary that contains the marginal distributions and their
152 | `loc` and `scale` parameters when the parameters are known.
153 | Example:
154 | marginals = [
155 | {
156 | "distribution": norm, "loc" : 0, "scale" : 1,
157 | },
158 | {
159 | "distribution": norm, "loc" : 0, "scale": 1,
160 | }]
161 | opti_method : str, optional
162 | The optimization method to pass to known_parametersscipy.optimize.minimize`.
163 | The default algorithm is set to `SLSQP`
164 | known_parameters : bool
165 | If the variable is set to `True` then we estimate the `loc` and `scale`
166 | parameters of the marginal distributions.
167 |
168 | Returns
169 | -------
170 | Return the estimated parameter(s) in a list
171 |
172 | """
173 |
174 | if copula.type == "mixture":
175 | print("estimation of mixture only available with CMLE try fit mle")
176 | raise ValueError
177 |
178 | if known_parameters == True:
179 |
180 | marg_cdf1 = lambda i : marginals[0]["distribution"].cdf(data[0][i], marginals[0]["loc"], marginals[0]["scale"])
181 | marg_pdf1 = lambda i : marginals[0]["distribution"].pdf(data[0][i], marginals[0]["loc"], marginals[0]["scale"])
182 |
183 | marg_cdf2 = lambda i : marginals[1]["distribution"].cdf(data[1][i], marginals[1]["loc"], marginals[1]["scale"])
184 | marg_pdf2 = lambda i : marginals[1]["distribution"].pdf(data[1][i], marginals[1]["loc"], marginals[1]["scale"])
185 |
186 | logi = lambda i, theta: np.log(copula.get_pdf(marg_cdf1(i),marg_cdf2(i),[theta]))+np.log(marg_pdf1(i)) +np.log(marg_pdf2(i))
187 | log_likelihood = lambda theta: -sum([logi(i, theta) for i in range(0,len(data[0]))])
188 |
189 | results = minimize(log_likelihood, copula.parameters_start, method=opti_method, )# options={'maxiter': 300})#.x[0]
190 |
191 | else:
192 | marg_cdf1 = lambda i, loc, scale : marginals[0]["distribution"].cdf(data[0][i], loc, scale)
193 | marg_pdf1 = lambda i, loc, scale : marginals[0]["distribution"].pdf(data[0][i], loc, scale)
194 |
195 | marg_cdf2 = lambda i, loc, scale : marginals[1]["distribution"].cdf(data[1][i], loc, scale)
196 | marg_pdf2 = lambda i, loc, scale : marginals[1]["distribution"].pdf(data[1][i], loc, scale)
197 |
198 | logi = lambda i, theta, loc1, scale1, loc2, scale2: \
199 | np.log(copula.get_pdf(marg_cdf1(i, loc1, scale1),marg_cdf2(i, loc2, scale2),[theta])) \
200 | + np.log(marg_pdf1(i, loc1, scale1)) +np.log(marg_pdf2(i, loc2, scale2))
201 |
202 | def log_likelihood(params):
203 | theta, loc1, scale1, loc2, scale2 = params
204 | return -sum([logi(i, theta, loc1, scale1, loc2, scale2) for i in range(0,len(data[0]))])
205 |
206 | results = minimize(log_likelihood, (copula.parameters_start, np.array(0), np.array(1), np.array(0), np.array(1)), method=opti_method, )# options={'maxiter': 300})#.x[0]
207 |
208 | print("method:", opti_method, "- success:", results.success, ":", results.message)
209 | if results.success == True:
210 | return results.x
211 |
212 | print("Optimization failed")
213 | return None
214 |
215 | def IAD_dist(copula, data, param):
216 | """
217 | Compute the Integrated Anderson-Darling (IAD) distance between
218 | the parametric copula and the empirical copula with vectorization.
219 |
220 | Info:
221 | This function first computes the empirical copula. It then computes the
222 | theoretical (parametric) copula values using the provided copula function
223 | and parameters. The IAD distance is calculated based on the differences
224 | between these two copulas.
225 | Based on equation 9 in "Crash Sensitivity and the Cross Section of Expected
226 | Stock Returns" (2018) Journal of Financial and Quantitative Analysis
227 |
228 | Args:
229 | copula (function): The copula object, providing a method `get_cdf` to compute the CDF.
230 | data (array-like): The underlying data as a 2D array, where each row is a dimension.
231 | param (array-like): The parameters of the copula.
232 |
233 | Returns:
234 | float: The IAD distance between the empirical and the parametric copulas.
235 | """
236 |
237 | n = len(data[0])
238 |
239 | # Get the order statistics for each dimension
240 | sorted_u = np.sort(data[0])
241 | sorted_v = np.sort(data[1])
242 |
243 | # Create a grid of comparisons for each pair (u, v)
244 | u_grid, v_grid = np.meshgrid(sorted_u, sorted_v, indexing='ij')
245 |
246 | # Count the number of points below the threshold in both dimensions
247 | # Use broadcasting to compare all pairs and count
248 | counts = np.sum((data[0][:, None, None] <= u_grid) & (data[1][:, None, None] <= v_grid), axis=0)
249 | # Compute the empirical copula
250 | C_empirical = counts / n
251 |
252 | # Prepare the grid for computing the parametric copula
253 | x_values, y_values = np.linspace(1/n, 1-1/n, n), np.linspace(1/n, 1-1/n, n)
254 |
255 | # Compute the parametric (theoretical) copula values
256 | u_flat = np.array([[x for x in x_values] for y in y_values]).flatten()
257 | v_flat = np.array([[y for x in x_values] for y in y_values]).flatten()
258 |
259 | C_copula = copula.get_cdf(np.array(u_flat), np.array(v_flat), param)
260 | C_copula = C_copula.reshape((n,n))
261 |
262 | # Calculate the Integrated Anderson-Darling distance
263 | IAD = np.sum(((C_empirical - C_copula) ** 2) / (C_copula - C_copula**2))
264 |
265 | return IAD
266 |
267 |
268 | def AD_dist(copula, data, param):
269 | """
270 | Compute the Anderson-Darling (IAD) distance between the parametric
271 | copula and the empirical copula with vectorization.
272 |
273 | Same principle as IAD_dist()
274 | """
275 |
276 | n = len(data[0])
277 |
278 | sorted_u = np.sort(data[0])
279 | sorted_v = np.sort(data[1])
280 |
281 | u_grid, v_grid = np.meshgrid(sorted_u, sorted_v, indexing='ij')
282 |
283 | counts = np.sum((data[0][:, None, None] <= u_grid) & (data[1][:, None, None] <= v_grid), axis=0)
284 | C_empirical = counts / n
285 |
286 | x_values, y_values = np.linspace(1/n, 1-1/n, n), np.linspace(1/n, 1-1/n, n)
287 |
288 | u_flat = np.array([[x for x in x_values] for y in y_values]).flatten()
289 | v_flat = np.array([[y for x in x_values] for y in y_values]).flatten()
290 |
291 | C_copula = copula.get_cdf(np.array(u_flat), np.array(v_flat), param)
292 | C_copula = C_copula.reshape((n,n))
293 |
294 | AD = np.max(((C_empirical - C_copula) ** 2) / (C_copula - C_copula**2))
295 | return AD
--------------------------------------------------------------------------------
/pycop/bivariate/gaussian.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from scipy.stats import norm, multivariate_normal
3 | from scipy.special import erfinv
4 | from pycop.bivariate.copula import copula
5 |
6 | class gaussian(copula):
7 | """
8 | # Creates a gaussian copula object
9 |
10 | ...
11 |
12 | Attributes
13 | ----------
14 | family : str
15 | = "gaussian"
16 | bounds_param : list
17 | A list that contains the domain of the parameter(s) in a tuple.
18 | Exemple : [(lower, upper)]
19 | parameters_start : array
20 | Value(s) of the initial guess when estimating the copula parameter(s).
21 | It represents the parameter `x0` in the `scipy.optimize.minimize` function.
22 |
23 | Methods
24 | -------
25 | get_cdf(u, v, param)
26 | Computes the Cumulative Distribution Function (CDF).
27 | get_pdf(u, v, param)
28 | Computes the Probability Density Function (PDF).
29 | """
30 |
31 | def __init__(self):
32 | # the `gaussian` copula class inherit the `copula` class
33 | super().__init__()
34 | self.family = "gaussian"
35 | self.bounds_param = [(-1, 1)]
36 | self.parameters_start = np.array(0)
37 |
38 | def get_cdf(self, u, v, param):
39 | """
40 | # Computes the CDF
41 |
42 | Parameters
43 | ----------
44 | u, v : float
45 | Values of the marginal CDFs
46 | param : list
47 | The correlation coefficient param[0] ∈ [-1,1].
48 | Used to defined the correlation matrix (squared, symetric and definite positive)
49 | """
50 |
51 | y1 = norm.ppf(u, 0, 1)
52 | y2 = norm.ppf(v, 0, 1)
53 | rho = param[0]
54 |
55 | return multivariate_normal.cdf(np.array([y1, y2]).T, mean=None, cov=[[1, rho], [rho, 1]])
56 |
57 | def get_pdf(self, u, v, param):
58 | """
59 | # Computes the PDF
60 |
61 | Parameters
62 | ----------
63 | u, v : float
64 | Values of the marginal CDFs
65 | param : list
66 | The correlation coefficient param[0] ∈ [-1,1].
67 | Used to defined the correlation matrix (squared, symetric and definite positive)
68 | """
69 |
70 | rho = param[0]
71 | a = np.sqrt(2) * erfinv(2 * u - 1)
72 | b = np.sqrt(2) * erfinv(2 * v - 1)
73 | det_rho = 1 - rho**2
74 |
75 | return det_rho**-0.5 * np.exp(-((a**2 + b**2) * rho**2 -2 * a * b * rho) / (2 * det_rho))
76 |
77 |
78 |
--------------------------------------------------------------------------------
/pycop/bivariate/mixture.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from pycop.bivariate.copula import copula
3 | from pycop.bivariate.archimedean import archimedean
4 | from pycop.bivariate.gaussian import gaussian
5 | from pycop.bivariate.student import student
6 |
7 |
8 | class mixture(copula):
9 | """
10 | # Creates an mixture copula objects
11 |
12 | ...
13 |
14 | Attributes
15 | ----------
16 | dim : int
17 | The number of copula combined, only 2 or 3 supported
18 | mixture_type : str
19 | The type of mixture as bundle of the combinated copula
20 | cop : list
21 | A list that contains the copula objects to combine
22 | bounds_param : list
23 | A list that contains the domain of the parameter(s) in a tuple.
24 | Example : [(lower, upper), (lower, upper)]
25 | parameters_start : array
26 | Value(s) of the initial guess when estimating the copula parameter(s).
27 | It represents the parameter `x0` in the `scipy.optimize.minimize` function.
28 |
29 | Methods
30 | -------
31 | get_cdf(u, v, param)
32 | Computes the Cumulative Distribution Function (CDF).
33 | get_pdf(u, v, param)
34 | Computes the Probability Density Function (PDF).
35 | LTDC(w1, theta1)
36 | Computes the Lower Tail Dependence Coefficient (TDC).
37 | UTDC(w1, theta2)
38 | Computes the upper TDC.
39 | """
40 |
41 | def __init__(self, copula_list):
42 | """
43 | Parameters
44 | ----------
45 | copula_list : list
46 | A list of string of the type of copula to combined
47 | Example : ["clayton", "gumbel"]
48 |
49 | Raises
50 | ------
51 | ValueError
52 | dim : the lenght of `copula_list` must be equal to 2 or 3.
53 | Mixtures are only available for a combination of 2 or 3 copulas
54 |
55 | copula_list : element must be supported functions.
56 | Mixtures are only available for archimedean and gaussian.
57 |
58 | """
59 | # the `student` copula class inherit the `copula` class
60 | super().__init__()
61 |
62 | Archimedean_families = [
63 | 'clayton', 'gumbel', 'frank', 'joe', 'galambos','fgm', 'plackett',
64 | 'rgumbel', 'rclayton', 'rjoe','rgalambos']
65 |
66 | self.dim = len(copula_list)
67 |
68 | if self.dim != 2 and self.dim != 3:
69 | print("Mixture supported only for combinaison of 2 or 3 copulas")
70 | raise ValueError
71 |
72 | self.cop = []
73 | mixture_type = copula_list[0].capitalize()
74 |
75 | for cop in copula_list[1:]:
76 | mixture_type+= "-"+cop.capitalize()
77 |
78 | self.family = mixture_type + " mixture"
79 | if self.dim ==2:
80 | self.bounds_param = [(0,1)]
81 | self.parameters_start = [np.array(1/self.dim)]
82 | else:
83 | self.bounds_param = [(0,1) for i in range(0, 3)]
84 | self.parameters_start = [np.array(1/self.dim) for i in range(0, 3)]
85 |
86 | for i in range(0,self.dim):
87 |
88 | if copula_list[i] == "gaussian":
89 | self.cop.append(gaussian())
90 | self.bounds_param.append((-1,1))
91 | self.parameters_start.append(np.array(0))
92 |
93 | elif copula_list[i] in Archimedean_families:
94 | cop_mixt = archimedean(family=copula_list[i])
95 | self.cop.append(cop_mixt)
96 | self.bounds_param.append(cop_mixt.bounds_param[0])
97 | self.parameters_start.append(cop_mixt.parameters_start)
98 | else:
99 | print("Mixture only supported for archimedean and gaussian mixture only")
100 | print("Archimedean copula available are: ", Archimedean_families)
101 | raise ValueError
102 | self.parameters_start = tuple(self.parameters_start)
103 |
104 | def get_cdf(self, u, v, param):
105 | """
106 | # Computes the CDF
107 |
108 | Parameters
109 | ----------
110 | u, v : float
111 | Values of the marginal CDFs
112 | param : list
113 | A list that contains the parameters of the mixture and the copula.
114 | The element of the list must be ordered as follow, for 2-dimensional mixture :
115 | [
116 | weight1 : float, weight1 ∈ [-1,1]
117 | The weight given in the first copula.
118 | theta1 : float
119 | The theta parameter of the first copula.
120 | theta2 : float
121 | " second.
122 | ]
123 | For a 3-dimensional mixture :
124 | [
125 | weight1 : float
126 | the weight given in the first copula.
127 | weight2 : float
128 | " second.
129 | weight3 : float
130 | " third.
131 | theta1 : float
132 | The theta parameter of the first copula.
133 | theta2 : float
134 | " second.
135 | theta3 : float
136 | " third.
137 | ]
138 | The sum of the weights must be equal to 1.
139 | """
140 | if self.dim == 2:
141 | cdf = param[0]*(self.cop[0].get_cdf(u,v,[param[1]])) \
142 | +(1-param[0])*(self.cop[1].get_cdf(u,v,[param[2]]))
143 | else:
144 | cdf= param[0]*(self.cop[0].get_cdf(u,v,[param[3]])) \
145 | + param[1]*(self.cop[1].get_cdf(u,v,[param[4]])) \
146 | + param[2]*(self.cop[2].get_cdf(u,v,[param[5]]))
147 | return cdf
148 |
149 | def get_pdf(self, u, v, param):
150 | """
151 | # Computes the CDF
152 |
153 | Parameters
154 | ----------
155 | u, v : float
156 | Values of the marginal CDFs
157 | param : list
158 | A list that contains the parameters of the mixture and the copula.
159 | See how to order the list in the above method `get_cdf`
160 | """
161 | if self.dim == 2:
162 | pdf = param[0]*(self.cop[0].get_pdf(u,v,[param[1]])) \
163 | +(1-param[0])*(self.cop[1].get_pdf(u,v,[param[2]]))
164 | else:
165 | pdf = param[0]*(self.cop[0].get_pdf(u,v,[param[3]])) \
166 | + param[1]*(self.cop[1].get_pdf(u,v,[param[4]])) \
167 | + param[2]*(self.cop[2].get_pdf(u,v,[param[5]]))
168 |
169 | return pdf
170 |
171 | def LTDC(self, theta, weight):
172 | """
173 | # Computes the upper TDC
174 |
175 | Parameters
176 | ----------
177 | weight : float
178 | The weight associated to the copula with Lower Tail Dependence
179 | theta : float
180 | The parameter of the copula with Lower Tail Dependence
181 | """
182 | return self.cop[0].LTDC(theta)*weight
183 |
184 | def UTDC(self, theta, weight):
185 | """
186 | # Computes the upper TDC
187 |
188 | Parameters
189 | ----------
190 | weight : float
191 | The weight associated to the copula with Upper Tail Dependence
192 | theta : float
193 | The parameter of the copula with Upper Tail Dependence
194 | """
195 | return self.cop[-1].UTDC(theta)*weight
196 |
197 |
198 |
199 |
--------------------------------------------------------------------------------
/pycop/bivariate/student.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from scipy.stats import t
3 | from scipy.special import gamma
4 | from pycop.bivariate.copula import copula
5 |
6 | class student(copula):
7 | """
8 | # Creates a student copula object
9 |
10 | The multivariate student CDF has no analytic expression but it can be
11 | approximated numerically
12 |
13 | ...
14 |
15 | Attributes
16 | ----------
17 | family : str
18 | = "student"
19 | bounds_param : list
20 | A list that contains the domain of the parameter(s) in a tuple.
21 | Exemple : [(lower, upper)]
22 | parameters_start : array
23 | Value(s) of the initial guess when estimating the copula parameter(s).
24 | It represents the parameter `x0` in the `scipy.optimize.minimize` function.
25 |
26 | Methods
27 | -------
28 |
29 | get_pdf(u, v, rho)
30 | Computes the Probability Density Function (PDF).
31 | """
32 |
33 | def __init__(self):
34 | # the `student` copula class inherit the `copula` class
35 | super().__init__()
36 | self.family = "student"
37 | self.bounds_param = [(-1+1e-6, 1-1e-6), (1e-6, None)]
38 | self.parameters_start = (np.array(0), np.array(1))
39 |
40 | def get_pdf(self, u, v, param):
41 | """
42 | # Computes the PDF
43 |
44 | # Source:
45 | Joe, H. (2014). Dependence modeling with copulas. CRC press.
46 | 4.13 Multivariate t - Student's Copula p.181 Equation (4.32)
47 |
48 | Parameters
49 | ----------
50 | u, v : float
51 | Values of the marginal CDFs
52 | param : list
53 | A list that contains the correlation coefficient rho ∈ [-1,1] and
54 | nu > 0, the degrees of freedom.
55 | """
56 |
57 | rho = param[0]
58 | nu = param[1]
59 |
60 | term1 = gamma((nu + 2) / 2) * gamma(nu / 2)
61 | term2 = gamma((nu + 1) / 2) ** 2
62 |
63 | u_ = t.ppf(u, df=nu)
64 | v_ = t.ppf(v, df=nu)
65 |
66 | det_rho = 1-rho**2
67 | multid = (-2 * u_ * v_ * rho + (u_ ** 2) + (v_ ** 2) ) / det_rho
68 | term3 = (1 + multid / nu) ** ((nu + 2) / 2)
69 |
70 | prod1 = (1 + (u_ ** 2) / nu) ** ((nu + 1) / 2)
71 | prod2 = (1 + (v_ ** 2) / nu) ** ((nu + 1) / 2)
72 | prod = prod1 * prod2
73 |
74 | return (1/np.sqrt(det_rho)) * (term1 * prod) / (term2 * term3)
75 |
76 |
--------------------------------------------------------------------------------
/pycop/multivariate/gaussian.py:
--------------------------------------------------------------------------------
1 |
2 |
3 | from scipy.stats import t
4 | from scipy.special import gamma
5 | from scipy.stats import norm, multivariate_normal
6 | import numpy as np
7 |
8 |
9 | class gaussian():
10 |
11 | def __init__(self, corrMatrix):
12 | """
13 | Creates a gaussian copula
14 |
15 | corrMatrix: length determines the dimension of random variable
16 |
17 | the correlation matrix must be squared + symetric
18 | and definite positive
19 |
20 | """
21 | self.corrMatrix = np.asarray(corrMatrix)
22 | self.n = len(corrMatrix)
23 |
24 | def cdf(self, d):
25 | """
26 | returns the cumulative distribution
27 | d = (U1, ..., Un)
28 |
29 | """
30 | y = norm.ppf(d, 0, 1)
31 |
32 | return multivariate_normal.cdf(y, mean=None, cov=self.corrMatrix)
33 |
34 | def pdf(self, d):
35 | """
36 | returns the density
37 | """
38 | y = norm.ppf(d, 0, 1)
39 |
40 | rho_det = np.linalg.det(self.corrMatrix)
41 | rho_inv = np.linalg.inv(self.corrMatrix)
42 |
43 | return rho_det**(-0.5) * np.exp(-0.5 * np.dot(y, np.dot(rho_inv - np.identity(self.n), y)))
44 |
45 |
--------------------------------------------------------------------------------
/pycop/multivariate/student.py:
--------------------------------------------------------------------------------
1 | from scipy.stats import t
2 | from scipy.special import gamma
3 | from scipy.stats import norm, multivariate_normal
4 | import numpy as np
5 |
6 |
7 | class student():
8 |
9 | def __init__(self, corrMatrix, nu):
10 | self.corrMatrix = np.asarray(corrMatrix)
11 | self.nu = nu
12 | self.n = len(corrMatrix)
13 |
14 | """
15 | The multivariate student distribution CDF has no analytic expression but it can be approximated numerically
16 | """
17 |
18 | def pdf(self, d):
19 | """
20 | returns the density
21 | """
22 | y = t.ppf(d, df=self.nu)
23 |
24 | rho_det = np.linalg.det(self.corrMatrix)
25 | rho_inv = np.linalg.inv(self.corrMatrix)
26 |
27 | A = gamma((self.nu+self.n)/2)*( gamma(self.nu/2)**(self.n-1) )
28 | B = gamma((self.nu+1)/2)**self.n
29 | C = (1 + (np.dot(y, np.dot(rho_inv, y)))/self.nu)**((self.nu+self.n)/2)
30 |
31 | [ (1 + yi**2/self.nu)**((self.nu+1)/2) for yi in y]
32 |
33 | prod = 1
34 | for comp in [ (1 + yi**2/self.nu)**((self.nu+1)/2) for yi in y]:
35 | prod *=comp
36 |
37 | return rho_det**(-0.5) *(A*prod)/(B*C)
38 |
--------------------------------------------------------------------------------
/pycop/simulation.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | import numpy as np
3 | from scipy import linalg
4 | from scipy.stats import norm, t, levy_stable, logser
5 | from scipy.special import gamma, comb
6 | from typing import List
7 |
8 |
9 | def simu_gaussian(n: int, m: int, corr_matrix: np.array):
10 | """
11 | # Gaussian Copula simulations with a given correlation matrix
12 |
13 | Parameters
14 | ----------
15 | n : int
16 | the dimension number of simulated variables
17 | m : int
18 | the sample size
19 | corr_matrix : array
20 | the correlation matrix
21 |
22 | Returns
23 | -------
24 | u : array
25 | the simulated sample
26 |
27 | """
28 | if not all(isinstance(v, int) for v in [n, m]):
29 | raise TypeError("The 'n' and 'm' arguments must both be integer types.")
30 | if not isinstance(corr_matrix, np.ndarray):
31 | raise TypeError("The 'corr_matrix' argument must be a numpy array.")
32 | # Generate n independent standard Gaussian random variables V = (v1 ,..., vn):
33 | v = [np.random.normal(0, 1, m) for i in range(0, n)]
34 |
35 | # Compute the lower triangular Cholesky factorization of the correlation matrix:
36 | l = linalg.cholesky(corr_matrix, lower=True)
37 | y = np.dot(l, v)
38 | u = norm.cdf(y, 0, 1)
39 |
40 | return u
41 |
42 |
43 | def simu_tstudent(n: int, m: int, corr_matrix: np.array, nu: float):
44 | """
45 | # Student Copula with k degrees of freedom and a given correlation matrix
46 |
47 | Parameters
48 | ----------
49 | n : int
50 | the dimension number of simulated variables
51 | m : int
52 | the sample size
53 | corr_matrix : array
54 | the correlation matrix
55 | nu : float
56 | the degree of freedom
57 |
58 | Returns
59 | -------
60 | u : array
61 | the simulated sample
62 |
63 | """
64 | if not all(isinstance(v, int) for v in [n, m]):
65 | raise TypeError("The 'n' and 'm' arguments must both be integer types.")
66 | if not isinstance(corr_matrix, np.ndarray):
67 | raise TypeError("The 'corr_matrix' argument must be a numpy array.")
68 | if not isinstance(nu, (int, float)):
69 | raise TypeError("The 'nu' argument must be a float type.")
70 |
71 | # Generate n independent standard Gaussian random variables V = (v1 ,..., vn):
72 | v = [np.random.normal(0, 1, m) for i in range(0, n)]
73 |
74 | # Compute the lower triangular Cholesky factorization of rho:
75 | l = linalg.cholesky(corr_matrix, lower=True)
76 | z = np.dot(l, v)
77 |
78 | # generate a random variable r, following a chi2-distribution with nu degrees of freedom
79 | r = np.random.chisquare(df=nu,size=m)
80 |
81 | y = np.sqrt(nu/ r)*z
82 | u = t.cdf(y, df=nu, loc=0, scale=1)
83 |
84 | return u
85 |
86 |
87 | def SimuSibuya(alpha: float, m: int):
88 | """
89 | # Sibuya distribution Sibuya(α)
90 | Used for sampling F=Sibuya(α) for Joe copula
91 | The algorithm is given in Proposition 3.2 in Hofert (2011) "Efficiently sampling nested Archimedean copulas"
92 |
93 | Parameters
94 | ----------
95 | alpha : float
96 | the alpha parameter, α = 1/θ
97 | m : int
98 | the sample size
99 |
100 | Returns
101 | -------
102 | X : array
103 | the simulated sample
104 |
105 | """
106 | if not isinstance(alpha, (int, float)):
107 | raise TypeError("The 'alpha' argument must be a float type.")
108 | if not isinstance(m, int):
109 | raise TypeError("The 'm' argument must be an integer type.")
110 |
111 | G_1 = lambda y: ((1-y)*gamma(1-alpha))**(-1/alpha)
112 | F = lambda n: 1 - ((-1)**n)*comb(n, alpha-1)
113 |
114 | X = np.random.uniform(0, 1, m)
115 |
116 | for i in range(0, len(X)):
117 | if X[i] <= alpha:
118 | X[i] = 1
119 | elif F(np.floor(G_1(X[i]))) < X[i]:
120 | X[i] = np.ceil(G_1(X[i]))
121 | else:
122 | X[i] = np.floor(G_1(X[i]))
123 |
124 | return X
125 |
126 |
127 | def simu_archimedean(family: str, n: int, m: int, theta: float):
128 | """
129 | Archimedean copula simulation
130 |
131 | Parameters
132 | ----------
133 |
134 | family : str
135 | type of the distribution
136 | n : int
137 | the dimension number of simulated variables
138 | m : int
139 | the sample size
140 | theta : float
141 | copula parameter
142 | Clayton: θ ∈ [0, inf)
143 | Gumbel: θ ∈ [1, +inf)
144 | Frank: θ ∈ (-inf, +inf)
145 | Joe: θ ∈ [1, +inf)
146 | AMH: θ ∈ [0, 1)
147 |
148 | Returns
149 | -------
150 | u : array
151 | the simulated sample, array matrix with dim (m, n)
152 |
153 | """
154 | if family not in ["clayton", "gumbel", "frank", "joe", "amh"]:
155 | raise ValueError("The family argument must be one of 'clayton', 'gumbel', 'frank', 'joe' or 'amh'.")
156 | if not all(isinstance(v, int) for v in [n, m]):
157 | raise TypeError("The 'n' and 'm' arguments must both be integer types.")
158 | if not isinstance(theta, (int, float)):
159 | raise TypeError("The 'theta' argument must be a float type.")
160 |
161 | if family == "clayton":
162 | # Generate n independent standard uniform random variables V = (v1 ,..., vn):
163 | v = [np.random.uniform(0, 1, m) for i in range(0, n)]
164 | # generate a random variable x following the gamma distribution gamma(theta**(-1), 1)
165 | X = np.array([np.random.gamma(theta**(-1), scale=1.0) for i in range(0, m)])
166 | phi_t = lambda t: (1+t)**(-1/theta)
167 | u = [phi_t(-np.log(v[i])/X) for i in range(0, n)]
168 |
169 | elif family == "gumbel":
170 | v = [np.random.uniform(0, 1, m) for i in range(0, n)]
171 | X = levy_stable.rvs(alpha=1/theta, beta=1, scale=(np.cos(np.pi/(2*theta)))**theta, loc=0, size=m)
172 | phi_t = lambda t: np.exp(-t**(1/theta))
173 | u = [phi_t(-np.log(v[i])/X) for i in range(0, n)]
174 |
175 | elif family == "frank":
176 | v = [np.random.uniform(0, 1, m) for i in range(0, n)]
177 | p = 1-np.exp(-theta)
178 | X = logser.rvs(p, loc=0, size=m, random_state=None)
179 | phi_t = lambda t: -np.log(1-np.exp(-t)*(1-np.exp(-theta)))/theta
180 | u = [phi_t(-np.log(v[i])/X) for i in range(0, n)]
181 |
182 | elif family == "joe":
183 | alpha = 1/theta
184 | X = SimuSibuya(alpha, m)
185 | v = [np.random.uniform(0, 1, m) for i in range(0, n)]
186 | phi_t = lambda t: (1-(1-np.exp(-t))**(1/theta))
187 | u = [phi_t(-np.log(v[i])/X) for i in range(0, n)]
188 |
189 | elif family == "amh":
190 | v = [np.random.uniform(0, 1, m) for i in range(0, n)]
191 | X = np.random.geometric(p=1-theta, size=m)
192 | phi_t = lambda t: (1-theta)/(np.exp(t)-theta)
193 | u = [phi_t(-np.log(v[i])/X) for i in range(0, n)]
194 | return u
195 |
196 |
197 | def simu_mixture(n: int, m: int, combination: List[dict]):
198 | """
199 | Mixture copula simulation
200 |
201 | Parameters
202 | ----------
203 |
204 | n : int
205 | the dimension number of simulated variables
206 | m : int
207 | the sample size
208 |
209 | combination : list
210 | A list of dictionaries that contains information on the copula to combine.
211 |
212 | example:
213 | combination =[
214 | {
215 | "type": "clayton",
216 | "weight": 0.5,
217 | "theta": 4
218 | },
219 | {
220 | "type": "student",
221 | "weight": 0.5,
222 | "corrMatrix": corrMatrix,
223 | "nu":2
224 | }
225 | ]
226 |
227 | Returns
228 | -------
229 | u : array
230 | the simulated sample, array matrix with dim (m, n)
231 |
232 | """
233 | if not all(isinstance(v, int) for v in [n, m]):
234 | raise TypeError("The 'n' and 'm' arguments must both be integer types.")
235 | if not isinstance(combination, list):
236 | raise TypeError("The 'combination' argument must be a list type")
237 | if not all(isinstance(v, dict) for v in combination):
238 | raise TypeError("Each element of the 'combination' argument must be a dict type.")
239 |
240 | v = [np.random.uniform(0, 1, m) for i in range(0, n)]
241 | weights = [comb["weight"] for comb in combination]
242 | #Generate a random sample of indexes of combination types
243 | y = np.array([np.where(ls == 1)[0][0] for ls in np.random.multinomial(n=1, pvals=weights, size=m)])
244 |
245 | for i in range(0, len(combination)):
246 | combinationsize = len(v[0][y == i])
247 |
248 | if combination[i]["type"] == "gaussian":
249 | corr_matrix = combination[i]["corrMatrix"]
250 |
251 | vi = simu_gaussian(n, combinationsize, corr_matrix)
252 | for j in range(0, len(vi)):
253 | v[j][y == i] = vi[j]
254 | elif combination[i]["type"] == "student":
255 | corr_matrix = combination[i]["corrMatrix"]
256 | nu = combination[i]["nu"]
257 | vi = simu_tstudent(n, combinationsize, corr_matrix, nu)
258 |
259 | elif combination[i]["type"] in ["clayton", "gumbel", "frank", "joe", "amh"]:
260 | vi = simu_archimedean(combination[i]["type"], n, combinationsize, combination[i]["theta"])
261 |
262 | for j in range(0, len(vi)):
263 | v[j][y == i] = vi[j]
264 | else:
265 | raise error
266 |
267 | return v
268 |
269 |
--------------------------------------------------------------------------------
/pycop/utils.py:
--------------------------------------------------------------------------------
1 | import matplotlib.pyplot as plt
2 | import numpy as np
3 |
4 |
5 | def empirical_density_contourplot(u, v, lims):
6 |
7 | res = 10
8 |
9 | pts = np.array([u,v])
10 |
11 | pts = (pts - lims[0]) * res / (lims[1] - lims[0])
12 | pts = np.round(pts).astype(int)
13 | pts[pts<0] = 0
14 | pts[pts>(res-1)] = res - 1
15 |
16 | Z = np.zeros((res,res))
17 | for i in range(0, len(u)):
18 | Z[pts[0,i],pts[1,i]] += 1.
19 |
20 | #Z /= len(u) * (lims[1]-lims[0])**2 / res**2
21 |
22 | lvls = np.percentile(Z.flatten(), (50, 80, 90,))
23 | x = np.linspace(lims[0],lims[1],res)
24 | y = np.linspace(lims[0],lims[1],res)
25 | X, Y = np.meshgrid(x,y)
26 | CS2 = plt.contour(X,Y,Z, levels=lvls, colors="k", linewidths=0.8)
27 | fmt = {}
28 |
29 | for l,s in zip( CS2.levels, [ "90%", "80%", "50%" ] ):
30 | fmt[l] = s
31 |
32 | plt.clabel(CS2, inline=1, inline_spacing=0, fmt=fmt, fontsize=8)
33 |
34 |
35 | def empiricalplot(u, contour=True):
36 |
37 | minu = min([min(ui) for ui in u])
38 | maxu = max([max(ui) for ui in u])
39 | setplenght = maxu-minu
40 | str("{:.1f}".format(minu))
41 | lowerticks = [minu,minu+0.2*setplenght,minu+0.4*setplenght]
42 | upperticks = [minu+0.6*setplenght,minu+0.8*setplenght,maxu]
43 | limticks = [minu-0.1*setplenght, maxu+0.1*setplenght]
44 |
45 | n=len(u)
46 | for i in range(0,n):
47 | for j in range(0,n):
48 | if i == j:
49 | ax = plt.subplot(n, n, 1+(n+1)*i)
50 | #plt.text(0.5, 0.5, r"$u_%s$" % str(i+1))
51 | plt.xlim(limticks)
52 | plt.ylim(limticks)
53 | plt.xticks(lowerticks, [str("{:.1f}".format(tcks)) for tcks in lowerticks], ha='center')
54 | plt.yticks(upperticks, [str("{:.1f}".format(tcks)) for tcks in upperticks], va='center', ha='left')
55 | ax.tick_params(axis="y",direction="in", pad=-10)
56 | ax.tick_params(axis="x",direction="in", pad=-15)
57 |
58 | ax2 = ax.twinx()
59 | plt.xlim(limticks)
60 | plt.ylim(limticks)
61 | plt.yticks(lowerticks, [str("{:.1f}".format(tcks)) for tcks in lowerticks], va='center', ha='right')
62 | ax2.tick_params(axis="y",direction="in", pad=-10)
63 |
64 | ax3 = ax.twiny()
65 | plt.xlim(limticks)
66 | plt.ylim(limticks)
67 | plt.xticks(upperticks, [str("{:.1f}".format(tcks)) for tcks in upperticks], va='center', ha='center')
68 | ax3.tick_params(axis="x",direction="in", pad=-12)
69 | elif i < j:
70 | if contour == True:
71 | ax = plt.subplot(n, n, n*i+j+1)
72 | empirical_density_contourplot(u[i], u[j], [minu, maxu])
73 | else:
74 | pass
75 |
76 | plt.xlim(limticks)
77 | plt.ylim(limticks)
78 | plt.xticks([])
79 | plt.yticks([])
80 | else:
81 | ax = plt.subplot(n, n, n*i+j+1)
82 | plt.scatter(u[i], u[j], alpha=0.8, facecolors='none', edgecolors='b', s=2)
83 | plt.xlim(limticks)
84 | plt.ylim(limticks)
85 | plt.xticks([])
86 | plt.yticks([])
87 |
88 | plt.subplots_adjust(bottom=0.02, right=0.98, top=0.98, left=0.02, wspace=0.0, hspace=0.0)
89 | plt.show()
90 |
91 |
92 |
93 |
--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------
1 | [build-system]
2 | requires = ["setuptools>=61.0"]
3 | build-backend = "setuptools.build_meta"
4 |
--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
1 | from setuptools import setup, find_packages
2 |
3 | setup(
4 | name = 'pycop',
5 | version = '0.0.13',
6 | description = 'Copula for multivariate dependence modeling',
7 | long_description=open('README.md', 'r').read(),
8 | long_description_content_type="text/markdown",
9 | author = 'Maxime N',
10 | author_email = 'maxime.nlc@proton.me',
11 | url = 'https://github.com/maximenc/pycop/',
12 | download_url = 'https://github.com/maximenc/pycop/',
13 | classifiers = [],
14 | include_package_data=True,
15 | packages=find_packages(".")
16 | )
17 |
--------------------------------------------------------------------------------