440 |
441 |
442 |
443 |
444 |
445 |
446 |
447 |
448 |
449 |
450 |
451 |
452 |
453 |
454 |
455 |
456 |
457 |
458 |
459 |
460 |
461 |
462 |
463 |
464 |
465 |
466 |
467 |
468 |
469 |
470 |
471 |
472 |
473 |
474 |
475 |
476 |
477 |
478 |
479 |
480 |
481 |
482 |
483 |
484 |
485 |
486 |
487 |
488 |
489 |
490 |
491 |
492 |
493 |
494 |
495 |
496 |
497 |
498 |
499 |
500 |
501 |
502 |
503 |
504 |
505 |
506 |
507 |
508 |
509 |
510 |
511 |
512 |
513 |
514 |
515 |
516 |
517 |
518 |
519 |
520 |
521 |
522 |
523 |
524 |
525 |
526 |
527 |
528 |
529 |
530 |
531 |
532 |
533 |
534 |
535 |
536 |
537 |
538 |
539 |
540 |
541 |
542 |
543 |
544 |
545 |
546 |
547 |
548 |
549 |
550 |
551 |
552 |
553 |
554 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | The MIT License (MIT)
2 |
3 | Copyright (c) 2016 JakeColtman
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # pyBMA
2 |
3 | Bayesian Model Averaging in python
4 |
5 | This module is based on the R package BMA and implements Bayesian Model Averaging for the cox proportional hazards model.
6 |
7 | #### Installation
8 |
9 | pyBMA can be installed from pypi using pip as normal
10 |
11 | pip3.5 install pyBMA
12 |
13 | #### How it works
14 |
15 | Given a survial dataset, pyBMA does the following things:
16 |
17 | 1 - Uses a leaps and bounds algorithm to sample model space
18 |
19 | 2 - Uses [lifelines](http://lifelines.readthedocs.io/en/latest/Survival%20Regression.html#cox-s-proportional-hazard-model) to run Cox Proprtional hazards and generate log-likihood.
20 |
21 | 3 - Calculates the posterior likihood of the model given the data and some priors
22 |
23 | 4 - Performs a weighted average over the models based on the posterior model likihood
24 |
25 | #### How to use it
26 |
27 | The API of pyBMA is designed to mirror that of lifelines to allow as easy an integration as possible. Compare the two in the snippet below:
28 |
29 | ``` python
30 |
31 | ##pyBMA version
32 | bma_cf = CoxPHFitter()
33 | bma_cf.fit(rossi_dataset, 'week', event_col='arrest')
34 |
35 | ## Lifelines version
36 | cf = CoxPHFitter()
37 | cf.fit(rossi_dataset, 'week', event_col='arrest')
38 |
39 | ```
40 |
41 | One addition is that you can now specify a prior for each variable. This should be inputted as a numpy array of numbers between 0 and 1 in the same order as the covariate variables appear in the main dataframe. The prior for a variable is your belief about the probability that the variable will be included in the correct model. E.g. if you are certain that a variable must occur in a model for it to be correct, then set the prior for that variable to 1, while if you consider it as likely as not to be included then choose 0.5. The default sets all the priors to 0.5
42 |
43 |
44 | ``` python
45 |
46 | ##pyBMA version
47 | bma_cf = CoxPHFitter()
48 | bma_cf.fit(rossi_dataset, 'week', event_col='arrest', priors = np.array([0.3, 0.6, 0.7, 0.1, 0.9, 0.5, 0.03])
49 |
50 | ```
51 |
52 | This setup will give more weight to those models which contain the 5th variable, and much less weight to the models including the last variable over and above the model's likihood given just the data.
53 |
54 | More examples can be found in lifelines_example.py
55 |
56 |
57 |
--------------------------------------------------------------------------------
/dist/pyBMA-0.1.0.dev1.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JakeColtman/pyBMA/0d94cab9ca3aa6875ce312e5c54f602626e01b38/dist/pyBMA-0.1.0.dev1.zip
--------------------------------------------------------------------------------
/dist/pyBMA-0.1.1.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JakeColtman/pyBMA/0d94cab9ca3aa6875ce312e5c54f602626e01b38/dist/pyBMA-0.1.1.zip
--------------------------------------------------------------------------------
/dist/pyBMA-0.1.dev13.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JakeColtman/pyBMA/0d94cab9ca3aa6875ce312e5c54f602626e01b38/dist/pyBMA-0.1.dev13.zip
--------------------------------------------------------------------------------
/dist/pyBMA-0.1.dev14.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JakeColtman/pyBMA/0d94cab9ca3aa6875ce312e5c54f602626e01b38/dist/pyBMA-0.1.dev14.zip
--------------------------------------------------------------------------------
/dist/pyBMA-0.1.dev15.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JakeColtman/pyBMA/0d94cab9ca3aa6875ce312e5c54f602626e01b38/dist/pyBMA-0.1.dev15.zip
--------------------------------------------------------------------------------
/dist/pyBMA-0.1.dev2.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JakeColtman/pyBMA/0d94cab9ca3aa6875ce312e5c54f602626e01b38/dist/pyBMA-0.1.dev2.zip
--------------------------------------------------------------------------------
/dist/pyBMA-0.1.dev3.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JakeColtman/pyBMA/0d94cab9ca3aa6875ce312e5c54f602626e01b38/dist/pyBMA-0.1.dev3.zip
--------------------------------------------------------------------------------
/dist/pyBMA-0.1.dev4.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JakeColtman/pyBMA/0d94cab9ca3aa6875ce312e5c54f602626e01b38/dist/pyBMA-0.1.dev4.zip
--------------------------------------------------------------------------------
/dist/pyBMA-0.1.dev5.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JakeColtman/pyBMA/0d94cab9ca3aa6875ce312e5c54f602626e01b38/dist/pyBMA-0.1.dev5.zip
--------------------------------------------------------------------------------
/dist/pyBMA-0.1.dev6-py2-none-any.whl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JakeColtman/pyBMA/0d94cab9ca3aa6875ce312e5c54f602626e01b38/dist/pyBMA-0.1.dev6-py2-none-any.whl
--------------------------------------------------------------------------------
/dist/pyBMA-0.1.dev6.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JakeColtman/pyBMA/0d94cab9ca3aa6875ce312e5c54f602626e01b38/dist/pyBMA-0.1.dev6.zip
--------------------------------------------------------------------------------
/dist/pyBMA-0.1.dev7-py2-none-any.whl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JakeColtman/pyBMA/0d94cab9ca3aa6875ce312e5c54f602626e01b38/dist/pyBMA-0.1.dev7-py2-none-any.whl
--------------------------------------------------------------------------------
/dist/pyBMA-0.1.dev7.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JakeColtman/pyBMA/0d94cab9ca3aa6875ce312e5c54f602626e01b38/dist/pyBMA-0.1.dev7.zip
--------------------------------------------------------------------------------
/pyBMA.egg-info/PKG-INFO:
--------------------------------------------------------------------------------
1 | Metadata-Version: 1.0
2 | Name: pyBMA
3 | Version: 0.1.dev7
4 | Summary: Bayesian Model Averaging in Python
5 | Home-page: https://github.com/JakeColtman/pyBMA
6 | Author: Jake Coltman, Jacob Goodwin
7 | Author-email: jakecoltman@sky.com
8 | License: UNKNOWN
9 | Description: Bayesian Model Averaging in Python
10 | Platform: UNKNOWN
11 |
--------------------------------------------------------------------------------
/pyBMA.egg-info/SOURCES.txt:
--------------------------------------------------------------------------------
1 | setup.py
2 | pyBMA.egg-info/PKG-INFO
3 | pyBMA.egg-info/SOURCES.txt
4 | pyBMA.egg-info/dependency_links.txt
5 | pyBMA.egg-info/requires.txt
6 | pyBMA.egg-info/top_level.txt
--------------------------------------------------------------------------------
/pyBMA.egg-info/dependency_links.txt:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/pyBMA.egg-info/requires.txt:
--------------------------------------------------------------------------------
1 | lifelines
2 | pandas
3 | numpy
4 |
--------------------------------------------------------------------------------
/pyBMA.egg-info/top_level.txt:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/pyBMA/CoxPHFitter.py:
--------------------------------------------------------------------------------
1 | from math import exp
2 | import pandas as pd
3 | from pyBMA import CoxPHModel
4 | from numpy.linalg import solve, norm, inv
5 | from itertools import combinations
6 |
7 | class CoxPHFitter:
8 | """
9 | Fitter for Cox Proportional Hazards using Bayesian model averaging:
10 | h(t|x) = h_0(t)*exp(x'*beta)
11 | Fitting of individual models is done using lifelines
12 | """
13 |
14 | def fit(self, df, duration_col, event_col, priors=None):
15 | """
16 | Average across models to produce the BMA estimate for coefficients
17 | Parameters:
18 | df: a Pandas dataframe.
19 |
20 | Required columns: duration_col and event_col
21 |
22 | Other columns: covariates to model
23 |
24 | duration_col: lifetime of subject in an arbitrary unit.
25 | event_col: indicator for whether a death event was observerd.
26 | 1: Death observed
27 | 0: Censored
28 |
29 | duration_col: name of column holding duration info
30 | event_col: name of column holding event info
31 |
32 | priors: A list of length = number of covariates. Indexed by the ordering of covariates in df
33 | Each element of the list is the probability of the respective variable being included in a correct model
34 | e.g. if you are certain a variable should be included, set this to 1
35 | if you wish to encourage parsimonious models set the value for all variables to be < 0.5
36 | if you want to encourage complex models, set all values to > 0.5
37 |
38 | Values should be restricted to [0 -> 1]
39 |
40 | default: [0.5] * number covariates:
41 | completely uninformative, all models considered as likely
42 | Returns:
43 | self
44 | """
45 |
46 | self.df = df
47 | self.duration_col = duration_col
48 | self.event_col = event_col
49 |
50 | if priors is None:
51 | # If no given prior choose an uniformative one
52 | self.priors = [0.5] * (len(self.df.columns) - 2)
53 | else:
54 | self.priors = priors
55 |
56 | self.reference_loglik = None
57 |
58 | # Create a baseline model using all covariates.
59 | self.full_model = self._create_model(None)
60 | self._set_reference_loglik()
61 |
62 | # Generate representative sample of model space
63 | models = self._generate_model_definnitions()
64 | models = [self._create_model(x) for x in models]
65 |
66 | # Process log likihoods into posterior probabilities
67 | bics = [x.bayesian_information_critera() for x in models]
68 | self._generate_posteriors_from_bic(bics)
69 |
70 | coefficiencts_by_model = [x.summary()[1] for x in models]
71 | self.coefficients_weighted = self._weight_by_posterior(coefficiencts_by_model)
72 | sterr_by_model = [x.summary()[2] for x in models]
73 | self.sterr_weighted = self._weight_by_posterior(sterr_by_model)
74 |
75 | return self
76 |
77 | @property
78 | def summary(self):
79 | """Details of the output.
80 | Returns
81 | -------
82 | df : pd.DataFrame
83 | Contains columns coef, exp(coef)"""
84 |
85 | df = self.coefficients_weighted.to_frame()
86 | df['exp(coef)'] = [exp(x) for x in df['coef']]
87 | return df
88 |
89 | def _create_model(self, covariate_names):
90 | return CoxPHModel.CoxPHModel(self.df, self.duration_col, self.event_col, self.priors, self.reference_loglik,
91 | covariate_names)
92 |
93 | def _set_reference_loglik(self):
94 | self.reference_loglik = self.full_model.loglik()
95 |
96 | def _generate_model_definnitions(self):
97 | names, coefs, var = self.full_model.summary()
98 | variance_covariance = inv(-self.full_model._cf._hessian_)
99 | all_models = []
100 | for i in range(1, len(names)):
101 | all_models.append(list(combinations(names, i)))
102 | all_models = [list(item) for sublist in all_models for item in sublist]
103 | return all_models
104 |
105 | def _generate_posteriors_from_bic(self, bics):
106 | self.posterior_probabilities = []
107 | min_bic = min(bics)
108 | summation = sum([exp(-0.5 * (bic - min_bic)) for bic in bics])
109 | for bic in bics:
110 | posterior = (exp(-0.5 * (bic - min_bic))) / summation
111 | self.posterior_probabilities.append(posterior)
112 |
113 | def _weight_by_posterior(self, values):
114 | def add_dataframes(dfone, dftwo):
115 | return dfone.add(dftwo, fill_value=0)
116 |
117 | output = zip(values, self.posterior_probabilities)
118 | weighted = [x[0] * x[1] for x in output]
119 | running_total = weighted[0]
120 | for i in range(1, len(weighted)):
121 | running_total = add_dataframes(running_total, weighted[i])
122 | return running_total
123 |
--------------------------------------------------------------------------------
/pyBMA/CoxPHModel.py:
--------------------------------------------------------------------------------
1 | import lifelines
2 | from functools import reduce
3 | from math import log
4 | import operator
5 |
6 | class CoxPHModel:
7 | def __init__(self, df, survival_col, cens_col, prior_params, reference_loglik=None, covariate_names=None):
8 | self.prior_params = prior_params
9 | self.survival_col = survival_col
10 | self.cens_col = cens_col
11 |
12 | all_covariate_columns = [col for col in df.columns if col not in [cens_col, survival_col]]
13 | if covariate_names == None:
14 | self.covariate_names = all_covariate_columns
15 | else:
16 | self.covariate_names = covariate_names
17 | self.df = df[self.covariate_names + [self.survival_col, self.cens_col]]
18 |
19 | self.mask = [x in self.covariate_names for x in all_covariate_columns]
20 | self._cf = None
21 | if reference_loglik == None:
22 | reference_loglik = self.loglik()
23 | self.reference_loglik = reference_loglik
24 |
25 | def prior(self):
26 | parameter_contributions = [x[1] if x[0] else (1 - x[1]) for x in zip(self.mask, self.prior_params)]
27 | return reduce(operator.mul, parameter_contributions, 1)
28 |
29 | def _run(self):
30 | self._cf = lifelines.CoxPHFitter()
31 | self._cf.fit(self.df, self.survival_col, event_col=self.cens_col, include_likelihood=True)
32 |
33 | def loglik(self):
34 | if self._cf is None:
35 | self._run()
36 | return self._cf._log_likelihood
37 |
38 | def summary(self):
39 | if self._cf is None:
40 | self._run()
41 | return self._cf.summary.index, self._cf.summary["coef"], (
42 | self._cf.summary["se(coef)"] * self._cf.summary["se(coef)"])
43 |
44 | def bayesian_information_critera(self):
45 | size = len(self.covariate_names)
46 | n = self.df.shape[0]
47 | prior = self.prior()
48 | loglik = self.loglik()
49 | return (size * log(n)) - (2 * (loglik - self.reference_loglik)) - (2 * log(prior))
50 |
--------------------------------------------------------------------------------
/pyBMA/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JakeColtman/pyBMA/0d94cab9ca3aa6875ce312e5c54f602626e01b38/pyBMA/__init__.py
--------------------------------------------------------------------------------
/pyBMA/__pycache__/CoxPHFitter.cpython-35.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JakeColtman/pyBMA/0d94cab9ca3aa6875ce312e5c54f602626e01b38/pyBMA/__pycache__/CoxPHFitter.cpython-35.pyc
--------------------------------------------------------------------------------
/pyBMA/__pycache__/CoxPHModel.cpython-35.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JakeColtman/pyBMA/0d94cab9ca3aa6875ce312e5c54f602626e01b38/pyBMA/__pycache__/CoxPHModel.cpython-35.pyc
--------------------------------------------------------------------------------
/pyBMA/__pycache__/__init__.cpython-35.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JakeColtman/pyBMA/0d94cab9ca3aa6875ce312e5c54f602626e01b38/pyBMA/__pycache__/__init__.cpython-35.pyc
--------------------------------------------------------------------------------
/pyBMA/lifelines_example.py:
--------------------------------------------------------------------------------
1 | from lifelines.datasets import load_rossi
2 | from pyBMA.CoxPHFitter import CoxPHFitter
3 |
4 | # Replication of http://lifelines.readthedocs.io/en/latest/Survival%20Regression.html
5 |
6 | bmaCox = CoxPHFitter()
7 | print(bmaCox.fit(load_rossi(), "week", event_col= "arrest").summary)
8 |
9 | # If you wanted to have a model with few variables
10 | bmaCox = CoxPHFitter()
11 | print(bmaCox.fit(load_rossi(), "week", event_col= "arrest", priors= [0.2]*7).summary)
12 |
13 | # If you wanted to have a model with many variables
14 | bmaCox = CoxPHFitter()
15 | print(bmaCox.fit(load_rossi(), "week", event_col= "arrest", priors= [0.8]*7).summary)
16 |
17 | # If you're very confident that race has no impact
18 | # rossi columns: fin age race wexp mar paro prio
19 | bmaCox = CoxPHFitter()
20 | print(bmaCox.fit(load_rossi(), "week", event_col= "arrest", priors= [0.5, 0.5, 0.1, 0.5, 0.5, 0.5, 0.5]).summary)
21 |
22 | # If you're very confident that both age and race should be included
23 | # rossi columns: fin age race wexp mar paro prio
24 | bmaCox = CoxPHFitter()
25 | print(bmaCox.fit(load_rossi(), "week", event_col= "arrest", priors= [0.5, 0.9, 0.9, 0.5, 0.5, 0.5, 0.5]).summary)
26 |
--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
1 | from os import path
2 | from setuptools import setup
3 |
4 | here = path.abspath(path.dirname(__file__))
5 |
6 | setup(
7 | name='pyBMA',
8 | packages=["pyBMA"],
9 | version='0.1.1',
10 | description='Bayesian Model Averaging in Python',
11 | long_description="Bayesian Model Averaging in Python",
12 | url='https://github.com/JakeColtman/pyBMA',
13 | author='Jake Coltman, Jacob Goodwin',
14 | author_email='jakecoltman@sky.com',
15 | install_requires=[
16 | 'lifelines',
17 | 'pandas',
18 | 'numpy'
19 | ]
20 | )
21 |
--------------------------------------------------------------------------------