Module auton_survival.datasets
29 | Utility functions to load standard datasets to train and evaluate the 32 | Deep Survival Machines models.
33 | 34 |Functions
41 |-
42 |
43 | def increase_censoring(e, t, p, random_seed=0) 44 |
45 | - 46 | 47 | 48 | 49 |
50 | def load_support() 51 |
52 | -
53 | 62 | 63 |
Helper function to load and preprocess the SUPPORT dataset. 54 | The SUPPORT Dataset comes from the Vanderbilt University study 55 | to estimate survival for seriously ill hospitalized adults [1]. 56 | Please refer to http://biostat.mc.vanderbilt.edu/wiki/Main/SupportDesc. 57 | for the original datasource.
58 |References
59 |[1]: Knaus WA, Harrell FE, Lynn J et al. (1995): The SUPPORT prognostic 60 | model: Objective estimates of survival for seriously ill hospitalized 61 | adults. Annals of Internal Medicine 122:191-203.
64 | 65 | def load_synthetic_cf_phenotyping() 66 |
67 | - 68 | 69 | 70 | 71 |
72 | def load_dataset(dataset='SUPPORT', **kwargs) 73 |
74 | -
75 | 117 | 118 |
Helper function to load datasets to test Survival Analysis models. 76 | Currently implemented datasets include:
77 |SUPPORT: This dataset comes from the Vanderbilt University study 78 | to estimate survival for seriously ill hospitalized adults [1]. 79 | (Refer to http://biostat.mc.vanderbilt.edu/wiki/Main/SupportDesc. 80 | for the original datasource.)
81 |PBC: The Primary biliary cirrhosis dataset [2] is well known 82 | dataset for evaluating survival analysis models with time 83 | dependent covariates.
84 |FRAMINGHAM: This dataset is a subset of 4,434 participants of the well 85 | known, ongoing Framingham Heart study [3] for studying epidemiology for 86 | hypertensive and arteriosclerotic cardiovascular disease. It is a popular 87 | dataset for longitudinal survival analysis with time dependent covariates.
88 |SYNTHETIC: This is a non-linear censored dataset for counterfactual 89 | time-to-event phenotyping. Introduced in [4], the dataset is generated 90 | such that the treatment effect is heterogenous conditioned on the covariates.
91 |References
92 |[1]: Knaus WA, Harrell FE, Lynn J et al. (1995): The SUPPORT prognostic 93 | model: Objective estimates of survival for seriously ill hospitalized 94 | adults. Annals of Internal Medicine 122:191-203.
95 |[2] Fleming, Thomas R., and David P. Harrington. Counting processes and 96 | survival analysis. Vol. 169. John Wiley & Sons, 2011.
97 |[3] Dawber, Thomas R., Gilcin F. Meadors, and Felix E. Moore Jr. 98 | "Epidemiological approaches to heart disease: the Framingham Study." 99 | American Journal of Public Health and the Nations Health 41.3 (1951).
100 |[4] Nagpal, C., Goswami M., Dufendach K., and Artur Dubrawski. 101 | "Counterfactual phenotyping for censored Time-to-Events" (2022).
102 |Parameters
103 |-
104 |
dataset
:str
105 | - The choice of dataset to load. Currently implemented is 'SUPPORT', 106 | 'PBC' and 'FRAMINGHAM'. 107 |
**kwargs
:dict
108 | - Dataset specific keyword arguments. 109 |
Returns
111 |-
112 |
tuple
:(np.ndarray, np.ndarray, np.ndarray)
113 | - A tuple of the form of (x, t, e) where x 114 | are the input covariates, t the event times and 115 | e the censoring indicators. 116 |
119 |